SQL Subquery Complete Tutorial: Subquery Syntax, CTE Comparison & Practical Applications [2026 Update]

SQL Subquery Complete Tutorial: Subquery Syntax and Practical Applications
Subqueries are one of the most powerful features in SQL, allowing you to nest one query within another query to solve complex problems that a single query cannot handle. This tutorial will start from basic concepts, gradually introduce various subquery types, and demonstrate their application scenarios through real cases.
What is a Subquery?
A subquery is a SELECT query nested within another SQL statement. The outer query is called the Main Query or Outer Query, and the nested query is called the Subquery or Inner Query.
Basic Subquery Syntax
-- Basic structure
SELECT column_list
FROM table_name
WHERE column_name operator (
SELECT column_name
FROM table_name
WHERE condition
);
Where Subqueries Can Appear
- In WHERE clause: Most common usage
- In FROM clause: As a derived table
- In SELECT clause: As a scalar value
- In HAVING clause: For group filtering
Types of Subqueries
1. Scalar Subquery
A scalar subquery returns only a single value (one row, one column) and can be used anywhere a single value is needed.
Basic Example: Find products priced above average
SELECT product_name, price
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
Using Scalar Subquery in SELECT Clause
SELECT
employee_name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
Calculating Percentage Application
SELECT
category,
SUM(sales) AS category_sales,
ROUND(
100.0 * SUM(sales) / (SELECT SUM(sales) FROM orders),
2
) AS percentage
FROM orders
GROUP BY category;
2. Row Subquery
A row subquery returns a single row (one row, multiple columns), typically used for multi-column comparisons.
Comparing Multiple Columns
SELECT *
FROM products
WHERE (category, brand) = (
SELECT category, brand
FROM products
WHERE product_id = 'P001'
);
Find Colleagues with Same Department and Position as a Specific Employee
SELECT employee_name, department, position
FROM employees
WHERE (department, position) = (
SELECT department, position
FROM employees
WHERE employee_id = 'E100'
)
AND employee_id != 'E100';
3. Table Subquery
A table subquery returns multiple rows and columns, commonly used with IN, ANY, ALL operators, or as a derived table in FROM clause.
Using IN Operator
-- Find customers who have placed orders
SELECT customer_name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2025-01-01'
);
Using NOT IN Operator
-- Find customers who have never placed orders
SELECT customer_name, email
FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT customer_id
FROM orders
);
⚠️ Note: Be careful with NULL values when using NOT IN. If subquery results contain NULL, the entire NOT IN will return empty results.
-- Safe approach: Exclude NULL
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id
FROM orders
WHERE customer_id IS NOT NULL
);
4. Derived Table
Place a subquery in the FROM clause to use as a temporary table.
Basic Syntax
SELECT dt.column1, dt.column2
FROM (
SELECT column1, column2, column3
FROM table_name
WHERE condition
) AS dt -- Derived table must have an alias
WHERE dt.column3 > 100;
Practical Application: Calculate Order Statistics per Customer
SELECT
c.customer_name,
order_stats.order_count,
order_stats.total_amount,
order_stats.avg_amount
FROM customers c
INNER JOIN (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_amount,
AVG(total_amount) AS avg_amount
FROM orders
GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
WHERE order_stats.order_count >= 5;
EXISTS and NOT EXISTS
EXISTS is an operator used to check whether a subquery returns any data rows. It only cares about whether data exists, not the actual content.
EXISTS Basic Syntax
SELECT column_list
FROM table1
WHERE EXISTS (
SELECT 1
FROM table2
WHERE table2.foreign_key = table1.primary_key
);
EXISTS vs IN Differences
Using IN
-- Find customers with orders
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
);
Using EXISTS
-- Same functionality using EXISTS
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Performance Comparison
| Scenario | Recommended | Reason |
|---|---|---|
| Small subquery result set | IN | Direct list comparison |
| Large subquery result set | EXISTS | Stops at first match |
| Small outer table, large inner table | EXISTS | Reduces subquery execution count |
| Large outer table, small inner table | IN | Subquery executes only once |
| Need to handle NULL | EXISTS | Avoids NULL issues |
NOT EXISTS Applications
-- Find products without any orders
SELECT product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
-- Find warehouses with stock for all products
SELECT warehouse_name
FROM warehouses w
WHERE NOT EXISTS (
SELECT 1
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM inventory i
WHERE i.warehouse_id = w.warehouse_id
AND i.product_id = p.product_id
AND i.quantity > 0
)
);
Correlated Subquery
A correlated subquery references columns from the outer query within the subquery. Each time the outer query processes a row, the subquery executes once.
Non-Correlated vs Correlated Subquery
Non-Correlated Subquery (Independent Execution)
-- Subquery executes independently, only once
SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);
Correlated Subquery (Depends on Outer Query)
-- Subquery references outer p.category, executes once per category
SELECT p.product_name, p.category, p.price
FROM products p
WHERE p.price > (
SELECT AVG(price)
FROM products
WHERE category = p.category -- References outer query's p.category
);
Correlated Subquery Practical Examples
Find Highest Paid Employee in Each Department
SELECT e1.department, e1.employee_name, e1.salary
FROM employees e1
WHERE e1.salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
);
Find Orders with Amount Higher than Customer's Average Order Amount
SELECT o1.order_id, o1.customer_id, o1.total_amount
FROM orders o1
WHERE o1.total_amount > (
SELECT AVG(o2.total_amount)
FROM orders o2
WHERE o2.customer_id = o1.customer_id
);
Calculate Cumulative Ranking
SELECT
e1.employee_name,
e1.salary,
(
SELECT COUNT(*)
FROM employees e2
WHERE e2.salary > e1.salary
) + 1 AS salary_rank
FROM employees e1
ORDER BY salary_rank;
ANY and ALL Operators
ANY (SOME) Operator
ANY means "any one"—true if any value in the subquery result matches the condition.
-- Find products priced higher than ANY electronics product
-- Equivalent to: price greater than lowest electronics price
SELECT product_name, price
FROM products
WHERE price > ANY (
SELECT price
FROM products
WHERE category = 'Electronics'
);
ANY Equivalents
| Expression | Equivalent To |
|---|---|
> ANY | > MIN() |
< ANY | < MAX() |
= ANY | IN |
ALL Operator
ALL means "all"—all values in subquery results must match the condition.
-- Find products priced higher than ALL electronics products
-- Equivalent to: price greater than highest electronics price
SELECT product_name, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Electronics'
);
ALL Equivalents
| Expression | Equivalent To |
|---|---|
> ALL | > MAX() |
< ALL | < MIN() |
<> ALL | NOT IN |
Subquery Practical Cases
Case 1: Customer Value Analysis
Find VIP customers (top 10% by total orders)
WITH customer_totals AS (
SELECT
customer_id,
SUM(total_amount) AS lifetime_value
FROM orders
GROUP BY customer_id
)
SELECT
c.customer_name,
c.email,
ct.lifetime_value
FROM customers c
INNER JOIN customer_totals ct ON c.customer_id = ct.customer_id
WHERE ct.lifetime_value >= (
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY lifetime_value)
FROM customer_totals
)
ORDER BY ct.lifetime_value DESC;
Case 2: Inventory Alert
Find products with stock below average sales volume
SELECT
p.product_name,
p.stock_quantity,
avg_sales.monthly_avg
FROM products p
INNER JOIN (
SELECT
product_id,
AVG(quantity) AS monthly_avg
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATEADD(MONTH, -3, GETDATE())
GROUP BY product_id
) AS avg_sales ON p.product_id = avg_sales.product_id
WHERE p.stock_quantity < avg_sales.monthly_avg * 2;
Case 3: Consecutive Purchasing Customers
Find customers who ordered in three consecutive months
WITH monthly_orders AS (
SELECT DISTINCT
customer_id,
FORMAT(order_date, 'yyyy-MM') AS order_month
FROM orders
WHERE order_date >= DATEADD(MONTH, -6, GETDATE())
)
SELECT c.customer_name
FROM customers c
WHERE (
SELECT COUNT(DISTINCT order_month)
FROM monthly_orders mo
WHERE mo.customer_id = c.customer_id
AND order_month IN (
FORMAT(DATEADD(MONTH, -1, GETDATE()), 'yyyy-MM'),
FORMAT(DATEADD(MONTH, -2, GETDATE()), 'yyyy-MM'),
FORMAT(DATEADD(MONTH, -3, GETDATE()), 'yyyy-MM')
)
) = 3;
Case 4: Product Recommendations
Find other products commonly purchased by customers who bought Product A
SELECT
p.product_name,
COUNT(DISTINCT oi.order_id) AS co_purchase_count
FROM order_items oi
INNER JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id IN (
-- Find orders containing Product A
SELECT order_id
FROM order_items
WHERE product_id = 'PROD_A'
)
AND oi.product_id != 'PROD_A' -- Exclude Product A itself
GROUP BY p.product_name
ORDER BY co_purchase_count DESC
LIMIT 10;
Choosing Between Subqueries and JOINs
When to Use Subqueries
- Clear logic: The problem itself is "find things that meet certain conditions"
- Single value comparison: Need to compare with aggregate results
- EXISTS checks: Checking if related records exist
- Complex filtering: Need multi-level condition judgment
When to Use JOINs
- Need columns from multiple tables: Results need to display data from multiple tables
- Performance considerations: JOINs usually perform better in most cases
- Simple relationships: Straightforward table connections
Comparing Different Approaches for Same Function
Problem: Find customer names with orders and their order count
Method 1: Subquery
SELECT
customer_name,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
Method 2: JOIN
SELECT
c.customer_name,
COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;
Method 3: CTE
WITH order_counts AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
)
SELECT c.customer_name, oc.order_count
FROM customers c
INNER JOIN order_counts oc ON c.customer_id = oc.customer_id;
2026 Perspective: Subquery vs CTE vs Window Function
In modern SQL development, many problems traditionally solved with subqueries now have clearer alternatives.
When to Choose Which Approach?
| Scenario | Recommended Approach | Reason |
|---|---|---|
| Simple existence check | EXISTS subquery | Clear semantics, good performance |
| Calculate rank/top N per group | Window Function | Single table scan |
| Multi-step complex logic | CTE | Best readability |
| Recursive queries (hierarchies) | Recursive CTE | Only option |
| Single value comparison (avg, max) | Scalar subquery or CTE | Either works |
Practical Example Comparison
Problem: Find the highest-paid employee in each department
Method 1: Correlated Subquery (Traditional)
SELECT department, employee_name, salary
FROM employees e1
WHERE salary = (
SELECT MAX(salary)
FROM employees e2
WHERE e2.department = e1.department
);
Method 2: Window Function (Recommended)
WITH ranked AS (
SELECT department, employee_name, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT department, employee_name, salary
FROM ranked WHERE rn = 1;
Method 3: JOIN + Derived Table
SELECT e.department, e.employee_name, e.salary
FROM employees e
INNER JOIN (
SELECT department, MAX(salary) AS max_salary
FROM employees GROUP BY department
) m ON e.department = m.department AND e.salary = m.max_salary;
Performance Comparison (typical cases):
- Window Function usually requires only one table scan
- Correlated subquery may execute once per row
- JOIN method performance is between the two
Conclusion: The 2026 best practice is to prefer CTE + Window Function, reserving subqueries for semantically appropriate scenarios.
Subquery Performance Optimization
Optimization Tips
1. Avoid Correlated Subqueries in SELECT
-- Worse: Subquery executes for each row
SELECT
p.product_name,
(SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) AS total_sold
FROM products p;
-- Better: Use JOIN
SELECT
p.product_name,
COALESCE(oi.total_sold, 0) AS total_sold
FROM products p
LEFT JOIN (
SELECT product_id, SUM(quantity) AS total_sold
FROM order_items
GROUP BY product_id
) oi ON p.product_id = oi.product_id;
2. Use EXISTS Instead of IN (For Large Data Sets)
-- When subquery result is large, EXISTS is usually faster
-- Instead of IN
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
3. Use CTE for Readability and Potential Performance
WITH high_value_orders AS (
SELECT customer_id, SUM(total_amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 10000
)
SELECT c.*, hvo.total
FROM customers c
INNER JOIN high_value_orders hvo ON c.customer_id = hvo.customer_id;
Conclusion
Subqueries are an important tool in SQL for handling complex data requirements. Mastering the characteristics and applicable scenarios of various subquery types enables you to more flexibly solve practical problems.
Key Points Review
- Scalar subquery: Returns single value, used for comparison operations
- Table subquery: Used with IN, ANY, ALL
- EXISTS: Checks existence, usually better performance
- Correlated subquery: Powerful but watch performance
- Derived table: Makes complex queries more readable
With experience, you'll gradually develop intuition for choosing between subqueries and JOINs, writing SQL queries that are both correct and efficient.
Need to optimize complex data query architecture? CloudInsight provides professional database architecture consulting services, helping you design high-performance query strategies and data structures to improve overall system performance.
💡 Schedule Data Architecture Consultation
FAQ
Q1: Which performs better, subqueries or JOINs?
It depends on the specific situation. Generally, modern database optimizers will rewrite many subqueries as JOINs for execution, so performance differences are minimal. However, some principles apply: (1) EXISTS is usually faster than IN when checking existence, especially with large subquery result sets; (2) Correlated subqueries in SELECT clauses execute once per row, potentially causing performance issues; (3) Actual performance needs to be determined by examining the execution plan. Recommend writing logically clear queries first, then optimizing based on execution plans.
Q2: Why does NOT IN have problems with NULL?
When subquery results contain NULL, NOT IN returns empty results. This is because of SQL's three-valued logic: any value compared with NULL is UNKNOWN, and NOT IN requires all comparisons to be TRUE to return that row. Three solutions: (1) Add WHERE column IS NOT NULL in subquery; (2) Use NOT EXISTS instead; (3) Use LEFT JOIN with IS NULL check. NOT EXISTS is the most robust approach because it's not affected by NULL values.
Further Reading
- SQL Basic Syntax Complete Tutorial - Complete syntax guide from SELECT to JOIN
- SQL JOIN Complete Tutorial - Deep understanding of various JOIN operations
- SQL Window Function Tutorial - Advanced analytics function applications
- SQL Performance Tuning Complete Guide - Query tuning and index strategy
- SQL Advanced Techniques Practical Tips - Advanced SQL techniques overview
References
- Microsoft SQL Server Documentation - Subqueries
- PostgreSQL Documentation - Subquery Expressions
- Oracle Database SQL Language Reference - Subqueries
- MySQL Reference Manual - Subqueries
- SQL Performance Explained - Markus Winand
Need Professional Cloud Advice?
Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help
Book Free ConsultationRelated Articles
SQL Stored Procedure Tutorial: Creating Procedures and Best Practices
Complete SQL stored procedure tutorial covering CREATE PROCEDURE syntax, INPUT/OUTPUT parameters, TRY CATCH error handling, performance optimization techniques, and practical examples to improve database development efficiency.
SQLSQL CRUD Operations Complete Guide: INSERT, UPDATE, DELETE Syntax Tutorial [2025]
Systematic introduction to SQL Data Manipulation Language (DML) core commands. Covers INSERT, UPDATE, DELETE complete syntax, TRUNCATE comparison, transaction control, and common error reminders.
SQLSQL Functions Complete Guide: String, Date, Math, Aggregate Functions Summary [2025]
Complete SQL built-in functions reference. Covers string functions (CONCAT, SUBSTRING, REPLACE), date functions (DATEADD, DATEDIFF), math functions (ROUND, CEILING), aggregate functions (COUNT, SUM, AVG) with syntax and examples.