Back to HomeSQL

SQL Subquery Complete Tutorial: Subquery Syntax, CTE Comparison & Practical Applications [2026 Update]

13 min min read
#SQL#Subquery#CTE#EXISTS#Database#Performance Optimization

SQL Subquery Complete Tutorial: Subquery Syntax and Practical Applications

SQL Subquery Complete Tutorial: Subquery Syntax and Practical Applications

Subqueries are one of the most powerful features in SQL, allowing you to nest one query within another query to solve complex problems that a single query cannot handle. This tutorial will start from basic concepts, gradually introduce various subquery types, and demonstrate their application scenarios through real cases.

What is a Subquery?

A subquery is a SELECT query nested within another SQL statement. The outer query is called the Main Query or Outer Query, and the nested query is called the Subquery or Inner Query.

Basic Subquery Syntax

-- Basic structure
SELECT column_list
FROM table_name
WHERE column_name operator (
    SELECT column_name
    FROM table_name
    WHERE condition
);

Where Subqueries Can Appear

  1. In WHERE clause: Most common usage
  2. In FROM clause: As a derived table
  3. In SELECT clause: As a scalar value
  4. In HAVING clause: For group filtering

Types of Subqueries

1. Scalar Subquery

A scalar subquery returns only a single value (one row, one column) and can be used anywhere a single value is needed.

Basic Example: Find products priced above average

SELECT product_name, price
FROM products
WHERE price > (
    SELECT AVG(price)
    FROM products
);

Using Scalar Subquery in SELECT Clause

SELECT
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

Calculating Percentage Application

SELECT
    category,
    SUM(sales) AS category_sales,
    ROUND(
        100.0 * SUM(sales) / (SELECT SUM(sales) FROM orders),
        2
    ) AS percentage
FROM orders
GROUP BY category;

2. Row Subquery

A row subquery returns a single row (one row, multiple columns), typically used for multi-column comparisons.

Comparing Multiple Columns

SELECT *
FROM products
WHERE (category, brand) = (
    SELECT category, brand
    FROM products
    WHERE product_id = 'P001'
);

Find Colleagues with Same Department and Position as a Specific Employee

SELECT employee_name, department, position
FROM employees
WHERE (department, position) = (
    SELECT department, position
    FROM employees
    WHERE employee_id = 'E100'
)
AND employee_id != 'E100';

3. Table Subquery

A table subquery returns multiple rows and columns, commonly used with IN, ANY, ALL operators, or as a derived table in FROM clause.

Using IN Operator

-- Find customers who have placed orders
SELECT customer_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= '2025-01-01'
);

Using NOT IN Operator

-- Find customers who have never placed orders
SELECT customer_name, email
FROM customers
WHERE customer_id NOT IN (
    SELECT DISTINCT customer_id
    FROM orders
);

⚠️ Note: Be careful with NULL values when using NOT IN. If subquery results contain NULL, the entire NOT IN will return empty results.

-- Safe approach: Exclude NULL
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
    SELECT customer_id
    FROM orders
    WHERE customer_id IS NOT NULL
);

4. Derived Table

Place a subquery in the FROM clause to use as a temporary table.

Basic Syntax

SELECT dt.column1, dt.column2
FROM (
    SELECT column1, column2, column3
    FROM table_name
    WHERE condition
) AS dt  -- Derived table must have an alias
WHERE dt.column3 > 100;

Practical Application: Calculate Order Statistics per Customer

SELECT
    c.customer_name,
    order_stats.order_count,
    order_stats.total_amount,
    order_stats.avg_amount
FROM customers c
INNER JOIN (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        SUM(total_amount) AS total_amount,
        AVG(total_amount) AS avg_amount
    FROM orders
    GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
WHERE order_stats.order_count >= 5;

EXISTS and NOT EXISTS

EXISTS is an operator used to check whether a subquery returns any data rows. It only cares about whether data exists, not the actual content.

EXISTS Basic Syntax

SELECT column_list
FROM table1
WHERE EXISTS (
    SELECT 1
    FROM table2
    WHERE table2.foreign_key = table1.primary_key
);

EXISTS vs IN Differences

Using IN

-- Find customers with orders
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
);

Using EXISTS

-- Same functionality using EXISTS
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Performance Comparison

ScenarioRecommendedReason
Small subquery result setINDirect list comparison
Large subquery result setEXISTSStops at first match
Small outer table, large inner tableEXISTSReduces subquery execution count
Large outer table, small inner tableINSubquery executes only once
Need to handle NULLEXISTSAvoids NULL issues

NOT EXISTS Applications

-- Find products without any orders
SELECT product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

-- Find warehouses with stock for all products
SELECT warehouse_name
FROM warehouses w
WHERE NOT EXISTS (
    SELECT 1
    FROM products p
    WHERE NOT EXISTS (
        SELECT 1
        FROM inventory i
        WHERE i.warehouse_id = w.warehouse_id
        AND i.product_id = p.product_id
        AND i.quantity > 0
    )
);

Correlated Subquery

A correlated subquery references columns from the outer query within the subquery. Each time the outer query processes a row, the subquery executes once.

Non-Correlated vs Correlated Subquery

Non-Correlated Subquery (Independent Execution)

-- Subquery executes independently, only once
SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);

Correlated Subquery (Depends on Outer Query)

-- Subquery references outer p.category, executes once per category
SELECT p.product_name, p.category, p.price
FROM products p
WHERE p.price > (
    SELECT AVG(price)
    FROM products
    WHERE category = p.category  -- References outer query's p.category
);

Correlated Subquery Practical Examples

Find Highest Paid Employee in Each Department

SELECT e1.department, e1.employee_name, e1.salary
FROM employees e1
WHERE e1.salary = (
    SELECT MAX(e2.salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

Find Orders with Amount Higher than Customer's Average Order Amount

SELECT o1.order_id, o1.customer_id, o1.total_amount
FROM orders o1
WHERE o1.total_amount > (
    SELECT AVG(o2.total_amount)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
);

Calculate Cumulative Ranking

SELECT
    e1.employee_name,
    e1.salary,
    (
        SELECT COUNT(*)
        FROM employees e2
        WHERE e2.salary > e1.salary
    ) + 1 AS salary_rank
FROM employees e1
ORDER BY salary_rank;

ANY and ALL Operators

ANY (SOME) Operator

ANY means "any one"—true if any value in the subquery result matches the condition.

-- Find products priced higher than ANY electronics product
-- Equivalent to: price greater than lowest electronics price
SELECT product_name, price
FROM products
WHERE price > ANY (
    SELECT price
    FROM products
    WHERE category = 'Electronics'
);

ANY Equivalents

ExpressionEquivalent To
> ANY> MIN()
< ANY< MAX()
= ANYIN

ALL Operator

ALL means "all"—all values in subquery results must match the condition.

-- Find products priced higher than ALL electronics products
-- Equivalent to: price greater than highest electronics price
SELECT product_name, price
FROM products
WHERE price > ALL (
    SELECT price
    FROM products
    WHERE category = 'Electronics'
);

ALL Equivalents

ExpressionEquivalent To
> ALL> MAX()
< ALL< MIN()
<> ALLNOT IN

Subquery Practical Cases

Case 1: Customer Value Analysis

Find VIP customers (top 10% by total orders)

WITH customer_totals AS (
    SELECT
        customer_id,
        SUM(total_amount) AS lifetime_value
    FROM orders
    GROUP BY customer_id
)
SELECT
    c.customer_name,
    c.email,
    ct.lifetime_value
FROM customers c
INNER JOIN customer_totals ct ON c.customer_id = ct.customer_id
WHERE ct.lifetime_value >= (
    SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY lifetime_value)
    FROM customer_totals
)
ORDER BY ct.lifetime_value DESC;

Case 2: Inventory Alert

Find products with stock below average sales volume

SELECT
    p.product_name,
    p.stock_quantity,
    avg_sales.monthly_avg
FROM products p
INNER JOIN (
    SELECT
        product_id,
        AVG(quantity) AS monthly_avg
    FROM order_items oi
    INNER JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date >= DATEADD(MONTH, -3, GETDATE())
    GROUP BY product_id
) AS avg_sales ON p.product_id = avg_sales.product_id
WHERE p.stock_quantity < avg_sales.monthly_avg * 2;

Case 3: Consecutive Purchasing Customers

Find customers who ordered in three consecutive months

WITH monthly_orders AS (
    SELECT DISTINCT
        customer_id,
        FORMAT(order_date, 'yyyy-MM') AS order_month
    FROM orders
    WHERE order_date >= DATEADD(MONTH, -6, GETDATE())
)
SELECT c.customer_name
FROM customers c
WHERE (
    SELECT COUNT(DISTINCT order_month)
    FROM monthly_orders mo
    WHERE mo.customer_id = c.customer_id
    AND order_month IN (
        FORMAT(DATEADD(MONTH, -1, GETDATE()), 'yyyy-MM'),
        FORMAT(DATEADD(MONTH, -2, GETDATE()), 'yyyy-MM'),
        FORMAT(DATEADD(MONTH, -3, GETDATE()), 'yyyy-MM')
    )
) = 3;

Case 4: Product Recommendations

Find other products commonly purchased by customers who bought Product A

SELECT
    p.product_name,
    COUNT(DISTINCT oi.order_id) AS co_purchase_count
FROM order_items oi
INNER JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id IN (
    -- Find orders containing Product A
    SELECT order_id
    FROM order_items
    WHERE product_id = 'PROD_A'
)
AND oi.product_id != 'PROD_A'  -- Exclude Product A itself
GROUP BY p.product_name
ORDER BY co_purchase_count DESC
LIMIT 10;

Choosing Between Subqueries and JOINs

When to Use Subqueries

  1. Clear logic: The problem itself is "find things that meet certain conditions"
  2. Single value comparison: Need to compare with aggregate results
  3. EXISTS checks: Checking if related records exist
  4. Complex filtering: Need multi-level condition judgment

When to Use JOINs

  1. Need columns from multiple tables: Results need to display data from multiple tables
  2. Performance considerations: JOINs usually perform better in most cases
  3. Simple relationships: Straightforward table connections

Comparing Different Approaches for Same Function

Problem: Find customer names with orders and their order count

Method 1: Subquery

SELECT
    customer_name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

Method 2: JOIN

SELECT
    c.customer_name,
    COUNT(o.order_id) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;

Method 3: CTE

WITH order_counts AS (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_id
)
SELECT c.customer_name, oc.order_count
FROM customers c
INNER JOIN order_counts oc ON c.customer_id = oc.customer_id;

2026 Perspective: Subquery vs CTE vs Window Function

In modern SQL development, many problems traditionally solved with subqueries now have clearer alternatives.

When to Choose Which Approach?

ScenarioRecommended ApproachReason
Simple existence checkEXISTS subqueryClear semantics, good performance
Calculate rank/top N per groupWindow FunctionSingle table scan
Multi-step complex logicCTEBest readability
Recursive queries (hierarchies)Recursive CTEOnly option
Single value comparison (avg, max)Scalar subquery or CTEEither works

Practical Example Comparison

Problem: Find the highest-paid employee in each department

Method 1: Correlated Subquery (Traditional)

SELECT department, employee_name, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

Method 2: Window Function (Recommended)

WITH ranked AS (
    SELECT department, employee_name, salary,
           RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT department, employee_name, salary
FROM ranked WHERE rn = 1;

Method 3: JOIN + Derived Table

SELECT e.department, e.employee_name, e.salary
FROM employees e
INNER JOIN (
    SELECT department, MAX(salary) AS max_salary
    FROM employees GROUP BY department
) m ON e.department = m.department AND e.salary = m.max_salary;

Performance Comparison (typical cases):

  • Window Function usually requires only one table scan
  • Correlated subquery may execute once per row
  • JOIN method performance is between the two

Conclusion: The 2026 best practice is to prefer CTE + Window Function, reserving subqueries for semantically appropriate scenarios.


Subquery Performance Optimization

Optimization Tips

1. Avoid Correlated Subqueries in SELECT

-- Worse: Subquery executes for each row
SELECT
    p.product_name,
    (SELECT SUM(quantity) FROM order_items WHERE product_id = p.product_id) AS total_sold
FROM products p;

-- Better: Use JOIN
SELECT
    p.product_name,
    COALESCE(oi.total_sold, 0) AS total_sold
FROM products p
LEFT JOIN (
    SELECT product_id, SUM(quantity) AS total_sold
    FROM order_items
    GROUP BY product_id
) oi ON p.product_id = oi.product_id;

2. Use EXISTS Instead of IN (For Large Data Sets)

-- When subquery result is large, EXISTS is usually faster
-- Instead of IN
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

3. Use CTE for Readability and Potential Performance

WITH high_value_orders AS (
    SELECT customer_id, SUM(total_amount) AS total
    FROM orders
    GROUP BY customer_id
    HAVING SUM(total_amount) > 10000
)
SELECT c.*, hvo.total
FROM customers c
INNER JOIN high_value_orders hvo ON c.customer_id = hvo.customer_id;

Conclusion

Subqueries are an important tool in SQL for handling complex data requirements. Mastering the characteristics and applicable scenarios of various subquery types enables you to more flexibly solve practical problems.

Key Points Review

  1. Scalar subquery: Returns single value, used for comparison operations
  2. Table subquery: Used with IN, ANY, ALL
  3. EXISTS: Checks existence, usually better performance
  4. Correlated subquery: Powerful but watch performance
  5. Derived table: Makes complex queries more readable

With experience, you'll gradually develop intuition for choosing between subqueries and JOINs, writing SQL queries that are both correct and efficient.


Need to optimize complex data query architecture? CloudInsight provides professional database architecture consulting services, helping you design high-performance query strategies and data structures to improve overall system performance.

💡 Schedule Data Architecture Consultation


FAQ

Q1: Which performs better, subqueries or JOINs?

It depends on the specific situation. Generally, modern database optimizers will rewrite many subqueries as JOINs for execution, so performance differences are minimal. However, some principles apply: (1) EXISTS is usually faster than IN when checking existence, especially with large subquery result sets; (2) Correlated subqueries in SELECT clauses execute once per row, potentially causing performance issues; (3) Actual performance needs to be determined by examining the execution plan. Recommend writing logically clear queries first, then optimizing based on execution plans.

Q2: Why does NOT IN have problems with NULL?

When subquery results contain NULL, NOT IN returns empty results. This is because of SQL's three-valued logic: any value compared with NULL is UNKNOWN, and NOT IN requires all comparisons to be TRUE to return that row. Three solutions: (1) Add WHERE column IS NOT NULL in subquery; (2) Use NOT EXISTS instead; (3) Use LEFT JOIN with IS NULL check. NOT EXISTS is the most robust approach because it's not affected by NULL values.

Further Reading

References

  1. Microsoft SQL Server Documentation - Subqueries
  2. PostgreSQL Documentation - Subquery Expressions
  3. Oracle Database SQL Language Reference - Subqueries
  4. MySQL Reference Manual - Subqueries
  5. SQL Performance Explained - Markus Winand

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

Related Articles