SQL Intermediate Guide — Hands-On with Joins, CTE's & Window Functions
SQL Intermediate Guide — Hands-On with Joins, CTEs & Window Functions

SQL Intermediate Guide — Hands-On with Joins, CTE's & Window Functions


SQL JOINs Explained: INNER, LEFT, RIGHT, FULL OUTER

In relational databases, data is often split across multiple tables. To extract meaningful insights, you frequently need to combine data from two or more tables — this is where SQL JOINs come in.

What is a JOIN?

A JOIN in SQL combines rows from two or more tables based on a related column (typically a primary-foreign key relationship). The resulting dataset will vary depending on the type of JOIN used.


INNER JOIN

Definition: Returns only the rows that have matching values in both tables.

Syntax:

SELECT table1.column, table2.column
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;        

Example: Join customers and orders tables to get customers who have placed an order.

SELECT c.customer_name, o.order_date
FROM customers c
INNER JOIN orders o
ON c.customer_id = o.customer_id;        

Key Points:

  • Filters out unmatched records.
  • Most commonly used JOIN type.
  • Results will exclude customers with no orders.


LEFT JOIN (LEFT OUTER JOIN)

Definition: Returns all rows from the left table, and matched rows from the right table. If there's no match, NULLs are returned for columns from the right table.

Syntax:

SELECT table1.column, table2.column
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;        

Example: List all customers and their orders (if any).

SELECT c.customer_name, o.order_date
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;        

Key Points:

  • Ensures all records from the left table are included.
  • Use when you want to identify unmatched records (e.g., customers with no orders).


RIGHT JOIN (RIGHT OUTER JOIN)

Definition: Returns all rows from the right table, and matched rows from the left table. If there's no match, NULLs are returned for columns from the left table.

Syntax:

SELECT table1.column, table2.column
FROM table1
RIGHT JOIN table2
ON table1.common_column = table2.common_column;        

Example: List all orders and the customer who placed them. Even if the customer record no longer exists, the order will appear.

SELECT c.customer_name, o.order_date
FROM customers c
RIGHT JOIN orders o
ON c.customer_id = o.customer_id;        

Key Points:

  • Useful for keeping all right-side data, even if there's no corresponding left-side entry.
  • Good for auditing orphan records (e.g., deleted customers with remaining orders).


FULL OUTER JOIN

Definition: Returns all rows from both tables. Where no match exists, NULLs are returned in place.

Syntax:

SELECT table1.column, table2.column
FROM table1
FULL OUTER JOIN table2
ON table1.common_column = table2.common_column;        

Example: List all customers and all orders, matching where possible.

SELECT c.customer_name, o.order_date
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id;        

Key Points:

  • Useful for comprehensive comparison.
  • Highlights unmatched rows from both tables.
  • May not be supported in some DBs (like MySQL) — in such cases, use UNION of LEFT JOIN and RIGHT JOIN.


Summary of JOIN Types

  1. INNER JOIN: Only matching rows in both tables
  2. LEFT JOIN: All rows from the left table, matched rows from the right table
  3. RIGHT JOIN: All rows from the right table, matched rows from the left table
  4. FULL OUTER JOIN: All rows from both tables, matched where possible

Pro Tips:

  • Always use table aliases (c, o) for clarity in multi-table queries.
  • Combine JOINs with WHERE, GROUP BY, and ORDER BY for advanced queries.
  • For performance, ensure join columns are indexed — especially in large datasets.


Subqueries vs CTEs (Common Table Expressions)

When writing SQL queries to solve real-world problems, you often need to structure your logic using temporary result sets. These can be created using either subqueries or Common Table Expressions (CTEs). Both approaches are used to improve modularity and handle complexity, but they differ in flexibility, readability, and performance.

1. Subqueries (Nested Queries)

Definition: A subquery is a query nested inside another SQL query. It can appear in the SELECT, FROM, or WHERE clause.

Syntax Example:

SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE order_date >= '2024-01-01'
);        

Common Use Cases:

  • Filtering results using WHERE or HAVING clauses.
  • Calculating values within SELECT as scalar subqueries.
  • Aggregating or transforming data within a limited scope.

Types of Subqueries:

  • Scalar subquery: Returns a single value.
  • Non-correlated subquery: Independent of the outer query.
  • Correlated subquery: Refers to columns from the outer query.

Limitations:

  • Difficult to read when deeply nested.
  • Cannot be reused multiple times within the same query.
  • Correlated subqueries may lead to performance bottlenecks.
  • Not suitable for recursion or multi-step transformations.

2. Common Table Expressions (CTEs):

Definition: A CTE (Common Table Expression) is a named temporary result set that can be referenced within the main SQL query. It improves clarity and enables reuse of intermediate logic.

Syntax Example:

WITH recent_orders AS (
    SELECT customer_id
    FROM orders
    WHERE order_date >= '2024-01-01'
)
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM recent_orders);        

Common Use Cases:

  • Structuring complex queries into readable blocks.
  • Reusing derived result sets multiple times.
  • Enabling recursive queries (e.g., organizational hierarchy, graph traversal).
  • Improving maintainability for debugging and step-by-step logic testing.

Advantages:

  • Cleaner, more modular structure.
  • Improves readability for multi-step queries.
  • Can be referenced multiple times in a single query.
  • Supports recursion using WITH RECURSIVE.

Subquery vs CTE – Comparison Table

Article content
Fig. Subquery vs CTE – Comparison Table

Real-World Scenario:

Requirement: List all customers who placed orders in the last 30 days and also show how many total orders they placed.

Approach 1: Using Subqueries

SELECT c.customer_name,
       (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS total_orders
FROM customers c
WHERE c.customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);        

Approach 2: Using CTE

WITH recent_orders AS (
    SELECT customer_id, COUNT(*) AS total_orders
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT c.customer_name, ro.total_orders
FROM customers c
JOIN recent_orders ro ON c.customer_id = ro.customer_id;        

Why CTE is better here

  • Cleaner to read and debug
  • Reuses the result set (recent_orders)
  • Easier to maintain if logic changes later

Summary:

  • Use subqueries when dealing with simple, one-off filtering or scalar value derivation.
  • Use CTEs for multi-step logic, recursive problems, and whenever clarity or modularity is a priority.
  • CTEs enhance maintainability in enterprise-scale SQL scripts and BI dashboards.


Window Functions in SQL – Deep Dive

What is a Window Function?

A Window Function performs a calculation across a set of rows (a window) related to the current row. This is defined using the OVER() clause.

Unlike aggregate functions, window functions do not reduce the number of rows returned.

Syntax:

function_name(column) OVER (
    PARTITION BY column1
    ORDER BY column2
)

-- PARTITION BY: Divides the dataset into groups (optional).
-- RDER BY: Defines the order of rows within each partition (required for ranking functions).        

1. ROW_NUMBER():

Definition: Assigns a unique sequential number to each row within a partition, ordered by the specified column(s).

Syntax:

SELECT column1, column2,
       ROW_NUMBER() OVER (PARTITION BY columnA ORDER BY columnB) AS row_num
FROM table_name;
        

Example: Assign a row number to each employee within their department based on salary.

SELECT employee_id, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;        

Key Points:

  • Always unique within each partition.
  • Resets for each partition group.
  • Commonly used for pagination and top-N queries

2. RANK()

Definition: Assigns a rank to each row within a partition based on the ordering. Tied values receive the same rank, and the next rank(s) are skipped.

Syntax:

SELECT column1, column2,
       RANK() OVER (PARTITION BY columnA ORDER BY columnB) AS ranking
FROM table_name;        

Example: Rank employees by salary within departments.

SELECT employee_id, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;        

Key Points:

  • Equal values share the same rank.
  • Skips ranks for ties (e.g., 1, 2, 2, 4).
  • Useful for leaderboards or competition scoring.

3. LEAD()

Definition: Accesses the value from the next row in the result set, relative to the current row.

Syntax:

SELECT column1, column2,
       LEAD(column_to_look_ahead) OVER (ORDER BY columnB) AS next_value
FROM table_name;        

Example: Compare an employee’s salary to the one just below.

SELECT employee_id, salary,
       LEAD(salary) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;        

Key Points:

  • Returns NULL for the last row.
  • Helps with comparisons, forecasting, or trend tracking.

4. LAG()

Definition: Accesses the value from the previous row in the result set, relative to the current row.

Syntax:

SELECT column1, column2,
       LAG(column_to_look_behind) OVER (ORDER BY columnB) AS previous_value
FROM table_name;        

Example: Compare current and previous salary.

SELECT employee_id, salary,
       LAG(salary) OVER (ORDER BY salary DESC) AS previous_salary
FROM employees;        

Key Points:

  • Returns NULL for the first row.
  • Helps in detecting changes over time or differences between consecutive rows.

Summary Table:

Article content
Fig. Window Functions – Feature Comparison

UNION vs UNION ALL

Combining result sets from multiple SELECT queries is a common requirement in SQL. The UNION and UNION ALL operators serve this purpose — but with key differences in performance, behavior, and result handling.

UNION

Definition: UNION combines the result sets of two or more SELECT statements and removes duplicate rows from the final output.

Syntax:

SELECT column1, column2
FROM table1
UNION
SELECT column1, column2
FROM table2;        

Example: Get a unique list of customer emails from both online_customers and retail_customers.

SELECT email FROM online_customers
UNION
SELECT email FROM retail_customers;        

Key Points:

  • Automatically eliminates duplicates.
  • Performs a sort operation internally, which may affect performance.
  • Can be slower for large datasets.
  • Useful when distinct data is required from multiple sources.

UNION ALL

Definition: UNION ALL combines result sets from two or more SELECT statements and retains all rows, including duplicates.

Syntax:

SELECT column1, column2
FROM table1
UNION ALL
SELECT column1, column2
FROM table2;        

Example: Get all emails (including duplicates) from online_customers and retail_customers.

SELECT email FROM online_customers
UNION ALL
SELECT email FROM retail_customers;        

Key Points:

  • Does not remove duplicates — includes all rows as they are.
  • No sorting required → faster performance.
  • Preferred when duplicates are expected or acceptable.
  • Better for raw logs, transaction audits, or analytics where counts matter.

Use Case Comparison:

Article content
Fig. Use Case Comparison

UNION vs UNION ALL – Feature Comparison:

Article content
Fig. UNION vs UNION ALL – Feature Comparison

Real-Life Scenario Example

Scenario: You are analyzing all the customers who interacted via app or website.

Using UNION (for unique users):

SELECT customer_id FROM app_logins
UNION
SELECT customer_id FROM web_logins;        

Using UNION ALL (for total login activity count):

SELECT customer_id FROM app_logins
UNION ALL
SELECT customer_id FROM web_logins;        

Best Practices

  • Use UNION when you care about uniqueness.
  • Use UNION ALL for performance-critical applications or when duplicates are required or expected.
  • Always ensure column count and data types match across all SELECT statements.
  • Add ORDER BY after the final query to sort the overall result.


CASE Statements in SQL

Definition:

The CASE expression allows you to implement conditional logic in SQL — similar to IF-ELSE statements in programming. It is often used to transform data, derive new columns, or apply logic-based categorization within queries.

Syntax:

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE default_result
END        

Example 1: Categorizing Salary Levels

SELECT employee_id, salary,
       CASE
           WHEN salary < 40000 THEN 'Low'
           WHEN salary BETWEEN 40000 AND 70000 THEN 'Medium'
           ELSE 'High'
       END AS salary_level
FROM employees;        

Purpose: Classifies employees into salary bands based on their income.

Example 2: Flagging Order Status

SELECT order_id, delivery_date, CASE WHEN delivery_date IS NULL THEN 'Pending' WHEN delivery_date <= CURRENT_DATE THEN 'Delivered' ELSE 'In Transit' END AS status FROM orders;        

Purpose: Derives a custom column to show human-readable order status.

Types of CASE

1. Simple CASE

Compares a single expression to multiple values.

CASE department
    WHEN 'HR' THEN 'Human Resources'
    WHEN 'IT' THEN 'Technology'
    ELSE 'Other'
END        

2. Searched CASE

Evaluates multiple logical conditions (most common form).

CASE
    WHEN salary < 40000 THEN 'Low'
    WHEN salary BETWEEN 40000 AND 70000 THEN 'Medium'
    ELSE 'High'
END        

Use Cases

Article content

Example 3: Conditional Aggregation

SELECT department,
       COUNT(*) AS total_employees,
       SUM(CASE WHEN gender = 'Female' THEN 1 ELSE 0 END) AS female_count
FROM employees
GROUP BY department;        

Purpose: Aggregate conditionally to count only female employees per department.


Best Practices

  • Always use ELSE to handle unexpected or default cases.
  • CASE can be used in SELECT, WHERE, ORDER BY, and GROUP BY clauses.
  • For deeply nested logic, consider using a lookup table or CTE for maintainability.
  • Combine with NULL checks to create fallback logic.


✅ Enjoyed this Edition?

📩 Follow & Subscribe

To never miss an update, subscribe to DataTalks: SQL Edition for practical, real-world SQL learning — from basics to advanced.

🔗 Follow Ashish Zope 📬 Subscribe to this newsletter for more hands-on examples and insights.

To view or add a comment, sign in

Others also viewed

Explore topics