A Beginner's Guide to Subqueries and CTEs

A Beginner's Guide to Subqueries and CTEs

SQL is a dynamic language which plays an important role in the field of data science. From data engineers to data analysts, professionals rely on SQL to perform critical data operations and analysis.

It offers powerful ways to manipulate and query data, and two essential tools for structuring complex queries are Common Table Expressions (CTEs) and Subqueries. Both allow you to write more readable and organized code, but they serve specific purposes, and knowing when to use each can enhance your SQL skills.

In this article, we will explore the differences between CTEs and subqueries, when to use each, and their advantages and disadvantages. By understanding their differences and use cases, you can choose the right tool to optimize your queries.


What is a subquery?

A sub query as the name suggests, is a query within another query. The output of a subquery is used as an input of the main query. In other words, it can also be called as nested query.

Subqueries are helpful when performing operations that rely on the result of another query. This makes them ideal for tasks like filtering data, calculating aggregates etc.

Subqueries can be utilized in different parts of a query depending on the task at hand. They are commonly used in clauses like WHERE, HAVING, or FROM, enabling developers to break down complex operations into smaller, more manageable parts.

Subqueries can return a single value, a list or even a table as an output.

1. 𝗦̲𝘂̲𝗯̲𝗾̲𝘂̲𝗲̲𝗿̲𝗶̲𝗲̲𝘀̲ ̲𝘁̲𝗵̲𝗮̲𝘁̲ ̲𝗿̲𝗲̲𝘁̲𝘂̲𝗿̲𝗻̲ ̲𝗮̲ ̲𝘀̲𝗶̲𝗻̲𝗴̲𝗹̲𝗲̲ ̲𝘃̲𝗮̲𝗹̲𝘂̲𝗲̲

A subquery that return a single value can either be one row or one column. This value can be used in SELECT, WHERE, or HAVING clauses.

For example, to find the employee whose salary is the highest, we can write the query as:

SELECT name 
FROM employees  
WHERE salary = MAX(salary);        

However, this will give an error because aggregate functions like MAX() cannot be used directly in the WHERE clause. To achieve the desired result, we can rewrite the query using a subquery:

SELECT name 
FROM employees 
WHERE salary = (SELECT MAX(salary) FROM employees);         

The subquery "(SELECT MAX(salary) FROM employees)" returns the highest salary from the employees table. The outer query selects the name of the employee whose salary matches this value.

2. 𝗦̲𝘂̲𝗯̲𝗾̲𝘂̲𝗲̲𝗿̲𝗶̲𝗲̲𝘀̲ ̲𝘁̲𝗵̲𝗮̲𝘁̲ ̲𝗿̲𝗲̲𝘁̲𝘂̲𝗿̲𝗻̲ ̲𝗮̲ ̲𝗹̲𝗶̲𝘀̲𝘁̲ ̲𝗼̲𝗳̲ ̲𝘃̲𝗮̲𝗹̲𝘂̲𝗲̲𝘀̲

A subquery that returns a list of values is called as "column subquery". Such subquery is often used with IN, NOT IN, or ANY operators.

For example, to find the employees who work in departments located in New York, we can write the sub query as:

SELECT name 
FROM employees 
WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York');        

The subquery (SELECT id FROM departments WHERE location = 'New York') returns a list of department IDs located in New York. The outer query selects employee names where their department_id matches one from the list of IDs.

3. 𝗦̲𝘂̲𝗯̲𝗾̲𝘂̲𝗲̲𝗿̲𝗶̲𝗲̲𝘀̲ ̲𝘁̲𝗵̲𝗮̲𝘁̲ ̲𝗿̲𝗲̲𝘁̲𝘂̲𝗿̲𝗻̲ ̲𝗮̲ ̲𝘁̲𝗮̲𝗯̲𝗹̲𝗲̲

A subquery that returns an entire table (multiple rows and columns) can be used in the FROM clause, acting as a temporary table for the main query.

SELECT * 
FROM 
  (SELECT name, salary, department_id 
   FROM employees 
   WHERE salary > 50000) AS high_earners 
WHERE department_id = 101;        

The subquery (SELECT name, salary, department_id FROM employees WHERE salary > 50000) acts as a temporary table that contains details of employees earning more than 50,000. The outer query retrieves only the employees from this temporary table - high_earners who belong to department 101.

4. 𝗖̲𝗼̲𝗿̲𝗿̲𝗲̲𝗹̲𝗮̲𝘁̲𝗲̲𝗱̲ ̲𝗦̲𝘂̲𝗯̲𝗾̲𝘂̲𝗲̲𝗿̲𝘆̲

A correlated subquery is a subquery that depends on the outer query for its values. It is re-evaluated for each row in the outer query.

For example, the below query includes correlated subquery.

For each employee in the employees table, the subquery calculates the average salary of employees in the same department_id. The outer query then compares each employee's salary to the average salary of their department and selects those whose salary is higher.

SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
)
ORDER BY e.employee_id;        

What is a CTE?

CTE stands for Common Table Expression. It creates a temporary table that is used only within that query. It can be written using WITH and AS clauses.

Basic syntax:

WITH CTE_name AS (             ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀-- defining the CTE
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition...
)
SELECT * FROM CTE_name...;⠀⠀⠀⠀⠀⠀⠀ ⠀⠀⠀⠀⠀⠀-- main query using CTE        

Now, to find the employees whose salary is greater than 50,000 and have department_id as 101, we can use the subqueries and write the query as:

SELECT * 
FROM 
  (SELECT name, salary, department_id 
   FROM employees 
   WHERE salary > 50000) AS high_earners 
WHERE department_id = 101;        

But this looks a bit messy and it takes time to figure out what are we trying to find by executing this query. But when written using CTEs, the query becomes simpler and easier to follow!

The same query using CTEs can be written as follows:

WITH high_earners AS (
    SELECT name, salary, department_id
    FROM employees
    WHERE salary > 50000
)
SELECT *
FROM high_earners
WHERE department_id = 101;
        

The CTE high_earners as defined using WITH and AS clauses selects all employees who have a salary greater than 50,000. The main query then selects all records from the high_earners CTE where the department_id is 101.

Doesn't that look much cleaner and easier to follow?


Subqueries vs. CTEs: When to Use What?

𝚂̲𝚞̲𝚋̲𝚚̲𝚞̲𝚎̲𝚛̲𝚒̲𝚎̲𝚜̲

  • When you need to perform a quick, one-time operation within a specific part of a query, such as filtering or calculating aggregate values.
  • Useful for simple scenarios where the result is only needed once.
  • Preferred when the query is simple, and readability is not a concern.

𝙲̲𝚃̲𝙴̲𝚜̲ (̲𝙲̲𝚘̲𝚖̲𝚖̲𝚘̲𝚗̲ ̲𝚃̲𝚊̲𝚋̲𝚕̲𝚎̲ ̲𝙴̲𝚡̲𝚙̲𝚛̲𝚎̲𝚜̲𝚜̲𝚒̲𝚘̲𝚗̲𝚜̲)̲

  • When dealing with more complex queries.
  • When you need to reuse the result of a query multiple times in the main query.
  • To improve the readability and organization of queries, making them easier to follow and maintain.
  • When working with multiple joins or nested queries to make the query easier to understand.
  • To reduce redundancy and improving performance as the result can be used multiple times within the same query.

Therefore, the choice of whether to use a subquery or a CTE depends on the complexity and requirements of the query.


Conclusion

Subqueries and CTEs are powerful SQL techniques essential for efficient data analysis. Gaining a solid understanding of these concepts will enhance your ability to write more effective and optimized queries. Now that you know the differences, try implementing both in your own SQL queries and see which works best for you!

I hope this guide helped clarify the differences between subqueries and CTEs!

Thanks for reading and Happy querying!!!

To view or add a comment, sign in

Others also viewed

Explore topics