🚀 SQL Challenge: Month-Over-Month Revenue Growth! 📈

🚀 SQL Challenge: Month-Over-Month Revenue Growth! 📈

Who’s Ready to Level Up Their SQL Game?

Hey LinkedIn Network! 👋

If you’ve ever struggled with SQL problems that test your ability to analyse trends over time, this article is for you. Today, we’re dissecting a real-world SQL challenge that’s a favourite in data interviews and business analytics roles: calculating month-over-month revenue growth.

But this isn’t just another coding tutorial. It’s about:

🔹 Thinking like a data pro – mastering date manipulations and window functions.

🔹 Writing interview-ready SQL – clean, optimised code that stands out to hiring managers.

🔹 Solving problems that matter – skills you’ll use in financial reporting, product analytics, and more.

Why This Will Transform Your SQL Skills

Short posts are great for quick wins, but career growth demands deep, practical learning. By the end of this guide, you’ll:

✅ Solve complex SQL problems faster – Learn how to structure queries for scalability.

✅ Avoid rookie mistakes – like mishandling edge cases or writing unreadable queries.

✅ Interview-ready techniques – Discover how to explain your logic clearly (a skill recruiters love!).

Let’s Collaborate!

This isn’t a lecture—it’s a conversation. As you read:

💬 Drop a comment with your solution or a question—let’s learn together! Compare your approach with peers and refine your strategy.

🔗 Tag a peer – Who else would benefit from this breakdown?

🚀 Share your “aha!” moment – did you spot an optimisation I missed?

Ready to dive in? Let’s go! ⬇️


What’s the SQL Challenge?

Objective: Calculate the month-over-month (MoM) percentage change in revenue using transaction data from the sf_transactions table.

Key Requirements:

  1. Data Aggregation: Group transactions by month using created_at (datetime). Sum value (bigint) to compute monthly revenue.
  2. MoM Calculation: Apply the formula ((Current Month Revenue - Previous Month Revenue) / Previous Month Revenue) 100 *****Round results to 2 decimal places.
  3. Output Format: Columns: year_month (YYYY-MM), revenue_change_pct.Sort chronologically. Exclude the first month (no prior data for comparison).

Breakdown of the sf_transactions table structure:

Pro Tip: Always validate assumptions about keys and relationships with stakeholders or documentation!


Why Is This Problem Tricky?

  • Time-Series Logic: Requires comparing data across adjacent months, not just simple aggregation.
  • Edge Cases: Handling the first month’s NULL value and ensuring correct sorting.
  • Efficiency: Avoiding redundant calculations (e.g., computing the previous month’s revenue twice).


How Does the SQL Query Evolve?

Below is a step-by-step walkthrough of the SQL solution to calculate the month-over-month percentage change in revenue.

Step 1: Explore Raw Data

Query:

SELECT id, created_at, value, purchase_id
FROM sf_transactions
LIMIT 10;        

Purpose: View raw transactional data.

Sample Output:

Article content

Step 2: Extract Year-Month

Query:

SELECT 
  DATE_FORMAT(created_at, '%Y-%m') AS yr_mo,
  value 
FROM sf_transactions;        

Purpose: Convert created_at to a YYYY-MM format.

Sample Output:

Article content

Step 3: Aggregate Monthly Revenue

Query:

SELECT 
  DATE_FORMAT(created_at, '%Y-%m') AS yr_mo,
  SUM(value) AS total_revenue 
FROM sf_transactions 
GROUP BY yr_mo;
        

Purpose: Calculate total revenue per month.

Sample Output:

Article content

Step 4: Fetch Previous Month’s Revenue

Query:

WITH monthly_revenue AS (
  SELECT
    DATE_FORMAT(created_at, '%Y-%m') AS yr_mo,
    SUM(value) AS total_revenue,
    LAG(SUM(value)) OVER (ORDER BY DATE_FORMAT(created_at, '%Y-%m')) AS prev_month_revenue
  FROM sf_transactions
  GROUP BY yr_mo
)
SELECT * FROM monthly_revenue;
        

Purpose: Compare each month’s revenue to the prior month.

Sample Output:


Article content

Step 5: Calculate Percentage Change

Query:

WITH monthly_revenue AS (
  SELECT
    DATE_FORMAT(created_at, '%Y-%m') AS yr_mo,
    SUM(value) AS total_revenue,
		 LAG(SUM(value)) 
		    OVER (ORDER BY DATE_FORMAT(created_at, '%Y-%m')) AS prev_month_revenue
  FROM sf_transactions
  GROUP BY yr_mo
)

SELECT
  yr_mo,
  ROUND(
    (total_revenue - prev_month_revenue) 
    / NULLIF(prev_month_revenue, 0) * 100, 
    2
  ) AS revenue_change_pct
FROM monthly_revenue
ORDER BY yr_mo;
        

Purpose: Final result with MoM % change.

Sample Output:

Article content

Keyword Explanations with Syntax, Tips, and Examples


1. DATE_FORMAT()

Purpose: Converts a date/time value into a specified string format. Syntax:

DATE_FORMAT(date_column, format_string)          

Parameters:

  • date_column: The datetime column (e.g., created_at).
  • format_string: Specifies the output format using

format specifiers:

  • %Y: 4-digit year (e.g., 2023).
  • %m: 2-digit month (01-12).
  • %d: 2-digit day (01-31).
  • %H: Hour (00-23).
  • %i: Minutes (00-59).

Example:

SELECT  
  created_at,  
  DATE_FORMAT(created_at, '%Y-%m') AS yr_mo  
FROM sf_transactions;          

Tips:

  • Use %Y-%m for monthly grouping.
  • Avoid ambiguous formats like %y (2-digit year) or %c (1-12 months).
  • Always validate date formats against your database’s locale settings.
  • Use STR_TO_DATE() to reverse the process (string → datetime).

SELECT STR_TO_DATE('2023-Jan', '%Y-%b'); -- Returns 2023-01-01        

  • Cross-Database: Use TO_CHAR() in PostgreSQL and FORMAT() in SQL Server.


2. CTE (Common Table Expression)

Purpose: Creates a temporary named result set for use within a query. Syntax:

WITH cte_name AS (  
  SELECT ... FROM ...  
)  
SELECT * FROM cte_name;        

Tips:

  • Use CTEs to break complex queries into modular, readable parts.
  • CTEs can reference other CTEs (e.g., WITH cte1 AS (...), cte2 AS (...)).

WITH cte1 AS (...), cte2 AS (SELECT * FROM cte1) ...        

  • Avoid Overuse: Subqueries may be more efficient for simple queries.
  • Replace nested subqueries with CTEs for readability.
  • Use recursive CTEs for hierarchical data (e.g., organisational charts).


3. GROUP BY

Purpose: Aggregates rows into summary rows based on shared column values. Syntax:

SELECT aggregate_function(column), grouped_column  
FROM table  
GROUP BY grouped_column;          

Example:

SELECT  
  DATE_FORMAT(created_at, '%Y-%m') AS yr_mo,  
  COUNT(id) AS transactions,  
  SUM(value) AS total_revenue  
FROM sf_transactions  
GROUP BY yr_mo;        

Tips:

  • Every non-aggregated column in SELECT must be in GROUP BY.
  • Use HAVING to filter aggregated results (e.g., HAVING total_revenue > 1000).

GROUP BY yr_mo
HAVING total_revenue > 1000;
        

  • Performance: Index columns used in GROUP BY (e.g., created_at).
  • Use WITH ROLLUP for subtotals (e.g., GROUP BY yr_mo WITH ROLLUP).
  • For large datasets, pre-aggregate data in staging tables.


4. LAG()

Purpose: Accesses data from a previous row in the result set (window function). Syntax:

LAG(column, offset, default_value) OVER (  
  [PARTITION BY partition_column]  
  ORDER BY order_column  
)        

Parameters:

  • Column: The column to fetch from the prior row.
  • Offset: Number of rows back (default = 1).
  • default_value: Value returned if no prior row exists (default = NULL).
  • PARTITION BY: Divides data into groups (e.g., by region).
  • ORDER BY: Defines the row order within partitions.

Example:

SELECT  
  yr_mo,  
  total_revenue,  
  LAG(total_revenue, 1, 0) OVER (ORDER BY yr_mo) AS prev_month  
FROM monthly_revenue;        

Tips:1

  • Use COALESCE() or NULLIF() to handle NULL values from LAG().
  • Use LEAD() for future comparisons:
  • Handle NULLs gracefully:
  • Performance: Window functions are efficient but avoid over-partitioning.
  • Always include ORDER BY in the OVER() clause.
  • Use PARTITION BY to reset comparisons (e.g., MoM changes per region).

LAG() vs LEAD() with Offset=3: Unified Table & Explanation

LAG(column_name, 3):

  • "What is the value 3 rows before?": It's about looking backwards in the ordered result set.
  • "Results Start with 3 NULLs (Think because of lag or delay in starting), then start showing the values":
  • "Think LAst Gone": Looks Backward - Imagine the original column is dragged downward by 3 rows.
  • The bottom 3 floors of a building sank underground – so the bottom 3 rows are missing from ****the result table.

LEAD(column_name, 3):

  • "What is the value 3 rows ahead?": It's about looking forward to the ordered result set.
  • "Results End with 3 NULLs (because it started early or ahead)":
  • "Think Looks ahEAD": Looks Forward. - Imagine the original column is pulled upward by 3 rows.
  • Eat the first 3 layers of Dairy Milk chocolate, so the 4th layer becomes the first!

Article content

Key Analogies

  1. LAG(3) = Building Floors Sinking: The bottom 3 floors vanish underground (rows 1-3). When standing on the 4th floor, looking 3 floors down lands you on the (now-invisible) 1st floor, i.e., the 4th floor becomes the 1st floor. (Think column dragged downwards similar to building sink)
  2. LEAD(3) = Chocolate Layers Eaten: The first 3 layers are eaten. When starting at the 1st layer, looking 3 layers ahead lands you on the 4th (now the "first" edible layer). If you ate the first 3 layers, then the first layer is now 4th layer. (Think column dragged upwards similar to the diary milk chocolate after eating next layer become the first layer so on.

Why This Works

  • LAG(3): Backward-looking: NULLs appear at the top (first 3 rows) because prior data is missing. Example: LAG(3) for row 4 = row 1’s value.
  • LEAD(3): Forward-looking: NULLs appear at the bottom (last 3 rows) because future data is missing. Example: LEAD(3) for row 3 = row 6’s value (doesn’t exist → NULL).

Key Takeaway

LAG/LEAD offsets act like time travel relative to the current row:

  • LAG(3) = "What was the value 3 steps ago?" (NULL if history is too short).
  • LEAD(3) = "What will the value be 3 steps later?" (NULL if future is undefined).


What’s Your Approach to This Problem?

Take the Challenge: Rebuild the query using a date spine to include months with no transactions (treat missing values as zero). Remember, failing to handle division by zero (use NULLIF) or to account for missing months can break your analysis.

Share Your Solution: Post your optimized query in the comments—we’ll highlight the most efficient approaches!

Collaborate: Tag a data engineer or analyst who might find this useful.

Follow for More:

🔔 Follow me on LinkedIn for weekly SQL deep dives and insights.

🚀 Ready for the next challenge? Let’s continue to conquer SQL together!

To view or add a comment, sign in

Others also viewed

Explore topics