Understanding Causal Analysis

Charuka Gunawardhane

Data Scientist at OCTAVE | Microsoft Certified: Azure Data Scientist Associate | Solving complex business problems & building data-driven, scalable systems. And creating impactful, user-friendly applications.

Published May 13, 2025

In data science, we often want to understand relationships between different factors. Most models can tell us what happened, but they can’t explain why it happened. This is why we need causal analysis, to figure out true cause-and-effect.

Causal analysis helps to understand true causes and effects from data. You've probably heard the phrase, "correlation does not imply causation."

Examples of correlation:

Website traffic went up after ads were run.
Employee productivity increased after flexible working hours started.

However, causal analysis tries to answer tougher questions:

How much did the ad campaign actually affect website traffic?
Did flexible working hours really boost productivity, or were there other reasons?

To answer these questions, we might first compare results before and after changes. But this might not be enough. What if many changes happened at once, or something unrelated caused the changes?

Causal Discovery vs. Causal Inference

Causal questions can be categorised into two types.

1. Causal Discovery

Here, the goal is to find out which factors cause an outcome.

Examples:

What factors cause to lower user retention?
Which marketing channels drive the most traffic?

This involves looking at data and creating causal graphs to show relationships. It helps us find real causal factors and creates ideas for experiments.

2. Causal Inference

This one focuses on measuring exactly how much one factor affects another.

Examples:

How much did the new feature improve user engagement?
What's the impact of a price change on sales?
Did a new marketing campaign increase customer retention?

Two Approaches to Causal Inference

A. Experimentation

Experiments involve changing one or more things intentionally and seeing what happens. A common experiment type is a Randomized Controlled Trial (RCT) or A/B test. This is considered the gold standard to measure true effects.

In an RCT, participants are randomly assigned to treatment (test) or control groups. Randomization ensures both groups are similar, so we can isolate the effect of the treatment.

B. Non-Experimental Methods

Sometimes experiments aren't feasible. Instead, we use methods like:

Propensity Score Matching
Difference-in-Differences
Bayesian Structural Time Series

These methods estimate causal effects using data we observe, but they depend on some assumptions.

How to Decide What Works

Choosing the right method can be tricky. It depends on:

The causal question you're asking
The data you have
The assumptions you're comfortable making

For example, if you want to find out what variables affect an outcome, start with causal discovery. Once you've identified potential causes, define your causal inference hypothesis clearly. For instance, if users adopt feature X, how much might user engagement increase?

If we can run an experiment, we should always experiment. If not, use non-experimental methods. If these require defining extra assumptions, go back and refine your causal discovery approach.

By using the causal questions and methods properly, we can gain useful insights and make better decisions.

Understanding Causal Analysis

Charuka Gunawardhane

Data Scientist at OCTAVE | Microsoft Certified: Azure Data Scientist Associate | Solving complex business problems & building data-driven, scalable systems. And creating impactful, user-friendly applications.

Causal Discovery vs. Causal Inference

1. Causal Discovery

2. Causal Inference

Two Approaches to Causal Inference

A. Experimentation

B. Non-Experimental Methods

How to Decide What Works

Others also viewed

Insights on Data Science for Strategy

Understanding the Common Ground Between Linear and Logistic Regression in Data Science

Graph use-case archetypes

Statistics in Data Science: From Analysis to Decision Making and Beyond

Your intuitive guide to interpret SHAP's beeswarm plot

From Raw Data to Robust Models: A Semester in Practical Data Science

Mastering Statistical Inference: Unlocking the Potential of Sampling Distributions

My Experiential Insights on "Top 10 Skills" needed for a "Data Scientist" in an IT Engagement

Understanding Data Science and Its Workflow

Practical Data Science: Less Theory, More Impact

Explore topics