Understanding Causal Analysis
In data science, we often want to understand relationships between different factors. Most models can tell us what happened, but they can’t explain why it happened. This is why we need causal analysis, to figure out true cause-and-effect.
Causal analysis helps to understand true causes and effects from data. You've probably heard the phrase, "correlation does not imply causation."
Examples of correlation:
However, causal analysis tries to answer tougher questions:
To answer these questions, we might first compare results before and after changes. But this might not be enough. What if many changes happened at once, or something unrelated caused the changes?
Causal Discovery vs. Causal Inference
Causal questions can be categorised into two types.
1. Causal Discovery
Here, the goal is to find out which factors cause an outcome.
Examples:
This involves looking at data and creating causal graphs to show relationships. It helps us find real causal factors and creates ideas for experiments.
2. Causal Inference
This one focuses on measuring exactly how much one factor affects another.
Examples:
Two Approaches to Causal Inference
A. Experimentation
Experiments involve changing one or more things intentionally and seeing what happens. A common experiment type is a Randomized Controlled Trial (RCT) or A/B test. This is considered the gold standard to measure true effects.
In an RCT, participants are randomly assigned to treatment (test) or control groups. Randomization ensures both groups are similar, so we can isolate the effect of the treatment.
B. Non-Experimental Methods
Sometimes experiments aren't feasible. Instead, we use methods like:
These methods estimate causal effects using data we observe, but they depend on some assumptions.
How to Decide What Works
Choosing the right method can be tricky. It depends on:
For example, if you want to find out what variables affect an outcome, start with causal discovery. Once you've identified potential causes, define your causal inference hypothesis clearly. For instance, if users adopt feature X, how much might user engagement increase?
If we can run an experiment, we should always experiment. If not, use non-experimental methods. If these require defining extra assumptions, go back and refine your causal discovery approach.
By using the causal questions and methods properly, we can gain useful insights and make better decisions.