Understanding Causal Analysis

Understanding Causal Analysis


Article content

In data science, we often want to understand relationships between different factors. Most models can tell us what happened, but they can’t explain why it happened. This is why we need causal analysis, to figure out true cause-and-effect.

Causal analysis helps to understand true causes and effects from data. You've probably heard the phrase, "correlation does not imply causation."

Examples of correlation:

  • Website traffic went up after ads were run.
  • Employee productivity increased after flexible working hours started.

However, causal analysis tries to answer tougher questions:

  • How much did the ad campaign actually affect website traffic?
  • Did flexible working hours really boost productivity, or were there other reasons?

To answer these questions, we might first compare results before and after changes. But this might not be enough. What if many changes happened at once, or something unrelated caused the changes?

Causal Discovery vs. Causal Inference

Causal questions can be categorised into two types.

1. Causal Discovery

Here, the goal is to find out which factors cause an outcome.

Examples:

  • What factors cause to lower user retention?
  • Which marketing channels drive the most traffic?

This involves looking at data and creating causal graphs to show relationships. It helps us find real causal factors and creates ideas for experiments.

2. Causal Inference

This one focuses on measuring exactly how much one factor affects another.

Examples:

  • How much did the new feature improve user engagement?
  • What's the impact of a price change on sales?
  • Did a new marketing campaign increase customer retention?

Two Approaches to Causal Inference

A. Experimentation

Experiments involve changing one or more things intentionally and seeing what happens. A common experiment type is a Randomized Controlled Trial (RCT) or A/B test. This is considered the gold standard to measure true effects.

In an RCT, participants are randomly assigned to treatment (test) or control groups. Randomization ensures both groups are similar, so we can isolate the effect of the treatment.

B. Non-Experimental Methods

Sometimes experiments aren't feasible. Instead, we use methods like:

  • Propensity Score Matching
  • Difference-in-Differences
  • Bayesian Structural Time Series

These methods estimate causal effects using data we observe, but they depend on some assumptions.

How to Decide What Works

Choosing the right method can be tricky. It depends on:

  • The causal question you're asking
  • The data you have
  • The assumptions you're comfortable making

For example, if you want to find out what variables affect an outcome, start with causal discovery. Once you've identified potential causes, define your causal inference hypothesis clearly. For instance, if users adopt feature X, how much might user engagement increase?

If we can run an experiment, we should always experiment. If not, use non-experimental methods. If these require defining extra assumptions, go back and refine your causal discovery approach.

By using the causal questions and methods properly, we can gain useful insights and make better decisions.

To view or add a comment, sign in

Others also viewed

Explore topics