Causality in Python PyCon 2021 ISRAEL

Causal Python
PyCon Israel 2021
Dr. Hanan Shteingart

Summary
Data +
Assumptions
Causal
Inference
Better
Decisions

Misson Impossible
A typical causal workshop… PyCon 2021
45 min!
https://github.com/amit-sharma/causal-inference-tutorial

Agenda
MOTIVATION THEORY PRACTISE

Motivation
The need to go beyond predictions

Most Important Business Question?
What ACTIONS should I take to maximize my KPIs?
ACTIONS
BUSINESS INTERVENTIONS
KPI(S)
OUTCOME(S) YOU CARE ABOUT.

Three Layers of Analytics
1. There are three types of analytic
questions
2. What business need is better
decisions (not better predictions)
3. “There is a gap between making
a prediction and making a
decision” - S. Athey 2017, Science.
Griffin, D. K. (2020).
Athey, S. (2017).
Bertsimas, D., & Kallus, N. (2020).

Prescriptive is Neglected
• Prescriptive methods seem to
be neglected
• What is the effect of doing an
action A?
• What is the optimal policy π to
maximize the KPI(s)?
https://www.kaggle.com/kaggle-survey-2020

Most $$$ in AI will be in 2 areas!
• Two main target markets:
• Marketing & Sales
• Supply-chain management and
manufacturing
• What’s common?
• Increase some KPI – 𝑅
• by doing some actions – 𝐴
• In some context - 𝑆
• Under some constraints - 𝐶
• Causal Inference and
Reinforcement Learning!
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/most-of-ais-business-uses-will-be-in-two-areas

Don’t believe me 2
"causal inference
helps us provide a
better user
experience for
customers on
the Uber platform
"
"we rely on quasi-
experiments and
causal inference
methods, especially to
measure new
marketing and
advertising ideas."
"we analyze
marketing
campaigns and
the impact of app
preloads using a
fourth type of
observational
study format."
"figuring out
whether booking
an attraction
ticket increases
long term user
engagement"
"Leveraging a
market-level
approach to
measure landing
page effectiveness
on Airbnb"
(Difference in
difference)

Summary
𝑃 𝑋
𝑃 𝑌|𝑋
𝑃 𝑌|𝑋, 𝑑𝑜 𝑇
−

Why is Causal Infernece Different then
Supervised Learning?

Causal Inference Main Concepts
Direceted Acyclic Causal
Graph (DACG) – truth
about what causes what
Causal Discovery
Find the causal dependce
given a dataset (otherwise
you need an expert)
𝑇 → 𝑌
Potential Outcomes
What would had happened if
(only) the treatment was set
to 𝑇
𝑌𝑡
= 𝑌 𝑑𝑜 𝑇 = 𝑡
Causal Inference
Find the average
treatment effect (ATE):
𝜏𝐴𝑇𝐸 = 𝐸(𝑌1
) − 𝐸(𝑌0
)
CATE/ITE/THE
What is the effect per unit?
𝜏ITE(X) = 𝐸 𝑌1
𝑋 − 𝐸(𝑌0
|𝑋)
Policy Evaluation
What is the value of policy
𝜋 𝑋 = Pr 𝑇 𝑋 ?
PE = 𝐸𝑇~𝜋 𝑋 (𝑌)
Policy Optimization
What is the best policy?
PO
= argmax
𝜋
𝐸𝑇~𝜋 𝑋 (𝑌)

What is the Fundumental Problem?
• Counterfactual is a missing data problem
• Play make belief with potential outcomes
https://www.bradyneal.com/causal-inference-course
𝒊 𝑻 𝒀 𝒀𝟏 𝒀𝟎 𝝉 = 𝒀𝟏 − 𝒀𝟎
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 1 ? 1 ?
5 0 1 ? 1 ?
treatment outcome potential outcomes Individual treatment effect

Quiz
• Exercise is known to reduce Cholesterol level
• You collected a medical dataset and plotted these variables against
each other
• What can explain this?

Confounders Create Bias in Effect Estimation
• Age is a confounder which effects both the treatment (Exercise) and
the outcome (Cholesterol)
• This creates a bias!
https://towardsdatascience.com/implementing-causal-inference-a-key-step-towards-agi-de2cde8ea599
Y
T
X
Y
T
X

Beyond Confounders
Red lines should not be accounted for.
Lederer et al., 2019

Identifiability
• The ability to estimate causal effect from observed data.
• If the following assumptions hold, then
𝐸 𝑌𝑎 𝑋 = 𝑥 = 𝐸(𝑌|𝐴 = 𝑎, 𝑋 = 𝑥)
𝐸 𝑌𝑎 = 𝐸𝑥(𝐸 𝑌𝑎 𝑋 )
1. Stable Unit Treatment Value Assumption (SUTVA)
for 𝑖 ≠ 𝑗: 𝐴𝑖 ⊥ 𝐴𝑗 and 𝑌𝑖 ⊥ 𝐴𝑗
1. Consistency
𝐴 = 𝑎 → 𝑌 = 𝑌𝑎
∀𝑎
2. Ignorability
𝑌0
, 𝑌1
⊥ 𝐴|𝑋
3. Positivity
𝑃 𝐴 = 𝑎 𝑋 = 𝑥 > 0 ∀𝑎, 𝑥
Not to be confused with
the law of total expectation

Quiz: which assumption is this?

What to control for (what is 𝑋)?
When the causal DAG is complicated Do-
calculus (Pearl) helps to do identification
• Input: DAG + Data
• Output: identification (a recipe of how
to estimate the effect)
Sacerdote, et al International journal of epidemiology 2012

Estimation Methods
1. Stratification – aggregate over stratas
If 𝐸 𝑌 𝐴 = 𝑎, 𝑋 = 𝑥 = 𝐸 𝑌𝑎 𝑋 = 𝑥 , then:
𝐸 𝑌𝑎
= ∑𝑃 𝑋 𝐸(𝑌|𝐴 = 𝑎, 𝑋 = 𝑥)

Estimation Methods
𝐸 𝑌𝑎
= ∑𝑃 𝑋 𝐸(𝑌|𝐴 = 𝑎, 𝑋 = 𝑥)
2. Matching – find “tweens” in high dim

Propensity Score
• Define 𝐴 = 1 for treatment and 𝐴 = 0 for control, we will denote the
propensity score for subject 𝑖 by
𝜋𝑖 = Pr(𝐴 = 1|𝑋𝑖)
• propensity is a “balancing score”: meaning if we control/match for it,
we will get unbiased effect estimation
𝑃 𝑋 𝜋 𝑋 = 𝑝, 𝐴 = 1 = 𝑃 𝑋 𝜋 𝑋 = 𝑝, 𝐴 = 0

Estimation Methods
𝐸 𝑌𝑎
= ∑𝑃 𝑋 𝐸(𝑌|𝐴 = 𝑎, 𝑋 = 𝑥)
2. Matching – find “tweens” in high dimension
3. Propensity Matching – find tweens in one dimension

Inverse Propensity Weighting
• 𝜋𝑖 = Pr(𝐴𝑖|𝑋 = 𝑥𝑖)
• 𝐴𝑇𝐸 = 𝐸 𝑌1 − 𝑌0 = ∑𝑌𝑖
𝐴𝑖−𝜋𝑖
𝜋𝑖 1−𝜋𝑖
It can be shown that IPTW and
standartization are equivalent
(Technical Point 2.3, see Appendix)

Estimation Methods
𝐸 𝑌𝑎
= ∑𝑃 𝑋 𝐸(𝑌|𝐴 = 𝑎, 𝑋 = 𝑥)
2. Matching – find “tweens” in high dimension
3. Propensity Matching – find tweens in one dimension
4. IPTW - Inverse Propensity Treatment Weighting

Better Predictions ↛ Better Effect Estimation
Which model is more accurate?
Model A is more outcome accurate
Model B is more causal accurate
• “The effect of ads is positive and in small
companies it is twice the effect on large one” 1
0.5
0.25
-1
1
0.5
-1.5
-1
-0.5
0
0.5
1
1.5
small large
Uplift
in
CTR
Effect
True Effect Model A Model B
1
2
5
5.5
1.5 1.75
5.5
4.5
-0.5
0.5
3
3.5
-2
0
2
4
6
small small large large
untreated treated untreated treated
CTR
Outcome
true ROI (unknown) Model A Model B

Refutation (aka Model Validation)
Placebo Treatment
• Replace treatment with
a random variable.
Irrelevant Additional
Confounder
• Add a random common
cause variable.
Subset validation
• Remove a random
subset of the data.
Random Replace
• Random replace a
covariate with an
irrelevant variable.
Selection Bias
• Blackwell, 2013

Application example: Uplift Modeling
• E.g., Instead of predicting who will churn  predict who is most likely
to reduce churn due to treatment
• Steps:
1. Estimate CATE
2. Rank users according to expected effect size
3. The more you target the lower would be the marginal performance
(diminishing return)
• See CausalML

Summary – Supervised vs Causal Learning
Supervised Learning Causal Inference
Predicts outcome 𝑃(𝑌|𝑋) effect of change 𝑃(𝑌|𝑑𝑜 𝑋)
Assumption Passive observer Decision maker
Train-Test Equailly distributed Distribution shift
Validation Easy, via hold-out Fundamental challenge.
Better prediction is NOT better causal
estimation
Feature set Quantitative (over fit / under fit) Qualitative – could cause a bias in the
estimate
Domain
Knoweledge
Nice to have, deep neural network are
doing beyond humans without
Essential to make assumptions to avoid
pitfalls
For Who?

Causality in Python https://vesoft-inc.github.io/github-statistics/
Like Scikit 10 years ago

Typical Stages in a Causal Project
1. Model – assumptions as a graph
(DAG)
If this is missing you can try causal
discovery methods
2. Identify – turn assumptions into
a list of what to control for
3. Estimate – use estimation
methods to estimate the effect
4. Refute – validate and check for
robustness

Causality in Python PyCon 2021 ISRAEL

More Related Content

What's hot (7)

Similar to Causality in Python PyCon 2021 ISRAEL (20)

Recently uploaded (20)

Causality in Python PyCon 2021 ISRAEL

Editor's Notes