SlideShare a Scribd company logo
A whirlwind tour of Causal Inference
The Path From Cause To Effect
By
Viswanath Gangavaram
Senior Data Scientist
Topics
• Introduction to Causal Inference
• Simpson Paradox: Let’s understand the damage done by Lurking Variables (Selection Bias at play)
• Ceteris Paribus: Other Things Equal Comparison
• Rubin Causal Model: Potential Outcomes, Observed Outcomes, Counterfactuals
• Average Treatment Effects ( ATE )
• Five important theorems from the world of propensity score theory
• IPTW: Slaying the Lurking Variable
• Doubly Robust Estimation
• Instrumental Variables: 2SLS, Compiler Average Causal Effect
• Heterogeneous Treatment Effects
• One Model Approach (OMA), Two Model Approach (TMA),
• Transformed Outcome Approach
• Applications of Causal Inference
• Counterfactual Inference framework for Learning To Rank
• Churn reason through Counterfactual Inference framework
• Causal Explanations on Quasi-Experiments
Simpson Paradox
Y=1 Y=0 Row Sums Success Rate
T=1 350 3,650 4,000 0.088
T=0 500 6,500 7,000 0.071
Col Sums 850 10,150 11,000
𝐸 𝑌1
| 𝑇 = 1
𝐸 𝑌0|𝑇 = 0
( 𝐸 𝑌1
| 𝑇 = 1 − 𝐸 𝑌0
| 𝑇 = 0 ) = 0.017
X=1
Y=1 Y=0 Row Sums Success Rate
T=1 300 2,700 3,000 0.100
T=0 300 2,700 3,000 0.100
Col Sums 600 5,400 6,000
X=0
Y=1 Y=0 Row Sums Succes Rate
T=1 50 950 1,000 0.050
T=0 200 3,800 4,000 0.050
Col Sums 250 4,750 5,000
• Standardization: Stratification followed by Normalization
𝐸 𝑌1
| 𝑇 = 1 =
(6000/11000)*0.1 + ( 5000/1000)*0.05 = 0.077
𝐸 𝑌0
|𝑇 = 0 =
(6000/11000)*0.1 + ( 5000/1000)*0.05 = 0.077
In future slides, we will resolve Simpson Paradox with Inverse Propensity Treatment Weighting(IPTW) a Causal
Inference Technique
Standardization
Pitfalls:
• Stratification
• Some strata might not have representation from other treatment groups
Ceteris Paribus: Other Things Equal Comparison
“The notion of Ideal Experiment disciplines our approach to causal inference”
• Comparisons made under ceteris paribus conditions have a causal interpretation
• The Causal Inference craft uses data to get to other things equal in-spite of obstacles-called selection bias or
omitted variables found on the path running from raw numbers to causal knowledge
• Random Assignment with Law Large Of Numbers, ensures Ceteris Paribus in Randomized Control Trails
• Random assignment isn’t the same as holding everything else fixed, but it has the same effect.
• The notion of Ideal Experiment disciplines our approach to causal inference
• Nothing but ensures Ceteris Paribus before marking Causal Statement
Rubin Causal Model or Potential Outcome Framework
𝑌0
• Think of Potential Outcomes as the
outcomes we would see under each
possible treatment option
• Notation: Ya is the outcome that would
be observed if treatment was set to A=a
• Counterfactuals outcomes are ones that
would have been observed, had the
treatment been different.
• Average Causal Effect: E[Y1 – Y0]
𝑌1
Average Causal Effect through Potential Outcome Framework
Population Of Interest
World 1
Everyone gets A=0
World 1
Everyone gets A=1
mean(Y) mean(Y)
Difference is Average Causal Effect: E[Y1-Y0]
• E[Y1 – Y0] ≠ E[Y|A=1] – E[Y|A=0]
• Condition Vs. Setting
• Comparing two different populations
• E[Y1/Y0] : Causal Relative Risk
• E[Y1-Y0|A=1 ]: Causal effect of treatment on
the treated
• Heterogeneity of Treatment Effects
• Fundamental Problem Of Causal Inference
• How do we use observed data to link Observed Outcomes to Potential Outcomes ?
• What assumptions are necessary to estimate causal effect from observed data ?
Untestable Causal Assumptions to link Observed and Potential Outcomes
• Stable Unit Treatment Value Assumption ( SUTVA )
• No Interference
• One Version Of Treatment
• Consistency Assumption: Y = Ya if A=a for all a
• Positivity Assumption: P(A=a | X=x) > 0
• Variability in treatment assignment for Causal Effect Identification
• Ignorability Assumption: (Y0, Y1 ) 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝐴 | X
• Ensures random assignment of treatment given co-variates
• In Simpson Paradox ignorability assumption is violated
• Linking Observed Data and Potential Outcomes
• E[Y|A=a, X=x]
• E[Y | A=a, X=x ] = E[Ya | A=a, X=x] by consistency assumption
• E[Ya | X=x ] by ignorability assumption
𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 𝑀𝑒𝑎𝑛𝑠 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐶𝑎𝑢𝑠𝑎𝑙 𝐸𝑓𝑓𝑒𝑐𝑡 + 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝐵𝑖𝑎𝑠
𝐸 𝑌𝑖|𝑇𝑖 = 1 − 𝐸 𝑌𝑖|𝑇𝑖 = 0
= 𝐸 𝑌1𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= 𝐸 𝑌0𝑖 + 𝑘|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= 𝑘 + 𝐸 𝑌0𝑖|𝑇 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= k + Selection Bias
Random Assignment Eliminates Selection Bias:
𝐸 𝑌𝑖|𝑇𝑖 = 1 − 𝐸 𝑌𝑖|𝑇𝑖 = 0
= 𝐸 𝑌1𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= 𝐸 𝑌0𝑖 + 𝑘|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= 𝑘 + 𝐸 𝑌0𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0
= k
𝐸 𝑌0𝑖|𝑇𝑖 = 1 = 𝐸 𝑌0𝑖|𝑇𝑖 = 0
Constant Causal Effect: Y1i = Y0i + k
Propensity Score & Propensity Score Matching
Short comings
• Discards some data
Propensity Score
Propensity Score as a balancing score
Propensity Score Matching
Distribution of Propensity score before & after matching
Propensity Score Matching
Five important theorems from the world of propensity score theory
The propensity score is a balancing score
Any score that is finer than the propensity score is a balancing score; moreover, x is the finest
balancing score and the propensity score is the coarsest
If treatment assignment is strongly ignorable given x, then it is strongly ignorable given any
balancing score
At any value of a balancing score, the difference between the treatment and control means is an
unbiased estimate of the average treatment effect at that value of the balancing score if
treatment assignment is strongly ignorable. Consequently, with strongly ignorable treatment
assignment, pair matching on balancing score, subclassification on a balancing score and
covariance adjustment on a balancing score can all produce unbiased estimates of treatment
effects.
Using sample estimates of balancing scores can produce sample balance on x
Inverse Propensity Treatment Weighting
• Rather than match, we could use all of the data, but down-weight some and up-weight
others
• This is accomplished by weighting by the inverse of the probability of treatment received
• For treated subjects, weight by the inverse of P(A=1 | X )
• For control subjects, weight by the inverse of P(A=0 | X)
This is known as Inverse Probability Of Treatment Weighting
• There is confounding in the original population
• IPTW creates a pseudo-population where treatment
assignment no-longer depends on X
• No Confounding in the pseudo-population
IPTW
Resolving Simpson Paradox through IPTW
• E[𝑌1
] = ( ( 300 * (1/0.5) ) + ( 50*(1/0.2) ) ) / 11000 = 850/11,000 = 0.0772
• E[𝑌0
] = ( ( 300 * (1/0.5) ) + ( 200*(1/0.8) ) ) / 11000 = 850/11,000 =
0.0772
• E[𝑌1
] - E[𝑌0
] = 0
• Note:
• Propensity Scores for X=1 are 0.5, 0.5
• Propensity Scores for X=0 are 0.2, 0.8
X=1
Y=1 Y=0 Row Sums Success Rate
T=1 300 2700 3,000 0.100
T=0 300 2700 3,000 0.100
Col Sums 600 5,400 6,000
X=0
Y=1 Y=0 Row Sums Succes Rate
T=1 50 950 1,000 0.050
T=0 200 3800 4,000 0.050
Col Sums 250 4,750 5,000
Some Terminology
• Marginal Treatment Probability
• Conditional Treatment Probability ( Propensity Score )
• The unit-level causal effect:
• Observed outcome:
• Conditional Average Treatment Effect( CATE )
• Population Treatment Effect ( ATE )
Heterogenous Treatment Effects: SMA & TMA
Single Model Approach: We can use conventional ML techniques with the observed outcome 𝑌𝑖
𝑜𝑏𝑠
as the outcome
and both the treatment Wi and Xi as the features.
Two Model Approach: Separate Trees for the Observed Outcome by Treatment Group
Heterogenous Treatment Effects: Transformed Outcome Approach
• Unbiased Learning-to-Rank with Biased Feedback: Optimizing Propensity-Weighted ERM Objective
• Position bias is search rankings strongly influences how many clicks a result receives, so that directly using click data
as a training signal in Learning-To-Rank methods yields sub-optimal results
• To overcome this bias problem, this paper presents a Counterfactual Inference Framework that provides the
theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data
• Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback data, where click models take
the role of the propensity estimators
• Other Quasi-Experiment Designs
• Use IPTW
Use Cases of Causal Inference
Instrumental Variables
References
1. Coursera’s Causality Crash Course
2. https://eng.uber.com/causal-inference-at-uber/
3. The Book Of Why
4. Mastering Metrics
5. Mostly Harmless Econometrics

More Related Content

PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPTX
Summary of statistical tools used in spss
PPTX
Parametric test
PDF
Logit model testing and interpretation
PPTX
Parametric tests
PPT
Nonparametric and Distribution- Free Statistics
PPTX
Stats - Intro to Quantitative
PPTX
Parametric test - t Test, ANOVA, ANCOVA, MANOVA
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Summary of statistical tools used in spss
Parametric test
Logit model testing and interpretation
Parametric tests
Nonparametric and Distribution- Free Statistics
Stats - Intro to Quantitative
Parametric test - t Test, ANOVA, ANCOVA, MANOVA

What's hot (20)

PDF
Analysis of Variance
PPTX
Statistics-Non parametric test
PPTX
3. parametric assumptions
DOCX
Parametric Statistics
PPTX
Non parametric tests
PPT
Medical statistics2
PPT
Nonparametric and Distribution- Free Statistics _contd
PPTX
Test of significance
PPTX
t distribution, paired and unpaired t-test
PPTX
Student's T test distributions & its Applications
PPT
T12 non-parametric tests
PPTX
Tests of significance
PPT
Medical statistics Basic concept and applications [Square one]
PPT
One Sample T Test
PPTX
DOE Project ANOVA Analysis Diet Type
PDF
Absence of a gold standard in diagnostic test accuracy research
PPTX
Lesson 27 using statistical techniques in analyzing data
PDF
Non parametrict test
Analysis of Variance
Statistics-Non parametric test
3. parametric assumptions
Parametric Statistics
Non parametric tests
Medical statistics2
Nonparametric and Distribution- Free Statistics _contd
Test of significance
t distribution, paired and unpaired t-test
Student's T test distributions & its Applications
T12 non-parametric tests
Tests of significance
Medical statistics Basic concept and applications [Square one]
One Sample T Test
DOE Project ANOVA Analysis Diet Type
Absence of a gold standard in diagnostic test accuracy research
Lesson 27 using statistical techniques in analyzing data
Non parametrict test
Ad

Similar to Introduction tocausalinference april02_2020 (20)

PDF
Causally regularized machine learning
PPTX
Causality in Python PyCon 2021 ISRAEL
PPT
Analytic Methods and Issues in CER from Observational Data
PDF
Causal Inference Introduction.pdf
PDF
Business Optimization via Causal Inference
PPTX
Causal inference for complex exposures: asking questions that matter, getting...
PPTX
Statistical Methods for Removing Selection Bias In Observational Studies
PDF
PyData Meetup Berlin 2017-04-19
PDF
Causality and Propensity Score Methods
PDF
Some notes on Deep Causal Inference - Presentation
PDF
The Challenge of Inference in the Social Sciences
PPTX
Dowhy: An end-to-end library for causal inference
PPTX
LESSON VI - Counterfactuals in Impact Evaluation.pptx
PPTX
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
PPT
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
PPT
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
PPTX
Machine Learning and Causal Inference
PDF
Causal Inference for Everyone
PDF
Estimating Causal Effects from Observations
PDF
Causal Inference and Direct Effects
Causally regularized machine learning
Causality in Python PyCon 2021 ISRAEL
Analytic Methods and Issues in CER from Observational Data
Causal Inference Introduction.pdf
Business Optimization via Causal Inference
Causal inference for complex exposures: asking questions that matter, getting...
Statistical Methods for Removing Selection Bias In Observational Studies
PyData Meetup Berlin 2017-04-19
Causality and Propensity Score Methods
Some notes on Deep Causal Inference - Presentation
The Challenge of Inference in the Social Sciences
Dowhy: An end-to-end library for causal inference
LESSON VI - Counterfactuals in Impact Evaluation.pptx
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
Machine Learning and Causal Inference
Causal Inference for Everyone
Estimating Causal Effects from Observations
Causal Inference and Direct Effects
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Lecture1 pattern recognition............
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Reliability_Chapter_ presentation 1221.5784
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data_Analytics_and_PowerBI_Presentation.pptx
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ISS -ESG Data flows What is ESG and HowHow
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Lecture1 pattern recognition............
Miokarditis (Inflamasi pada Otot Jantung)
Business Acumen Training GuidePresentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Introduction tocausalinference april02_2020

  • 1. A whirlwind tour of Causal Inference The Path From Cause To Effect By Viswanath Gangavaram Senior Data Scientist
  • 2. Topics • Introduction to Causal Inference • Simpson Paradox: Let’s understand the damage done by Lurking Variables (Selection Bias at play) • Ceteris Paribus: Other Things Equal Comparison • Rubin Causal Model: Potential Outcomes, Observed Outcomes, Counterfactuals • Average Treatment Effects ( ATE ) • Five important theorems from the world of propensity score theory • IPTW: Slaying the Lurking Variable • Doubly Robust Estimation • Instrumental Variables: 2SLS, Compiler Average Causal Effect • Heterogeneous Treatment Effects • One Model Approach (OMA), Two Model Approach (TMA), • Transformed Outcome Approach • Applications of Causal Inference • Counterfactual Inference framework for Learning To Rank • Churn reason through Counterfactual Inference framework • Causal Explanations on Quasi-Experiments
  • 3. Simpson Paradox Y=1 Y=0 Row Sums Success Rate T=1 350 3,650 4,000 0.088 T=0 500 6,500 7,000 0.071 Col Sums 850 10,150 11,000 𝐸 𝑌1 | 𝑇 = 1 𝐸 𝑌0|𝑇 = 0 ( 𝐸 𝑌1 | 𝑇 = 1 − 𝐸 𝑌0 | 𝑇 = 0 ) = 0.017 X=1 Y=1 Y=0 Row Sums Success Rate T=1 300 2,700 3,000 0.100 T=0 300 2,700 3,000 0.100 Col Sums 600 5,400 6,000 X=0 Y=1 Y=0 Row Sums Succes Rate T=1 50 950 1,000 0.050 T=0 200 3,800 4,000 0.050 Col Sums 250 4,750 5,000
  • 4. • Standardization: Stratification followed by Normalization 𝐸 𝑌1 | 𝑇 = 1 = (6000/11000)*0.1 + ( 5000/1000)*0.05 = 0.077 𝐸 𝑌0 |𝑇 = 0 = (6000/11000)*0.1 + ( 5000/1000)*0.05 = 0.077 In future slides, we will resolve Simpson Paradox with Inverse Propensity Treatment Weighting(IPTW) a Causal Inference Technique Standardization Pitfalls: • Stratification • Some strata might not have representation from other treatment groups
  • 5. Ceteris Paribus: Other Things Equal Comparison “The notion of Ideal Experiment disciplines our approach to causal inference” • Comparisons made under ceteris paribus conditions have a causal interpretation • The Causal Inference craft uses data to get to other things equal in-spite of obstacles-called selection bias or omitted variables found on the path running from raw numbers to causal knowledge • Random Assignment with Law Large Of Numbers, ensures Ceteris Paribus in Randomized Control Trails • Random assignment isn’t the same as holding everything else fixed, but it has the same effect. • The notion of Ideal Experiment disciplines our approach to causal inference • Nothing but ensures Ceteris Paribus before marking Causal Statement
  • 6. Rubin Causal Model or Potential Outcome Framework 𝑌0 • Think of Potential Outcomes as the outcomes we would see under each possible treatment option • Notation: Ya is the outcome that would be observed if treatment was set to A=a • Counterfactuals outcomes are ones that would have been observed, had the treatment been different. • Average Causal Effect: E[Y1 – Y0] 𝑌1
  • 7. Average Causal Effect through Potential Outcome Framework Population Of Interest World 1 Everyone gets A=0 World 1 Everyone gets A=1 mean(Y) mean(Y) Difference is Average Causal Effect: E[Y1-Y0] • E[Y1 – Y0] ≠ E[Y|A=1] – E[Y|A=0] • Condition Vs. Setting • Comparing two different populations • E[Y1/Y0] : Causal Relative Risk • E[Y1-Y0|A=1 ]: Causal effect of treatment on the treated • Heterogeneity of Treatment Effects
  • 8. • Fundamental Problem Of Causal Inference • How do we use observed data to link Observed Outcomes to Potential Outcomes ? • What assumptions are necessary to estimate causal effect from observed data ?
  • 9. Untestable Causal Assumptions to link Observed and Potential Outcomes • Stable Unit Treatment Value Assumption ( SUTVA ) • No Interference • One Version Of Treatment • Consistency Assumption: Y = Ya if A=a for all a • Positivity Assumption: P(A=a | X=x) > 0 • Variability in treatment assignment for Causal Effect Identification • Ignorability Assumption: (Y0, Y1 ) 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝐴 | X • Ensures random assignment of treatment given co-variates • In Simpson Paradox ignorability assumption is violated • Linking Observed Data and Potential Outcomes • E[Y|A=a, X=x] • E[Y | A=a, X=x ] = E[Ya | A=a, X=x] by consistency assumption • E[Ya | X=x ] by ignorability assumption
  • 10. 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 𝑀𝑒𝑎𝑛𝑠 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐶𝑎𝑢𝑠𝑎𝑙 𝐸𝑓𝑓𝑒𝑐𝑡 + 𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝐵𝑖𝑎𝑠 𝐸 𝑌𝑖|𝑇𝑖 = 1 − 𝐸 𝑌𝑖|𝑇𝑖 = 0 = 𝐸 𝑌1𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = 𝐸 𝑌0𝑖 + 𝑘|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = 𝑘 + 𝐸 𝑌0𝑖|𝑇 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = k + Selection Bias Random Assignment Eliminates Selection Bias: 𝐸 𝑌𝑖|𝑇𝑖 = 1 − 𝐸 𝑌𝑖|𝑇𝑖 = 0 = 𝐸 𝑌1𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = 𝐸 𝑌0𝑖 + 𝑘|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = 𝑘 + 𝐸 𝑌0𝑖|𝑇𝑖 = 1 − 𝐸 𝑌0𝑖|𝑇𝑖 = 0 = k 𝐸 𝑌0𝑖|𝑇𝑖 = 1 = 𝐸 𝑌0𝑖|𝑇𝑖 = 0 Constant Causal Effect: Y1i = Y0i + k
  • 11. Propensity Score & Propensity Score Matching Short comings • Discards some data Propensity Score Propensity Score as a balancing score Propensity Score Matching
  • 12. Distribution of Propensity score before & after matching Propensity Score Matching
  • 13. Five important theorems from the world of propensity score theory The propensity score is a balancing score Any score that is finer than the propensity score is a balancing score; moreover, x is the finest balancing score and the propensity score is the coarsest If treatment assignment is strongly ignorable given x, then it is strongly ignorable given any balancing score At any value of a balancing score, the difference between the treatment and control means is an unbiased estimate of the average treatment effect at that value of the balancing score if treatment assignment is strongly ignorable. Consequently, with strongly ignorable treatment assignment, pair matching on balancing score, subclassification on a balancing score and covariance adjustment on a balancing score can all produce unbiased estimates of treatment effects. Using sample estimates of balancing scores can produce sample balance on x
  • 14. Inverse Propensity Treatment Weighting • Rather than match, we could use all of the data, but down-weight some and up-weight others • This is accomplished by weighting by the inverse of the probability of treatment received • For treated subjects, weight by the inverse of P(A=1 | X ) • For control subjects, weight by the inverse of P(A=0 | X) This is known as Inverse Probability Of Treatment Weighting • There is confounding in the original population • IPTW creates a pseudo-population where treatment assignment no-longer depends on X • No Confounding in the pseudo-population
  • 15. IPTW Resolving Simpson Paradox through IPTW • E[𝑌1 ] = ( ( 300 * (1/0.5) ) + ( 50*(1/0.2) ) ) / 11000 = 850/11,000 = 0.0772 • E[𝑌0 ] = ( ( 300 * (1/0.5) ) + ( 200*(1/0.8) ) ) / 11000 = 850/11,000 = 0.0772 • E[𝑌1 ] - E[𝑌0 ] = 0 • Note: • Propensity Scores for X=1 are 0.5, 0.5 • Propensity Scores for X=0 are 0.2, 0.8 X=1 Y=1 Y=0 Row Sums Success Rate T=1 300 2700 3,000 0.100 T=0 300 2700 3,000 0.100 Col Sums 600 5,400 6,000 X=0 Y=1 Y=0 Row Sums Succes Rate T=1 50 950 1,000 0.050 T=0 200 3800 4,000 0.050 Col Sums 250 4,750 5,000
  • 16. Some Terminology • Marginal Treatment Probability • Conditional Treatment Probability ( Propensity Score ) • The unit-level causal effect: • Observed outcome: • Conditional Average Treatment Effect( CATE ) • Population Treatment Effect ( ATE )
  • 17. Heterogenous Treatment Effects: SMA & TMA Single Model Approach: We can use conventional ML techniques with the observed outcome 𝑌𝑖 𝑜𝑏𝑠 as the outcome and both the treatment Wi and Xi as the features. Two Model Approach: Separate Trees for the Observed Outcome by Treatment Group
  • 18. Heterogenous Treatment Effects: Transformed Outcome Approach
  • 19. • Unbiased Learning-to-Rank with Biased Feedback: Optimizing Propensity-Weighted ERM Objective • Position bias is search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-To-Rank methods yields sub-optimal results • To overcome this bias problem, this paper presents a Counterfactual Inference Framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data • Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback data, where click models take the role of the propensity estimators • Other Quasi-Experiment Designs • Use IPTW Use Cases of Causal Inference
  • 21. References 1. Coursera’s Causality Crash Course 2. https://eng.uber.com/causal-inference-at-uber/ 3. The Book Of Why 4. Mastering Metrics 5. Mostly Harmless Econometrics