SlideShare a Scribd company logo
CS592 Presentation #20
Causal Effect Inference
with Deep Latent-Variable Models
20173586 Jeongmin Cha
20193168 Hyunsu Kim
20183239 Jongjin Park
C. Louizos, U. Shalit, J. Mooij, D. Sontag, R. Zemel and M. Welling
Contents
1. Introduction
2. Identification of causal effect
3. Causal effect variational autoencoder
4. Experiments
5. Group Discussion Point
Introduction
y
t
Z
X
Introduction
y
t
Z
X
# of deaths
MedicineAverage income
Socio-economic status
Introduction
y
t
Z
X
# of deaths
Medicine
Socio-economic status
Average income
Q] Is this reasonable?
Introduction
y
t
Z
X
# of deaths
MedicineAverage income
Q] Isn’t this more general?
Socio-economic status
Introduction
y
t
Z
X
# of deaths
MedicineAverage income
Q] Isn’t this more general?
Could be, but X is just a noisy
view on Z. → We can say “only
Z can cause y and t”.
Socio-economic status
Introduction
y
t
Z
X
# of deaths
Medicine
Q] How about this case?
Socio-economic status
Average income
Introduction
y
t
Z
X
# of deaths
MedicineAverage consumption
Q] How about this case?
Socio-economic status
Introduction
y
t
Z
X
# of deaths
MedicineAverage income
Socio-economic status
Identification of Causal Effect
● Individual Treatment Effect (ITE)
● Example) “How will the number of deaths(y) vary if we do(t=1) or do
not(t=0) treat the poor(Z), whose annual salary(X) is about 10,000
dollars?”
Do-Calculus
● What if we set X to x ?
● Interventional probability is generally not the same with conditional
probability.
● Example)
X Y X Y
Problem Setup
y
t=1
Z
X ( The case for t=0 is identical )
Bayes’ rule
Do-calculus
When ‘t’ causes ‘X’
y
t=1
Z
X
Bayes’ rule
Do-calculus
( The case for t=0 is identical )
Causal effect variational autoencoder
Parametrize the causal graph as a latent variable model with neural networks
Encoder : model q(z|x,t,y) Decoder : model p(x|z)
Inference Network Model Network
Causal effect variational autoencoder
● Assume the observations factorize conditioned
on the latent variables
Model Network
Causal effect variational autoencoder
● First, compute the distribution p(t|z) and sample t
● Next, compute p(y|t,z) and sample y
○ For a continuous outcome: Gaussian distribution
○ For a discrete outcome: Bernoulli distribution.
● Then compute p(x|z) for reconstruction
Model Network
Causal effect variational autoencoder
Neural networks outputs the parameters of a posterior approximation over the
latent variables z, e.g. a Gaussian
x
y
t
Inference Network
q(z|x,t=0,y)
q(z|x,t=1,y)
For the mean parameters
For the variance parameters
Causal effect variational autoencoder
The objective from original VAE - ELBO
x
y
t
Inference NetworkModel Network
q(z|x,t=0,y)
q(z|x,t=1,y)
○ For a continuous outcome: Gaussian distribution
○ For a discrete outcome: Bernoulli distribution.
Causal effect variational autoencoder
● We need two auxiliary distributions that predict t, y for new samples.
Inference Network
● ELBO term
● The Objective of the Causal Effect Variational Autoencoder (CEVAE)
Causal effect variational autoencoder
Q: Why this two extra terms are needed?
● The Objective of the Causal Effect Variational Autoencoder (CEVAE)
Causal effect variational autoencoder
Q: Why this two extra terms are needed?
● For unseen data x, we require to know the treatment assignment t and outcome y
before inferring the distribution over z.
● So we need two auxiliary distributions that predict t, y for new samples, and two
extra terms are for estimating the parameters of these distributions.
Experiment - Dataset
Three main experiment in the present paper
1. Two existing benchmark datasets
● IHDP (Infant Health and Development Program), Jobs
2. a synthetic toy dataset
3. Introduce a new benchmark
● based on twin births and deaths in the USA
Experiment - Implementation
● neural network architecture
○ 3 hidden layers NN with ELU nonlinearities
■ approximate posterior over the latent variables q(Z|X,t,y)
■ generative model p(X|Z)
■ outcome models p(y|t, Z), q(y|t, X).
○ a single hidden layer NN with ELU nonlinearities
■ Treatment models p(t|Z), q(t|X)
● latent variable
○ 20 dimensional latent variable z
○ weight decay term for all of the parameters
Experiment - Baseline models
● LR1 = logistic regression
● LR2 = two separate logistic regressions fit
○ to treated (t=1)
○ to control (t=0)
● TARnet
○ FFNN for causal inference
Experiment 1 - Benchmark datasets
● IHDP (Infant Health and Development Program)
● effect of home visits by specialists on future cognitive test scores
● Metrics
○ PEHE (Precision in Estimation of Heterogeneous Effect)
○ absolute error on ATE (Average Treatment Effect)
Experiment 1 - Benchmark datasets
Experiment 1 - Benchmark datasets
● the effect of job training (treatment) on employment after training (outcome)
● Metrics
○ absolute error on Average Treatment effect on the Treated (ATT)
○ Policy risk
■ acts as a proxy to the individual treatment effect.
Experiment 1 - Benchmark datasets
Experiment 2 - Synthetic experiment on toy data
● the marginal distribution of X is a mixture of Gaussians
● the hidden variable Z is determining the mixture component
● latent variable z => binary variable
● How models can imitate the true latent variable well
Experiment 2 - Synthetic experiment on toy data
● CEVAE bin: 5-dim binary latent z
○ latent model is correctly
specified
○ better with all sample size
Experiment 2 - Synthetic experiment on toy data
● CEVAE cont:
5-dim continuous latent z
○ latent model is not correctly
specified
● We require more samples for the
latent space to imitate more
closely the true latent variable
Experiment 3 - Binary treatment outcome on Twins
● Introduce a new benchmark dataset about twin births in the USA
● t = 1
○ being born the heavier than the other.
● Y (outcome)
○ mortality of each of the twins in their first year of life
● Z (latent variable)
○ the number of gestation weeks (20 weeks, 20-27, 27-34, …) 10 categories
● X (proxy variables)
○ noisy view of Z
○ encoded vector (Z) flipped with a probability of 0.05-0.5
○ 0.5 flip means no direct information from Z
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
Q: Why does all methods
perform similarly when the
proxy noise is small?
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
Q: Why does all methods
perform similarly when the
proxy noise is small?
● X ≃ Z
● Only CEVAE uses Z
● The others use only X
● Z (the gestation length
feature) is very informative
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
● nh = number of hidden
layers
● in CEVAE cases,
larger nh, better AUC
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
● nh = number of hidden
layers
● in CEVAE cases,
larger nh, better AUC
Q: “TARnet (nh=0) == LR2“, but
Why are [CEVAE nh=0] and
[LR2] performing differently
when more proxy noisy level?
Experiment 3 - Binary treatment outcome on Twins
● inferring the mortality of the unobserved twin (counterfactual)
● nh = number of hidden
layers
● in CEVAE cases,
larger nh, better AUC
Q: “TARnet (nh=0) == LR2“, but
Why are [CEVAE nh=0] and
[LR2] performing differently
when more proxy noisy level?
● LR2 rely directly on the
noisy proxies instead of
the inferred latent state.
Experiment 3 - Binary treatment outcome on Twins
● inferring the average treatment effect
● CEVAE nh=0: not so good
● CEVAE is robust with
increasing proxy noise
Group Discussion Point
Balancing Neural Network (BNN) vs. CEVAE
Group Discussion Point
Balancing Neural Network (BNN) vs. CEVAE
- Model choice
- Discriminative model vs. Generative model
- CEVAE learns latent variable z (i.e. unobserved confounder), which BNN does not learn
- Architecture
- CEVAE split outputs for each treatment group in t after a shared representation
BNN CEVAE (~ TARnet architecture)
It can learn the
difference of t
more explicitly
Thank you

More Related Content

PDF
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
PDF
Causality without headaches
PPTX
NS - CUK Seminar: J.H.Lee, Review on "How Attentive are Graph Attention Netw...
PPTX
Variational inference intro. (korean ver.)
PPTX
Neural Networks and Deep Learning Basics
PPT
Decision tree
DOCX
UNIT 3 IRT.docx
PDF
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Causality without headaches
NS - CUK Seminar: J.H.Lee, Review on "How Attentive are Graph Attention Netw...
Variational inference intro. (korean ver.)
Neural Networks and Deep Learning Basics
Decision tree
UNIT 3 IRT.docx
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed

What's hot (20)

PDF
WEBMINING_SOWMYAJYOTHI.pdf
PPTX
Causality in Python PyCon 2021 ISRAEL
PPTX
Causal inference in practice
PDF
Anomaly detection Workshop slides
PPT
The Method Of Maximum Likelihood
PDF
Interpretable machine learning : Methods for understanding complex models
PPTX
Ensemble methods
PPTX
Anomaly Detection
PPTX
Machine Learning and Causal Inference
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
DOCX
Anomaly detection Full Article
PPTX
Beginner's Guide to Diffusion Models..pptx
PDF
Towards Causal Representation Learning
PDF
Network embedding
PPTX
Artificial Neural Network
PDF
6. Association Rule.pdf
PPTX
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
PPTX
Desicion tree and neural networks
PDF
Deep Feed Forward Neural Networks and Regularization
PDF
Ways to evaluate a machine learning model’s performance
WEBMINING_SOWMYAJYOTHI.pdf
Causality in Python PyCon 2021 ISRAEL
Causal inference in practice
Anomaly detection Workshop slides
The Method Of Maximum Likelihood
Interpretable machine learning : Methods for understanding complex models
Ensemble methods
Anomaly Detection
Machine Learning and Causal Inference
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Anomaly detection Full Article
Beginner's Guide to Diffusion Models..pptx
Towards Causal Representation Learning
Network embedding
Artificial Neural Network
6. Association Rule.pdf
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Desicion tree and neural networks
Deep Feed Forward Neural Networks and Regularization
Ways to evaluate a machine learning model’s performance
Ad

Similar to Causal Effect Inference with Deep Latent-Variable Models (20)

PDF
Some notes on Deep Causal Inference - Presentation
PPTX
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
PDF
Causal Inference in Data Science and Machine Learning
PPTX
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
PPTX
Bayesian networks and the search for causality
PDF
2019 PMED Spring Course - Preliminaries: Basic Causal Inference - Marie David...
PPTX
Dowhy: An end-to-end library for causal inference
PDF
Causally regularized machine learning
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPT
Analytic Methods and Issues in CER from Observational Data
PPTX
Adapting neural networks for the estimation of treatment effects
PPTX
Matching Methods and Natural Experiments - Examples of Causal Inference from ...
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PPTX
Causal inference for complex exposures: asking questions that matter, getting...
PDF
Aussem
PDF
Perspective of feature selection in bioinformatics
PDF
Jonathan Ronen - Variational Autoencoders tutorial
PDF
Causality and Propensity Score Methods
PDF
Clinical studies & observational trials in the age of AI
PPT
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Some notes on Deep Causal Inference - Presentation
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Causal Inference in Data Science and Machine Learning
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Bayesian networks and the search for causality
2019 PMED Spring Course - Preliminaries: Basic Causal Inference - Marie David...
Dowhy: An end-to-end library for causal inference
Causally regularized machine learning
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Analytic Methods and Issues in CER from Observational Data
Adapting neural networks for the estimation of treatment effects
Matching Methods and Natural Experiments - Examples of Causal Inference from ...
Modeling Electronic Health Records with Recurrent Neural Networks
Causal inference for complex exposures: asking questions that matter, getting...
Aussem
Perspective of feature selection in bioinformatics
Jonathan Ronen - Variational Autoencoders tutorial
Causality and Propensity Score Methods
Clinical studies & observational trials in the age of AI
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Ad

More from Jeongmin Cha (8)

PDF
차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서
PDF
Composing graphical models with neural networks for structured representatio...
PDF
Sparse Additive Models (SPAM)
PPTX
Waterful Application (iOS + AppleWatch)
PPTX
시스템 프로그램 설계 2 최종발표 (차정민, 조경재)
PPTX
시스템 프로그램 설계1 최종발표
PPTX
마이크로프로세서 응용(2013-2)
PPTX
최종발표
차정민 (소프트웨어 엔지니어) 이력서 + 경력기술서
Composing graphical models with neural networks for structured representatio...
Sparse Additive Models (SPAM)
Waterful Application (iOS + AppleWatch)
시스템 프로그램 설계 2 최종발표 (차정민, 조경재)
시스템 프로그램 설계1 최종발표
마이크로프로세서 응용(2013-2)
최종발표

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Sustainable Sites - Green Building Construction
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT 4 Total Quality Management .pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PPT on Performance Review to get promotions
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
OOP with Java - Java Introduction (Basics)
Lecture Notes Electrical Wiring System Components
Foundation to blockchain - A guide to Blockchain Tech
CYBER-CRIMES AND SECURITY A guide to understanding
Geodesy 1.pptx...............................................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Sustainable Sites - Green Building Construction
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT 4 Total Quality Management .pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT on Performance Review to get promotions
Automation-in-Manufacturing-Chapter-Introduction.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
573137875-Attendance-Management-System-original
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...

Causal Effect Inference with Deep Latent-Variable Models

  • 1. CS592 Presentation #20 Causal Effect Inference with Deep Latent-Variable Models 20173586 Jeongmin Cha 20193168 Hyunsu Kim 20183239 Jongjin Park C. Louizos, U. Shalit, J. Mooij, D. Sontag, R. Zemel and M. Welling
  • 2. Contents 1. Introduction 2. Identification of causal effect 3. Causal effect variational autoencoder 4. Experiments 5. Group Discussion Point
  • 5. Introduction y t Z X # of deaths Medicine Socio-economic status Average income Q] Is this reasonable?
  • 6. Introduction y t Z X # of deaths MedicineAverage income Q] Isn’t this more general? Socio-economic status
  • 7. Introduction y t Z X # of deaths MedicineAverage income Q] Isn’t this more general? Could be, but X is just a noisy view on Z. → We can say “only Z can cause y and t”. Socio-economic status
  • 8. Introduction y t Z X # of deaths Medicine Q] How about this case? Socio-economic status Average income
  • 9. Introduction y t Z X # of deaths MedicineAverage consumption Q] How about this case? Socio-economic status
  • 10. Introduction y t Z X # of deaths MedicineAverage income Socio-economic status
  • 11. Identification of Causal Effect ● Individual Treatment Effect (ITE) ● Example) “How will the number of deaths(y) vary if we do(t=1) or do not(t=0) treat the poor(Z), whose annual salary(X) is about 10,000 dollars?”
  • 12. Do-Calculus ● What if we set X to x ? ● Interventional probability is generally not the same with conditional probability. ● Example) X Y X Y
  • 13. Problem Setup y t=1 Z X ( The case for t=0 is identical ) Bayes’ rule Do-calculus
  • 14. When ‘t’ causes ‘X’ y t=1 Z X Bayes’ rule Do-calculus ( The case for t=0 is identical )
  • 15. Causal effect variational autoencoder Parametrize the causal graph as a latent variable model with neural networks Encoder : model q(z|x,t,y) Decoder : model p(x|z) Inference Network Model Network
  • 16. Causal effect variational autoencoder ● Assume the observations factorize conditioned on the latent variables Model Network
  • 17. Causal effect variational autoencoder ● First, compute the distribution p(t|z) and sample t ● Next, compute p(y|t,z) and sample y ○ For a continuous outcome: Gaussian distribution ○ For a discrete outcome: Bernoulli distribution. ● Then compute p(x|z) for reconstruction Model Network
  • 18. Causal effect variational autoencoder Neural networks outputs the parameters of a posterior approximation over the latent variables z, e.g. a Gaussian x y t Inference Network q(z|x,t=0,y) q(z|x,t=1,y) For the mean parameters For the variance parameters
  • 19. Causal effect variational autoencoder The objective from original VAE - ELBO x y t Inference NetworkModel Network q(z|x,t=0,y) q(z|x,t=1,y)
  • 20. ○ For a continuous outcome: Gaussian distribution ○ For a discrete outcome: Bernoulli distribution. Causal effect variational autoencoder ● We need two auxiliary distributions that predict t, y for new samples. Inference Network
  • 21. ● ELBO term ● The Objective of the Causal Effect Variational Autoencoder (CEVAE) Causal effect variational autoencoder Q: Why this two extra terms are needed?
  • 22. ● The Objective of the Causal Effect Variational Autoencoder (CEVAE) Causal effect variational autoencoder Q: Why this two extra terms are needed? ● For unseen data x, we require to know the treatment assignment t and outcome y before inferring the distribution over z. ● So we need two auxiliary distributions that predict t, y for new samples, and two extra terms are for estimating the parameters of these distributions.
  • 23. Experiment - Dataset Three main experiment in the present paper 1. Two existing benchmark datasets ● IHDP (Infant Health and Development Program), Jobs 2. a synthetic toy dataset 3. Introduce a new benchmark ● based on twin births and deaths in the USA
  • 24. Experiment - Implementation ● neural network architecture ○ 3 hidden layers NN with ELU nonlinearities ■ approximate posterior over the latent variables q(Z|X,t,y) ■ generative model p(X|Z) ■ outcome models p(y|t, Z), q(y|t, X). ○ a single hidden layer NN with ELU nonlinearities ■ Treatment models p(t|Z), q(t|X) ● latent variable ○ 20 dimensional latent variable z ○ weight decay term for all of the parameters
  • 25. Experiment - Baseline models ● LR1 = logistic regression ● LR2 = two separate logistic regressions fit ○ to treated (t=1) ○ to control (t=0) ● TARnet ○ FFNN for causal inference
  • 26. Experiment 1 - Benchmark datasets ● IHDP (Infant Health and Development Program) ● effect of home visits by specialists on future cognitive test scores ● Metrics ○ PEHE (Precision in Estimation of Heterogeneous Effect) ○ absolute error on ATE (Average Treatment Effect)
  • 27. Experiment 1 - Benchmark datasets
  • 28. Experiment 1 - Benchmark datasets ● the effect of job training (treatment) on employment after training (outcome) ● Metrics ○ absolute error on Average Treatment effect on the Treated (ATT) ○ Policy risk ■ acts as a proxy to the individual treatment effect.
  • 29. Experiment 1 - Benchmark datasets
  • 30. Experiment 2 - Synthetic experiment on toy data ● the marginal distribution of X is a mixture of Gaussians ● the hidden variable Z is determining the mixture component ● latent variable z => binary variable ● How models can imitate the true latent variable well
  • 31. Experiment 2 - Synthetic experiment on toy data ● CEVAE bin: 5-dim binary latent z ○ latent model is correctly specified ○ better with all sample size
  • 32. Experiment 2 - Synthetic experiment on toy data ● CEVAE cont: 5-dim continuous latent z ○ latent model is not correctly specified ● We require more samples for the latent space to imitate more closely the true latent variable
  • 33. Experiment 3 - Binary treatment outcome on Twins ● Introduce a new benchmark dataset about twin births in the USA ● t = 1 ○ being born the heavier than the other. ● Y (outcome) ○ mortality of each of the twins in their first year of life ● Z (latent variable) ○ the number of gestation weeks (20 weeks, 20-27, 27-34, …) 10 categories ● X (proxy variables) ○ noisy view of Z ○ encoded vector (Z) flipped with a probability of 0.05-0.5 ○ 0.5 flip means no direct information from Z
  • 34. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual)
  • 35. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual) Q: Why does all methods perform similarly when the proxy noise is small?
  • 36. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual) Q: Why does all methods perform similarly when the proxy noise is small? ● X ≃ Z ● Only CEVAE uses Z ● The others use only X ● Z (the gestation length feature) is very informative
  • 37. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual) ● nh = number of hidden layers ● in CEVAE cases, larger nh, better AUC
  • 38. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual) ● nh = number of hidden layers ● in CEVAE cases, larger nh, better AUC Q: “TARnet (nh=0) == LR2“, but Why are [CEVAE nh=0] and [LR2] performing differently when more proxy noisy level?
  • 39. Experiment 3 - Binary treatment outcome on Twins ● inferring the mortality of the unobserved twin (counterfactual) ● nh = number of hidden layers ● in CEVAE cases, larger nh, better AUC Q: “TARnet (nh=0) == LR2“, but Why are [CEVAE nh=0] and [LR2] performing differently when more proxy noisy level? ● LR2 rely directly on the noisy proxies instead of the inferred latent state.
  • 40. Experiment 3 - Binary treatment outcome on Twins ● inferring the average treatment effect ● CEVAE nh=0: not so good ● CEVAE is robust with increasing proxy noise
  • 41. Group Discussion Point Balancing Neural Network (BNN) vs. CEVAE
  • 42. Group Discussion Point Balancing Neural Network (BNN) vs. CEVAE - Model choice - Discriminative model vs. Generative model - CEVAE learns latent variable z (i.e. unobserved confounder), which BNN does not learn - Architecture - CEVAE split outputs for each treatment group in t after a shared representation BNN CEVAE (~ TARnet architecture) It can learn the difference of t more explicitly