SlideShare a Scribd company logo
11-1
Experiments and Quasi-Experiments
(SW Chapter 11)
Why study experiments?
 Ideal randomized controlled experiments provide a
benchmark for assessing observational studies.
 Actual experiments are rare ($$$) but influential.
 Experiments can solve the threats to internal validity
of observational studies, but they have their own
threats to internal validity.
 Thinking about experiments helps us to understand
quasi-experiments, or “natural experiments,” in which
there some variation is “as if” randomly assigned.
11-2
Terminology: experiments and quasi-experiments
 An experiment is designed and implemented
consciously by human researchers. An experiment
entails conscious use of a treatment and control group
with random assignment (e.g. clinical trials of a drug)
 A quasi-experiment or natural experiment has a
source of randomization that is “as if” randomly
assigned, but this variation was not part of a conscious
randomized treatment and control design.
 Program evaluation is the field of statistics aimed at
evaluating the effect of a program or policy, for
example, an ad campaign to cut smoking.
11-3
Different types of experiments: three examples
 Clinical drug trial: does a proposed drug lower
cholesterol?
oY = cholesterol level
oX = treatment or control group (or dose of drug)
 Job training program (Job Training Partnership Act)
oY = has a job, or not (or Y = wage income)
oX = went through experimental program, or not
 Class size effect (Tennessee class size experiment)
oY = test score (Stanford Achievement Test)
oX = class size treatment group (regular, regular +
aide, small)
11-4
Our treatment of experiments: brief outline
 Why (precisely) do ideal randomized controlled
experiments provide estimates of causal effects?
 What are the main threats to the validity (internal and
external) of actual experiments – that is, experiments
actually conducted with human subjects?
 Flaws in actual experiments can result in X and u being
correlated (threats to internal validity).
 Some of these threats can be addressed using the
regression estimation methods we have used so far:
multiple regression, panel data, IV regression.
11-5
Idealized Experiments and Causal Effects
(SW Section 11.1)
 An ideal randomized controlled experiment randomly
assigns subjects to treatment and control groups.
 More generally, the treatment level X is randomly
assigned:
Yi = 0 + 1Xi + ui
 If X is randomly assigned (for example by computer)
then u and X are independently distributed and E(ui|Xi)
= 0, so OLS yields an unbiased estimator of 1.
 The causal effect is the population value of 1 in an
ideal randomized controlled experiment
11-6
Estimation of causal effects in an ideal randomized
controlled experiment
 Random assignment of X implies that E(ui|Xi) = 0.
 Thus the OLS estimator 1
ˆ
 is unbiased.
 When the treatment is binary, 1
ˆ
 is just the difference
in mean outcome (Y) in the treatment vs. control
group ( treated
Y – control
Y ).
 This differences in means is sometimes called the
differences estimator.
11-7
Potential Problems with Experiments in Practice
(SW Section 11.2)
Threats to Internal Validity
1. Failure to randomize (or imperfect randomization)
 for example, openings in job treatment program are
filled on first-come, first-serve basis; latecomers are
controls
 result is correlation between X and u
11-8
Threats to internal validity, ctd.
2. Failure to follow treatment protocol (or “partial
compliance”)
 some controls get the treatment
 some “treated” get controls
 “errors-in-variables” bias: corr(X,u) 0
 Attrition (some subjects drop out)
 suppose the controls who get jobs move out of town;
then corr(X,u) 0
11-9
Threats to internal validity, ctd.
3. Experimental effects
 experimenter bias (conscious or subconscious):
treatment X is associated with “extra effort” or
“extra care,” so corr(X,u) 0
 subject behavior might be affected by being in an
experiment, so corr(X,u) 0 (Hawthorne effect)
Just as in regression analysis with observational data,
threats to the internal validity of regression with
experimental data implies that corr(X,u) 0 so OLS
(the differences estimator) is biased.
George Elton Mayo and the Hawthorne Experiment
11-10
Subjects in the Hawthorne plant experiments, 1924 – 1932
11-11
Threats to External Validity
1. Nonrepresentative sample
2. Nonrepresentative “treatment” (that is, program or
policy)
3. General equilibrium effects (effect of a program can
depend on its scale; admissions counseling )
4. Treatment v. eligibility effects (which is it you want
to measure: effect on those who take the program, or
the effect on those are eligible)
11-12
Regression Estimators of Causal Effects Using
Experimental Data
(SW Section 11.3)
 Focus on the case that X is binary (treatment/control).
 Often you observe subject characteristics, W1i,…,Wri.
 Extensions of the differences estimator:
ocan improve efficiency (reduce standard errors)
ocan eliminate bias that arises when:
 treatment and control groups differ
 there is “conditional randomization”
 there is partial compliance
 These extensions involve methods we have already
seen – multiple regression, panel data, IV regression
11-13
Estimators of the Treatment Effect 1 using
Experimental Data (X = 1 if treated, 0 if control)
Dep.
vble
Ind.
vble(s)
method
differences Y X OLS
differences-in-
differences
Y =
Yafter
– Ybefore
X OLS adjusts for initial
differences between
treatment and control
groups
differences with
add’l regressors
Y X,W1,
…,Wn
OLS controls for
additional subject
characteristics W
11-14
Estimators with experimental data, ctd.
Dep.
vble
Ind.
vble(s)
method
differences-in-
differences with
add’l regressors
Y =
Yafter
– Ybefore
X,W1,
…,Wn
OLS adjusts for group
differences + controls
for subject char’s W
Instrumental
variables
Y X TSLS Z = initial random
assignment;
eliminates bias from
partial compliance
 TSLS with Z = initial random assignment also can be
applied to the differences-in-differences estimator and
the estimators with additional regressors (W’s)
11-15
The differences-in-differences estimator
 Suppose the treatment and control groups differ
systematically; maybe the control group is healthier
(wealthier; better educated; etc.)
 Then X is correlated with u, and the differences
estimator is biased.
 The differences-in-differences estimator adjusts for pre-
experimental differences by subtracting off each
subject’s pre-experimental value of Y
o before
i
Y = value of Y for subject i before the expt
o after
i
Y = value of Y for subject i after the expt
oYi = after
i
Y – before
i
Y = change over course of expt
11-16
1
ˆdiffs in diffs
  
= ( ,
treat after
Y – ,
treat before
Y ) – ( ,
control after
Y – ,
control before
Y )
11-17
The differences-in-differences estimator, ctd.
(1) “Differences” formulation:
Yi = 0 + 1Xi + ui
where
Yi = after
i
Y – before
i
Y
Xi = 1 if treated, = 0 otherwise
 1
ˆ
 is the diffs-in-diffs estimator
11-18
The differences-in-differences estimator, ctd.
(2) Equivalent “panel data” version:
Yit = 0 + 1Xit + 2Dit + 3Git + vit, i = 1,…,n
where
t = 1 (before experiment), 2 (after experiment)
Dit = 0 for t = 1, = 1 for t = 2
Git = 0 for control group, = 1 for treatment group
Xit = 1 if treated, = 0 otherwise
= Dit Git = interaction effect of being in treatment
group in the second period
 1
ˆ
 is the diffs-in-diffs estimator
11-19
Including additional subject characteristics (W’s)
 Typically you observe additional subject characteristics,
W1i,…,Wri
 Differences estimator with add’l regressors:
Yi = 0 + 1Xi + 2W1i + … + r+1Wri + ui
 Differences-in-differences estimator with W’s:
Yi = 0 + 1Xi + 2W1i + … + r+1Wri + ui
where Yi = after
i
Y – before
i
Y .
11-20
Why include additional subject characteristics (W’s)?
1. Efficiency: more precise estimator of 1 (smaller
standard errors)
2. Check for randomization. If X is randomly assigned,
then the OLS estimators with and without the W’s
should be similar – if they aren’t, this suggests that X
wasn’t randomly designed (a problem with the expt.)
 Note: To check directly for randomization,
regress X on the W’s and do a F-test.
3. Adjust for conditional randomization (we’ll return to
this later…)
11-21
Estimation when there is partial compliance
Consider diffs-in-diffs estimator, X = actual treatment
Yi = 0 + 1Xi + ui
 Suppose there is partial compliance: some of the
treated don’t take the drug; some of the controls go to
job training anyway
 Then X is correlated with u, and OLS is biased
 Suppose initial assignment, Z, is random
 Then (1) corr(Z,X) 0 and (2) corr(Z,u) = 0
 Thus 1 can be estimated by TSLS, with instrumental
variable Z = initial assignment
 This can be extended to W’s (included exog. variables)
11-22
Experimental Estimates of the Effect of
Reduction: The Tennessee Class Size Experiment
(SW Section 11.4)
Project STAR (Student-Teacher Achievement Ratio)
 4-year study, $12 million
 Upon entering the school system, a student was
randomly assigned to one of three groups:
oregular class (22 – 25 students)
oregular class + aide
osmall class (13 – 17 students)
 regular class students re-randomized after first year to
regular or regular+aide
 Y = Stanford Achievement Test scores
11-23
Deviations from experimental design
 Partial compliance:
o10% of students switched treatment groups because
of “incompatibility” and “behavior problems” – how
much of this was because of parental pressure?
oNewcomers: incomplete receipt of treatment for
those who move into district after grade 1
 Attrition
ostudents move out of district
ostudents leave for private/religious schools
11-24
Regression analysis
 The “differences” regression model:
Yi = 0 + 1SmallClassi + 2RegAidei + ui
where
SmallClassi = 1 if in a small class
RegAidei = 1 if in regular class with aide
 Additional regressors (W’s)
oteacher experience
ofree lunch eligibility
ogender, race
11-25
Differences estimates (no W’s)
11-26
11-27
How big are these estimated effects?
 Put on same basis by dividing by std. dev. of Y
 Units are now standard deviations of test scores
11-28
How do these estimates compare to those from the
California, Mass. observational studies? (Ch. 4 – 7)
11-29
Summary: The Tennessee Class Size Experiment
Remaining threats to internal validity
 partial compliance/incomplete treatment
ocan use TSLS with Z = initial assignment
oTurns out, TSLS and OLS estimates are similar
(Krueger (1999)), so this bias seems not to be large
Main findings:
 The effects are small quantitatively (same size as
gender difference)
 Effect is sustained but not cumulative or increasing
biggest effect at the youngest grades
11-30
What is the Difference Between a Control Variable
and the Variable of Interest?
(SW App. 11.3)
Example: “free lunch eligible” in the STAR regressions
 Coefficient is large, negative, statistically significant
 Policy interpretation: Making students ineligible for a
free school lunch will improve their test scores.
 Is this really an estimate of a causal effect?
 Is the OLS estimator of its coefficient unbiased?
 Can it be that the coefficient on “free lunch eligible”
is biased but the coefficient on SmallClass is not?
11-31
11-32
Example: “free lunch eligible,” ctd.
 Coefficient on “free lunch eligible” is large, negative,
statistically significant
 Policy interpretation: Making students ineligible for a
free school lunch will improve their test scores.
 Why (precisely) can we interpret the coefficient on
SmallClass as an unbiased estimate of a causal effect,
but not the coefficient on “free lunch eligible”?
 This is not an isolated example!
oOther “control variables” we have used: gender,
race, district income, state fixed effects, time fixed
effects, city (or state) population,…
 What is a “control variable” anyway?
11-33
Simplest case: one X, one control variable W
Yi = 0 + 1 Xi + 2Wi + ui
For example,
 W = free lunch eligible (binary)
 X = small class/large class (binary)
 Suppose random assignment of X depends on W
ofor example, 60% of free-lunch eligibles get small
class, 40% of ineligibles get small class)
onote: this wasn’t the actual STAR randomization
procedure – this is a hypothetical example
 Further suppose W is correlated with u
11-34
Yi = 0 + 1 Xi + 2Wi + ui
Suppose:
 The control variable W is correlated with u
 Given W = 0 (ineligible), X is randomly assigned
 Given W = 1 (eligible), X is randomly assigned.
Then:
 Given the value of W, X is randomly assigned;
 That is, controlling for W, X is randomly assigned;
 Thus, controlling for W, X is uncorrelated with u
 Moreover, E(u|X,W) doesn’t depend on X
 That is, we have conditional mean independence:
E(u|X,W) = E(u|W)
11-35
Implications of conditional mean independence
Yi = 0 + 1 Xi + 2Wi + ui
Suppose E(u|W) is linear in W (not restrictive – could add
quadratics etc.): then,
E(u|X,W) = E(u|W) = 0 + 1Wi (*)
so
E(Yi|Xi,Wi) = E(0 + 1 Xi + 2Wi + ui|Xi,Wi)
= 0 + 1Xi + 2Wi + E(ui|Xi,Wi)
= 0 + 1Xi + 2Wi + 0 + 1Wi by (*)
= (0+0) + 1Xi + (1+2)Wi
11-36
Implications of conditional mean independence:
 The conditional mean of Y given X and W is
E(Yi|Xi,Wi) = (0+0) + 1Xi + (1+2)Wi
 The effect of a change in X under conditional mean
independence is the desired causal effect:
E(Yi|Xi = x+x,Wi) – E(Yi|Xi = x,Wi) = 1x
or
1 =
( | , ) ( | , )
i i i i i i
E Y X x x W E Y X x W
x
    

 If X is binary (treatment/control), this becomes:
1 =
( | 1, ) ( | 0, )
i i i i i i
E Y X W E Y X W
x
  

which is the desired treatment effect.
11-37
Implications of conditional mean independence, ctd.
Yi = 0 + 1 Xi + 2Wi + ui
Conditional mean independence says:
E(u|X,W) = E(u|W)
which, with linearity, implies:
E(Yi|Xi,Wi) = (0+0) + 1Xi + (1+2)Wi
Then:
 The OLS estimator 1
ˆ
 is unbiased.
 2
ˆ
 is not consistent and not meaningful
 The usual inference methods (standard errors,
hypothesis tests, etc.) apply to 1
ˆ
 .
11-38
So, what is a control variable?
A control variable W is a variable that results in X
satisfying the conditional mean independence condition:
E(u|X,W) = E(u|W)
 Upon including a control variable in the regression, X
ceases to be correlated with the error term.
 The control variable itself can be (in general will be)
correlated with the error term.
 The coefficient on X has a causal interpretation.
 The coefficient on W does not have a causal
interpretation.
11-39
Example: Effect of teacher experience on test scores
More on the design of Project STAR:
 Teachers didn’t change school because of the expt.
 Within their normal school, teachers were randomly
assigned to small/regular/reg+aide classrooms.
 What is the effect of X = years of teacher education?
The design implies conditional mean independence:
 W = school binary indicator
 Given W (school), X is randomly assigned
 That is, E(u|X,W) = E(u|W)
 W is plausibly correlated with u (nonzero school fixed
effects: some schools are better/richer/etc than others)
11-40
11-41
Example: teacher experience, ctd.
 Without school fixed effects (2), the estimated effect of
an additional year of experience is 1.47 (SE = .17)
 “Controlling for the school” (3), the estimated effect of
an additional year of experience is .74 (SE = .17)
 Direction of bias makes sense:
oless experienced teachers at worse schools
oyears of experience picks up this school effect
 OLS estimator of coefficient on years of experience is
biased up without school effects; with school effects,
OLS yields unbiased estimator of causal effect
 School effect coefficients don’t have a causal
interpretation (effect of student changing schools)
11-42
Quasi-Experiments
(SW Section 11.5)
A quasi-experiment or natural experiment has a source
of randomization that is “as if” randomly assigned, but
this variation was not part of a conscious randomized
treatment and control design.
Two cases:
(a) Treatment (X) is “as if” randomly assigned (OLS)
(b) A variable (Z) that influences treatment (X) is
“as if” randomly assigned (IV)
11-43
Two types of quasi-experiments
(a) Treatment (X) is “as if” randomly assigned (perhaps
conditional on some control variables W)
 Ex: Effect of marginal tax rates on labor supply
oX = marginal tax rate (rate changes in one state,
not another; state is “as if” randomly assigned)
(b) A variable (Z) that influences treatment (X) is
“as if” randomly assigned (IV)
 Effect on survival of cardiac catheterization
X = cardiac catheterization;
Z = differential distance to CC hospital
11-44
Econometric methods
(a) Treatment (X) is “as if” randomly assigned (OLS)
Diffs-in-diffs estimator using panel data methods:
Yit = 0 + 1Xit + 2Dit + 3Git + uit, i = 1,…,n
where
t = 1 (before experiment), 2 (after experiment)
Dit = 0 for t = 1, = 1 for t = 2
Git = 0 for control group, = 1 for treatment group
Xit = 1 if treated, = 0 otherwise
= Dit Git = interaction effect of being in treatment
group in the second period
 1
ˆ
 is the diffs-in-diffs estimator…
11-45
The panel data diffs-in-diffs estimator simplifies to
the “changes” diffs-in-diffs estimator when T = 2
Yit = 0 + 1Xit + 2Dit + 3Git + uit, i = 1,…,n (*)
For t = 1: Di1 = 0 and Xi1 = 0 (nobody treated), so
Yi1 = 0 + 3Gi1 + ui1
For t = 2: Di2 = 1 and Xi2 = 1 if treated, = 0 if not, so
Yi2 = 0 + 1Xi2 + 2 + 3Gi2 + ui2
so
Yi = Yi2–Yi1 = (0+1Xi2+2+3Gi2+ui2) – (0+3Gi1+ui1)
= 1Xi + 2 + (ui1 – ui2) (since Gi1 = Gi2)
or
Yi = 2 + 1Xi + vi, where vi = ui1 – ui2 (**)
11-46
Differences-in-differences with control variables
Yit = 0 + 1Xit + 2Dit + 3Git + 4W1it + … + 3+rWrit + uit,
Xit = 1 if the treatment is received, = 0 otherwise
= Git Dit (= 1 for treatment group in second period)
 If the treatment (X) is “as if” randomly assigned,
given W, then u is conditionally mean indep. of X:
E(u|X,D,G,W) = E(u|D,G,W)
 OLS is a consistent estimator of 1, the causal effect
of a change in X
 In general, the OLS estimators of the other
coefficients do not have a causal interpretation.
11-47
(b) A variable (Z) that influences treatment (X) is
“as if” randomly assigned (IV)
Yit = 0 + 1Xit + 2Dit + 3Git + 4W1it + … + 3+rWrit + uit,
Xit = 1 if the treatment is received, = 0 otherwise
= Git Dit (= 1 for treatment group in second period)
Zit = variable that influences treatment but is
uncorrelated with uit (given W’s)
TSLS:
 X = endogenous regressor
 D,G,W1,…,Wr = included exogenous variables
 Z = instrumental variable
11-48
Potential Threats to Quasi-Experiments
(SW Section 11.6)
The threats to the internal validity of a quasi-
experiment are the same as for a true experiment, with
one addition.
4. Failure to randomize (imperfect randomization)
Is the “as if” randomization really random, so that X
(or Z) is uncorrelated with u?
5. Failure to follow treatment protocol & attrition
6. Experimental effects (not applicable)
7. Instrument invalidity (relevance + exogeneity)
(Maybe healthier patients do live closer to CC hospitals
–they might have better access to care in general)
11-49
The threats to the external validity of a quasi-
experiment are the same as for an observational study.
5. Nonrepresentative sample
6. Nonrepresentative “treatment” (that is, program or
policy)
Example: Cardiac catheterization
 The CC study has better external validity than
controlled clinical trials because the CC study uses
observational data based on real-world
implementation of cardiac catheterization.
However that study used data from the early 90’s – do its
findings apply to CC usage today?
11-50
Experimental and Quasi-Experiments Estimates in
Heterogeneous Populations
(SW Section 11.7)
 We have discussed “the” treatment effect
 But the treatment effect could vary across individuals:
oEffect of job training program probably depends on
education, years of education, etc.
oEffect of a cholesterol-lowering drug could depend
other health factors (smoking, age, diabetes,…)
 If this variation depends on observed variables, then
this is a job for interaction variables!
 But what if the source of variation is unobserved?
11-51
Heterogeneity of causal effects
When the causal effect (treatment effect) varies among
individuals, the population is said to be heterogeneous.
When there are heterogeneous causal effects that are not
linked to an observed variable:
 What do we want to estimate?
oOften, the average causal effect in the population
oBut there are other choices, for example the average
causal effect for those who participate (effect of
treatment on the treated)
 What do we actually estimate?
ousing OLS? using TSLS?
11-52
Population regression model with heterogeneous
causal effects:
Yi = 0 + 1iXi + ui, i = 1,…,n
 1i is the causal effect (treatment effect) for the ith
individual in the sample
 For example, in the JTPA experiment, 1i could be zero
if person i already has good job search skills
 What do we want to estimate?
oeffect of the program on a randomly selected person
(the “average causal effect”) – our main focus
oeffect on those most (least?) benefited
oeffect on those who choose to go into the program?
11-53
The Average Causal Effect
Yi = 0 + 1iXi + ui, i = 1,…,n
 The average causal effect (or average treatment effect)
is the mean value of 1i in the population.
 We can think of 1 as a random variable: it has a
distribution in the population, and drawing a different
person yields a different value of 1 (just like X and Y)
 For example, for person #34 the treatment effect is not
random – it is her true treatment effect – but before she
is selected at random from the population, her value of
1 can be thought of as randomly distributed.
11-54
The average causal effect, ctd.
Yi = 0 + 1iXi + ui, i = 1,…,n
 The average causal effect is E(1).
 What does OLS estimate:
(a) When the conditional mean of u given X is zero?
(b)Under the stronger assumption that X is randomly
assigned (as in a randomized experiment)?
In this case, OLS is a consistent estimator of the
average causal effect.
11-55
OLS with Heterogeneous Causal Effects
Yi = 0 + 1iXi + ui, i = 1,…,n
(a) Suppose E(ui|Xi) = 0 so cov(ui,Xi) = 0.
 If X is binary (treated/untreated), 1
ˆ
 = treated
Y – control
Y
estimates the causal effect among those who receive
the treatment.
 Why? For those treated, treated
Y reflects the effect of
the treatment on them. But we don’t know how the
untreated would have responded had they been
treated!
11-56
The math: suppose X is binary and E(ui|Xi) = 0.
Then
1
ˆ
 = treated
Y – control
Y
For the treated:
E(Yi|Xi=1) = 0 + E(1iXi|Xi=1) + E(ui|Xi=1)
= 0 + E(1i|Xi=1)
For the controls:
E(Yi|Xi=0) = 0 + E(1iXi|Xi=0) + E(ui|Xi=0)
= 0
Thus:
1
ˆ

p
 E(Yi|Xi=1) – E(Yi|Xi=0) = E(1i|Xi=1)
= average effect of the treatment on the treated
11-57
OLS with heterogeneous treatment effects: general X
with E(ui|Xi) = 0
1
ˆ
 = 2
XY
X
s
s
p
 2
XY
X


= 0 1
cov( , )
var( )
i i i i
i
X u X
X
 
 
= 0 1
cov( , ) cov( , ) cov( , )
var( )
i i i i i i
i
X X X u X
X
 
 
= 1
cov( , )
var( )
i i i
i
X X
X

(because cov(ui,Xi) = 0)
 If X is binary, this simplifies to the “effect of
treatment on the treated”
 Without heterogeneity, 1i = 1 and 1
ˆ

p
 1
 In general, the treatment effects of individuals with
large values of X are given the most weight
11-58
(b) Now make a stronger assumption: that X is randomly
assigned (experiment or quasi-experiment). Then
what does OLS actually estimate?
 I Xi is randomly assigned, it is distributed
independently of 1i, so there is no difference
between the population of controls and the
population in the treatment group
 Thus the effect of treatment on the treated = the
average treatment effect in the population.
11-59
The math:
1
ˆ

p

1
cov( , )
var( )
i i i
i
X X
X

= 1
1
cov( , )
|
var( )
i i i
i
i
X X
E E
X


 
 
 
 
 
 
= 1
cov( , )
var( )
i i
i
i
X X
E
X

 
 
 
= 1
var( )
var( )
i
i
i
X
E
X

 
 
 
= E(1i)
Summary
 If Xi and 1i are independent (Xi is randomly
assigned), OLS estimates the average treatment effect.
 If Xi is not randomly assigned but E(ui|Xi) = 0, OLS
estimates the effect of treatment on the treated.
 Without heterogeneity, the effect of treatment on the
treated and the average treatment effect are the same
11-60
IV Regression with Heterogeneous Causal Effects
Suppose the treatment effect is heterogeneous and the
effect of the instrument on X is heterogeneous:
Yi = 0 + 1iXi + ui (equation of interest)
Xi = 0 + 1iZi + vi (first stage of TSLS)
In general, TSLS estimates the causal effect for those
whose value of X (probability of treatment) is most
influenced by the instrument.
11-61
IV with heterogeneous causal effects, ctd.
Yi = 0 + 1iXi + ui (equation of interest)
Xi = 0 + 1iZi + vi (first stage of TSLS)
Intuition:
 Suppose 1i’s were known. If for some people 1i =
0, then their predicted value of Xi wouldn’t depend
on Z, so the IV estimator would ignore them.
 The IV estimator puts most of the weight on
individuals for whom Z has a large influence on X.
 TSLS measures the treatment effect for those whose
probability of treatment is most influenced by X.
11-62
The math…
Yi = 0 + 1iXi + ui (equation of interest)
Xi = 0 + 1iZi + vi (first stage of TSLS)
To simplify things, suppose:
 1i and 1i are distributed independently of (ui,vi,Zi)
 E(ui|Zi) = 0 and E(vi|Zi) = 0
 E(1i) 0
Then 1
ˆTSLS

p
 1 1
1
( )
( )
i i
i
E
E
 

(derived in SW App. 11.4)
 TSLS estimates the causal effect for those individuals
for whom Z is most influential (those with large 1i).
11-63
When there are heterogeneous causal effects, what
TSLS estimates depends on the choice of instruments!
 With different instruments, TSLS estimates different
weighted averages!!!
 Suppose you have two instruments, Z1 and Z2.
oIn general these instruments will be influential for
different members of the population.
oUsing Z1, TSLS will estimate the treatment effect for
those people whose probability of treatment (X) is
most influenced by Z1
oThe treatment effect for those most influenced by Z1
might differ from the treatment effect for those most
influenced by Z2
11-64
When does TSLS estimate the average causal effect?
Yi = 0 + 1iXi + ui (equation of interest)
Xi = 0 + 1iZi + vi (first stage of TSLS)
1
ˆTSLS

p
 1 1
1
( )
( )
i i
i
E
E
 

 TSLS estimates the average causal effect (that is,
1
ˆTSLS

p
 E(1i)) if:
oIf 1i and 1i are independent
oIf 1i = 1 (no heterogeneity in equation of interest)
oIf 1i = 1 (no heterogeneity in first stage equation)
 But in general 1
ˆTSLS
 does not estimate E(1i)!
11-65
Example: Cardiac catheterization
Yi = survival time (days) for AMI patients
Xi = received cardiac catheterization (or not)
Zi = differential distance to CC hospital
Equation of interest:
SurvivalDaysi = 0 + 1iCardCathi + ui
First stage (linear probability model):
CardCathi = 0 + 1iDistancei + vi
 For whom does distance have the great effect on the
probability of treatment?
 For those patients, what is their causal effect 1i?
11-66
Equation of interest:
SurvivalDaysi = 0 + 1iCardCathi + ui
First stage (linear probability model):
CardCathi = 0 + 1iDistancei + vi
 TSLS estimates the causal effect for those whose
value of Xi is most heavily influenced by Zi
 TSLS estimates the causal effect for those for whom
distance most influences the probability of treatment
 What is their causal effect? (“We might as well go to
the CC hospital, its not too much farther”)
 This is one explanation of why the TSLS estimate is
smaller than the clinical trial OLS estimate.
11-67
Heterogeneous Causal Effects: Summary
 Heterogeneous causal effects means that the causal (or
treatment) effect varies across individuals.
 When these differences depend on observable variables,
heterogeneous causal effects can be estimated using
interactions (nothing new here).
 When these differences are unobserved (1i) the
average causal (or treatment) effect is the average value
in the population, E(1i).
 When causal effects are heterogeneous, OLS and TSLS
estimate….
11-68
OLS with Heterogeneous Causal Effects
X is: Relation between Xi and
ui:
Then OLS estimates:
binary E(ui|Xi) = 0 effect of treatment on the
treated: E(1i|Xi=1)
X randomly assigned (so
Xi and ui are independent)
average causal effect E(1i)
general E(ui|Xi) = 0 weighted average of 1i,
placing most weight on
those with large |Xi–X|
X randomly assigned average causal effect E(1i)
Without heterogeneity, 1i = 1 and 1
ˆ

p
 1 in all these
cases.
11-69
TSLS with Heterogeneous Causal Effects
 TSLS estimates the causal effect for those individuals
for whom Z is most influential (those with large 1i).
 What TSLS estimates depends on the choice of Z!!
 In CC example, these were the individuals for whom
the decision to drive to a CC lab was heavily
influenced by the extra distance (those patients for
whom the EMT was otherwise “on the fence”)
 Thus TSLS also estimates a causal effect: the average
effect of treatment on those most influenced by the
instrument
oIn general, this is neither the average causal effect
nor the effect of treatment on the treated
11-70
Summary: Experiments and Quasi-Experiments
(SW Section 11.8)
Experiments:
 Average causal effects are defined as expected values
of ideal randomized controlled experiments
 Actual experiments have threats to internal validity
 These threats to internal validity can be addressed (in
part) by:
opanel methods (differences-in-differences)
omultiple regression
oIV (using initial assignment as an instrument)
11-71
Summary, ctd.
Quasi-experiments:
 Quasi-experiments have an “as-if” randomly assigned
source of variation.
 This as-if random variation can generate:
oXi which satisfies E(ui|Xi) = 0 (so estimation
proceeds using OLS); or
oinstrumental variable(s) which satisfy E(ui|Zi) = 0
(so estimation proceeds using TSLS)
 Quasi-experiments also have threats to internal vaidity
11-72
Summary, ctd.
Two additional subtle issues:
 What is a control variable?
oA variable W for which X and u are uncorrelated,
given the value of W (conditional mean
independence: E(ui|Xi,Wi) = E(ui|Wi)
oExample: STAR & effect of teacher experience
 within their school, teachers were randomly
assigned to regular/reg+aide/small class
 OLS provides an unbiased estimator of the causal
effect, but only after controlling for school
effects.
11-73
Summary, ctd.
 What do OLS and TSLS estimate when there is
unobserved heterogeneity of causal effects?
 In general, weighted averages of causal effects:
oIf X is randomly assigned, then OLS estimates the
average causal effect.
oIf Xi is not randomly assigned but E(ui|Xi) = 0, OLS
estimates the average effect of treatment on the
treated.
o If E(ui|Zi) = 0, TSLS estimates the average effect of
treatment on those most influenced by Zi.

More Related Content

PPTX
Adlt673 session 5_quantitative_experimental_nonexperimental
PPTX
Experimental research design
PPTX
Experimental research
PPT
Threats to Internal and External Validity
PPTX
Adlt673 session 5_2017
PPTX
The classic experiment_(and_its_limitations)-1
DOCX
Research Methods in PsychologyQuasi-Experimental Designs.docx
PPT
Chapter16
Adlt673 session 5_quantitative_experimental_nonexperimental
Experimental research design
Experimental research
Threats to Internal and External Validity
Adlt673 session 5_2017
The classic experiment_(and_its_limitations)-1
Research Methods in PsychologyQuasi-Experimental Designs.docx
Chapter16

Similar to Ch 11 Slides.doc. Introduction to econometric modeling (20)

PDF
Program Evaluation Midterm
PPTX
Experimental research
PDF
Quantitative methodology part two.compressed 2
PPTX
Experimental research_Kritika.pptx
PPT
Constructs, variables, hypotheses
PPTX
DOCX
1) The path length from A to B in the following graph is .docx
PPT
Chapter7
PDF
Potential Solutions to the Fundamental Problem of Causal Inference: An Overview
DOCX
Experimental Design 1 Running Head EXPERIMENTAL DES.docx
PPTX
LESSON VI - Counterfactuals in Impact Evaluation.pptx
PPT
Experimental research design
PPT
497experiments
PPT
Experiments
PPTX
Experimental method of Educational Research.
PPT
Chapter8.ppt resaerch types and advantages disadvantages
PDF
Donald t. (donald_t._campbell)_campbell,_julian_stanley-experimental_and_quas...
PPT
Ch.4 ppt
PPT
Causal research-1219342132452457-8
DOCX
20. (8` points) Two observers observe a child in the classro.docx
Program Evaluation Midterm
Experimental research
Quantitative methodology part two.compressed 2
Experimental research_Kritika.pptx
Constructs, variables, hypotheses
1) The path length from A to B in the following graph is .docx
Chapter7
Potential Solutions to the Fundamental Problem of Causal Inference: An Overview
Experimental Design 1 Running Head EXPERIMENTAL DES.docx
LESSON VI - Counterfactuals in Impact Evaluation.pptx
Experimental research design
497experiments
Experiments
Experimental method of Educational Research.
Chapter8.ppt resaerch types and advantages disadvantages
Donald t. (donald_t._campbell)_campbell,_julian_stanley-experimental_and_quas...
Ch.4 ppt
Causal research-1219342132452457-8
20. (8` points) Two observers observe a child in the classro.docx
Ad

More from ohenebabismark508 (7)

DOC
Ch 56669 Slides.doc.2234322344443222222344
DOC
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
DOC
Ch 8 Slides.doc28383&3&3&39388383288283838
DOC
Ch 4 Slides.doc655444444444444445678888776
DOC
Ch 10 Slides.doc546555544554555455555777777
DOC
Ch 12 Slides.doc. Introduction of science of business
PPTX
Chapter_1_Intro.pptx. Introductory econometric book
Ch 56669 Slides.doc.2234322344443222222344
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 4 Slides.doc655444444444444445678888776
Ch 10 Slides.doc546555544554555455555777777
Ch 12 Slides.doc. Introduction of science of business
Chapter_1_Intro.pptx. Introductory econometric book
Ad

Recently uploaded (20)

PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PDF
way to join Real illuminati agent 0782561496,0756664682
PPTX
Session 14-16. Capital Structure Theories.pptx
PDF
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PPTX
Introduction to Customs (June 2025) v1.pptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
Lecture1.pdf buss1040 uses economics introduction
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
PDF
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
PPTX
introuction to banking- Types of Payment Methods
PPTX
The discussion on the Economic in transportation .pptx
PDF
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
PPTX
Who’s winning the race to be the world’s first trillionaire.pptx
PDF
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
way to join Real illuminati agent 0782561496,0756664682
Session 14-16. Capital Structure Theories.pptx
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
Understanding University Research Expenditures (1)_compressed.pdf
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
Introduction to Customs (June 2025) v1.pptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Lecture1.pdf buss1040 uses economics introduction
Corporate Finance Fundamentals - Course Presentation.pdf
ssrn-3708.kefbkjbeakjfiuheioufh ioehoih134.pdf
introuction to banking- Types of Payment Methods
The discussion on the Economic in transportation .pptx
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
Who’s winning the race to be the world’s first trillionaire.pptx
THE EFFECT OF FOREIGN AID ON ECONOMIC GROWTH IN ETHIOPIA

Ch 11 Slides.doc. Introduction to econometric modeling

  • 1. 11-1 Experiments and Quasi-Experiments (SW Chapter 11) Why study experiments?  Ideal randomized controlled experiments provide a benchmark for assessing observational studies.  Actual experiments are rare ($$$) but influential.  Experiments can solve the threats to internal validity of observational studies, but they have their own threats to internal validity.  Thinking about experiments helps us to understand quasi-experiments, or “natural experiments,” in which there some variation is “as if” randomly assigned.
  • 2. 11-2 Terminology: experiments and quasi-experiments  An experiment is designed and implemented consciously by human researchers. An experiment entails conscious use of a treatment and control group with random assignment (e.g. clinical trials of a drug)  A quasi-experiment or natural experiment has a source of randomization that is “as if” randomly assigned, but this variation was not part of a conscious randomized treatment and control design.  Program evaluation is the field of statistics aimed at evaluating the effect of a program or policy, for example, an ad campaign to cut smoking.
  • 3. 11-3 Different types of experiments: three examples  Clinical drug trial: does a proposed drug lower cholesterol? oY = cholesterol level oX = treatment or control group (or dose of drug)  Job training program (Job Training Partnership Act) oY = has a job, or not (or Y = wage income) oX = went through experimental program, or not  Class size effect (Tennessee class size experiment) oY = test score (Stanford Achievement Test) oX = class size treatment group (regular, regular + aide, small)
  • 4. 11-4 Our treatment of experiments: brief outline  Why (precisely) do ideal randomized controlled experiments provide estimates of causal effects?  What are the main threats to the validity (internal and external) of actual experiments – that is, experiments actually conducted with human subjects?  Flaws in actual experiments can result in X and u being correlated (threats to internal validity).  Some of these threats can be addressed using the regression estimation methods we have used so far: multiple regression, panel data, IV regression.
  • 5. 11-5 Idealized Experiments and Causal Effects (SW Section 11.1)  An ideal randomized controlled experiment randomly assigns subjects to treatment and control groups.  More generally, the treatment level X is randomly assigned: Yi = 0 + 1Xi + ui  If X is randomly assigned (for example by computer) then u and X are independently distributed and E(ui|Xi) = 0, so OLS yields an unbiased estimator of 1.  The causal effect is the population value of 1 in an ideal randomized controlled experiment
  • 6. 11-6 Estimation of causal effects in an ideal randomized controlled experiment  Random assignment of X implies that E(ui|Xi) = 0.  Thus the OLS estimator 1 ˆ  is unbiased.  When the treatment is binary, 1 ˆ  is just the difference in mean outcome (Y) in the treatment vs. control group ( treated Y – control Y ).  This differences in means is sometimes called the differences estimator.
  • 7. 11-7 Potential Problems with Experiments in Practice (SW Section 11.2) Threats to Internal Validity 1. Failure to randomize (or imperfect randomization)  for example, openings in job treatment program are filled on first-come, first-serve basis; latecomers are controls  result is correlation between X and u
  • 8. 11-8 Threats to internal validity, ctd. 2. Failure to follow treatment protocol (or “partial compliance”)  some controls get the treatment  some “treated” get controls  “errors-in-variables” bias: corr(X,u) 0  Attrition (some subjects drop out)  suppose the controls who get jobs move out of town; then corr(X,u) 0
  • 9. 11-9 Threats to internal validity, ctd. 3. Experimental effects  experimenter bias (conscious or subconscious): treatment X is associated with “extra effort” or “extra care,” so corr(X,u) 0  subject behavior might be affected by being in an experiment, so corr(X,u) 0 (Hawthorne effect) Just as in regression analysis with observational data, threats to the internal validity of regression with experimental data implies that corr(X,u) 0 so OLS (the differences estimator) is biased. George Elton Mayo and the Hawthorne Experiment
  • 10. 11-10 Subjects in the Hawthorne plant experiments, 1924 – 1932
  • 11. 11-11 Threats to External Validity 1. Nonrepresentative sample 2. Nonrepresentative “treatment” (that is, program or policy) 3. General equilibrium effects (effect of a program can depend on its scale; admissions counseling ) 4. Treatment v. eligibility effects (which is it you want to measure: effect on those who take the program, or the effect on those are eligible)
  • 12. 11-12 Regression Estimators of Causal Effects Using Experimental Data (SW Section 11.3)  Focus on the case that X is binary (treatment/control).  Often you observe subject characteristics, W1i,…,Wri.  Extensions of the differences estimator: ocan improve efficiency (reduce standard errors) ocan eliminate bias that arises when:  treatment and control groups differ  there is “conditional randomization”  there is partial compliance  These extensions involve methods we have already seen – multiple regression, panel data, IV regression
  • 13. 11-13 Estimators of the Treatment Effect 1 using Experimental Data (X = 1 if treated, 0 if control) Dep. vble Ind. vble(s) method differences Y X OLS differences-in- differences Y = Yafter – Ybefore X OLS adjusts for initial differences between treatment and control groups differences with add’l regressors Y X,W1, …,Wn OLS controls for additional subject characteristics W
  • 14. 11-14 Estimators with experimental data, ctd. Dep. vble Ind. vble(s) method differences-in- differences with add’l regressors Y = Yafter – Ybefore X,W1, …,Wn OLS adjusts for group differences + controls for subject char’s W Instrumental variables Y X TSLS Z = initial random assignment; eliminates bias from partial compliance  TSLS with Z = initial random assignment also can be applied to the differences-in-differences estimator and the estimators with additional regressors (W’s)
  • 15. 11-15 The differences-in-differences estimator  Suppose the treatment and control groups differ systematically; maybe the control group is healthier (wealthier; better educated; etc.)  Then X is correlated with u, and the differences estimator is biased.  The differences-in-differences estimator adjusts for pre- experimental differences by subtracting off each subject’s pre-experimental value of Y o before i Y = value of Y for subject i before the expt o after i Y = value of Y for subject i after the expt oYi = after i Y – before i Y = change over course of expt
  • 16. 11-16 1 ˆdiffs in diffs    = ( , treat after Y – , treat before Y ) – ( , control after Y – , control before Y )
  • 17. 11-17 The differences-in-differences estimator, ctd. (1) “Differences” formulation: Yi = 0 + 1Xi + ui where Yi = after i Y – before i Y Xi = 1 if treated, = 0 otherwise  1 ˆ  is the diffs-in-diffs estimator
  • 18. 11-18 The differences-in-differences estimator, ctd. (2) Equivalent “panel data” version: Yit = 0 + 1Xit + 2Dit + 3Git + vit, i = 1,…,n where t = 1 (before experiment), 2 (after experiment) Dit = 0 for t = 1, = 1 for t = 2 Git = 0 for control group, = 1 for treatment group Xit = 1 if treated, = 0 otherwise = Dit Git = interaction effect of being in treatment group in the second period  1 ˆ  is the diffs-in-diffs estimator
  • 19. 11-19 Including additional subject characteristics (W’s)  Typically you observe additional subject characteristics, W1i,…,Wri  Differences estimator with add’l regressors: Yi = 0 + 1Xi + 2W1i + … + r+1Wri + ui  Differences-in-differences estimator with W’s: Yi = 0 + 1Xi + 2W1i + … + r+1Wri + ui where Yi = after i Y – before i Y .
  • 20. 11-20 Why include additional subject characteristics (W’s)? 1. Efficiency: more precise estimator of 1 (smaller standard errors) 2. Check for randomization. If X is randomly assigned, then the OLS estimators with and without the W’s should be similar – if they aren’t, this suggests that X wasn’t randomly designed (a problem with the expt.)  Note: To check directly for randomization, regress X on the W’s and do a F-test. 3. Adjust for conditional randomization (we’ll return to this later…)
  • 21. 11-21 Estimation when there is partial compliance Consider diffs-in-diffs estimator, X = actual treatment Yi = 0 + 1Xi + ui  Suppose there is partial compliance: some of the treated don’t take the drug; some of the controls go to job training anyway  Then X is correlated with u, and OLS is biased  Suppose initial assignment, Z, is random  Then (1) corr(Z,X) 0 and (2) corr(Z,u) = 0  Thus 1 can be estimated by TSLS, with instrumental variable Z = initial assignment  This can be extended to W’s (included exog. variables)
  • 22. 11-22 Experimental Estimates of the Effect of Reduction: The Tennessee Class Size Experiment (SW Section 11.4) Project STAR (Student-Teacher Achievement Ratio)  4-year study, $12 million  Upon entering the school system, a student was randomly assigned to one of three groups: oregular class (22 – 25 students) oregular class + aide osmall class (13 – 17 students)  regular class students re-randomized after first year to regular or regular+aide  Y = Stanford Achievement Test scores
  • 23. 11-23 Deviations from experimental design  Partial compliance: o10% of students switched treatment groups because of “incompatibility” and “behavior problems” – how much of this was because of parental pressure? oNewcomers: incomplete receipt of treatment for those who move into district after grade 1  Attrition ostudents move out of district ostudents leave for private/religious schools
  • 24. 11-24 Regression analysis  The “differences” regression model: Yi = 0 + 1SmallClassi + 2RegAidei + ui where SmallClassi = 1 if in a small class RegAidei = 1 if in regular class with aide  Additional regressors (W’s) oteacher experience ofree lunch eligibility ogender, race
  • 26. 11-26
  • 27. 11-27 How big are these estimated effects?  Put on same basis by dividing by std. dev. of Y  Units are now standard deviations of test scores
  • 28. 11-28 How do these estimates compare to those from the California, Mass. observational studies? (Ch. 4 – 7)
  • 29. 11-29 Summary: The Tennessee Class Size Experiment Remaining threats to internal validity  partial compliance/incomplete treatment ocan use TSLS with Z = initial assignment oTurns out, TSLS and OLS estimates are similar (Krueger (1999)), so this bias seems not to be large Main findings:  The effects are small quantitatively (same size as gender difference)  Effect is sustained but not cumulative or increasing biggest effect at the youngest grades
  • 30. 11-30 What is the Difference Between a Control Variable and the Variable of Interest? (SW App. 11.3) Example: “free lunch eligible” in the STAR regressions  Coefficient is large, negative, statistically significant  Policy interpretation: Making students ineligible for a free school lunch will improve their test scores.  Is this really an estimate of a causal effect?  Is the OLS estimator of its coefficient unbiased?  Can it be that the coefficient on “free lunch eligible” is biased but the coefficient on SmallClass is not?
  • 31. 11-31
  • 32. 11-32 Example: “free lunch eligible,” ctd.  Coefficient on “free lunch eligible” is large, negative, statistically significant  Policy interpretation: Making students ineligible for a free school lunch will improve their test scores.  Why (precisely) can we interpret the coefficient on SmallClass as an unbiased estimate of a causal effect, but not the coefficient on “free lunch eligible”?  This is not an isolated example! oOther “control variables” we have used: gender, race, district income, state fixed effects, time fixed effects, city (or state) population,…  What is a “control variable” anyway?
  • 33. 11-33 Simplest case: one X, one control variable W Yi = 0 + 1 Xi + 2Wi + ui For example,  W = free lunch eligible (binary)  X = small class/large class (binary)  Suppose random assignment of X depends on W ofor example, 60% of free-lunch eligibles get small class, 40% of ineligibles get small class) onote: this wasn’t the actual STAR randomization procedure – this is a hypothetical example  Further suppose W is correlated with u
  • 34. 11-34 Yi = 0 + 1 Xi + 2Wi + ui Suppose:  The control variable W is correlated with u  Given W = 0 (ineligible), X is randomly assigned  Given W = 1 (eligible), X is randomly assigned. Then:  Given the value of W, X is randomly assigned;  That is, controlling for W, X is randomly assigned;  Thus, controlling for W, X is uncorrelated with u  Moreover, E(u|X,W) doesn’t depend on X  That is, we have conditional mean independence: E(u|X,W) = E(u|W)
  • 35. 11-35 Implications of conditional mean independence Yi = 0 + 1 Xi + 2Wi + ui Suppose E(u|W) is linear in W (not restrictive – could add quadratics etc.): then, E(u|X,W) = E(u|W) = 0 + 1Wi (*) so E(Yi|Xi,Wi) = E(0 + 1 Xi + 2Wi + ui|Xi,Wi) = 0 + 1Xi + 2Wi + E(ui|Xi,Wi) = 0 + 1Xi + 2Wi + 0 + 1Wi by (*) = (0+0) + 1Xi + (1+2)Wi
  • 36. 11-36 Implications of conditional mean independence:  The conditional mean of Y given X and W is E(Yi|Xi,Wi) = (0+0) + 1Xi + (1+2)Wi  The effect of a change in X under conditional mean independence is the desired causal effect: E(Yi|Xi = x+x,Wi) – E(Yi|Xi = x,Wi) = 1x or 1 = ( | , ) ( | , ) i i i i i i E Y X x x W E Y X x W x        If X is binary (treatment/control), this becomes: 1 = ( | 1, ) ( | 0, ) i i i i i i E Y X W E Y X W x     which is the desired treatment effect.
  • 37. 11-37 Implications of conditional mean independence, ctd. Yi = 0 + 1 Xi + 2Wi + ui Conditional mean independence says: E(u|X,W) = E(u|W) which, with linearity, implies: E(Yi|Xi,Wi) = (0+0) + 1Xi + (1+2)Wi Then:  The OLS estimator 1 ˆ  is unbiased.  2 ˆ  is not consistent and not meaningful  The usual inference methods (standard errors, hypothesis tests, etc.) apply to 1 ˆ  .
  • 38. 11-38 So, what is a control variable? A control variable W is a variable that results in X satisfying the conditional mean independence condition: E(u|X,W) = E(u|W)  Upon including a control variable in the regression, X ceases to be correlated with the error term.  The control variable itself can be (in general will be) correlated with the error term.  The coefficient on X has a causal interpretation.  The coefficient on W does not have a causal interpretation.
  • 39. 11-39 Example: Effect of teacher experience on test scores More on the design of Project STAR:  Teachers didn’t change school because of the expt.  Within their normal school, teachers were randomly assigned to small/regular/reg+aide classrooms.  What is the effect of X = years of teacher education? The design implies conditional mean independence:  W = school binary indicator  Given W (school), X is randomly assigned  That is, E(u|X,W) = E(u|W)  W is plausibly correlated with u (nonzero school fixed effects: some schools are better/richer/etc than others)
  • 40. 11-40
  • 41. 11-41 Example: teacher experience, ctd.  Without school fixed effects (2), the estimated effect of an additional year of experience is 1.47 (SE = .17)  “Controlling for the school” (3), the estimated effect of an additional year of experience is .74 (SE = .17)  Direction of bias makes sense: oless experienced teachers at worse schools oyears of experience picks up this school effect  OLS estimator of coefficient on years of experience is biased up without school effects; with school effects, OLS yields unbiased estimator of causal effect  School effect coefficients don’t have a causal interpretation (effect of student changing schools)
  • 42. 11-42 Quasi-Experiments (SW Section 11.5) A quasi-experiment or natural experiment has a source of randomization that is “as if” randomly assigned, but this variation was not part of a conscious randomized treatment and control design. Two cases: (a) Treatment (X) is “as if” randomly assigned (OLS) (b) A variable (Z) that influences treatment (X) is “as if” randomly assigned (IV)
  • 43. 11-43 Two types of quasi-experiments (a) Treatment (X) is “as if” randomly assigned (perhaps conditional on some control variables W)  Ex: Effect of marginal tax rates on labor supply oX = marginal tax rate (rate changes in one state, not another; state is “as if” randomly assigned) (b) A variable (Z) that influences treatment (X) is “as if” randomly assigned (IV)  Effect on survival of cardiac catheterization X = cardiac catheterization; Z = differential distance to CC hospital
  • 44. 11-44 Econometric methods (a) Treatment (X) is “as if” randomly assigned (OLS) Diffs-in-diffs estimator using panel data methods: Yit = 0 + 1Xit + 2Dit + 3Git + uit, i = 1,…,n where t = 1 (before experiment), 2 (after experiment) Dit = 0 for t = 1, = 1 for t = 2 Git = 0 for control group, = 1 for treatment group Xit = 1 if treated, = 0 otherwise = Dit Git = interaction effect of being in treatment group in the second period  1 ˆ  is the diffs-in-diffs estimator…
  • 45. 11-45 The panel data diffs-in-diffs estimator simplifies to the “changes” diffs-in-diffs estimator when T = 2 Yit = 0 + 1Xit + 2Dit + 3Git + uit, i = 1,…,n (*) For t = 1: Di1 = 0 and Xi1 = 0 (nobody treated), so Yi1 = 0 + 3Gi1 + ui1 For t = 2: Di2 = 1 and Xi2 = 1 if treated, = 0 if not, so Yi2 = 0 + 1Xi2 + 2 + 3Gi2 + ui2 so Yi = Yi2–Yi1 = (0+1Xi2+2+3Gi2+ui2) – (0+3Gi1+ui1) = 1Xi + 2 + (ui1 – ui2) (since Gi1 = Gi2) or Yi = 2 + 1Xi + vi, where vi = ui1 – ui2 (**)
  • 46. 11-46 Differences-in-differences with control variables Yit = 0 + 1Xit + 2Dit + 3Git + 4W1it + … + 3+rWrit + uit, Xit = 1 if the treatment is received, = 0 otherwise = Git Dit (= 1 for treatment group in second period)  If the treatment (X) is “as if” randomly assigned, given W, then u is conditionally mean indep. of X: E(u|X,D,G,W) = E(u|D,G,W)  OLS is a consistent estimator of 1, the causal effect of a change in X  In general, the OLS estimators of the other coefficients do not have a causal interpretation.
  • 47. 11-47 (b) A variable (Z) that influences treatment (X) is “as if” randomly assigned (IV) Yit = 0 + 1Xit + 2Dit + 3Git + 4W1it + … + 3+rWrit + uit, Xit = 1 if the treatment is received, = 0 otherwise = Git Dit (= 1 for treatment group in second period) Zit = variable that influences treatment but is uncorrelated with uit (given W’s) TSLS:  X = endogenous regressor  D,G,W1,…,Wr = included exogenous variables  Z = instrumental variable
  • 48. 11-48 Potential Threats to Quasi-Experiments (SW Section 11.6) The threats to the internal validity of a quasi- experiment are the same as for a true experiment, with one addition. 4. Failure to randomize (imperfect randomization) Is the “as if” randomization really random, so that X (or Z) is uncorrelated with u? 5. Failure to follow treatment protocol & attrition 6. Experimental effects (not applicable) 7. Instrument invalidity (relevance + exogeneity) (Maybe healthier patients do live closer to CC hospitals –they might have better access to care in general)
  • 49. 11-49 The threats to the external validity of a quasi- experiment are the same as for an observational study. 5. Nonrepresentative sample 6. Nonrepresentative “treatment” (that is, program or policy) Example: Cardiac catheterization  The CC study has better external validity than controlled clinical trials because the CC study uses observational data based on real-world implementation of cardiac catheterization. However that study used data from the early 90’s – do its findings apply to CC usage today?
  • 50. 11-50 Experimental and Quasi-Experiments Estimates in Heterogeneous Populations (SW Section 11.7)  We have discussed “the” treatment effect  But the treatment effect could vary across individuals: oEffect of job training program probably depends on education, years of education, etc. oEffect of a cholesterol-lowering drug could depend other health factors (smoking, age, diabetes,…)  If this variation depends on observed variables, then this is a job for interaction variables!  But what if the source of variation is unobserved?
  • 51. 11-51 Heterogeneity of causal effects When the causal effect (treatment effect) varies among individuals, the population is said to be heterogeneous. When there are heterogeneous causal effects that are not linked to an observed variable:  What do we want to estimate? oOften, the average causal effect in the population oBut there are other choices, for example the average causal effect for those who participate (effect of treatment on the treated)  What do we actually estimate? ousing OLS? using TSLS?
  • 52. 11-52 Population regression model with heterogeneous causal effects: Yi = 0 + 1iXi + ui, i = 1,…,n  1i is the causal effect (treatment effect) for the ith individual in the sample  For example, in the JTPA experiment, 1i could be zero if person i already has good job search skills  What do we want to estimate? oeffect of the program on a randomly selected person (the “average causal effect”) – our main focus oeffect on those most (least?) benefited oeffect on those who choose to go into the program?
  • 53. 11-53 The Average Causal Effect Yi = 0 + 1iXi + ui, i = 1,…,n  The average causal effect (or average treatment effect) is the mean value of 1i in the population.  We can think of 1 as a random variable: it has a distribution in the population, and drawing a different person yields a different value of 1 (just like X and Y)  For example, for person #34 the treatment effect is not random – it is her true treatment effect – but before she is selected at random from the population, her value of 1 can be thought of as randomly distributed.
  • 54. 11-54 The average causal effect, ctd. Yi = 0 + 1iXi + ui, i = 1,…,n  The average causal effect is E(1).  What does OLS estimate: (a) When the conditional mean of u given X is zero? (b)Under the stronger assumption that X is randomly assigned (as in a randomized experiment)? In this case, OLS is a consistent estimator of the average causal effect.
  • 55. 11-55 OLS with Heterogeneous Causal Effects Yi = 0 + 1iXi + ui, i = 1,…,n (a) Suppose E(ui|Xi) = 0 so cov(ui,Xi) = 0.  If X is binary (treated/untreated), 1 ˆ  = treated Y – control Y estimates the causal effect among those who receive the treatment.  Why? For those treated, treated Y reflects the effect of the treatment on them. But we don’t know how the untreated would have responded had they been treated!
  • 56. 11-56 The math: suppose X is binary and E(ui|Xi) = 0. Then 1 ˆ  = treated Y – control Y For the treated: E(Yi|Xi=1) = 0 + E(1iXi|Xi=1) + E(ui|Xi=1) = 0 + E(1i|Xi=1) For the controls: E(Yi|Xi=0) = 0 + E(1iXi|Xi=0) + E(ui|Xi=0) = 0 Thus: 1 ˆ  p  E(Yi|Xi=1) – E(Yi|Xi=0) = E(1i|Xi=1) = average effect of the treatment on the treated
  • 57. 11-57 OLS with heterogeneous treatment effects: general X with E(ui|Xi) = 0 1 ˆ  = 2 XY X s s p  2 XY X   = 0 1 cov( , ) var( ) i i i i i X u X X     = 0 1 cov( , ) cov( , ) cov( , ) var( ) i i i i i i i X X X u X X     = 1 cov( , ) var( ) i i i i X X X  (because cov(ui,Xi) = 0)  If X is binary, this simplifies to the “effect of treatment on the treated”  Without heterogeneity, 1i = 1 and 1 ˆ  p  1  In general, the treatment effects of individuals with large values of X are given the most weight
  • 58. 11-58 (b) Now make a stronger assumption: that X is randomly assigned (experiment or quasi-experiment). Then what does OLS actually estimate?  I Xi is randomly assigned, it is distributed independently of 1i, so there is no difference between the population of controls and the population in the treatment group  Thus the effect of treatment on the treated = the average treatment effect in the population.
  • 59. 11-59 The math: 1 ˆ  p  1 cov( , ) var( ) i i i i X X X  = 1 1 cov( , ) | var( ) i i i i i X X E E X               = 1 cov( , ) var( ) i i i i X X E X        = 1 var( ) var( ) i i i X E X        = E(1i) Summary  If Xi and 1i are independent (Xi is randomly assigned), OLS estimates the average treatment effect.  If Xi is not randomly assigned but E(ui|Xi) = 0, OLS estimates the effect of treatment on the treated.  Without heterogeneity, the effect of treatment on the treated and the average treatment effect are the same
  • 60. 11-60 IV Regression with Heterogeneous Causal Effects Suppose the treatment effect is heterogeneous and the effect of the instrument on X is heterogeneous: Yi = 0 + 1iXi + ui (equation of interest) Xi = 0 + 1iZi + vi (first stage of TSLS) In general, TSLS estimates the causal effect for those whose value of X (probability of treatment) is most influenced by the instrument.
  • 61. 11-61 IV with heterogeneous causal effects, ctd. Yi = 0 + 1iXi + ui (equation of interest) Xi = 0 + 1iZi + vi (first stage of TSLS) Intuition:  Suppose 1i’s were known. If for some people 1i = 0, then their predicted value of Xi wouldn’t depend on Z, so the IV estimator would ignore them.  The IV estimator puts most of the weight on individuals for whom Z has a large influence on X.  TSLS measures the treatment effect for those whose probability of treatment is most influenced by X.
  • 62. 11-62 The math… Yi = 0 + 1iXi + ui (equation of interest) Xi = 0 + 1iZi + vi (first stage of TSLS) To simplify things, suppose:  1i and 1i are distributed independently of (ui,vi,Zi)  E(ui|Zi) = 0 and E(vi|Zi) = 0  E(1i) 0 Then 1 ˆTSLS  p  1 1 1 ( ) ( ) i i i E E    (derived in SW App. 11.4)  TSLS estimates the causal effect for those individuals for whom Z is most influential (those with large 1i).
  • 63. 11-63 When there are heterogeneous causal effects, what TSLS estimates depends on the choice of instruments!  With different instruments, TSLS estimates different weighted averages!!!  Suppose you have two instruments, Z1 and Z2. oIn general these instruments will be influential for different members of the population. oUsing Z1, TSLS will estimate the treatment effect for those people whose probability of treatment (X) is most influenced by Z1 oThe treatment effect for those most influenced by Z1 might differ from the treatment effect for those most influenced by Z2
  • 64. 11-64 When does TSLS estimate the average causal effect? Yi = 0 + 1iXi + ui (equation of interest) Xi = 0 + 1iZi + vi (first stage of TSLS) 1 ˆTSLS  p  1 1 1 ( ) ( ) i i i E E     TSLS estimates the average causal effect (that is, 1 ˆTSLS  p  E(1i)) if: oIf 1i and 1i are independent oIf 1i = 1 (no heterogeneity in equation of interest) oIf 1i = 1 (no heterogeneity in first stage equation)  But in general 1 ˆTSLS  does not estimate E(1i)!
  • 65. 11-65 Example: Cardiac catheterization Yi = survival time (days) for AMI patients Xi = received cardiac catheterization (or not) Zi = differential distance to CC hospital Equation of interest: SurvivalDaysi = 0 + 1iCardCathi + ui First stage (linear probability model): CardCathi = 0 + 1iDistancei + vi  For whom does distance have the great effect on the probability of treatment?  For those patients, what is their causal effect 1i?
  • 66. 11-66 Equation of interest: SurvivalDaysi = 0 + 1iCardCathi + ui First stage (linear probability model): CardCathi = 0 + 1iDistancei + vi  TSLS estimates the causal effect for those whose value of Xi is most heavily influenced by Zi  TSLS estimates the causal effect for those for whom distance most influences the probability of treatment  What is their causal effect? (“We might as well go to the CC hospital, its not too much farther”)  This is one explanation of why the TSLS estimate is smaller than the clinical trial OLS estimate.
  • 67. 11-67 Heterogeneous Causal Effects: Summary  Heterogeneous causal effects means that the causal (or treatment) effect varies across individuals.  When these differences depend on observable variables, heterogeneous causal effects can be estimated using interactions (nothing new here).  When these differences are unobserved (1i) the average causal (or treatment) effect is the average value in the population, E(1i).  When causal effects are heterogeneous, OLS and TSLS estimate….
  • 68. 11-68 OLS with Heterogeneous Causal Effects X is: Relation between Xi and ui: Then OLS estimates: binary E(ui|Xi) = 0 effect of treatment on the treated: E(1i|Xi=1) X randomly assigned (so Xi and ui are independent) average causal effect E(1i) general E(ui|Xi) = 0 weighted average of 1i, placing most weight on those with large |Xi–X| X randomly assigned average causal effect E(1i) Without heterogeneity, 1i = 1 and 1 ˆ  p  1 in all these cases.
  • 69. 11-69 TSLS with Heterogeneous Causal Effects  TSLS estimates the causal effect for those individuals for whom Z is most influential (those with large 1i).  What TSLS estimates depends on the choice of Z!!  In CC example, these were the individuals for whom the decision to drive to a CC lab was heavily influenced by the extra distance (those patients for whom the EMT was otherwise “on the fence”)  Thus TSLS also estimates a causal effect: the average effect of treatment on those most influenced by the instrument oIn general, this is neither the average causal effect nor the effect of treatment on the treated
  • 70. 11-70 Summary: Experiments and Quasi-Experiments (SW Section 11.8) Experiments:  Average causal effects are defined as expected values of ideal randomized controlled experiments  Actual experiments have threats to internal validity  These threats to internal validity can be addressed (in part) by: opanel methods (differences-in-differences) omultiple regression oIV (using initial assignment as an instrument)
  • 71. 11-71 Summary, ctd. Quasi-experiments:  Quasi-experiments have an “as-if” randomly assigned source of variation.  This as-if random variation can generate: oXi which satisfies E(ui|Xi) = 0 (so estimation proceeds using OLS); or oinstrumental variable(s) which satisfy E(ui|Zi) = 0 (so estimation proceeds using TSLS)  Quasi-experiments also have threats to internal vaidity
  • 72. 11-72 Summary, ctd. Two additional subtle issues:  What is a control variable? oA variable W for which X and u are uncorrelated, given the value of W (conditional mean independence: E(ui|Xi,Wi) = E(ui|Wi) oExample: STAR & effect of teacher experience  within their school, teachers were randomly assigned to regular/reg+aide/small class  OLS provides an unbiased estimator of the causal effect, but only after controlling for school effects.
  • 73. 11-73 Summary, ctd.  What do OLS and TSLS estimate when there is unobserved heterogeneity of causal effects?  In general, weighted averages of causal effects: oIf X is randomly assigned, then OLS estimates the average causal effect. oIf Xi is not randomly assigned but E(ui|Xi) = 0, OLS estimates the average effect of treatment on the treated. o If E(ui|Zi) = 0, TSLS estimates the average effect of treatment on those most influenced by Zi.