Part 2A: Endogeneity [ 1/53]
Econometric Analysis of Panel Data
William Greene
Department of Economics
University of South Florida
Part 2A: Endogeneity [ 2/53]
lnSPI =  + *lnGDPPC(PPP) + , 0 <  < 1.
(Huffington Post, 2/16/16)
Reverse Causality in the Preston Curve?
Part 2A: Endogeneity [ 3/53]
In two of my projects, I was asked by reviewers to address the endogeneity concerns. In one
project, I regress employee departure on project termination. Arguably project termination is
not exogenous.
DEPARTURE = f(a + b*PROJECT TERMINATION + e)
(Time until departure???)
In the other, I regress firms’ charitable giving in specific countries on their business activities
in the local community. Again, business presence in countries are not exogenous. The
problem is, both papers used non-linear models (Hazard model in one, and hurdle model
in the other), which are required by the data I have. Are you aware of any econometric
methods to deal with endogeneity in non-linear models? My search online did not go
anywhere.
Hazard Model: Not a linear model.
Prob[event happens in time interval t to t+Δ| event happens after time t] =
a function of (x’β)
http://people.stern.nyu.edu/wgreene/Econometrics/
NonlinearPanelDataModels.pdf
Part 2A: Endogeneity [ 4/53]
I have been asked this question (or ones like it) dozens of times. I think the issue is
getting way overplayed. But, I'm not the majority voice, so you are going to have to deal
with this.
Step 1: you or the referee need to figure out (make a case for) by what construction is
"project termination" endogenous. What is correlated with what **in your hazard
model** that makes the variable endogenous? There must be a second equation that
implies that project termination is endogenous. What is it? What unobservable in that
equation is correlated with what unobservable in the hazard model that makes it
endogenous. Same questions for your hurdle model.
Step 2: Depends on the outcome of Step 1…
Part 2A: Endogeneity [ 5/53]
Part 2A: Endogeneity [ 6/53]
By what construction is SNAP endogenous in the
HEALTH equation?
SNAP = XβSNAP + Zδ + ε
HEALTH = XβHEALTH + ηSNAP + v
Part 2A: Endogeneity [ 7/53]
Part 2A: Endogeneity [ 8/53]
  

  
  

   
j
j
Poisson Regression
exp( )
Prob(y = j|x,S) = ,
j!
= exp( x+ S) [How is S endogenous?]
Negative Binomial Regression
exp( )
Prob(y = j|x,S,v) = ,
j!
= exp( x+ S+ ) = E[y|
v

  

    

0
j
x,S, ]
Prob(y = j|x,S) = Prob(y = j|x,S,v)f( )d
Negative Binomial Regression with Common Factor
exp( )
Prob(y = j|x,S,v,u) = ,
j!
= exp( x+ S+ + u) = E[y|x,S, u]
v
v v
v v,
Part 2A: Endogeneity [ 9/53]
Part 2A: Endogeneity [ 10/53]
Part 2A: Endogeneity [ 11/53]
 
 
 
S
N N
i S i
i 1 i 1
i S
i
i i
i S
Control Function Approach
S* = w, S = 1[S* > 0], w ~ N[0,1]
lnL = ln (2S 1) f
GENERALIZED RESIDUAL
(2S 1)
f
u (2S 1) Control Function
Constant term (2S 1)
(For a l
 
 

  

 

   

  
 
x
x
x
x




2
i
i i i i S i
i i
inear regression, the generalized residual is e / s .)
Poisson or NB1 Model with "Residual Inclusion"
ˆ ˆ ˆ
E[C | x ,S ,u ] exp[ S u ]

   
x v

Part 2A: Endogeneity [ 12/53]
Part 2A: Endogeneity [ 13/53]
Part 2A: Endogeneity [ 14/53]
Part 2A: Endogeneity [ 15/53]
Part 2A: Endogeneity [ 16/53]
Endogeneity
 y = X+ε,
 Definition: E[ε|x]≠0
 Why not?
 Omitted variables
 Unobserved heterogeneity (equivalent to omitted
variables)
 Measurement error on the RHS (equivalent to
omitted variables)
 Endogenous sampling and attrition
 Simultaneity (?) (“reverse causality”)
Part 2A: Endogeneity [ 17/53]
Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years
Variables in the file are
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by union contract
ED = years of education
LWAGE = log of wage = dependent variable in regressions
These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with
Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal
of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further
analysis. The data were downloaded from the website for Baltagi's text.
Part 2A: Endogeneity [ 18/53]
Specification: Quadratic Effect of Experience
Part 2A: Endogeneity [ 19/53]
The Effect of Education on LWAGE
     
1 2 3 4 ... ε
What is ε? ,...+ everything
M e
ot ls
ivat e
= f( , , , ,...)
ion
Motivation
LWAGE EDUC EXP
EDUC GENDER SMSA SOUTH
2
EXP
Part 2A: Endogeneity [ 20/53]
What Influences LWAGE?
 
  

1 2
3 4
Motivation
Motiva
( , ,...)
...
ε( )
Variation in is associated with variation in
tion
Motivation
Motivation Motivatio
( , ,...) and ε(
LWAGE EDUC X
EXP
EDUC X
2
EXP
2
n
Motivatio
)
What lookslike an effect due to variationin may
be due to variationin . The estimate of picks up
the effect of and the hidden effect of
n
Motivation.
EDUC
EDUC
Part 2A: Endogeneity [ 21/53]
The General Problem
1 2
1 1
2 2
2
1 2
(
Cov( , ) , K variables
Cov( , ) , K variables
is
cannot estimate ( , )
consistently. Some other estimator is needed.
Additional structur
, )
e: H
  


endogenous
OLS regression of y o
y X X
X 0
X 0
n
X
X X
  


 
2
2
1 2
ow does X become endogenous?
= + where Cov( , ) but Cov( , )= .
An estimator based on ( , , ) may
be able to estimate ( , ) consistently.

instrumental varia
X Z V V 0 Z 0
X
bl X Z
e (IV)
  
 
Part 2A: Endogeneity [ 22/53]
Instrumental Variables
 Framework: y = X + , K variables in X.
 There exists a set of K variables, Z such that
plim(Z’X/n)  0 but plim(Z’/n) = 0
The variables in Z are called instrumental
variables.
 An alternative (to least squares) estimator of  is
bIV = (Z’X)-1
Z’y ~ Cov(Z,y) / Cov(Z,X)
 We consider the following:
 Why use this estimator?
 What are its properties compared to least squares?
 We will also examine an important application
Part 2A: Endogeneity [ 23/53]
An Exogenous Influence
 
  

1 2
3 4
Motivation
Moti
( , , ,...)
...
ε( )
Variation in is associated with variation in
( ,
vation
Motivation
, ,...) andnot Motiva n
( o
ε ti
LWA Z
GE EDUC X
EXP
EDU Z
C
Z
X
2
EXP
2
)
An effect due to the effect of variationin on will
only be due to variationin . The estimate of picks up
the effect of only.
Z
Z is anInstrument
EDUC
EDU
al Vari
C
EDUC
able
Part 2A: Endogeneity [ 24/53]
Instrumental Variables
 My theory claims that MS and FEM are
instruments
 Structural equations
 LWAGE (ED,EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION)
 ED (…,MS, FEM)  Equation explains the
endogeneity
Reduced Form:
LWAGE[ ED (…,MS, FEM),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
Part 2A: Endogeneity [ 25/53]
X
Z
SNAP Model.
X is in both
equations.
Z is in SNAP
equation.
SNAP is in
Health
equation.
Part 2A: Endogeneity [ 26/53]
Instrumental Variables in Regression
 Typical Case: One “problem” variable – the “last” one
 yit = 1x1it + 2x2it + … + KxKit + εit
 E[εit|x1it…,xKit] ≠ 0. (0 for all others)
 There exists a variable zit such that
Relevance
 E[xKit| x1it, x2it,…, xK-1,it,zit] = g(x1it, x2it,…, xK-1,it,zit)
In the presence of the other variables, zit “explains” xit
 A projection interpretation: In the projection,
xKt =θ1x1it,+ θ2x2it + … + θk-1xK-1,it + θK zit, θK ≠ 0.
Exogeneity
 E[εit| x1it, x2it,…, xK-1,it,zit] = 0
In the presence of the other variables, zit and εit are
uncorrelated.
Part 2A: Endogeneity [ 27/53]
Two Stage Least Squares Strategy
 Reduced Form:
LWAGE[ ED (MS, FEM,X),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION ]
 Strategy
 (1) Purge ED of the influence of everything but MS,
FEM and the other X variables. Predict ED using all
exogenous information in the sample (X,MS,FEM).
 (2) Regress LWAGE on this prediction of ED and
everything else.
 Standard errors must be adjusted for the predicted
ED
Part 2A: Endogeneity [ 28/53]
OLS Regression (Inconsistent)
Part 2A: Endogeneity [ 29/53]
The weird results for the
coefficient on ED may be
due to the instruments, MS
and FEM being dummy
variables. There is not
much variation in these
variables and not much
covariation with the other
variables.
2SLS Regression (Maybe not a very good theory))
2SLS coefficient estimate is implausible. Now what?
Part 2A: Endogeneity [ 30/53]
An Interpretation
The Source of the Endogeneity
 LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + 
 ED = f(MS,FEM,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u
Part 2A: Endogeneity [ 31/53]
Can We Remove the Endogeneity?
 LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u + 
 LWAGE = f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) + u + 
 Strategy
 Estimate u
 Add u to the equation.
ED is correlated with u+ because it is correlated with
u.
 ED is uncorrelated with u+ if u is in the equation.
Part 2A: Endogeneity [ 32/53]
Auxiliary Regression for
ED to Obtain Residuals
IVs
Exog.
Vars
Part 2A: Endogeneity [ 33/53]
OLS with Residual Added (Control Function)
2SLS
Part 2A: Endogeneity [ 34/53]
A Warning About Control Functions
Sum of squares is not computed correctly because U is in the regression.
A general result. Control function estimators usually require a fix to the
estimated covariance matrix for the estimator.
Part 2A: Endogeneity [ 35/53]
Estimating σ2


   
2
2 n
1
i 1 i
n
Estimating the asymptotic covariance matrix -
a caution about estimating .
ˆ
Since the regression is computed by regressing y on ,
one might use
ˆ
(y ) uses
ˆ 2sls
x
x'b

   
2 n
1
i 1 i
n
ˆ
This is inconsistent. Use
(y ) uses
ˆ
(Degrees of freedom correction is optional; usually done.)
2sls
x
x'b x
Part 2A: Endogeneity [ 36/53]
Robust estimation of VC
 
 
  
-1 2 -1
i,t it it
Counterpart to the White estimator allows heteroscedasticity
ˆ
ˆ ˆ ˆ ˆ
ˆ ˆ
Est.Asy.Var[ ]= ( ) (y ) ( )
it it
X'X x β x x X'X
“Actual” X
“Predicted” X
Part 2A: Endogeneity [ 37/53]
2SLS vs. Robust Standard Errors
+--------------------------------------------------+
| Robust Standard Errors |
+---------+--------------+----------------+--------+
|Variable | Coefficient | Standard Error |b/St.Er.|
+---------+--------------+----------------+--------+
B_1 45.4842872 4.02597121 11.298
B_2 .05354484 .01264923 4.233
B_3 -.00169664 .00029006 -5.849
B_4 .01294854 .05757179 .225
B_5 .38537223 .07065602 5.454
B_6 .36777247 .06472185 5.682
B_7 .95530115 .08681261 11.000
+--------------------------------------------------+
| 2SLS Standard Errors |
+---------+--------------+----------------+--------+
|Variable | Coefficient | Standard Error |b/St.Er.|
+---------+--------------+----------------+--------+
B_1 45.4842872 .36908158 123.236
B_2 .05354484 .03139904 1.705
B_3 -.00169664 .00069138 -2.454
B_4 .01294854 .16266435 .080
B_5 .38537223 .17645815 2.184
B_6 .36777247 .17284574 2.128
B_7 .95530115 .20846241 4.583
Part 2A: Endogeneity [ 38/53]
Inference with IV Estimators
 

(1) Wald Statistics:
ˆ ˆ ˆ
( ) ( )
(E.g., the usual 't-statistics')
(2) A type of F statistic:
ˆ ˆ
Compute SSUA=( )'( ) without restrictions (Note, )
ˆ
ˆ
Compute SSR=( )'(
-1
u u
R
Rβ - q ' { Est.Asy.Var[β]} Rβ - q
y Xβ y Xβ X
y Xβ y 
 


ˆ
ˆ ) with restrictions
ˆ ˆ
ˆ ˆ ˆ
Compute SSU=( )'( ) without restrictions (Note, )
(SSR SSU) / J
F = ~ F[J,N K]
SSUA/(N-K)
R
U U
Xβ
y Xβ y Xβ X
Part 2A: Endogeneity [ 39/53]
Endogeneity Test? (Hausman)
Exogenous Endogenous
OLS Consistent, Efficient Inconsistent
2SLS Consistent, Inefficient Consistent
Base a test on d = b2SLS - bOLS
Use a Wald statistic, d’[Var(d)]-1
d
What to use for the variance matrix?
Hausman: V2SLS - VOLS
Part 2A: Endogeneity [ 40/53]
Hausman Test
Part 2A: Endogeneity [ 41/53]
Hausman Test: One at a Time?
Part 2A: Endogeneity [ 42/53]
Endogeneity Test: Wu
 Considerable complication in Hausman test
(text, pp. 276-277)
 Simplification: Wu test.
 Regress y on X and estimated for the
endogenous part of X. Then use an ordinary
Wald test.
X̂
Part 2A: Endogeneity [ 43/53]
Monday, 2/6/17
Part 2A: Endogeneity [ 44/53]
Regression Based Endogeneity Test

    

it it it
An easy t test. (Wooldridge 2010, p. 127)
y q
= a set of M instruments.
Write = +
Can be estimated by ordinary least squares.
Endogeneity concerns correlation between v and .
ˆ
Add v
it
x δ
Z
q Zπ v


      


it it it it
= q - to the equation and use OLS
ˆ
ˆ
y q v + { error}
Simple t test on whether equals 0.
ˆ
Even easier, algebraically identical, (Wu, 1973), add
to the equation and do the same tes
it
z
x δ
q
t.
Part 2A: Endogeneity [ 45/53]
Wu Test
Since this is 2SLS using a control function, the standard errors should have
been adjusted to carry out this test. (The sum of squares is too small.)
Part 2A: Endogeneity [ 46/53]
Testing Endogeneity of WKS
(1) Regress WKS on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,MS.
U=residual, WKSHAT=prediction
(2) Regress LWAGE on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,WKS, U or WKSHAT
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant -9.97734299 .75652186 -13.188 .0000
EXP .01833440 .00259373 7.069 .0000 19.8537815
EXPSQ -.799491D-04 .603484D-04 -1.325 .1852 514.405042
OCC -.28885529 .01222533 -23.628 .0000 .51116447
SOUTH -.26279891 .01439561 -18.255 .0000 .29027611
SMSA .03616514 .01369743 2.640 .0083 .65378151
WKS .35314170 .01638709 21.550 .0000 46.8115246
U -.34960141 .01642842 -21.280 .0000 -.341879D-14
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant -9.97734299 .75652186 -13.188 .0000
EXP .01833440 .00259373 7.069 .0000 19.8537815
EXPSQ -.799491D-04 .603484D-04 -1.325 .1852 514.405042
OCC -.28885529 .01222533 -23.628 .0000 .51116447
SOUTH -.26279891 .01439561 -18.255 .0000 .29027611
SMSA .03616514 .01369743 2.640 .0083 .65378151
WKS .00354028 .00116459 3.040 .0024 46.8115246
WKSHAT .34960141 .01642842 21.280 .0000 46.8115246
Part 2A: Endogeneity [ 47/53]
Weak Instruments
 One endogenous variable: y = X +xk+ ; Instruments (X,Z) Z is
exogenous
 Symptom: The relevance condition, plim Z’X/n not zero, is close to being
violated. Relevance: Z must “explain” xk after controlling for X.
 Detection:
 Standard F test in the regression of xk on (X,Z). Wald test
of coefficients on Z equal 0, F < 10 suggests a problem. (Staiger and
Stock)
 Other versions by Stock and Yogo (2005), Cragg and Donald (1993),
Kleibergen and Paap (2006) for when xk is more than one variable and
for when xk = (X,Z) + u and is heteroscedastic, clustered, nonnormal,
etc.
 Remedy:
 Not much – most of the discussion is about the condition, not what to
do about it.
 Use LIML instead of 2SLS? Requires a normality assumption. Probably
Part 2A: Endogeneity [ 48/53]
Weak Instruments (cont.)
-1
ˆ
plim = + [Cov( )] Cov( )
If Cov( ) is "small" but nonzero, small
Cov( ) may hugely magnify the effect.
IV is not only inefficient, may be very badly
biased by "weak" instruments.
Solutions
β β , X , ε
, ε
Z Z
Z
, X
Z
? Can one "test" for weak instruments?
Part 2A: Endogeneity [ 49/53]
Weak Instruments
-1 -1
-1
Which is better?
LS is inconsistent, but probably has smaller variance
LS may be more precise
IV is consistent, but probably has larger variance
ˆ
Asy.Var[ ] =
may be l
Z ZZ
X
X
Z
Z
X
β Q Ω Q
Q arge. (Compared to what?)
Strange results with "small"
IV estimator tends to resemble OLS (bias) (not a
function of sample size).
Contradictory result. Suppose is perfectly correlated
ZX
z
Q
with . IV MUST be the same as OLS.
x
Part 2A: Endogeneity [ 50/53]
Endogenous Union Effect
Name ; Xunion = one,occ,smsa,ed,exp,union $
Name ; Zinst = fem,ind,south $
Name ; Zunion = one,occ,smsa,ed,exp,Zinst $
? Inconsistent OLS
Regr ; Lhs = lwage ; Rhs = Xunion ; Cluster = 7 $
? Two Stage Least Squares gives a nonsense result
2sls ; Lhs=lwage;rhs=Xunion;Inst=Zunion ; Cluster=7 $
? Test for weak instruments
Regr ; Lhs=union ; Rhs=Zunion ; res = u
; Cluster = 7 ; test:zinst $
? Control function estimator
? 2SLS coefficients with the wrong standard errors
Regr ; Lhs=lwage;rhs=xunion,u;cluster=7$
Part 2A: Endogeneity [ 51/53]
OLS Should Be Inconsistent
Part 2A: Endogeneity [ 52/53]
Nonsense 2SLS Result
Part 2A: Endogeneity [ 53/53]
Weak Instruments? No
What is going on here? When the endogenous variable and/or the excluded
instruments are binary, the actual results are sometimes a bit unstable. The
theoretical results are generally about covariation of continuous variables.
Part 2A: Endogeneity [ 54/53]
Appendix
Miscellaneous
Part 2A: Endogeneity [ 55/53]
The First IV Study Was a Natural Experiment
(Snow, J., On the Mode of Communication of Cholera, 1855)
http://www.ph.ucla.edu/epi/snow/snowbook3.html
 London Cholera epidemic, ca 1853-4
 Cholera = f(Water Purity,u) + ε.
 ‘Causal’ effect of water purity on cholera?
 Purity=f(cholera prone environment (poor, garbage in
streets, rodents, etc.). Regression does not work.
Two London water companies
Lambeth Southwark & Vauxhall
Main sewage discharge
Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects…
http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf
A review of instrumental variables estimation in the applied health sciences. Health Services
and Outcomes Research Methodology 2007; 7(3-4):159-179.
River
Thames
Part 2A: Endogeneity [ 56/53]
0 1
0 1
Cholera = BadWater Other Factors
C = B (Stylized)
(C=0/1=no/yes
  
  
Investigation Using an Instrumental Variable
Theory :
Model :
1
) (B=0/1=good/bad) ( =other factors)
Cholera prone environment u affects B and .
Interpret this to say



Interesting measure of causal effect of bad water :
Endogeneity Problem :
0 1
0 1
0
1
B(u) and (u) are correlated because of u.
E[C|B] B because E[ |B] 0
E[C|B=1] = E[ |B=1]
E[C|B=0] = E[ |B=0]
E[C|B=1] - E[C|B=0] = {E[ |B

    
   
  
  
Confounding Effect :
=1] E[ |B=0]}
Comparing cholera rates of those with bad water (measurable)
to those with good water, P(C|B=1) - P(C|B=0), does not reveal the
water effect.
 
Conclusion :
Part 2A: Endogeneity [ 57/53]
L = 1 if water supplied by Lambeth
L = 0 if water supplied by Southwark/Vauxhall
Is E[B|L=1] E[B|L=0]? That i

Instrumental Variable :
Relevant? s Snow's theory, that
the water supply is partly the culprit, and because of their
location, Lambeth provided purer water than Southwark.
Exogenous Is E[ |L=1]-E[ |L=0]=0? Water supply is randomly supplied
to houses. Homeowners do not even know which supplier is
providing their water. "Assignm
 
?
0 1
0 1
0 1
ent is random."
in E[C|L] = E[B| L] E[ | L]:
E[C | L 1] E[B| L 1] E[ | L 1]
E[C | L 0] E[B| L 0] E[ | L 0]
E[C | L
   
      
      
Using the IV
Estimating Equation :  
 
1
1] E[C | L 0] E[B | L 1] E[B| L 0]
(z
E[ | L 1] E[ | L 0] ero because L is exogenous)
      
     
Part 2A: Endogeneity [ 58/53]
 
1
1 (Note :nonz
E[C | L 1] E[C | L 0] E[B| L 1] E[B | L 0]
E[C | L 1] E[C | L 0]
ero denominator is the r
E[B| L 1
elev
] E[B| L 0]
P(C|L=1) = Proportion
ance condition.
of observations
)
      
  
 
  
IV Estimator :
Operational : supplied by Lambeth that have Cholera
P(C|L=0) = Proportion of observations supplied by Southwark that have Cholera
P(B| L 1) Pr oportion of observations sup
 
1
plied by Lambeth with Bad Water
P(B| L 0) Pr oportion of observations supplied by Southwark with Bad Water
P(C | L 1) P(C | L 0) Cov(C,L
b (broadly)
P(B| L 1) P(B| L 0)
 
  
 
  
Estimate :
)
(The Wald estimator)
Cov(B,L)
Part 2A: Endogeneity [ 59/53]
On Sat, May 3, 2014 at 4:48 PM, … wrote:
Dear Professor Greene,
I am giving an Econometrics course in Brazil and we are using
your textbook. I got a question which I think only you can help
me. In our last class, I did a formal proof that
var(beta_hat_OLS) is lower or equal than var(beta_hat_2SLS),
under homoscedasticity.
We know this assertive is also valid under heteroscedasticity,
but a graduate student asked me the proof (which is my
problem).
Do you know where can I find it?
Part 2A: Endogeneity [ 60/53]
Part 2A: Endogeneity [ 61/53]
Part 2A: Endogeneity [ 62/53]

More Related Content

PDF
Instrumental Variables and Control Functions
PPTX
IV Slides 2020.pptx
PDF
Endogeneity and Entrepreneurship Research
PDF
PanelDadasdsadadsadasdasdasdataNotes-1b.pdf
PDF
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
PDF
Causal Inference and Program Evaluation
PDF
Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...
PDF
Gradient Metrics for Artificial _2020_Lec4.pdf
Instrumental Variables and Control Functions
IV Slides 2020.pptx
Endogeneity and Entrepreneurship Research
PanelDadasdsadadsadasdasdasdataNotes-1b.pdf
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Causal Inference and Program Evaluation
Modeling Heterogeneity by Structural Varying Coefficients Models in Presence of...
Gradient Metrics for Artificial _2020_Lec4.pdf

Similar to Panel Data Regression Notes- Part 2 Subsection -A.pptx (20)

PDF
ICAR-IFPRI- Instrumental Variable Regression- Devesh Roy, IFPRI
PDF
ICAR-IFPRI - regression lecture 6 - Devesh Roy
DOCX
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
PPTX
Panel Data Regression Notes Part-2 main.pptx
PDF
Student Solutions Manual To Introductory Econometrics 2nd Edition Jeffrey M W...
PDF
tutorial1 on economic and strategic issues
DOCX
2.1 the simple regression model
DOCX
2.1 the simple regression model
DOCX
Chapter 2.docxnjnjnijijijijijijoiopooutdhuj
PDF
Econometrics Notes
PPTX
Week 8 - Functional Forms.pptx this is presentation
DOCX
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
PPT
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
PPT
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
PPT
Estimation of Dynamic Causal Effects -Introduction to Economics
PDF
Priliminary Research on Multi-Dimensional Panel Data Modeling
PDF
XYou_AOkunade_HEjSympos_issue_Online Appendix_Jan.15, 2016
PPTX
Recursive and non-recursive models
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPTX
SURE Model_Panel data.pptx
ICAR-IFPRI- Instrumental Variable Regression- Devesh Roy, IFPRI
ICAR-IFPRI - regression lecture 6 - Devesh Roy
1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd
Panel Data Regression Notes Part-2 main.pptx
Student Solutions Manual To Introductory Econometrics 2nd Edition Jeffrey M W...
tutorial1 on economic and strategic issues
2.1 the simple regression model
2.1 the simple regression model
Chapter 2.docxnjnjnijijijijijijoiopooutdhuj
Econometrics Notes
Week 8 - Functional Forms.pptx this is presentation
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
Estimation of Dynamic Causal Effects -Introduction to Economics
Priliminary Research on Multi-Dimensional Panel Data Modeling
XYou_AOkunade_HEjSympos_issue_Online Appendix_Jan.15, 2016
Recursive and non-recursive models
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
SURE Model_Panel data.pptx
Ad

Recently uploaded (20)

DOCX
Center Enamel Powering Innovation and Resilience in the Italian Chemical Indu...
PPTX
33ABJFA6556B1ZP researhchzfrsdfasdfsadzd
PPTX
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
PPT
Retail Management and Retail Markets and Concepts
PDF
Tortilla Mexican Grill 发射点犯得上发射点发生发射点犯得上发生
PDF
Cross-Cultural Leadership Practices in Education (www.kiu.ac.ug)
PDF
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
PPTX
Market and Demand Analysis.pptx for Management students
PPTX
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
PDF
Immigration Law and Communication: Challenges and Solutions {www.kiu.ac.ug)
PPTX
basic introduction to research chapter 1.pptx
PDF
Vinod Bhatt - Most Inspiring Supply Chain Leader in India 2025.pdf
PDF
Middle East's Most Impactful Business Leaders to Follow in 2025
PPTX
IITM - FINAL Option - 01 - 12.08.25.pptx
PPTX
Portfolio Example- Market & Consumer Insights – Strategic Entry for BYD UK.pptx
PPTX
Transportation in Logistics management.pptx
PPTX
Project Management_ SMART Projects Class.pptx
PPTX
operations management : demand supply ch
DOCX
Center Enamel A Strategic Partner for the Modernization of Georgia's Chemical...
PDF
HQ #118 / 'Building Resilience While Climbing the Event Mountain
Center Enamel Powering Innovation and Resilience in the Italian Chemical Indu...
33ABJFA6556B1ZP researhchzfrsdfasdfsadzd
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
Retail Management and Retail Markets and Concepts
Tortilla Mexican Grill 发射点犯得上发射点发生发射点犯得上发生
Cross-Cultural Leadership Practices in Education (www.kiu.ac.ug)
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
Market and Demand Analysis.pptx for Management students
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
Immigration Law and Communication: Challenges and Solutions {www.kiu.ac.ug)
basic introduction to research chapter 1.pptx
Vinod Bhatt - Most Inspiring Supply Chain Leader in India 2025.pdf
Middle East's Most Impactful Business Leaders to Follow in 2025
IITM - FINAL Option - 01 - 12.08.25.pptx
Portfolio Example- Market & Consumer Insights – Strategic Entry for BYD UK.pptx
Transportation in Logistics management.pptx
Project Management_ SMART Projects Class.pptx
operations management : demand supply ch
Center Enamel A Strategic Partner for the Modernization of Georgia's Chemical...
HQ #118 / 'Building Resilience While Climbing the Event Mountain
Ad

Panel Data Regression Notes- Part 2 Subsection -A.pptx

  • 1. Part 2A: Endogeneity [ 1/53] Econometric Analysis of Panel Data William Greene Department of Economics University of South Florida
  • 2. Part 2A: Endogeneity [ 2/53] lnSPI =  + *lnGDPPC(PPP) + , 0 <  < 1. (Huffington Post, 2/16/16) Reverse Causality in the Preston Curve?
  • 3. Part 2A: Endogeneity [ 3/53] In two of my projects, I was asked by reviewers to address the endogeneity concerns. In one project, I regress employee departure on project termination. Arguably project termination is not exogenous. DEPARTURE = f(a + b*PROJECT TERMINATION + e) (Time until departure???) In the other, I regress firms’ charitable giving in specific countries on their business activities in the local community. Again, business presence in countries are not exogenous. The problem is, both papers used non-linear models (Hazard model in one, and hurdle model in the other), which are required by the data I have. Are you aware of any econometric methods to deal with endogeneity in non-linear models? My search online did not go anywhere. Hazard Model: Not a linear model. Prob[event happens in time interval t to t+Δ| event happens after time t] = a function of (x’β) http://people.stern.nyu.edu/wgreene/Econometrics/ NonlinearPanelDataModels.pdf
  • 4. Part 2A: Endogeneity [ 4/53] I have been asked this question (or ones like it) dozens of times. I think the issue is getting way overplayed. But, I'm not the majority voice, so you are going to have to deal with this. Step 1: you or the referee need to figure out (make a case for) by what construction is "project termination" endogenous. What is correlated with what **in your hazard model** that makes the variable endogenous? There must be a second equation that implies that project termination is endogenous. What is it? What unobservable in that equation is correlated with what unobservable in the hazard model that makes it endogenous. Same questions for your hurdle model. Step 2: Depends on the outcome of Step 1…
  • 6. Part 2A: Endogeneity [ 6/53] By what construction is SNAP endogenous in the HEALTH equation? SNAP = XβSNAP + Zδ + ε HEALTH = XβHEALTH + ηSNAP + v
  • 8. Part 2A: Endogeneity [ 8/53]                j j Poisson Regression exp( ) Prob(y = j|x,S) = , j! = exp( x+ S) [How is S endogenous?] Negative Binomial Regression exp( ) Prob(y = j|x,S,v) = , j! = exp( x+ S+ ) = E[y| v            0 j x,S, ] Prob(y = j|x,S) = Prob(y = j|x,S,v)f( )d Negative Binomial Regression with Common Factor exp( ) Prob(y = j|x,S,v,u) = , j! = exp( x+ S+ + u) = E[y|x,S, u] v v v v v,
  • 11. Part 2A: Endogeneity [ 11/53]       S N N i S i i 1 i 1 i S i i i i S Control Function Approach S* = w, S = 1[S* > 0], w ~ N[0,1] lnL = ln (2S 1) f GENERALIZED RESIDUAL (2S 1) f u (2S 1) Control Function Constant term (2S 1) (For a l                       x x x x     2 i i i i i S i i i inear regression, the generalized residual is e / s .) Poisson or NB1 Model with "Residual Inclusion" ˆ ˆ ˆ E[C | x ,S ,u ] exp[ S u ]      x v 
  • 16. Part 2A: Endogeneity [ 16/53] Endogeneity  y = X+ε,  Definition: E[ε|x]≠0  Why not?  Omitted variables  Unobserved heterogeneity (equivalent to omitted variables)  Measurement error on the RHS (equivalent to omitted variables)  Endogenous sampling and attrition  Simultaneity (?) (“reverse causality”)
  • 17. Part 2A: Endogeneity [ 17/53] Cornwell and Rupert Data Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are EXP = work experience WKS = weeks worked OCC = occupation, 1 if blue collar, IND = 1 if manufacturing industry SOUTH = 1 if resides in south SMSA = 1 if resides in a city (SMSA) MS = 1 if married FEM = 1 if female UNION = 1 if wage set by union contract ED = years of education LWAGE = log of wage = dependent variable in regressions These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text.
  • 18. Part 2A: Endogeneity [ 18/53] Specification: Quadratic Effect of Experience
  • 19. Part 2A: Endogeneity [ 19/53] The Effect of Education on LWAGE       1 2 3 4 ... ε What is ε? ,...+ everything M e ot ls ivat e = f( , , , ,...) ion Motivation LWAGE EDUC EXP EDUC GENDER SMSA SOUTH 2 EXP
  • 20. Part 2A: Endogeneity [ 20/53] What Influences LWAGE?       1 2 3 4 Motivation Motiva ( , ,...) ... ε( ) Variation in is associated with variation in tion Motivation Motivation Motivatio ( , ,...) and ε( LWAGE EDUC X EXP EDUC X 2 EXP 2 n Motivatio ) What lookslike an effect due to variationin may be due to variationin . The estimate of picks up the effect of and the hidden effect of n Motivation. EDUC EDUC
  • 21. Part 2A: Endogeneity [ 21/53] The General Problem 1 2 1 1 2 2 2 1 2 ( Cov( , ) , K variables Cov( , ) , K variables is cannot estimate ( , ) consistently. Some other estimator is needed. Additional structur , ) e: H      endogenous OLS regression of y o y X X X 0 X 0 n X X X        2 2 1 2 ow does X become endogenous? = + where Cov( , ) but Cov( , )= . An estimator based on ( , , ) may be able to estimate ( , ) consistently.  instrumental varia X Z V V 0 Z 0 X bl X Z e (IV)     
  • 22. Part 2A: Endogeneity [ 22/53] Instrumental Variables  Framework: y = X + , K variables in X.  There exists a set of K variables, Z such that plim(Z’X/n)  0 but plim(Z’/n) = 0 The variables in Z are called instrumental variables.  An alternative (to least squares) estimator of  is bIV = (Z’X)-1 Z’y ~ Cov(Z,y) / Cov(Z,X)  We consider the following:  Why use this estimator?  What are its properties compared to least squares?  We will also examine an important application
  • 23. Part 2A: Endogeneity [ 23/53] An Exogenous Influence       1 2 3 4 Motivation Moti ( , , ,...) ... ε( ) Variation in is associated with variation in ( , vation Motivation , ,...) andnot Motiva n ( o ε ti LWA Z GE EDUC X EXP EDU Z C Z X 2 EXP 2 ) An effect due to the effect of variationin on will only be due to variationin . The estimate of picks up the effect of only. Z Z is anInstrument EDUC EDU al Vari C EDUC able
  • 24. Part 2A: Endogeneity [ 24/53] Instrumental Variables  My theory claims that MS and FEM are instruments  Structural equations  LWAGE (ED,EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION)  ED (…,MS, FEM)  Equation explains the endogeneity Reduced Form: LWAGE[ ED (…,MS, FEM), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]
  • 25. Part 2A: Endogeneity [ 25/53] X Z SNAP Model. X is in both equations. Z is in SNAP equation. SNAP is in Health equation.
  • 26. Part 2A: Endogeneity [ 26/53] Instrumental Variables in Regression  Typical Case: One “problem” variable – the “last” one  yit = 1x1it + 2x2it + … + KxKit + εit  E[εit|x1it…,xKit] ≠ 0. (0 for all others)  There exists a variable zit such that Relevance  E[xKit| x1it, x2it,…, xK-1,it,zit] = g(x1it, x2it,…, xK-1,it,zit) In the presence of the other variables, zit “explains” xit  A projection interpretation: In the projection, xKt =θ1x1it,+ θ2x2it + … + θk-1xK-1,it + θK zit, θK ≠ 0. Exogeneity  E[εit| x1it, x2it,…, xK-1,it,zit] = 0 In the presence of the other variables, zit and εit are uncorrelated.
  • 27. Part 2A: Endogeneity [ 27/53] Two Stage Least Squares Strategy  Reduced Form: LWAGE[ ED (MS, FEM,X), EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION ]  Strategy  (1) Purge ED of the influence of everything but MS, FEM and the other X variables. Predict ED using all exogenous information in the sample (X,MS,FEM).  (2) Regress LWAGE on this prediction of ED and everything else.  Standard errors must be adjusted for the predicted ED
  • 28. Part 2A: Endogeneity [ 28/53] OLS Regression (Inconsistent)
  • 29. Part 2A: Endogeneity [ 29/53] The weird results for the coefficient on ED may be due to the instruments, MS and FEM being dummy variables. There is not much variation in these variables and not much covariation with the other variables. 2SLS Regression (Maybe not a very good theory)) 2SLS coefficient estimate is implausible. Now what?
  • 30. Part 2A: Endogeneity [ 30/53] An Interpretation The Source of the Endogeneity  LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) +   ED = f(MS,FEM, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u
  • 31. Part 2A: Endogeneity [ 31/53] Can We Remove the Endogeneity?  LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u +   LWAGE = f(ED, EXP,EXPSQ,WKS,OCC, SOUTH,SMSA,UNION) + u +   Strategy  Estimate u  Add u to the equation. ED is correlated with u+ because it is correlated with u.  ED is uncorrelated with u+ if u is in the equation.
  • 32. Part 2A: Endogeneity [ 32/53] Auxiliary Regression for ED to Obtain Residuals IVs Exog. Vars
  • 33. Part 2A: Endogeneity [ 33/53] OLS with Residual Added (Control Function) 2SLS
  • 34. Part 2A: Endogeneity [ 34/53] A Warning About Control Functions Sum of squares is not computed correctly because U is in the regression. A general result. Control function estimators usually require a fix to the estimated covariance matrix for the estimator.
  • 35. Part 2A: Endogeneity [ 35/53] Estimating σ2       2 2 n 1 i 1 i n Estimating the asymptotic covariance matrix - a caution about estimating . ˆ Since the regression is computed by regressing y on , one might use ˆ (y ) uses ˆ 2sls x x'b      2 n 1 i 1 i n ˆ This is inconsistent. Use (y ) uses ˆ (Degrees of freedom correction is optional; usually done.) 2sls x x'b x
  • 36. Part 2A: Endogeneity [ 36/53] Robust estimation of VC        -1 2 -1 i,t it it Counterpart to the White estimator allows heteroscedasticity ˆ ˆ ˆ ˆ ˆ ˆ ˆ Est.Asy.Var[ ]= ( ) (y ) ( ) it it X'X x β x x X'X “Actual” X “Predicted” X
  • 37. Part 2A: Endogeneity [ 37/53] 2SLS vs. Robust Standard Errors +--------------------------------------------------+ | Robust Standard Errors | +---------+--------------+----------------+--------+ |Variable | Coefficient | Standard Error |b/St.Er.| +---------+--------------+----------------+--------+ B_1 45.4842872 4.02597121 11.298 B_2 .05354484 .01264923 4.233 B_3 -.00169664 .00029006 -5.849 B_4 .01294854 .05757179 .225 B_5 .38537223 .07065602 5.454 B_6 .36777247 .06472185 5.682 B_7 .95530115 .08681261 11.000 +--------------------------------------------------+ | 2SLS Standard Errors | +---------+--------------+----------------+--------+ |Variable | Coefficient | Standard Error |b/St.Er.| +---------+--------------+----------------+--------+ B_1 45.4842872 .36908158 123.236 B_2 .05354484 .03139904 1.705 B_3 -.00169664 .00069138 -2.454 B_4 .01294854 .16266435 .080 B_5 .38537223 .17645815 2.184 B_6 .36777247 .17284574 2.128 B_7 .95530115 .20846241 4.583
  • 38. Part 2A: Endogeneity [ 38/53] Inference with IV Estimators    (1) Wald Statistics: ˆ ˆ ˆ ( ) ( ) (E.g., the usual 't-statistics') (2) A type of F statistic: ˆ ˆ Compute SSUA=( )'( ) without restrictions (Note, ) ˆ ˆ Compute SSR=( )'( -1 u u R Rβ - q ' { Est.Asy.Var[β]} Rβ - q y Xβ y Xβ X y Xβ y      ˆ ˆ ) with restrictions ˆ ˆ ˆ ˆ ˆ Compute SSU=( )'( ) without restrictions (Note, ) (SSR SSU) / J F = ~ F[J,N K] SSUA/(N-K) R U U Xβ y Xβ y Xβ X
  • 39. Part 2A: Endogeneity [ 39/53] Endogeneity Test? (Hausman) Exogenous Endogenous OLS Consistent, Efficient Inconsistent 2SLS Consistent, Inefficient Consistent Base a test on d = b2SLS - bOLS Use a Wald statistic, d’[Var(d)]-1 d What to use for the variance matrix? Hausman: V2SLS - VOLS
  • 40. Part 2A: Endogeneity [ 40/53] Hausman Test
  • 41. Part 2A: Endogeneity [ 41/53] Hausman Test: One at a Time?
  • 42. Part 2A: Endogeneity [ 42/53] Endogeneity Test: Wu  Considerable complication in Hausman test (text, pp. 276-277)  Simplification: Wu test.  Regress y on X and estimated for the endogenous part of X. Then use an ordinary Wald test. X̂
  • 43. Part 2A: Endogeneity [ 43/53] Monday, 2/6/17
  • 44. Part 2A: Endogeneity [ 44/53] Regression Based Endogeneity Test        it it it An easy t test. (Wooldridge 2010, p. 127) y q = a set of M instruments. Write = + Can be estimated by ordinary least squares. Endogeneity concerns correlation between v and . ˆ Add v it x δ Z q Zπ v            it it it it = q - to the equation and use OLS ˆ ˆ y q v + { error} Simple t test on whether equals 0. ˆ Even easier, algebraically identical, (Wu, 1973), add to the equation and do the same tes it z x δ q t.
  • 45. Part 2A: Endogeneity [ 45/53] Wu Test Since this is 2SLS using a control function, the standard errors should have been adjusted to carry out this test. (The sum of squares is too small.)
  • 46. Part 2A: Endogeneity [ 46/53] Testing Endogeneity of WKS (1) Regress WKS on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,MS. U=residual, WKSHAT=prediction (2) Regress LWAGE on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,WKS, U or WKSHAT +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant -9.97734299 .75652186 -13.188 .0000 EXP .01833440 .00259373 7.069 .0000 19.8537815 EXPSQ -.799491D-04 .603484D-04 -1.325 .1852 514.405042 OCC -.28885529 .01222533 -23.628 .0000 .51116447 SOUTH -.26279891 .01439561 -18.255 .0000 .29027611 SMSA .03616514 .01369743 2.640 .0083 .65378151 WKS .35314170 .01638709 21.550 .0000 46.8115246 U -.34960141 .01642842 -21.280 .0000 -.341879D-14 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Constant -9.97734299 .75652186 -13.188 .0000 EXP .01833440 .00259373 7.069 .0000 19.8537815 EXPSQ -.799491D-04 .603484D-04 -1.325 .1852 514.405042 OCC -.28885529 .01222533 -23.628 .0000 .51116447 SOUTH -.26279891 .01439561 -18.255 .0000 .29027611 SMSA .03616514 .01369743 2.640 .0083 .65378151 WKS .00354028 .00116459 3.040 .0024 46.8115246 WKSHAT .34960141 .01642842 21.280 .0000 46.8115246
  • 47. Part 2A: Endogeneity [ 47/53] Weak Instruments  One endogenous variable: y = X +xk+ ; Instruments (X,Z) Z is exogenous  Symptom: The relevance condition, plim Z’X/n not zero, is close to being violated. Relevance: Z must “explain” xk after controlling for X.  Detection:  Standard F test in the regression of xk on (X,Z). Wald test of coefficients on Z equal 0, F < 10 suggests a problem. (Staiger and Stock)  Other versions by Stock and Yogo (2005), Cragg and Donald (1993), Kleibergen and Paap (2006) for when xk is more than one variable and for when xk = (X,Z) + u and is heteroscedastic, clustered, nonnormal, etc.  Remedy:  Not much – most of the discussion is about the condition, not what to do about it.  Use LIML instead of 2SLS? Requires a normality assumption. Probably
  • 48. Part 2A: Endogeneity [ 48/53] Weak Instruments (cont.) -1 ˆ plim = + [Cov( )] Cov( ) If Cov( ) is "small" but nonzero, small Cov( ) may hugely magnify the effect. IV is not only inefficient, may be very badly biased by "weak" instruments. Solutions β β , X , ε , ε Z Z Z , X Z ? Can one "test" for weak instruments?
  • 49. Part 2A: Endogeneity [ 49/53] Weak Instruments -1 -1 -1 Which is better? LS is inconsistent, but probably has smaller variance LS may be more precise IV is consistent, but probably has larger variance ˆ Asy.Var[ ] = may be l Z ZZ X X Z Z X β Q Ω Q Q arge. (Compared to what?) Strange results with "small" IV estimator tends to resemble OLS (bias) (not a function of sample size). Contradictory result. Suppose is perfectly correlated ZX z Q with . IV MUST be the same as OLS. x
  • 50. Part 2A: Endogeneity [ 50/53] Endogenous Union Effect Name ; Xunion = one,occ,smsa,ed,exp,union $ Name ; Zinst = fem,ind,south $ Name ; Zunion = one,occ,smsa,ed,exp,Zinst $ ? Inconsistent OLS Regr ; Lhs = lwage ; Rhs = Xunion ; Cluster = 7 $ ? Two Stage Least Squares gives a nonsense result 2sls ; Lhs=lwage;rhs=Xunion;Inst=Zunion ; Cluster=7 $ ? Test for weak instruments Regr ; Lhs=union ; Rhs=Zunion ; res = u ; Cluster = 7 ; test:zinst $ ? Control function estimator ? 2SLS coefficients with the wrong standard errors Regr ; Lhs=lwage;rhs=xunion,u;cluster=7$
  • 51. Part 2A: Endogeneity [ 51/53] OLS Should Be Inconsistent
  • 52. Part 2A: Endogeneity [ 52/53] Nonsense 2SLS Result
  • 53. Part 2A: Endogeneity [ 53/53] Weak Instruments? No What is going on here? When the endogenous variable and/or the excluded instruments are binary, the actual results are sometimes a bit unstable. The theoretical results are generally about covariation of continuous variables.
  • 54. Part 2A: Endogeneity [ 54/53] Appendix Miscellaneous
  • 55. Part 2A: Endogeneity [ 55/53] The First IV Study Was a Natural Experiment (Snow, J., On the Mode of Communication of Cholera, 1855) http://www.ph.ucla.edu/epi/snow/snowbook3.html  London Cholera epidemic, ca 1853-4  Cholera = f(Water Purity,u) + ε.  ‘Causal’ effect of water purity on cholera?  Purity=f(cholera prone environment (poor, garbage in streets, rodents, etc.). Regression does not work. Two London water companies Lambeth Southwark & Vauxhall Main sewage discharge Paul Grootendorst: A Review of Instrumental Variables Estimation of Treatment Effects… http://individual.utoronto.ca/grootendorst/pdf/IV_Paper_Sept6_2007.pdf A review of instrumental variables estimation in the applied health sciences. Health Services and Outcomes Research Methodology 2007; 7(3-4):159-179. River Thames
  • 56. Part 2A: Endogeneity [ 56/53] 0 1 0 1 Cholera = BadWater Other Factors C = B (Stylized) (C=0/1=no/yes       Investigation Using an Instrumental Variable Theory : Model : 1 ) (B=0/1=good/bad) ( =other factors) Cholera prone environment u affects B and . Interpret this to say    Interesting measure of causal effect of bad water : Endogeneity Problem : 0 1 0 1 0 1 B(u) and (u) are correlated because of u. E[C|B] B because E[ |B] 0 E[C|B=1] = E[ |B=1] E[C|B=0] = E[ |B=0] E[C|B=1] - E[C|B=0] = {E[ |B                 Confounding Effect : =1] E[ |B=0]} Comparing cholera rates of those with bad water (measurable) to those with good water, P(C|B=1) - P(C|B=0), does not reveal the water effect.   Conclusion :
  • 57. Part 2A: Endogeneity [ 57/53] L = 1 if water supplied by Lambeth L = 0 if water supplied by Southwark/Vauxhall Is E[B|L=1] E[B|L=0]? That i  Instrumental Variable : Relevant? s Snow's theory, that the water supply is partly the culprit, and because of their location, Lambeth provided purer water than Southwark. Exogenous Is E[ |L=1]-E[ |L=0]=0? Water supply is randomly supplied to houses. Homeowners do not even know which supplier is providing their water. "Assignm   ? 0 1 0 1 0 1 ent is random." in E[C|L] = E[B| L] E[ | L]: E[C | L 1] E[B| L 1] E[ | L 1] E[C | L 0] E[B| L 0] E[ | L 0] E[C | L                   Using the IV Estimating Equation :     1 1] E[C | L 0] E[B | L 1] E[B| L 0] (z E[ | L 1] E[ | L 0] ero because L is exogenous)             
  • 58. Part 2A: Endogeneity [ 58/53]   1 1 (Note :nonz E[C | L 1] E[C | L 0] E[B| L 1] E[B | L 0] E[C | L 1] E[C | L 0] ero denominator is the r E[B| L 1 elev ] E[B| L 0] P(C|L=1) = Proportion ance condition. of observations )                IV Estimator : Operational : supplied by Lambeth that have Cholera P(C|L=0) = Proportion of observations supplied by Southwark that have Cholera P(B| L 1) Pr oportion of observations sup   1 plied by Lambeth with Bad Water P(B| L 0) Pr oportion of observations supplied by Southwark with Bad Water P(C | L 1) P(C | L 0) Cov(C,L b (broadly) P(B| L 1) P(B| L 0)           Estimate : ) (The Wald estimator) Cov(B,L)
  • 59. Part 2A: Endogeneity [ 59/53] On Sat, May 3, 2014 at 4:48 PM, … wrote: Dear Professor Greene, I am giving an Econometrics course in Brazil and we are using your textbook. I got a question which I think only you can help me. In our last class, I did a formal proof that var(beta_hat_OLS) is lower or equal than var(beta_hat_2SLS), under homoscedasticity. We know this assertive is also valid under heteroscedasticity, but a graduate student asked me the proof (which is my problem). Do you know where can I find it?