Nonparametric testing for exogeneity with discrete regressors and instruments

Nonparametric testing for exogeneity with discrete
regressors and instruments
Katarzyna Bech and Grant Hillier
Warsaw School of Economics
and
University of Southampton
July 8, 2016
1/28

Outline
1 Motivation.
2 Simplest nonparametric additive error model- setup, identi…cation,
estimation.
2/28

Outline
1 Motivation.
estimation.
3 Two test statistics and critical values computation.
2/28

Outline
1 Motivation.
estimation.
4 Generalization to several variables of each type- tested, exogenous,
instrument.
2/28

Outline
1 Motivation.
estimation.
4 Generalization to several variables of each type- tested, exogenous,
instrument.
5 Applications: Card (1995) and Angrist and Krueger (1991).
2/28

Motivation
Endogeneity is one of the common problems in econometric models.
3/28

Motivation
In nonparametric models with discrete regressors and instruments, the
presence of endogenous regressors produces bias (in the identi…ed
case) or non-existance of any consistent estimator (in the partially
identi…ed case).
3/28

Motivation
identi…ed case).
IV for nonparametric models with discrete regressors: Das (2005) and
Florens and Malavolti (2003).
3/28

Motivation
identi…ed case).
IV for nonparametric models with discrete regressors: Das (2005) and
Florens and Malavolti (2003).
Nonparametric testing for exogeneity with continuous regressors:
Blundell and Horowitz (2007), Lavergne and Patilea (2008) among
others.
3/28

Simple model
Nonparametric additive error model
Y = h(X) + ε
E[εjZ = zj ] = 0, 8j
where we have i.i.d. data (xs
i , ys
i , zs
i ) on (X, Y , Z), and
4/28

Simple model
Y = h(X) + ε
E[εjZ = zj ] = 0, 8j
i , ys
i , zs
Y is a continuous scalar dependent variable,
4/28

Simple model
Y = h(X) + ε
E[εjZ = zj ] = 0, 8j
i , ys
i , zs
X is a single discrete regressor with support fxk , k = 1, ..., Kg, that
may be endogenous, with associated probabilities pk > 0,
4/28

Simple model
Y = h(X) + ε
E[εjZ = zj ] = 0, 8j
i , ys
i , zs
X is a single discrete regressor with support fxk , k = 1, ..., Kg, that
may be endogenous, with associated probabilities pk > 0,
Z is a discrete instrumental variable with support fzj , j = 1, ..., Jg,
with associated probabilities qj > 0.
4/28

Hypothesis of interest
Null hypothesis (exogeneity):
H0 : E[εjX = xk ] = 0, k = 1, .., K.
5/28

H0 : E[εjX = xk ] = 0, k = 1, .., K.
Under the null, h( ) can be consistently estimated using standard
nonparametric techniques.
5/28

H0 : E[εjX = xk ] = 0, k = 1, .., K.
Under the null, h( ) can be consistently estimated using standard
nonparametric techniques.
Under the alternative, the IV solution to endogeneity only possible
under point identi…cation.
5/28

Identi…cation
Since
Y =
K
∑
k=1
h(xk )I(X = xk ) + ε,
the conditional expectation of Y given Z = zj is
E[Y jZ = zj ] =
K
∑
k=1
Pr[X = xk , Z = zj ]h(xk ).
6/28

Identi…cation
Since
Y =
K
∑
k=1
h(xk )I(X = xk ) + ε,
E[Y jZ = zj ] =
K
∑
k=1
) the instrument Z supplies the equations
π = Πβ,
where βk = h(xk ), πj = E[Y jZ = zj ], Πjk = P[X = xk jZ = zj ].
6/28

Identi…cation
Since
Y =
K
∑
k=1
h(xk )I(X = xk ) + ε,
E[Y jZ = zj ] =
K
∑
k=1
) the instrument Z supplies the equations
π = Πβ,
where βk = h(xk ), πj = E[Y jZ = zj ], Πjk = P[X = xk jZ = zj ].
h( ) is identi…ed at ALL support points of X i¤ J K.
6/28

Identi…cation when J<K
h( ) is partially identi…ed when J < K :
7/28

Theorem 1
Let L(β) = c0β be a linear functional of the elements of β. When
rank(Π) = J < K, the following are true:
(1) for any c orthogonal to the null space of Π, L(β) is point-identi…ed;
the dimension of this set is J.
(2) for c not orthogonal to the null space of Π, L(β) is completely
unconstrained; the dimension of this set is K J.
7/28

Theorem 1
That is: when J < K, some linear functionals are point-identi…ed,
some completely arbitrary (not even set-identi…ed!).
7/28

Theorem 1
That is: when J < K, some linear functionals are point-identi…ed,
some completely arbitrary (not even set-identi…ed!).
Point-identi…ability of L(β) can be tested (for a given choice of c):
Gn = n c0
2 c0
1
ˆΠ 1
1
ˆΠ2 V 1
ˆP
c0
2 c0
1
ˆΠ 1
1
ˆΠ2
0
!d χ2
K J .
7/28

Linear Model Setup
We de…ne the (0, 1) matrix LX (n K) with elements
(LX )ik = I(xs
i = xk ). Likewise LZ (n J). Then, H0 says:
y = LX β + ε, E[εjX = xk ] = 08k.
8/28

Linear Model Setup
(LX )ik = I(xs
y = LX β + ε, E[εjX = xk ] = 08k.
β can be consistently estimated by OLS under exogeneity:
bβ = (L0
X LX ) 1
L0
X y =
0
B
B
@
∑n
i=1 yi I (xs
i =x1)
∑n
i=1 I (xs
i =x1)
...
∑n
i=1 yi I (xs
i =xK )
∑n
i=1 I (xs
i =xK )
1
C
C
A .
8/28

Linear Model Setup
(LX )ik = I(xs
y = LX β + ε, E[εjX = xk ] = 08k.
β can be consistently estimated by OLS under exogeneity:
bβ = (L0
X LX ) 1
L0
X y =
0
B
B
@
∑n
i=1 yi I (xs
i =x1)
∑n
i=1 I (xs
i =x1)
...
∑n
i=1 yi I (xs
i =xK )
∑n
i=1 I (xs
i =xK )
1
C
C
A .
Theorem 2
If X is exogenous then the nonparametric (OLS) estimator bβ is consistent
and p
n bβ β !d
N 0, σ2
D 1
X ,
where DX is diag(pk ).
8/28

Linear Model Setup
or by IV (with LZ as instruments) under endogeneity, when J K:
bβIV = L0
X PLZ
LX
1
L0
X PLZ
y.
9/28

Linear Model Setup
bβIV = L0
X PLZ
LX
1
L0
X PLZ
y.
Theorem 3
Under assumptions above, the IV estimator bβIV is consistent and
p
n bβIV β !d
N 0, σ2
P0
D 1
Z P
1
,
where P is a matrix of joint probabilities with elements
pjk = Pr [Z = zj , X = xk ] ; j = 1, ..., J; k = 1, ..., K
and DZ is diag(qj ).
9/28

Linear Model Setup
bβIV = L0
X PLZ
LX
1
L0
X PLZ
y.
Theorem 3
Under assumptions above, the IV estimator bβIV is consistent and
p
n bβIV β !d
N 0, σ2
P0
D 1
Z P
1
,
where P is a matrix of joint probabilities with elements
pjk = Pr [Z = zj , X = xk ] ; j = 1, ..., J; k = 1, ..., K
and DZ is diag(qj ).
BUT no consistent estimator exists for K J linear functionals if X is
endogenous and J < K.
9/28

Test for exogeneity
Test statistics di¤er depending on whether J K or J < K.
10/28

Test for exogeneity
For J K, (a modi…ed version of) Wu-Hausman test:
10/28

Test for exogeneity
Theorem 4
Under H0, and the assumptions above,
Tn !d
χ2
K 1.
10/28

Test for exogeneity
Theorem 4
Under H0, and the assumptions above,
Tn !d
χ2
K 1.
Theorem 5
Under the sequence of local alternatives, and the assumptions above, the
test statistic
Tn !d
Gamma (β, λ, θ) ,
with the shape parameter α = K 1
2 , the scale parameter θ = 2 σ2
σ 2 and the
noncentrality parameter λ = 2δ2
, where
δ2
=
ξ0
Σ 1
11 ξ
σ2
.
10/28

Test for exogeneity
For J < K, the test is based on the two SSE:
11/28

Test for exogeneity
Unrestricted: in the model y = LX β + ε, i.e., y0MLX
y, and
11/28

Test for exogeneity
y, and
Restricted: minimising the SSE in this model subject to ˆπ = ˆΠβ.
11/28

Test for exogeneity
y, and
Restricted: minimising the SSE in this model subject to ˆπ = ˆΠβ.
Test statistic:
Rn =
y0MLX
LZ (L0
Z PLX
LZ ) 1
L0
Z MLX
y
n 1y0MLX
y
.
11/28

Test for exogeneity
Theorem 6
Under H0 and the assumptions above,
Rn !d
z0
Ω 1
z
J 1
∑
j=1
ωj χ2
j (1)
where z N(0, Σ), with Σ as de…ned above,
Ω := C0
J (PD 1
X P0
pZ p0
Z )CJ ,
and the ωj are positive eigenvalues satisfying
det[Σ ωΩ] = 0
with the χ2
j (1) variables independent copies of a χ2
1 random variable.
12/28

Critical value computation
using consistent estimates of bωj , simulate the distribution of
∑
(J 1)
j=1 bωj χ2
j (1) to get the appropriate 1 α quantiles,
13/28

∑
(J 1)
j=1 bωj χ2
simulate the quadratic form z0 bΩ 1z, with z N(0, bΣ) and compute
the quantiles,
13/28

∑
(J 1)
j=1 bωj χ2
simulate the quadratic form z0 bΩ 1z, with z N(0, bΣ) and compute
the quantiles,
approximate by the distribution of aχ2
(v ) + b, choosing (a, b, v) to
match the …rst three cumulants.
13/28

Generalizations: model with two discrete regressors
Y = h(W , X) + ε
E[εjZ = zj , W = wd ] = 0, 8j, d
We de…ne LWX (n DK) with elements
(LWX )i,dk = I(W = wd )I(X = xk ). Likewise LWZ (n DJ), and H0
says:
y = LWX β + ε, E[εjW = wd , X = xk ] = 08d, k.
14/28

Structure of regression matrix
LWX is a permutation of the rows of
2
6
6
4
L1
X 0 0 0
0 L2
X 0 0
0 0 0
0 0 0 LD
X
3
7
7
5
15/28

2
6
6
4
L1
X 0 0 0
0 L2
X 0 0
0 0 0
0 0 0 LD
X
3
7
7
5
Observations corresponding to Ld
X all have W = wd , and rows
identify which values of X occur where.
15/28

2
6
6
4
L1
X 0 0 0
0 L2
X 0 0
0 0 0
0 0 0 LD
X
3
7
7
5
Similarly for LWZ .
15/28

2
6
6
4
L1
X 0 0 0
0 L2
X 0 0
0 0 0
0 0 0 LD
X
3
7
7
5
Similarly for LWZ .
We assume that all possible combinations of K support points of X, J
support points of Z and D support points of W occur in the sample.
15/28

Identi…cation in general model
The vector with elements h(wd , xk ) can be split into D K 1
vectors, with component vectors hd (xk ), say, one for each wd .
16/28

So we have D problems of the same type as the case with W absent.
[Split y into D subvectors yd ]
16/28

The instrument is valid for each subsample.
16/28

The instrument is valid for each subsample.
For h to be point-identi…ed, each hd must be, so the condition is
again J K.
16/28

Most general model
Now assume several variables of each type:
X1, .., XR , W1, .., WU , Z1, .., ZT , with respective supports of
dimensions Kr , Su, Jt .
17/28

Most general model
We want to test the joint endogeneity of (X1, .., XR ) .
17/28

Most general model
Label combinations of support points thus:
xα = (xα1 , .., xαR
), 1 αr Kr ,
wβ = (wβ1
, .., wβS
), 1 βu Su,
zγ = (zγ1
, .., zγT
), 1 γt Jt
17/28

Most general model
xα = (xα1 , .., xαR
), 1 αr Kr ,
wβ = (wβ1
, .., wβS
), 1 βu Su,
zγ = (zγ1
, .., zγT
), 1 γt Jt
Order the sequences lexicographically.
17/28

Most general model
xα = (xα1 , .., xαR
), 1 αr Kr ,
wβ = (wβ1
, .., wβS
), 1 βu Su,
zγ = (zγ1
, .., zγT
), 1 γt Jt
Order the sequences lexicographically.
Indistinguishable from the case of one variable of each type, except
that
J = ΠT
t=1Jt , K = ΠR
r=1Kr , S = ΠU
u=1Su.
17/28

In particular - Identi…cation
The N&S condition remains that J K, but with J and K de…ned as
products of the Jt and Kr .
18/28

BUT note:
18/28

BUT note:
There is NO requirement that there should be at least as many
instruments as endogenous variables (T R);
18/28

BUT note:
All that is needed is J K.
18/28

BUT note:
All that is needed is J K.
Of course: more instruments increases J = ΠT
t=1Jt .
18/28

Applications motivation
There are many published applications, where discrete endogenous
regressor is instrumented by a variable with insu¢ cient support, e.g.
Card (1995), Angrist and Krueger (1991), Bronars and Grogger
(1994), Lochner and Moretti (2004).
19/28

The point identi…cation is achieved by assuming a parametric (linear)
speci…cation.
19/28

The point identi…cation is achieved by assuming a parametric (linear)
speci…cation.
Parametric vs. nonparametric speci…cation testing (e.g. Horowitz
(2006)) not possible in this case.
19/28

Application- Card (1995) on returns to schooling
We are interested in the relationship between individual’s wage Y and
education X (in the presence of exogenous covariates W ) in
Y = h(X, W ) + ε.
20/28

Application- Card (1995) on returns to schooling
We are interested in the relationship between individual’s wage Y and
education X (in the presence of exogenous covariates W ) in
Y = h(X, W ) + ε.
Card (1995) treats education as endogenous and estimates
ln(wagei ) = β0 + β1Xi +
S
∑
s=1
γs Wsi + εi
by 2SLS using a binary instrument Z, which takes value 1 if there is a
college in the neighbourhood, 0 otherwise. The point identi…cation is
achieved by imposing the parametric (linear) speci…cation, that is not
testable.
20/28

Data
The dataset consists of 3010 observations from the National
Longitudinal Survey of Young Men.
21/28

Data
The (sample) support of education variable consists of K = 18
di¤erent values, and for a binary instrument: J = 2.
21/28

Data
Data limitation: the more exogenous covariates, the less likely it is to
get observations for all possible combinations of the support points.
21/28

Data
Educational levels (K = 4): less than high school, high school, some
college, post-college education.
21/28

Data
Educational levels (K = 4): less than high school, high school, some
college, post-college education.
Potential labour market experience levels: low and high.
21/28

Results
Covariates Rn cv.1 cv.2 cv.3 α
Educ 1.765 0.239 0.232 0.238 1%
0.136 0.132 0.138 5%
0.094 0.096 0.097 10%
Educ*, Exp* 4.147 1.221 1.259 1.217 1%
0.715 0.696 0.719 5%
0.511 0.500 0.515 10%
Educ*, Exp*, Race 3.572 1.771 1.692 1.688 1%
1.107 1.131 1.108 5%
0.849 0.871 0.860 10%
Educ*, Exp*, Race, SMSA 2.955 2.382 2.330 2.415 1%
1.702 1.679 1.735 5%
1.399 1.365 1.430 10%
22/28

Outcome
Education is endogenous, whatever the speci…cation of the W 0s.
23/28

Outcome
So: linearity is not testable, because no consistent estimator for h( )
23/28

Outcome
Some linear functionals of interest may be - use the test above to
check.
23/28

Outcome
Some linear functionals of interest may be - use the test above to
check.
Can consistently estimate an identi…ed linear combination.
23/28

Testing for point-identi…ability of linear functionals
As J = 2, linear functionals of only 2 parameters might be point
identi…ed, eg. the di¤erence in earnings across di¤erent years of
education.
linear combination Gn
bL(β) Chesher’s bounds
h(3) h(2) 0.1356 0.0040 -
h(7) h(6) 1.9332 0.1017 (0.0365, 0.2895)
h(8) h(7) 0.1494 0.2395 ( 0.1732; 0.352)
h(9) h(8) 26.5527 - ( 0.2742; 0.1334)
h(10) h(9) 75.2217 - -
h(11) h(10) 4.7003 0.1317 ( 0.057; 0.3187)
h(14) h(13) 61.5525 - -
h(17) h(16) 10.7344 -0.1900 -
h(18) h(17) 74.1413 - -
24/28

Application- Angrist and Krueger (1991) on returns to
schooling
Angrist and Krueger (1991) estimate
ln(wagei ) = βXi + ∑
c
δc Yci +
S
∑
s=1
γs Wsi + εi
by 2SLS using quarter of birth as an instrument for (assumed)
endogenous education.
25/28

schooling
c
δc Yci +
S
∑
s=1
γs Wsi + εi
Data: 1980 Census, split into 1930-1939 cohort (40-49 year-old men)
and 1940-1949 cohort (30-39 year-old men).
25/28

schooling
c
δc Yci +
S
∑
s=1
γs Wsi + εi
Data: 1980 Census, split into 1930-1939 cohort (40-49 year-old men)
and 1940-1949 cohort (30-39 year-old men).
Now, K = 21 and J = 4.
25/28

Results: 1930’s cohort
critical values
Rn 1% 5% 10%
1930 0.645 17.144 11.026 8.433
1931 1.000 19.806 12.614 9.582
1932 10.843 21.541 14.313 11.184
1933 2.385 18.980 12.685 9.952
1934 6.498 25.674 16.824 13.025
1935 2.728 20.451 13.374 10.340
1936 10.990 29.102 18.465 13.980
1937 1.614 13.467 9.032 7.101
1938 1.344 22.932 15.107 11.737
1939 9.649 22.130 14.837 11.664
full cohort 38.044 85.933 72.138 65.465
26/28

Results: 1940’s cohort
critical values
Rn 1% 5% 10%
1940 3.528 24.137 15.704 12.096
1941 18.143 24.733 16.005* 12.286*
1942 6.517 34.282 21.810 16.535
1943 99.840 55.818* 35.712* 27.202*
1944 22.665 39.214 24.860 18.823*
1945 31.736 26.623* 17.705* 13.847*
1946 17.181 23.478 15.183* 11.642*
1947 22.803 33.000 21.830* 17.012*
1948 34.116 46.991 29.790* 22.552*
1949 32.952 36.445 23.627* 18.168*
full cohort 278.703 138.344* 114.551* 103.182*
27/28

To conlude...
we propose consistent nonparametric exogeneity test(s) applicable in
models with discrete regressors,
28/28

To conlude...
the tests con…rm endogeneity of education variable in some classic
applied work, but
28/28

To conlude...
applied work, but
suggest that linearity of these models might be a bold assumption;
28/28

To conlude...
applied work, but
we suggest a nonparametric approach, or …nding instruments with
more support points!
28/28

To conlude...
applied work, but
we suggest a nonparametric approach, or …nding instruments with
more support points!
THE END
28/28

Nonparametric testing for exogeneity with discrete regressors and instruments

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Nonparametric testing for exogeneity with discrete regressors and instruments (20)

More from GRAPE (20)

Recently uploaded (20)

Nonparametric testing for exogeneity with discrete regressors and instruments