Cunningham slides-ch2

Chapter 2: Variably Parametric Nonlinear Regression with Endogenous Switching

James Cunningham

(September, 2012)

Introduction

• Most empirical research in health economics (HE) focuses on measurement of policy-relevant

causal effects: what effect would an exogenously mandated change in the (policy) variable

have on the outcome of interest?

• HE is replete with nonlinear outcomes: non-negative; count-valued; highly skewed; etc.

• The dissertation as a whole treats practical methods of estimating endogenous treatment

effects in nonlinear models.

• This paper — Chapter 2 — develops some flexible but parametric estimators in the case of

binary endogenous switching, methods which are either

o Minimally parametric (requiring specification of the conditional mean)

o Full information (requiring specification of the conditional density)

Introduction (contd)

• These form foundation of the dissertation, drawing upon the research of Terza (1998, 2009,

etc).

• I demonstrate two estimators:

o Minimally parametric with specification of a conditional mean; by example we use an

exponential conditional mean with a linear index.

o Fully parametric with specification of the conditional density of the outcome; by example

we use the three-parameter generalized gamma (Manning et al. [2005]).

• In the sections that follow we introduce the estimation objective (average treatment effect);

give detail on the estimators; provide a Monte Carlo study of their efficiency properties; and

apply them to real data.

Estimation Objective: Average Treatment Effect from a Potential Outcomes Perspective

• Consider measurement of the effect of a policy-relevant variable X p on an outcome Y.

• Distinguish between the observed X p and its exogenously mandated counterpart X* , and
p

similarly between Y and its potential (possibly counterfactual) value YX* .
p

• Then the average treatment effect is given by

E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤
⎣ ⎦ ⎣ ⎦ (1)

• Due to the (possibly) counterfactual natural of the random variables Y1 and Y0 , (1) cannot be

estimated directly.

Estimation Objective: Average Treatment Effect (contd)

• But when controlling for a comprehensive set of variables X o (observed), X u (unobserved),

we can iterate expectations:

ATE = E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤
⎣ ⎦ ⎣ ⎦
(2)
= E X ,X ⎡ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ⎤
o u ⎣ ⎣ ⎦ ⎣ ⎦⎦

• When correlated with X p , ignoring the unobserved X u will spuriously attribute some of its

effect to X p .

• We can recover causal interpretation by formalizing the correlation between X p and X u , as in

(
X p = 1 Wα + X u > 0 ) (3)

where 1(⋅) is a standard indicator function, W = ⎡ X o
⎣ W + ⎤ , W + is a vector of identifying
⎦

( )
instrumental variables, and X u W ~ N ( 0, 1) .

Estimation Objective: Average Treatment Effect (contd)

• By iterating expectations, we can then write (2) as

o
⎡⌠ ∞
{
⎣ ⌡−∞ ⎣ ⎦ ⎣ }(
⎦ ) ⎤
ATE = E X ⎢ ⎮ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ϕ X u dX u ⎥
⎦
(4)

• Then an estimator of (1), through (2) and (3), is

{
 = 1 ∑ ⎡ ⌠ E ⎡ Y X = 1,X ,X ⎤ − E ⎡ Y X = 0,X ,X ⎤ ϕ X dX ⎤
}(
∞
)
n
ATE ˆ ˆ (5)
⎢⎮
n i=1 ⎣ ⌡−∞ ⎣ p o u⎦ ⎣ p o u⎦ u u⎥
⎦

ˆ
where E ⎡⋅⎤ denotes an estimate of an expected value.
⎣⎦

• We thus proceed by specifying estimators as if X u were observed, just one variable among

others.

Endogenous Treatment Effects in Continuous Nonnegative Models

• Consider the common specification

⎣ ⎦ (
E ⎡ Y X p ,X o ,X u ⎤ = exp X pβ p + X oβ o + X uβ u ) (6)

• After some algebra the treatment effect from (5) can be written

( )( ( ) )
 = 1 ∑ ⎡exp X β + exp β − 1 ⎤
n
ATE ˆ ˆ (7)
n i=1 ⎢
⎣ o o p ⎥
⎦

ˆ
where β denotes an estimate of β , and β + is β o with its constant term shifted by 1 β 2 .
o 2 u

• We consider minimally and fully parametric approaches to the estimation of the parameters

necessary for (7).

Endogenous Treatment Effects in Continuous Nonnegative Models: Minimally Parametric

• If the conditional mean assumption (6) holds, no further assumption is required (beyond the

relationship between X p and X u ).

• To derive consistent estimates of the parameters, it can be shown that

⎡ Φ β u + wα ( ) ( )⎤
( )
1− Φ β u + wα
E ⎡ Y X ,W ⎤ = exp X β + X β + ⎢ x
⎣ p ⎦ p p o o
⎢
p
Φ wα
+ 1− x p
( ) (1− Φ wα
) ( )
⎥
⎥
(8)
⎣ ⎦

• (8) can be employed in estimation via a two-step procedure: probit in the first stage and

Nonlinear least squares in the second.

Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• When further assumptions can or must be made, we must consider a full-information version

of the model above. Letting gg refer to the generalized gamma, assume that

( ) (
f Y X o ,X p ,X u = gg Y X;µ,κ,σ )
( )
γγ (9)
= exp Z γ − U
σY γ Γ γ()
X = ⎡ X p X o X u ⎤ , µ = X pβ p + X oβ o + X uβ u , γ = κ , Z = sgn ( κ ) ( log y − µ ) / σ ,
−2
⎣ ⎦

( )
and U = γ exp κ Z

• The generalized gamma is highly flexible: it fits the nonnegative, highly skewed outcomes

common in HE, and subsumes many popular distributions (gamma, Weibull, exponential,

lognormal)

Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• Further:

⎣ ⎦ (
E ⎡ Y X p ,X o ,X u ⎤ = exp µ + k ) (10)

( ) ( ) ( )
where k = σ / κ log κ 2 + log Γ ⎡ κ −2 + σ / κ ⎤ − log κ −2
⎣ ⎦ ( )
• Thus the average treatment effect estimator takes the above form, after adding the correction

k.

• It can be shown that (11)

⎧ ⎛ ⌠∞
(
X pi ⎮ gg Yi X i ;κ,µ,σ ϕ X u dX u + ) ( ) ⎞⎫
( )
n ⎪
L α,β,µ,κ,σ Y,X p ;W = ∑ ⎨ X pi log ⎜
⌡−wα ⎟⎪
⎜ ⎟⎬
i=1 ⎪
⎩
⎜ 1− X pi ⎮
⎝ (
⌠ −wα
⌡−∞ ) ( ) ( )
gg Yi X i ;κ,µ,σ ϕ X u dX u ⎟ ⎪
⎠⎭

• The parameters β and α can be jointly estimated via maximum likelihood using (11).

Monte Carlo Simulations

• To evaluate the consistency properties of the above estimators, we undertake a Monte Carlo

study. In all simulations the data generating process takes the following form:

X o ~ U ( −0.5, 1) , W ~ U ( 0, 1) , X u ~ N ( 0, 1)

(
X p = 1 X oα o + Wα w + α c + X u > 0 )
µ = X pβ p + X oβ o + X uβ u + βc , κ = 0.8 , σ = 0.4

Y ~ GeneralizedGamma ( µ,σ,κ )

⎡α o α W α c ⎤ = ⎡1 1 0.5⎤
⎣ ⎦ ⎣ ⎦

⎡β β β β ⎤ = ⎡1 1 0.5 0.25⎤
⎣ p o u c⎦ ⎣ ⎦

• The average treatment effect was estimated by the above.

Monte Carlo Simulations (contd)

With 500 repetitions each with sample sizes 5,000; 10,000; 50,000; and 100,000, we compute the

m ˆ
()
ˆ = 1 ∑ β−β .
absolute percentage bias for each parameter: ABP β
m i=1 β

Endogenous Treatment: Minimally Parametric Exponential Conditional Mean Estimator
βp = 1 βo = 1 βu = 0.5 βc = 0.25 ATE = 2.22
n Est ABP Est ABP Est ABP Est ABP Est ABP
5,000 0.995 7.65% 1.002 2.82% 0.504 11.24% 0.247 12.97% 2.201 6.24%
10,000 0.996 5.58% 1.002 1.97% 0.504 8.14% 0.249 9.88% 2.208 4.47%
50,000 1.002 2.38% 1.000 0.90% 0.498 3.55% 0.249 4.07% 2.219 1.91%
100,000 0.998 1.72% 1.000 0.67% 0.501 2.53% 0.250 2.84% 2.212 1.41%


Endogenous Treatment: Full-Information Generalized Gamma Estimator
βp = 1 βo = 1 βu = 0.5 βc = 0.25
n Est ABP Est ABP Est ABP Est ABP
5,000 1.008 2.20% 0.998 0.91% 0.494 2.60% 0.240 8.34%
10,000 1.007 1.62% 0.999 0.67% 0.495 1.80% 0.243 6.12%
50,000 1.006 0.86% 0.999 0.32% 0.496 1.04% 0.243 3.45%
100,000 1.006 0.71% 0.999 0.23% 0.496 0.92% 0.243 3.03%
ATE = 2.22 κ = 0.8 σ = 0.4
Est ABP Est ABP Est ABP
5,000 2.226 2.12% 0.773 7.10% 0.406 2.95%
10,000 2.229 1.64% 0.777 5.25% 0.406 2.25%
50,000 2.227 0.86% 0.777 3.21% 0.406 1.53%
100,000 0.223 0.62% 0.778 2.83% 0.405 1.38%


• On average, the parameter estimates are hit relatively well.

• There are clear efficiency advantages to using the full-information estimator — percentage

biases are low even in small samples.

• In small samples using the minimally parametric estimator, β u appears subject to some bias,

but implications for treatment effect estimation seems minimal.

• In future revisions simulations should draw upon correct standard errors to characterize the

seriousness of these implications in determining (and correcting for) endogeneity bias in small

samples.

Real Data Example

• To provide an empirical demonstration, we applied both estimators above to the birthweight

data from Mullahy (1997), who investigated the role played by maternal cigarette smoking in

determining birthweight.

Consider birthweight production to be a function of a binary indicator (cig) for whether the
•
mother smoked during pregnancy, other relevant covariates ( X o ), and any unobservable

determinants of birthweight ( X u ):

⎣ ⎦ (
E ⎡ BirthWeight cig,X o ,X u ⎤ = exp cig ⋅βcig + X oβ o + X uβ u ) (12)

in the minimally parametric case, and (13)

( BirthWeight cig,X ,X ) ~ GeneralizedGamma (κ,µ = cig ⋅β
o u cig + X oβ o + X uβ u ,σ )

Real Data Example (contd)

• The observable vector X o contains birth order (parity), an indicator for race (white v.

nonwhite), an indicator for gender, and a constant;

• The variable of instruments contains parental education, family income, and the per-state

cigarette excise tax. Results

Birthweight Model with Endogenous Treatment Effect
Minimally Parametric (Exp Fully Parametric
Cond Mean) (Generalized Gamma)
Coefficient T-Statistic P-Value Coefficient T-Statistic P-Value
Smoked During Pregnancy -0.17 -3.82 0.00
-0.15 -7.10 0.00
Parity 0.02 3.06 0.000.01 2.81 0.01
White 0.06 4.65 0.000.05 4.19 0.00
Male 0.02 2.31 0.020.02 1.91 0.06
Constant 1.95 124.33 0.001.99 130.90 0.00
Xu 0.05 2.23 0.030.04 5.30 0.00
Effect of Cig on B.Wt. (lbs) -1.18 -4.26 0.00
-1.03 -7.80 0.00
κ 0.60 4.77 0.00
σ 0.16 20.00 0.00
All parameter estimates significant at conventional levels.
Standard errors corrected for multi-step estimation.

Real Data Example (contd)

• Results are broadly consistent between minimally and maximally parametric estimators,

although there are appear to be some efficiency gains from using maximum likelihood.

• In the minimally parametric case, maternal smoking appears to lead to a loss of 1.18 pounds;

and in the fully parametric case a loss of 1.03 pounds.

• Both are considerably different from a treatment effect estimate using NLS with an

exponential conditional mean that did not correct for endogeneity, which implies an average

drop in birthweight of about 0.57 pounds.

• Estimates of parameters κ and σ are statistically significant, so use of the generalized gamma

does appear to offer an opportunity for greater fit.

Cunningham slides-ch2

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Cunningham slides-ch2 (20)

Recently uploaded (20)

Cunningham slides-ch2