SlideShare a Scribd company logo
Chapter 2: Variably Parametric Nonlinear Regression with Endogenous Switching



                             James Cunningham


                              (September, 2012)
Introduction

• Most empirical research in health economics (HE) focuses on measurement of policy-relevant

  causal effects: what effect would an exogenously mandated change in the (policy) variable

  have on the outcome of interest?

• HE is replete with nonlinear outcomes: non-negative; count-valued; highly skewed; etc.

• The dissertation as a whole treats practical methods of estimating endogenous treatment

  effects in nonlinear models.

• This paper — Chapter 2 — develops some flexible but parametric estimators in the case of

  binary endogenous switching, methods which are either

    o Minimally parametric (requiring specification of the conditional mean)

    o Full information (requiring specification of the conditional density)
Introduction (contd)

• These form foundation of the dissertation, drawing upon the research of Terza (1998, 2009,

  etc).

• I demonstrate two estimators:

     o Minimally parametric with specification of a conditional mean; by example we use an

          exponential conditional mean with a linear index.

     o Fully parametric with specification of the conditional density of the outcome; by example

          we use the three-parameter generalized gamma (Manning et al. [2005]).

• In the sections that follow we introduce the estimation objective (average treatment effect);

  give detail on the estimators; provide a Monte Carlo study of their efficiency properties; and

  apply them to real data.
Estimation Objective: Average Treatment Effect from a Potential Outcomes Perspective

• Consider measurement of the effect of a policy-relevant variable X p on an outcome Y.

• Distinguish between the observed X p and its exogenously mandated counterpart X* , and
                                                                                 p


  similarly between Y and its potential (possibly counterfactual) value YX* .
                                                                          p




• Then the average treatment effect is given by

         E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤
           ⎣ ⎦        ⎣ ⎦                                                                   (1)

• Due to the (possibly) counterfactual natural of the random variables Y1 and Y0 , (1) cannot be

  estimated directly.
Estimation Objective: Average Treatment Effect (contd)

• But when controlling for a comprehensive set of variables X o (observed), X u (unobserved),

  we can iterate expectations:

         ATE = E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤
                 ⎣ ⎦        ⎣ ⎦
                                                                                                      (2)
               = E X ,X ⎡ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ⎤
                    o  u ⎣  ⎣                    ⎦     ⎣                    ⎦⎦

• When correlated with X p , ignoring the unobserved X u will spuriously attribute some of its

  effect to X p .

• We can recover causal interpretation by formalizing the correlation between X p and X u , as in

                (
         X p = 1 Wα + X u > 0   )                                                                     (3)


  where 1(⋅) is a standard indicator function, W = ⎡ X o
                                                   ⎣             W + ⎤ , W + is a vector of identifying
                                                                     ⎦

                                 (       )
  instrumental variables, and X u W ~ N ( 0, 1) .
Estimation Objective: Average Treatment Effect (contd)

• By iterating expectations, we can then write (2) as


                  o
                    ⎡⌠ ∞
                         {
                    ⎣ ⌡−∞ ⎣                   ⎦     ⎣                   }(
                                                                         ⎦     )      ⎤
         ATE = E X ⎢ ⎮ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ϕ X u dX u ⎥
                                                                                      ⎦
                                                                                          (4)


• Then an estimator of (1), through (2) and (3), is


                         {
          = 1 ∑ ⎡ ⌠ E ⎡ Y X = 1,X ,X ⎤ − E ⎡ Y X = 0,X ,X ⎤ ϕ X dX ⎤
                                                                        }(
                      ∞
                                                                                )
                n
         ATE             ˆ                ˆ                                               (5)
                   ⎢⎮
             n i=1 ⎣ ⌡−∞ ⎣  p     o  u⎦     ⎣    p     o  u⎦    u  u⎥
                                                                    ⎦

        ˆ
  where E ⎡⋅⎤ denotes an estimate of an expected value.
          ⎣⎦

• We thus proceed by specifying estimators as if X u were observed, just one variable among

  others.
Endogenous Treatment Effects in Continuous Nonnegative Models

• Consider the common specification


          ⎣                 ⎦       (
        E ⎡ Y X p ,X o ,X u ⎤ = exp X pβ p + X oβ o + X uβ u   )                                (6)

• After some algebra the treatment effect from (5) can be written


                           (       )( ( ) )
         = 1 ∑ ⎡exp X β + exp β − 1 ⎤
               n
        ATE            ˆ       ˆ                                                                (7)
            n i=1 ⎢
                  ⎣   o o        p   ⎥
                                     ⎦

        ˆ
  where β denotes an estimate of β , and β + is β o with its constant term shifted by 1 β 2 .
                                           o                                          2 u

• We consider minimally and fully parametric approaches to the estimation of the parameters

  necessary for (7).
Endogenous Treatment Effects in Continuous Nonnegative Models: Minimally Parametric

• If the conditional mean assumption (6) holds, no further assumption is required (beyond the

  relationship between X p and X u ).

• To derive consistent estimates of the parameters, it can be shown that

                                 ⎡ Φ β u + wα   (          )                 (         )⎤
                        (               )
                                                       1− Φ β u + wα
  E ⎡ Y X ,W ⎤ = exp X β + X β + ⎢ x
    ⎣    p   ⎦        p p   o o
                                 ⎢
                                     p
                                       Φ wα
                                              + 1− x p
                                                    ( ) (1− Φ wα
                                                                      )          ( )
                                                                                        ⎥
                                                                                        ⎥
                                                                                              (8)
                                 ⎣                                                      ⎦

• (8) can be employed in estimation via a two-step procedure: probit in the first stage and

  Nonlinear least squares in the second.
Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• When further assumptions can or must be made, we must consider a full-information version

  of the model above. Letting gg refer to the generalized gamma, assume that


          (               ) (
        f Y X o ,X p ,X u = gg Y X;µ,κ,σ        )
                                                (           )
                                   γγ                                                          (9)
                            =                exp Z γ − U
                                σY γ Γ γ()
  X = ⎡ X p X o X u ⎤ , µ = X pβ p + X oβ o + X uβ u , γ = κ , Z = sgn ( κ ) ( log y − µ ) / σ ,
                                                            −2
      ⎣             ⎦

                 ( )
  and U = γ exp κ Z

• The generalized gamma is highly flexible: it fits the nonnegative, highly skewed outcomes

  common in HE, and subsumes many popular distributions (gamma, Weibull, exponential,

  lognormal)
Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric

• Further:


             ⎣                 ⎦     (
           E ⎡ Y X p ,X o ,X u ⎤ = exp µ + k   )                                            (10)


               (     )    ( )         (                    )
  where k = σ / κ log κ 2 + log Γ ⎡ κ −2 + σ / κ ⎤ − log κ −2
                                          ⎣            ⎦               ( )
• Thus the average treatment effect estimator takes the above form, after adding the correction

  k.

• It can be shown that (11)

                             ⎧        ⎛      ⌠∞
                                                                   (
                                        X pi ⎮ gg Yi X i ;κ,µ,σ ϕ X u dX u + ) ( )  ⎞⎫
       (                      )
                          n ⎪
  L α,β,µ,κ,σ Y,X p ;W = ∑ ⎨ X pi log ⎜
                                             ⌡−wα                                   ⎟⎪
                                      ⎜                                             ⎟⎬
                         i=1 ⎪
                             ⎩
                                      ⎜ 1− X pi ⎮
                                      ⎝            (
                                                  ⌠ −wα
                                                  ⌡−∞          )         (        ) ( )
                                                        gg Yi X i ;κ,µ,σ ϕ X u dX u ⎟ ⎪
                                                                                    ⎠⎭


• The parameters β and α can be jointly estimated via maximum likelihood using (11).
Monte Carlo Simulations

• To evaluate the consistency properties of the above estimators, we undertake a Monte Carlo

  study. In all simulations the data generating process takes the following form:

  X o ~ U ( −0.5, 1) , W ~ U ( 0, 1) , X u ~ N ( 0, 1)

         (
  X p = 1 X oα o + Wα w + α c + X u > 0      )
  µ = X pβ p + X oβ o + X uβ u + βc , κ = 0.8 , σ = 0.4

  Y ~ GeneralizedGamma ( µ,σ,κ )

  ⎡α o α W α c ⎤ = ⎡1 1 0.5⎤
  ⎣            ⎦ ⎣         ⎦

  ⎡β β β β ⎤ = ⎡1 1 0.5 0.25⎤
  ⎣ p o u c⎦ ⎣              ⎦

• The average treatment effect was estimated by the above.
Monte Carlo Simulations (contd)

With 500 repetitions each with sample sizes 5,000; 10,000; 50,000; and 100,000, we compute the

                                                        m ˆ
                                                 ()
                                                 ˆ = 1 ∑ β−β .
absolute percentage bias for each parameter: ABP β
                                                     m i=1 β



     Endogenous Treatment: Minimally Parametric Exponential Conditional Mean Estimator
              βp = 1           βo = 1           βu = 0.5         βc = 0.25       ATE = 2.22
n           Est    ABP       Est    ABP       Est      ABP      Est     ABP       Est    ABP
5,000      0.995   7.65%    1.002   2.82%   0.504     11.24%   0.247   12.97%    2.201   6.24%
10,000     0.996   5.58%    1.002   1.97%   0.504     8.14%    0.249   9.88%     2.208   4.47%
50,000     1.002   2.38%    1.000   0.90%   0.498     3.55%    0.249   4.07%     2.219   1.91%
100,000    0.998   1.72%    1.000   0.67%   0.501     2.53%    0.250   2.84%     2.212   1.41%
Monte Carlo Simulations (contd)



    Endogenous Treatment: Full-Information Generalized Gamma Estimator
                   βp = 1            βo = 1        βu = 0.5       βc = 0.25
n            Est       ABP     Est       ABP     Est    ABP      Est    ABP
5,000       1.008     2.20%   0.998     0.91%   0.494   2.60%   0.240   8.34%
10,000      1.007     1.62%   0.999     0.67%   0.495   1.80%   0.243   6.12%
50,000      1.006     0.86%   0.999     0.32%   0.496   1.04%   0.243   3.45%
100,000     1.006     0.71%   0.999     0.23%   0.496   0.92%   0.243   3.03%
             ATE = 2.22          κ = 0.8           σ = 0.4
             Est       ABP     Est       ABP     Est    ABP
5,000       2.226     2.12%   0.773     7.10%   0.406   2.95%
10,000      2.229     1.64%   0.777     5.25%   0.406   2.25%
50,000      2.227     0.86%   0.777     3.21%   0.406   1.53%
100,000     0.223     0.62%   0.778     2.83%   0.405   1.38%
Monte Carlo Simulations (contd)

• On average, the parameter estimates are hit relatively well.

• There are clear efficiency advantages to using the full-information estimator — percentage

  biases are low even in small samples.

• In small samples using the minimally parametric estimator, β u appears subject to some bias,

  but implications for treatment effect estimation seems minimal.

• In future revisions simulations should draw upon correct standard errors to characterize the

  seriousness of these implications in determining (and correcting for) endogeneity bias in small

  samples.
Real Data Example

• To provide an empirical demonstration, we applied both estimators above to the birthweight

    data from Mullahy (1997), who investigated the role played by maternal cigarette smoking in

    determining birthweight.

    Consider birthweight production to be a function of a binary indicator (cig) for whether the
•
    mother smoked during pregnancy, other relevant covariates ( X o ), and any unobservable

    determinants of birthweight ( X u ):


            ⎣                          ⎦           (
          E ⎡ BirthWeight cig,X o ,X u ⎤ = exp cig ⋅βcig + X oβ o + X uβ u   )                    (12)

    in the minimally parametric case, and (13)

      ( BirthWeight cig,X ,X ) ~ GeneralizedGamma (κ,µ = cig ⋅β
                             o    u                                        cig   + X oβ o + X uβ u ,σ   )
Real Data Example (contd)

  • The observable vector X o contains birth order (parity), an indicator for race (white v.

    nonwhite), an indicator for gender, and a constant;

  • The variable of instruments contains parental education, family income, and the per-state

    cigarette excise tax. Results

                   Birthweight Model with Endogenous Treatment Effect
                                Minimally Parametric (Exp        Fully Parametric
                                       Cond Mean)              (Generalized Gamma)
                                Coefficient   T-Statistic   P-Value   Coefficient   T-Statistic   P-Value
Smoked During Pregnancy           -0.17        -3.82         0.00
                                                                -0.15        -7.10    0.00
Parity                             0.02        3.06          0.000.01         2.81    0.01
White                              0.06        4.65          0.000.05         4.19    0.00
Male                               0.02        2.31          0.020.02         1.91    0.06
Constant                           1.95       124.33         0.001.99       130.90    0.00
Xu                                 0.05        2.23          0.030.04         5.30    0.00
Effect of Cig on B.Wt. (lbs)      -1.18        -4.26         0.00
                                                                -1.03        -7.80    0.00
κ                                                                0.60         4.77    0.00
σ                                                                0.16       20.00     0.00
                               All parameter estimates significant at conventional levels.
                               Standard errors corrected for multi-step estimation.
Real Data Example (contd)

• Results are broadly consistent between minimally and maximally parametric estimators,

  although there are appear to be some efficiency gains from using maximum likelihood.

• In the minimally parametric case, maternal smoking appears to lead to a loss of 1.18 pounds;

  and in the fully parametric case a loss of 1.03 pounds.

• Both are considerably different from a treatment effect estimate using NLS with an

  exponential conditional mean that did not correct for endogeneity, which implies an average

  drop in birthweight of about 0.57 pounds.

• Estimates of parameters κ and σ are statistically significant, so use of the generalized gamma

  does appear to offer an opportunity for greater fit.
Cunningham slides-ch2

More Related Content

PDF
Tele3113 wk1wed
PDF
Introduction to Stochastic calculus
PDF
Stochastic calculus
PDF
Numerical solution of boundary value problems by piecewise analysis method
PDF
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
PDF
Akitoshi Takayasu
PDF
Introduction to Decision Making Theory
PDF
Advanced Microeconomics - Lecture Slides
Tele3113 wk1wed
Introduction to Stochastic calculus
Stochastic calculus
Numerical solution of boundary value problems by piecewise analysis method
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
Akitoshi Takayasu
Introduction to Decision Making Theory
Advanced Microeconomics - Lecture Slides

What's hot (20)

PDF
Multivriada ppt ms
PDF
Optimization Approach to Nash Euilibria with Applications to Interchangeability
PDF
ma112011id535
PDF
A Geometric Note on a Type of Multiple Testing-07-24-2015
PDF
So a webinar-2013-2
PDF
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
PDF
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
PDF
Slides erasmus
PDF
Lesson 16: Inverse Trigonometric Functions
PDF
L25052056
PDF
Slides ACTINFO 2016
PDF
BlUP and BLUE- REML of linear mixed model
PDF
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PDF
Econometrics, PhD Course, #1 Nonlinearities
PDF
Prml
PDF
Slides Bank England
PDF
Multiattribute Decision Making
PPT
Probability statistics assignment help
PDF
1 - Linear Regression
PDF
Dag in mmhc
Multivriada ppt ms
Optimization Approach to Nash Euilibria with Applications to Interchangeability
ma112011id535
A Geometric Note on a Type of Multiple Testing-07-24-2015
So a webinar-2013-2
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
Slides erasmus
Lesson 16: Inverse Trigonometric Functions
L25052056
Slides ACTINFO 2016
BlUP and BLUE- REML of linear mixed model
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
Econometrics, PhD Course, #1 Nonlinearities
Prml
Slides Bank England
Multiattribute Decision Making
Probability statistics assignment help
1 - Linear Regression
Dag in mmhc
Ad

Viewers also liked (8)

PPTX
E-commerce
PPT
Building a Business Development Strategy
DOC
12. Strategic Business Plan Outline
PPTX
Business development strategy
PPT
Best Practices In Business Development
PPTX
How to develop an effective Business Development Strategy
PPT
Business Development Presentation
PPTX
Developing A Strategic Business Plan
E-commerce
Building a Business Development Strategy
12. Strategic Business Plan Outline
Business development strategy
Best Practices In Business Development
How to develop an effective Business Development Strategy
Business Development Presentation
Developing A Strategic Business Plan
Ad

Similar to Cunningham slides-ch2 (20)

PDF
Proba stats-r1-2017
PDF
Ian.petrow【transcendental number theory】.
PDF
Basics of probability in statistical simulation and stochastic programming
PDF
Probability Formula sheet
PDF
Finance Enginering from Columbia.pdf
PDF
Problem_Session_Notes
PDF
tensor-decomposition
PPTX
Unit II PPT.pptx
PDF
Logit model testing and interpretation
PDF
FullMLCheatSheetfor engineering students .pdf
PDF
Deep learning .pdf
PDF
Relaxed Utility Maximization in Complete Markets
PPT
SOME PROPERTIES OF ESTIMATORS - 552.ppt
PDF
PDF
Cheatsheet probability
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Nested sampling
PDF
Fixed Point Results In Fuzzy Menger Space With Common Property (E.A.)
PDF
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
PDF
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
Proba stats-r1-2017
Ian.petrow【transcendental number theory】.
Basics of probability in statistical simulation and stochastic programming
Probability Formula sheet
Finance Enginering from Columbia.pdf
Problem_Session_Notes
tensor-decomposition
Unit II PPT.pptx
Logit model testing and interpretation
FullMLCheatSheetfor engineering students .pdf
Deep learning .pdf
Relaxed Utility Maximization in Complete Markets
SOME PROPERTIES OF ESTIMATORS - 552.ppt
Cheatsheet probability
Welcome to International Journal of Engineering Research and Development (IJERD)
Nested sampling
Fixed Point Results In Fuzzy Menger Space With Common Property (E.A.)
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
01-Introduction-to-Information-Management.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Classroom Observation Tools for Teachers
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
01-Introduction-to-Information-Management.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Classroom Observation Tools for Teachers
O5-L3 Freight Transport Ops (International) V1.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
human mycosis Human fungal infections are called human mycosis..pptx

Cunningham slides-ch2

  • 1. Chapter 2: Variably Parametric Nonlinear Regression with Endogenous Switching James Cunningham (September, 2012)
  • 2. Introduction • Most empirical research in health economics (HE) focuses on measurement of policy-relevant causal effects: what effect would an exogenously mandated change in the (policy) variable have on the outcome of interest? • HE is replete with nonlinear outcomes: non-negative; count-valued; highly skewed; etc. • The dissertation as a whole treats practical methods of estimating endogenous treatment effects in nonlinear models. • This paper — Chapter 2 — develops some flexible but parametric estimators in the case of binary endogenous switching, methods which are either o Minimally parametric (requiring specification of the conditional mean) o Full information (requiring specification of the conditional density)
  • 3. Introduction (contd) • These form foundation of the dissertation, drawing upon the research of Terza (1998, 2009, etc). • I demonstrate two estimators: o Minimally parametric with specification of a conditional mean; by example we use an exponential conditional mean with a linear index. o Fully parametric with specification of the conditional density of the outcome; by example we use the three-parameter generalized gamma (Manning et al. [2005]). • In the sections that follow we introduce the estimation objective (average treatment effect); give detail on the estimators; provide a Monte Carlo study of their efficiency properties; and apply them to real data.
  • 4. Estimation Objective: Average Treatment Effect from a Potential Outcomes Perspective • Consider measurement of the effect of a policy-relevant variable X p on an outcome Y. • Distinguish between the observed X p and its exogenously mandated counterpart X* , and p similarly between Y and its potential (possibly counterfactual) value YX* . p • Then the average treatment effect is given by E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤ ⎣ ⎦ ⎣ ⎦ (1) • Due to the (possibly) counterfactual natural of the random variables Y1 and Y0 , (1) cannot be estimated directly.
  • 5. Estimation Objective: Average Treatment Effect (contd) • But when controlling for a comprehensive set of variables X o (observed), X u (unobserved), we can iterate expectations: ATE = E ⎡ Y1 ⎤ − E ⎡ Y0 ⎤ ⎣ ⎦ ⎣ ⎦ (2) = E X ,X ⎡ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ⎤ o u ⎣ ⎣ ⎦ ⎣ ⎦⎦ • When correlated with X p , ignoring the unobserved X u will spuriously attribute some of its effect to X p . • We can recover causal interpretation by formalizing the correlation between X p and X u , as in ( X p = 1 Wα + X u > 0 ) (3) where 1(⋅) is a standard indicator function, W = ⎡ X o ⎣ W + ⎤ , W + is a vector of identifying ⎦ ( ) instrumental variables, and X u W ~ N ( 0, 1) .
  • 6. Estimation Objective: Average Treatment Effect (contd) • By iterating expectations, we can then write (2) as o ⎡⌠ ∞ { ⎣ ⌡−∞ ⎣ ⎦ ⎣ }( ⎦ ) ⎤ ATE = E X ⎢ ⎮ E ⎡ Y X p = 1,X o ,X u ⎤ − E ⎡ Y X p = 0,X o ,X u ⎤ ϕ X u dX u ⎥ ⎦ (4) • Then an estimator of (1), through (2) and (3), is {  = 1 ∑ ⎡ ⌠ E ⎡ Y X = 1,X ,X ⎤ − E ⎡ Y X = 0,X ,X ⎤ ϕ X dX ⎤ }( ∞ ) n ATE ˆ ˆ (5) ⎢⎮ n i=1 ⎣ ⌡−∞ ⎣ p o u⎦ ⎣ p o u⎦ u u⎥ ⎦ ˆ where E ⎡⋅⎤ denotes an estimate of an expected value. ⎣⎦ • We thus proceed by specifying estimators as if X u were observed, just one variable among others.
  • 7. Endogenous Treatment Effects in Continuous Nonnegative Models • Consider the common specification ⎣ ⎦ ( E ⎡ Y X p ,X o ,X u ⎤ = exp X pβ p + X oβ o + X uβ u ) (6) • After some algebra the treatment effect from (5) can be written ( )( ( ) )  = 1 ∑ ⎡exp X β + exp β − 1 ⎤ n ATE ˆ ˆ (7) n i=1 ⎢ ⎣ o o p ⎥ ⎦ ˆ where β denotes an estimate of β , and β + is β o with its constant term shifted by 1 β 2 . o 2 u • We consider minimally and fully parametric approaches to the estimation of the parameters necessary for (7).
  • 8. Endogenous Treatment Effects in Continuous Nonnegative Models: Minimally Parametric • If the conditional mean assumption (6) holds, no further assumption is required (beyond the relationship between X p and X u ). • To derive consistent estimates of the parameters, it can be shown that ⎡ Φ β u + wα ( ) ( )⎤ ( ) 1− Φ β u + wα E ⎡ Y X ,W ⎤ = exp X β + X β + ⎢ x ⎣ p ⎦ p p o o ⎢ p Φ wα + 1− x p ( ) (1− Φ wα ) ( ) ⎥ ⎥ (8) ⎣ ⎦ • (8) can be employed in estimation via a two-step procedure: probit in the first stage and Nonlinear least squares in the second.
  • 9. Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric • When further assumptions can or must be made, we must consider a full-information version of the model above. Letting gg refer to the generalized gamma, assume that ( ) ( f Y X o ,X p ,X u = gg Y X;µ,κ,σ ) ( ) γγ (9) = exp Z γ − U σY γ Γ γ() X = ⎡ X p X o X u ⎤ , µ = X pβ p + X oβ o + X uβ u , γ = κ , Z = sgn ( κ ) ( log y − µ ) / σ , −2 ⎣ ⎦ ( ) and U = γ exp κ Z • The generalized gamma is highly flexible: it fits the nonnegative, highly skewed outcomes common in HE, and subsumes many popular distributions (gamma, Weibull, exponential, lognormal)
  • 10. Endogenous Treatment Effects in Continuous Nonnegative Models: Fully Parametric • Further: ⎣ ⎦ ( E ⎡ Y X p ,X o ,X u ⎤ = exp µ + k ) (10) ( ) ( ) ( ) where k = σ / κ log κ 2 + log Γ ⎡ κ −2 + σ / κ ⎤ − log κ −2 ⎣ ⎦ ( ) • Thus the average treatment effect estimator takes the above form, after adding the correction k. • It can be shown that (11) ⎧ ⎛ ⌠∞ ( X pi ⎮ gg Yi X i ;κ,µ,σ ϕ X u dX u + ) ( ) ⎞⎫ ( ) n ⎪ L α,β,µ,κ,σ Y,X p ;W = ∑ ⎨ X pi log ⎜ ⌡−wα ⎟⎪ ⎜ ⎟⎬ i=1 ⎪ ⎩ ⎜ 1− X pi ⎮ ⎝ ( ⌠ −wα ⌡−∞ ) ( ) ( ) gg Yi X i ;κ,µ,σ ϕ X u dX u ⎟ ⎪ ⎠⎭ • The parameters β and α can be jointly estimated via maximum likelihood using (11).
  • 11. Monte Carlo Simulations • To evaluate the consistency properties of the above estimators, we undertake a Monte Carlo study. In all simulations the data generating process takes the following form: X o ~ U ( −0.5, 1) , W ~ U ( 0, 1) , X u ~ N ( 0, 1) ( X p = 1 X oα o + Wα w + α c + X u > 0 ) µ = X pβ p + X oβ o + X uβ u + βc , κ = 0.8 , σ = 0.4 Y ~ GeneralizedGamma ( µ,σ,κ ) ⎡α o α W α c ⎤ = ⎡1 1 0.5⎤ ⎣ ⎦ ⎣ ⎦ ⎡β β β β ⎤ = ⎡1 1 0.5 0.25⎤ ⎣ p o u c⎦ ⎣ ⎦ • The average treatment effect was estimated by the above.
  • 12. Monte Carlo Simulations (contd) With 500 repetitions each with sample sizes 5,000; 10,000; 50,000; and 100,000, we compute the m ˆ () ˆ = 1 ∑ β−β . absolute percentage bias for each parameter: ABP β m i=1 β Endogenous Treatment: Minimally Parametric Exponential Conditional Mean Estimator βp = 1 βo = 1 βu = 0.5 βc = 0.25 ATE = 2.22 n Est ABP Est ABP Est ABP Est ABP Est ABP 5,000 0.995 7.65% 1.002 2.82% 0.504 11.24% 0.247 12.97% 2.201 6.24% 10,000 0.996 5.58% 1.002 1.97% 0.504 8.14% 0.249 9.88% 2.208 4.47% 50,000 1.002 2.38% 1.000 0.90% 0.498 3.55% 0.249 4.07% 2.219 1.91% 100,000 0.998 1.72% 1.000 0.67% 0.501 2.53% 0.250 2.84% 2.212 1.41%
  • 13. Monte Carlo Simulations (contd) Endogenous Treatment: Full-Information Generalized Gamma Estimator βp = 1 βo = 1 βu = 0.5 βc = 0.25 n Est ABP Est ABP Est ABP Est ABP 5,000 1.008 2.20% 0.998 0.91% 0.494 2.60% 0.240 8.34% 10,000 1.007 1.62% 0.999 0.67% 0.495 1.80% 0.243 6.12% 50,000 1.006 0.86% 0.999 0.32% 0.496 1.04% 0.243 3.45% 100,000 1.006 0.71% 0.999 0.23% 0.496 0.92% 0.243 3.03% ATE = 2.22 κ = 0.8 σ = 0.4 Est ABP Est ABP Est ABP 5,000 2.226 2.12% 0.773 7.10% 0.406 2.95% 10,000 2.229 1.64% 0.777 5.25% 0.406 2.25% 50,000 2.227 0.86% 0.777 3.21% 0.406 1.53% 100,000 0.223 0.62% 0.778 2.83% 0.405 1.38%
  • 14. Monte Carlo Simulations (contd) • On average, the parameter estimates are hit relatively well. • There are clear efficiency advantages to using the full-information estimator — percentage biases are low even in small samples. • In small samples using the minimally parametric estimator, β u appears subject to some bias, but implications for treatment effect estimation seems minimal. • In future revisions simulations should draw upon correct standard errors to characterize the seriousness of these implications in determining (and correcting for) endogeneity bias in small samples.
  • 15. Real Data Example • To provide an empirical demonstration, we applied both estimators above to the birthweight data from Mullahy (1997), who investigated the role played by maternal cigarette smoking in determining birthweight. Consider birthweight production to be a function of a binary indicator (cig) for whether the • mother smoked during pregnancy, other relevant covariates ( X o ), and any unobservable determinants of birthweight ( X u ): ⎣ ⎦ ( E ⎡ BirthWeight cig,X o ,X u ⎤ = exp cig ⋅βcig + X oβ o + X uβ u ) (12) in the minimally parametric case, and (13) ( BirthWeight cig,X ,X ) ~ GeneralizedGamma (κ,µ = cig ⋅β o u cig + X oβ o + X uβ u ,σ )
  • 16. Real Data Example (contd) • The observable vector X o contains birth order (parity), an indicator for race (white v. nonwhite), an indicator for gender, and a constant; • The variable of instruments contains parental education, family income, and the per-state cigarette excise tax. Results Birthweight Model with Endogenous Treatment Effect Minimally Parametric (Exp Fully Parametric Cond Mean) (Generalized Gamma) Coefficient T-Statistic P-Value Coefficient T-Statistic P-Value Smoked During Pregnancy -0.17 -3.82 0.00 -0.15 -7.10 0.00 Parity 0.02 3.06 0.000.01 2.81 0.01 White 0.06 4.65 0.000.05 4.19 0.00 Male 0.02 2.31 0.020.02 1.91 0.06 Constant 1.95 124.33 0.001.99 130.90 0.00 Xu 0.05 2.23 0.030.04 5.30 0.00 Effect of Cig on B.Wt. (lbs) -1.18 -4.26 0.00 -1.03 -7.80 0.00 κ 0.60 4.77 0.00 σ 0.16 20.00 0.00 All parameter estimates significant at conventional levels. Standard errors corrected for multi-step estimation.
  • 17. Real Data Example (contd) • Results are broadly consistent between minimally and maximally parametric estimators, although there are appear to be some efficiency gains from using maximum likelihood. • In the minimally parametric case, maternal smoking appears to lead to a loss of 1.18 pounds; and in the fully parametric case a loss of 1.03 pounds. • Both are considerably different from a treatment effect estimate using NLS with an exponential conditional mean that did not correct for endogeneity, which implies an average drop in birthweight of about 0.57 pounds. • Estimates of parameters κ and σ are statistically significant, so use of the generalized gamma does appear to offer an opportunity for greater fit.