SlideShare a Scribd company logo
U N I V E R S I T Y O F S O U T H F L O R I D A //
Discrete Choice Model
Dr. Shivendu
Agenda
5/24/2022 2
Discrete choice models: Multiple Choices
• Multinomial Models
• Ordinal Logit Models
• Censored Regression or Count Data Models: Tobit Models
Quiz 8: Based on Class 9 Readings
Class 9_SAS_Module
Statistical Analysis IV: Non-Parametric Procedures
• Chap 27 and 28 of DAU_SAS
• SAS Assignment 9 posted: Due before class 10
Examples of multinomial choice (polytomous) situations:
1.Choice of a laundry detergent: Tide, Cheer, Arm & Hammer, Wisk, etc.
2.Choice of a major: economics, marketing, management, finance or
accounting.
3.Choices after graduating from high school: not going to college, going
to a private 4-year college, a public 4 year-college, or a 2-year college.
Slide16-3
Principles of Econometrics, 3rd Edition
The explanatory variable xi is individual specific but does not change
across alternatives. Example age of the individual.
The dependent variable is nominal
Slide16-4
Principles of Econometrics, 3rd Edition
Examples of multinomial choice situations:
1. It is key that there are more than 2 choices
2. It is key that there is no meaningful ordering to them. Otherwise, we would
want to use that information (with an ordered probit or ordered logit)
Slide16-5
Principles of Econometrics, 3rd Edition
In essence this model is like a set of simultaneous individual binomial/binary
logistic regressions
With appropriate weighting, since the different comparisons between different
pairs of categories would generally involve different numbers of
observations
Slide16-6
Principles of Econometrics, 3rd Edition
Principles of Econometrics, 3rd Edition Slide16-7
   
1
12 22 13 23
1
, 1
1 exp exp
i
i i
p j
x x
 
     
 
   
12 22
2
12 22 13 23
exp
, 2
1 exp exp
i
i
i i
x
p j
x x
 
 
     
 
   
13 23
3
12 22 13 23
exp
, 3
1 exp exp
i
i
i i
x
p j
x x
 
 
     
 
individual chooses alternative
ij
p P i j

An interesting feature of the odds ratio (16.21) is that the odds of choosing
alternative j rather than alternative 1 does not depend on how many alternatives
there are in total. There is the implicit assumption in logit models that the odds
between any pair of alternatives is independent of irrelevant alternatives (IIA).
Principles of Econometrics, 3rd Edition Slide16-8
 
 
 
1 2
1
exp 2,3
1
ij
i
j j i
i i
p
P y j
x j
P y p

    

 
 
1
2 1 2
exp 2,3
ij i
j j j i
i
p p
x j
x

    

• There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
One way to state the assumption
• If choice A is preferred to choice B out of the choice set {A,B}, then
introducing a third alternative X, thus expanding that choice set to
{A,B,X}, must not make B preferable to A.
• which kind of makes sense 
Slide16-9
Principles of Econometrics, 3rd Edition
IIA assumption
• There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
In the case of the multinomial logit model, the IIA implies that adding
another alternative or changing the characteristics of a third alternative
must not affect the relative odds between the two alternatives
considered.
This is not realistic for many real life applications involving similar
(substitute) alternatives.
Slide16-10
Principles of Econometrics, 3rd Edition
IIA assumption
This is not realistic for many real-life applications with similar
(substitute) alternatives
Examples:
• Beethoven/Debussy versus another of Beethoven’s Symphonies
(Debreu 1960; Tversky 1972)
• Bicycle/Pony (Luce and Suppes 1965)
• Red Bus/Blue Bus (McFadden 1974).
• Black slacks, jeans, shorts versus blue slacks (Hoffman, 2004)
• Etc.
Slide16-11
Principles of Econometrics, 3rd Edition
IIA assumption
Red Bus/Blue Bus (McFadden 1974).
• Imagine commuters first face a decision between two modes of transportation: car and red bus
• Suppose that a consumer chooses between these two options with equal probability, 0.5, so that
the odds ratio equals 1.
• Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the
bus (they are perfect substitutes), consumers are expected to choose between bus and car still
with equal probability, so the probability of car is still 0.5, while the probabilities of each of the
two bus types should go down to 0.25
• However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new
probabilities must be: car 0.33; red bus 0.33; blue bus 0.33
• Te IIA axiom does not mix well with perfect substitutes 
IIA assumption
We can test this assumption with a Hausman-McFadden test which
compares a logistic model with all the choices with one with restricted
choices.
IIA assumption
Model for three categories
Need k-1 generalized logits to represent a dependent
variable with k categories
Meaning of the regression coefficients
A positive regression coefficient for logit j means that higher
values of the independent variable are associated with
greater chances of response category j, compared to
the reference category.
Solve for the probabilities
so
So
Three linear equations in 3 unknowns
Solution
In general, solve k equations in k
unknowns
General Solution
Using the solution, one can
 Calculate the probability of obtaining the
observed data as a function of the regression
coefficients: Get maximum likelihood estimates
(beta-hat values)
 From maximum likelihood estimates, get tests
and confidence intervals
 Using beta-hat values in Lj, estimate
probabilities of category membership for any
set of x values.
Multinominal Logistic Regression in SAS
https://support.sas.com/resources/papers/proceedings12/427-2012.pdf
Multinominal Logistic Regression in SAS
• For a good example and implementation, see:
•https://stats.idre.ucla.edu/sas/dae/multinomiallogistic-regression/
•For example and more statistical theory, see
• https://support.sas.com/resources/papers/proceedings12/427-2012.pdf
5/24/2022 23
 Extensions have arisen to deal with this issue
 The multinomial probit and the mixed logit are alternative models for nominal outcomes that
relax IIA, by allowing correlation among the errors (to reflect similarity among options)
 but these models often have issues and assumptions themselves 
 IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives.
The most popular of these is called the McFadden’s nested logit model, which allows
correlation among some errors, but not all (e.g. Heiss 2002)
 Generalized extreme value and multinomial probit models possess another property, the
Invariant Proportion of Substitution (Steenburgh 2008), which itself also suggests similarly
counterintuitive real-life individual choice behavior
 The multinomial probit has serious computational disadvantages too, since it involves
calculating multiple (one less than the number of categories) integrals. With integration by
simulation this problem is being ameliorated now…
IIA assumption
Combining Categories
Consider testing whether two categories could be combined
If none of the independent variables really explain the odds of choosing choice A versus B, you
should merge them
Multinomial Logit versus Probit
Computational issues make the Multinomial Probit very rare
Advantage: it does not need IIA 
Example: choice between three types (J = 3) of soft drinks, say Pepsi,
7-Up and Coke Classic.
Let yi1, yi2 and yi3 be dummy variables that indicate the choice made by individual i. The price facing
individual i for brand j is PRICEij.
Variables like price are to be individual and alternative specific, because they vary from individual
to individual and are different for each choice the consumer might make
Another example: of mode of transportation choice: time from home to work using train, car, or bus.
For more details and example in implementing conditional logit in SAS, see
https://stats.idre.ucla.edu/sas/faq/how-do-i-do-a-conditional-logit-model-analysis-in-sas-9-1/
Principles of Econometrics, 3rd Edition Slide16-27
Logit Models for Ordinal Responses
• Response variable is ordinal (categorical with natural ordering)
• Predictor variable(s) can be numeric or qualitative (dummy
variables)
• Labeling the ordinal categories from 1 (lowest level) to c (highest),
can obtain the cumulative probabilities:
c
j
j
Y
P
Y
P
j
Y
P ,
,
1
)
(
)
1
(
)
( 
 






Logistic Regression for Ordinal Response
• The odds of falling in category j or below:
1
)
(
1
,
,
1
)
(
)
(






c
Y
P
c
j
j
Y
P
j
Y
P

• Logit (log odds) of cumulative probabilities are modeled as linear
functions of predictor variable(s):
  1
,
,
1
)
(
)
(
log
)
(
logit 












 c
j
X
j
Y
P
j
Y
P
j
Y
P j 


This is called the proportional odds model, and assumes the effect of X is the
same for each cumulative probability
Example - Urban Renewal Attitudes
• Response: Attitude toward urban renewal project (Negative (Y=1),
Moderate (Y=2), Positive (Y=3))
• Predictor Variable: Respondent’s Race (White, Nonwhite)
• Contingency Table:
AttitudeRace White Nonwhite
Negative (Y=1) 101 106
Moderate (Y=2) 91 127
Positive (Y=3) 170 190
SPSS Output
• Note that SPSS fits the model in the following form:
  1
,
,
1
)
(
)
(
log
)
(
logit 












 c
j
X
j
Y
P
j
Y
P
j
Y
P j 


r E
7
2
3
1
0
7
8
5
4
0
1
0
0
1
1
3
0
1
3
3
0
0 a
.
.
0
.
.
.
[ A
[ A
T
[ R
[ R
L
m
E
a
d f
S i g
r B
r B
d e
L
T
a
Note that the race variable is not significant (or even close).
Fitted Equation
• The fitted equation for each group/category:
165
.
0
0
165
.
0
)
Nonwhite
|
2
(
)
Nonwhite
|
2
(
logit
:
te
Mod/Nonwhi
or
Neg
166
.
0
)
001
.
0
(
165
.
0
)
White
|
2
(
)
White
|
2
(
logit
:
Mod/White
or
Neg
027
.
1
)
0
(
027
.
1
)
Nonwhite
|
1
(
)
Nonwhite
|
1
(
logit
:
onwhite
Negative/N
026
.
1
)
001
.
0
(
027
.
1
)
White
|
1
(
)
White
|
1
(
logit
:
hite
Negative/W


















































Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
For each group, the fitted probability of falling in that set of categories is eL/(1+eL)
where L is the logit value (0.264,0.264,0.541,0.541)
Inference for Regression Coefficients
• If  = 0, the response (Y) is independent of X
• Z-test can be conducted to test this (estimate divided by its standard error)
• Most software will conduct the Wald test, with the statistic being the z-statistic
squared, which has a chi-squared distribution with 1 degree of freedom under
the null hypothesis
• Odds ratio of increasing X by 1 unit and its confidence interval are obtained by
raising e to the power of the regression coefficient and its upper and lower
bounds
Example - Urban Renewal Attitudes
• Z-statistic for testing for race differences:
Z=0.001/0.133 = 0.0075 (recall model estimates -)
• Wald statistic: .000 (P-value=.993)
• Estimated odds ratio: e.001 = 1.001
• 95% Confidence Interval: (e-.260,e.263)=(0.771,1.301)
• Interval contains 1, odds of being in a given category or below is same for
whites as nonwhites
r E
7
2
3
1
0
7
8
5
4
0
1
0
0
1
1
3
0
1
3
3
0
0 a
.
.
0
.
.
.
[ A
[ A
T
[ R
[ R
L
m
E
a
d f
S i g
r B
r B
d e
L
T
a
Ordinal Predictors
• Creating dummy variables for ordinal categories treats them as if
nominal
• To make an ordinal variable, create a new variable X that models the
levels of the ordinal variable
• Setting depends on assignment of levels (simplest form is to let
X=1,...,c for the categories which treats levels as being equally
spaced)
Censored Regression or Count Data Models:
Tobit Models
3
7
INTRODUCTION
• Data on y is censored if for part of the range of y
we observe only that y is in that range, rather than
observing the exact value of y.
e.g. income is top-coded at $75,000 per year.
• Data on y is truncated if for part of the range of y we
do not observe y at all.
e.g. people with income above $75,000 per year are
excluded from the sample.
3
8
• Meaningful policy analysis requires extrapolationfrom
the restricted sample to the population as a whole.
• But running regressions on censored or truncated data,
without controlling for censoring or truncation, leads
to inconsistent parameter estimates.
3
9
• We focus on the normal, with censoring or truncation
at zero.
e.g. annual hours worked, and annual expenditure on
automobiles.
• The class of models presented in this chapter is called
limited dependent variable models or latent variable
models. Econometricians also use the terminology
tobit models or generalized tobit models.
4
0
• Censoring can arise for distributions other than the
normal.
• For e.g. count data treatment is similar to here except
different distributions.
• For duration data, e.g. the length of a spell of unem-
ployment, a separate treatment of censoring is given
there due to different censoring mechanism (random)
to that considered here.
4
1
Many Models for Censored Data
• Tobit model: MLE, NLS and Heckman 2-step.
• Sample selectivity model, a generalization of Tobit.
• Semiparametric estimation.
• Structural economic models for censored choice.
• Simultaneous equation models
4
2
TOBIT
MODEL
• Interest lies in a latent dependent variable y*
y* = X'β + c.
• This variable is only partially observed.
• In censored regression we observe
y =
*
y if y >0
0 if y <=0
• In truncated regression we observe
y = y* if y* >0
.
• For censored and truncated data, linear regression is
inappropriate.
4
4
• The standard estimators require stochasticassumptions
about the distribution of c and hence y*.
• The Tobit model assumes normality of the error term.
• For consistent estimates, OLS is not good. We should
use MLE
Tobit in SAS
• For example, and using SAS to estimate Tobit regression, see
• https://stats.idre.ucla.edu/sas/dae/tobit-analysis/
Takeaway
• Multinominal and ordered logit models are increasingly being used in consumer product
choice research as well as in practice.
• Tobit models are useful when data is count data or duration data. These are often used in
specific contexts only.
5/24/2022 46
Going forward
• No more face-to-face class. Session 10 and 11 will be only on Teams.
• Final exam is on 4th May from 8.30 am to 11.00 am and will be on Teams through proctorio.
5/24/2022 47

More Related Content

PPTX
Topic3-Qualitative and Limited Dependent Variables.pptx
PPTX
Logit and Probit and Tobit model: Basic Introduction
PPTX
2.2 Logit and Probit.pptx
PDF
PPTX
Generalized Logistic Regression - by example (Anthony Kilili)
PPTX
Multinomial Logistic Regression Analysis
PPT
Discrete choice models_TT.ppt
PPTX
Chapter 12 (1).pptx applied Econometrics
Topic3-Qualitative and Limited Dependent Variables.pptx
Logit and Probit and Tobit model: Basic Introduction
2.2 Logit and Probit.pptx
Generalized Logistic Regression - by example (Anthony Kilili)
Multinomial Logistic Regression Analysis
Discrete choice models_TT.ppt
Chapter 12 (1).pptx applied Econometrics

Similar to Discrete Choice Model - Part 2 (20)

PPTX
ECONOMETRICS-LBS-2025-#11-LOGIT (1).pptx
PPTX
The probit model
PDF
4_logit_printable_.pdf
PPTX
SIT095_Lecture_9_Logistic_Regression_Part_3.pptx
PDF
Chapter6
PPTX
WF ED 540, Class Meeting14, 3 December 2015 2015
PDF
bayes_proj
PPTX
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
PDF
7. logistics regression using spss
PDF
Logistic Regression Analysis
PPTX
Introduction to Modeling
PPTX
lrmssm sms mssssssssss mssssm - Copy.pptx
PPTX
lrm nns ns sn s sss sssssa aa - Copy.pptx
PPTX
lrmnnnnnnnn hhhhhhhhhh hhhhhhhhh - Copy.pptx
PDF
tutorial1 on economic and strategic issues
DOC
Ch 56669 Slides.doc.2234322344443222222344
PPT
logit_probit.ppt
PDF
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
PPTX
Logistical Regression.pptx
PDF
The application of discrete choice models in marketing
ECONOMETRICS-LBS-2025-#11-LOGIT (1).pptx
The probit model
4_logit_printable_.pdf
SIT095_Lecture_9_Logistic_Regression_Part_3.pptx
Chapter6
WF ED 540, Class Meeting14, 3 December 2015 2015
bayes_proj
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
7. logistics regression using spss
Logistic Regression Analysis
Introduction to Modeling
lrmssm sms mssssssssss mssssm - Copy.pptx
lrm nns ns sn s sss sssssa aa - Copy.pptx
lrmnnnnnnnn hhhhhhhhhh hhhhhhhhh - Copy.pptx
tutorial1 on economic and strategic issues
Ch 56669 Slides.doc.2234322344443222222344
logit_probit.ppt
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Logistical Regression.pptx
The application of discrete choice models in marketing
Ad

More from Michael770443 (9)

PPTX
Discrete Choice Model
PPTX
Categorical Data and Statistical Analysis
PPTX
Analysis of Variance
PPTX
Classification
PPTX
Segmentation: Clustering and Classification
PPTX
Regression Analysis
PPTX
Linear Regression
PPTX
Introduction to Statistical Methods
PPTX
Overview of Statistical Concepts
Discrete Choice Model
Categorical Data and Statistical Analysis
Analysis of Variance
Classification
Segmentation: Clustering and Classification
Regression Analysis
Linear Regression
Introduction to Statistical Methods
Overview of Statistical Concepts
Ad

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Institutional Correction lecture only . . .
PDF
Pre independence Education in Inndia.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Insiders guide to clinical Medicine.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Complications of Minimal Access Surgery at WLH
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial diseases, their pathogenesis and prophylaxis
Microbial disease of the cardiovascular and lymphatic systems
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Renaissance Architecture: A Journey from Faith to Humanism
Institutional Correction lecture only . . .
Pre independence Education in Inndia.pdf
Cell Types and Its function , kingdom of life
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
GDM (1) (1).pptx small presentation for students
Anesthesia in Laparoscopic Surgery in India
VCE English Exam - Section C Student Revision Booklet
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

Discrete Choice Model - Part 2

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // Discrete Choice Model Dr. Shivendu
  • 2. Agenda 5/24/2022 2 Discrete choice models: Multiple Choices • Multinomial Models • Ordinal Logit Models • Censored Regression or Count Data Models: Tobit Models Quiz 8: Based on Class 9 Readings Class 9_SAS_Module Statistical Analysis IV: Non-Parametric Procedures • Chap 27 and 28 of DAU_SAS • SAS Assignment 9 posted: Due before class 10
  • 3. Examples of multinomial choice (polytomous) situations: 1.Choice of a laundry detergent: Tide, Cheer, Arm & Hammer, Wisk, etc. 2.Choice of a major: economics, marketing, management, finance or accounting. 3.Choices after graduating from high school: not going to college, going to a private 4-year college, a public 4 year-college, or a 2-year college. Slide16-3 Principles of Econometrics, 3rd Edition
  • 4. The explanatory variable xi is individual specific but does not change across alternatives. Example age of the individual. The dependent variable is nominal Slide16-4 Principles of Econometrics, 3rd Edition
  • 5. Examples of multinomial choice situations: 1. It is key that there are more than 2 choices 2. It is key that there is no meaningful ordering to them. Otherwise, we would want to use that information (with an ordered probit or ordered logit) Slide16-5 Principles of Econometrics, 3rd Edition
  • 6. In essence this model is like a set of simultaneous individual binomial/binary logistic regressions With appropriate weighting, since the different comparisons between different pairs of categories would generally involve different numbers of observations Slide16-6 Principles of Econometrics, 3rd Edition
  • 7. Principles of Econometrics, 3rd Edition Slide16-7     1 12 22 13 23 1 , 1 1 exp exp i i i p j x x               12 22 2 12 22 13 23 exp , 2 1 exp exp i i i i x p j x x                 13 23 3 12 22 13 23 exp , 3 1 exp exp i i i i x p j x x             individual chooses alternative ij p P i j 
  • 8. An interesting feature of the odds ratio (16.21) is that the odds of choosing alternative j rather than alternative 1 does not depend on how many alternatives there are in total. There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA). Principles of Econometrics, 3rd Edition Slide16-8       1 2 1 exp 2,3 1 ij i j j i i i p P y j x j P y p            1 2 1 2 exp 2,3 ij i j j j i i p p x j x       
  • 9. • There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA) One way to state the assumption • If choice A is preferred to choice B out of the choice set {A,B}, then introducing a third alternative X, thus expanding that choice set to {A,B,X}, must not make B preferable to A. • which kind of makes sense  Slide16-9 Principles of Econometrics, 3rd Edition IIA assumption
  • 10. • There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA) In the case of the multinomial logit model, the IIA implies that adding another alternative or changing the characteristics of a third alternative must not affect the relative odds between the two alternatives considered. This is not realistic for many real life applications involving similar (substitute) alternatives. Slide16-10 Principles of Econometrics, 3rd Edition IIA assumption
  • 11. This is not realistic for many real-life applications with similar (substitute) alternatives Examples: • Beethoven/Debussy versus another of Beethoven’s Symphonies (Debreu 1960; Tversky 1972) • Bicycle/Pony (Luce and Suppes 1965) • Red Bus/Blue Bus (McFadden 1974). • Black slacks, jeans, shorts versus blue slacks (Hoffman, 2004) • Etc. Slide16-11 Principles of Econometrics, 3rd Edition IIA assumption
  • 12. Red Bus/Blue Bus (McFadden 1974). • Imagine commuters first face a decision between two modes of transportation: car and red bus • Suppose that a consumer chooses between these two options with equal probability, 0.5, so that the odds ratio equals 1. • Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the bus (they are perfect substitutes), consumers are expected to choose between bus and car still with equal probability, so the probability of car is still 0.5, while the probabilities of each of the two bus types should go down to 0.25 • However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new probabilities must be: car 0.33; red bus 0.33; blue bus 0.33 • Te IIA axiom does not mix well with perfect substitutes  IIA assumption
  • 13. We can test this assumption with a Hausman-McFadden test which compares a logistic model with all the choices with one with restricted choices. IIA assumption
  • 14. Model for three categories Need k-1 generalized logits to represent a dependent variable with k categories
  • 15. Meaning of the regression coefficients A positive regression coefficient for logit j means that higher values of the independent variable are associated with greater chances of response category j, compared to the reference category.
  • 16. Solve for the probabilities so So
  • 17. Three linear equations in 3 unknowns
  • 19. In general, solve k equations in k unknowns
  • 21. Using the solution, one can  Calculate the probability of obtaining the observed data as a function of the regression coefficients: Get maximum likelihood estimates (beta-hat values)  From maximum likelihood estimates, get tests and confidence intervals  Using beta-hat values in Lj, estimate probabilities of category membership for any set of x values.
  • 22. Multinominal Logistic Regression in SAS https://support.sas.com/resources/papers/proceedings12/427-2012.pdf
  • 23. Multinominal Logistic Regression in SAS • For a good example and implementation, see: •https://stats.idre.ucla.edu/sas/dae/multinomiallogistic-regression/ •For example and more statistical theory, see • https://support.sas.com/resources/papers/proceedings12/427-2012.pdf 5/24/2022 23
  • 24.  Extensions have arisen to deal with this issue  The multinomial probit and the mixed logit are alternative models for nominal outcomes that relax IIA, by allowing correlation among the errors (to reflect similarity among options)  but these models often have issues and assumptions themselves   IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives. The most popular of these is called the McFadden’s nested logit model, which allows correlation among some errors, but not all (e.g. Heiss 2002)  Generalized extreme value and multinomial probit models possess another property, the Invariant Proportion of Substitution (Steenburgh 2008), which itself also suggests similarly counterintuitive real-life individual choice behavior  The multinomial probit has serious computational disadvantages too, since it involves calculating multiple (one less than the number of categories) integrals. With integration by simulation this problem is being ameliorated now… IIA assumption
  • 25. Combining Categories Consider testing whether two categories could be combined If none of the independent variables really explain the odds of choosing choice A versus B, you should merge them
  • 26. Multinomial Logit versus Probit Computational issues make the Multinomial Probit very rare Advantage: it does not need IIA 
  • 27. Example: choice between three types (J = 3) of soft drinks, say Pepsi, 7-Up and Coke Classic. Let yi1, yi2 and yi3 be dummy variables that indicate the choice made by individual i. The price facing individual i for brand j is PRICEij. Variables like price are to be individual and alternative specific, because they vary from individual to individual and are different for each choice the consumer might make Another example: of mode of transportation choice: time from home to work using train, car, or bus. For more details and example in implementing conditional logit in SAS, see https://stats.idre.ucla.edu/sas/faq/how-do-i-do-a-conditional-logit-model-analysis-in-sas-9-1/ Principles of Econometrics, 3rd Edition Slide16-27
  • 28. Logit Models for Ordinal Responses • Response variable is ordinal (categorical with natural ordering) • Predictor variable(s) can be numeric or qualitative (dummy variables) • Labeling the ordinal categories from 1 (lowest level) to c (highest), can obtain the cumulative probabilities: c j j Y P Y P j Y P , , 1 ) ( ) 1 ( ) (         
  • 29. Logistic Regression for Ordinal Response • The odds of falling in category j or below: 1 ) ( 1 , , 1 ) ( ) (       c Y P c j j Y P j Y P  • Logit (log odds) of cumulative probabilities are modeled as linear functions of predictor variable(s):   1 , , 1 ) ( ) ( log ) ( logit               c j X j Y P j Y P j Y P j    This is called the proportional odds model, and assumes the effect of X is the same for each cumulative probability
  • 30. Example - Urban Renewal Attitudes • Response: Attitude toward urban renewal project (Negative (Y=1), Moderate (Y=2), Positive (Y=3)) • Predictor Variable: Respondent’s Race (White, Nonwhite) • Contingency Table: AttitudeRace White Nonwhite Negative (Y=1) 101 106 Moderate (Y=2) 91 127 Positive (Y=3) 170 190
  • 31. SPSS Output • Note that SPSS fits the model in the following form:   1 , , 1 ) ( ) ( log ) ( logit               c j X j Y P j Y P j Y P j    r E 7 2 3 1 0 7 8 5 4 0 1 0 0 1 1 3 0 1 3 3 0 0 a . . 0 . . . [ A [ A T [ R [ R L m E a d f S i g r B r B d e L T a Note that the race variable is not significant (or even close).
  • 32. Fitted Equation • The fitted equation for each group/category: 165 . 0 0 165 . 0 ) Nonwhite | 2 ( ) Nonwhite | 2 ( logit : te Mod/Nonwhi or Neg 166 . 0 ) 001 . 0 ( 165 . 0 ) White | 2 ( ) White | 2 ( logit : Mod/White or Neg 027 . 1 ) 0 ( 027 . 1 ) Nonwhite | 1 ( ) Nonwhite | 1 ( logit : onwhite Negative/N 026 . 1 ) 001 . 0 ( 027 . 1 ) White | 1 ( ) White | 1 ( logit : hite Negative/W                                                   Y P Y P Y P Y P Y P Y P Y P Y P For each group, the fitted probability of falling in that set of categories is eL/(1+eL) where L is the logit value (0.264,0.264,0.541,0.541)
  • 33. Inference for Regression Coefficients • If  = 0, the response (Y) is independent of X • Z-test can be conducted to test this (estimate divided by its standard error) • Most software will conduct the Wald test, with the statistic being the z-statistic squared, which has a chi-squared distribution with 1 degree of freedom under the null hypothesis • Odds ratio of increasing X by 1 unit and its confidence interval are obtained by raising e to the power of the regression coefficient and its upper and lower bounds
  • 34. Example - Urban Renewal Attitudes • Z-statistic for testing for race differences: Z=0.001/0.133 = 0.0075 (recall model estimates -) • Wald statistic: .000 (P-value=.993) • Estimated odds ratio: e.001 = 1.001 • 95% Confidence Interval: (e-.260,e.263)=(0.771,1.301) • Interval contains 1, odds of being in a given category or below is same for whites as nonwhites r E 7 2 3 1 0 7 8 5 4 0 1 0 0 1 1 3 0 1 3 3 0 0 a . . 0 . . . [ A [ A T [ R [ R L m E a d f S i g r B r B d e L T a
  • 35. Ordinal Predictors • Creating dummy variables for ordinal categories treats them as if nominal • To make an ordinal variable, create a new variable X that models the levels of the ordinal variable • Setting depends on assignment of levels (simplest form is to let X=1,...,c for the categories which treats levels as being equally spaced)
  • 36. Censored Regression or Count Data Models: Tobit Models
  • 37. 3 7 INTRODUCTION • Data on y is censored if for part of the range of y we observe only that y is in that range, rather than observing the exact value of y. e.g. income is top-coded at $75,000 per year. • Data on y is truncated if for part of the range of y we do not observe y at all. e.g. people with income above $75,000 per year are excluded from the sample.
  • 38. 3 8 • Meaningful policy analysis requires extrapolationfrom the restricted sample to the population as a whole. • But running regressions on censored or truncated data, without controlling for censoring or truncation, leads to inconsistent parameter estimates.
  • 39. 3 9 • We focus on the normal, with censoring or truncation at zero. e.g. annual hours worked, and annual expenditure on automobiles. • The class of models presented in this chapter is called limited dependent variable models or latent variable models. Econometricians also use the terminology tobit models or generalized tobit models.
  • 40. 4 0 • Censoring can arise for distributions other than the normal. • For e.g. count data treatment is similar to here except different distributions. • For duration data, e.g. the length of a spell of unem- ployment, a separate treatment of censoring is given there due to different censoring mechanism (random) to that considered here.
  • 41. 4 1 Many Models for Censored Data • Tobit model: MLE, NLS and Heckman 2-step. • Sample selectivity model, a generalization of Tobit. • Semiparametric estimation. • Structural economic models for censored choice. • Simultaneous equation models
  • 42. 4 2 TOBIT MODEL • Interest lies in a latent dependent variable y* y* = X'β + c. • This variable is only partially observed. • In censored regression we observe y = * y if y >0 0 if y <=0 • In truncated regression we observe y = y* if y* >0 .
  • 43. • For censored and truncated data, linear regression is inappropriate.
  • 44. 4 4 • The standard estimators require stochasticassumptions about the distribution of c and hence y*. • The Tobit model assumes normality of the error term. • For consistent estimates, OLS is not good. We should use MLE
  • 45. Tobit in SAS • For example, and using SAS to estimate Tobit regression, see • https://stats.idre.ucla.edu/sas/dae/tobit-analysis/
  • 46. Takeaway • Multinominal and ordered logit models are increasingly being used in consumer product choice research as well as in practice. • Tobit models are useful when data is count data or duration data. These are often used in specific contexts only. 5/24/2022 46
  • 47. Going forward • No more face-to-face class. Session 10 and 11 will be only on Teams. • Final exam is on 4th May from 8.30 am to 11.00 am and will be on Teams through proctorio. 5/24/2022 47