Discrete Choice Model - Part 2

U N I V E R S I T Y O F S O U T H F L O R I D A //
Discrete Choice Model
Dr. Shivendu

Agenda
5/24/2022 2
Discrete choice models: Multiple Choices
• Multinomial Models
• Ordinal Logit Models
• Censored Regression or Count Data Models: Tobit Models
Quiz 8: Based on Class 9 Readings
Class 9_SAS_Module
Statistical Analysis IV: Non-Parametric Procedures
• Chap 27 and 28 of DAU_SAS
• SAS Assignment 9 posted: Due before class 10

Examples of multinomial choice (polytomous) situations:
1.Choice of a laundry detergent: Tide, Cheer, Arm & Hammer, Wisk, etc.
2.Choice of a major: economics, marketing, management, finance or
accounting.
3.Choices after graduating from high school: not going to college, going
to a private 4-year college, a public 4 year-college, or a 2-year college.
Slide16-3
Principles of Econometrics, 3rd Edition

The explanatory variable xi is individual specific but does not change
across alternatives. Example age of the individual.
The dependent variable is nominal
Slide16-4

Examples of multinomial choice situations:
1. It is key that there are more than 2 choices
2. It is key that there is no meaningful ordering to them. Otherwise, we would
want to use that information (with an ordered probit or ordered logit)
Slide16-5

In essence this model is like a set of simultaneous individual binomial/binary
logistic regressions
With appropriate weighting, since the different comparisons between different
pairs of categories would generally involve different numbers of
observations
Slide16-6

Principles of Econometrics, 3rd Edition Slide16-7
   
1
12 22 13 23
1
, 1
1 exp exp
i
i i
p j
x x
 
     
 
   
12 22
2
12 22 13 23
exp
, 2
1 exp exp
i
i
i i
x
p j
x x
 
 
     
 
   
13 23
3
12 22 13 23
exp
, 3
1 exp exp
i
i
i i
x
p j
x x
 
 
     
 
individual chooses alternative
ij
p P i j


An interesting feature of the odds ratio (16.21) is that the odds of choosing
alternative j rather than alternative 1 does not depend on how many alternatives
there are in total. There is the implicit assumption in logit models that the odds
between any pair of alternatives is independent of irrelevant alternatives (IIA).
 
 
 
1 2
1
exp 2,3
1
ij
i
j j i
i i
p
P y j
x j
P y p

    

 
 
1
2 1 2
exp 2,3
ij i
j j j i
i
p p
x j
x

    


• There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
One way to state the assumption
• If choice A is preferred to choice B out of the choice set {A,B}, then
introducing a third alternative X, thus expanding that choice set to
{A,B,X}, must not make B preferable to A.
• which kind of makes sense 
Slide16-9
IIA assumption

• There is the implicit assumption in logit models that the odds between any pair of
alternatives is independent of irrelevant alternatives (IIA)
In the case of the multinomial logit model, the IIA implies that adding
another alternative or changing the characteristics of a third alternative
must not affect the relative odds between the two alternatives
considered.
This is not realistic for many real life applications involving similar
(substitute) alternatives.
Slide16-10
IIA assumption

This is not realistic for many real-life applications with similar
(substitute) alternatives
Examples:
• Beethoven/Debussy versus another of Beethoven’s Symphonies
(Debreu 1960; Tversky 1972)
• Bicycle/Pony (Luce and Suppes 1965)
• Red Bus/Blue Bus (McFadden 1974).
• Black slacks, jeans, shorts versus blue slacks (Hoffman, 2004)
• Etc.
Slide16-11
IIA assumption

Red Bus/Blue Bus (McFadden 1974).
• Imagine commuters first face a decision between two modes of transportation: car and red bus
• Suppose that a consumer chooses between these two options with equal probability, 0.5, so that
the odds ratio equals 1.
• Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the
bus (they are perfect substitutes), consumers are expected to choose between bus and car still
with equal probability, so the probability of car is still 0.5, while the probabilities of each of the
two bus types should go down to 0.25
• However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new
probabilities must be: car 0.33; red bus 0.33; blue bus 0.33
• Te IIA axiom does not mix well with perfect substitutes 
IIA assumption

We can test this assumption with a Hausman-McFadden test which
compares a logistic model with all the choices with one with restricted
choices.
IIA assumption

Model for three categories
Need k-1 generalized logits to represent a dependent
variable with k categories

Meaning of the regression coefficients
A positive regression coefficient for logit j means that higher
values of the independent variable are associated with
greater chances of response category j, compared to
the reference category.

Solve for the probabilities
so
So

Three linear equations in 3 unknowns

In general, solve k equations in k
unknowns

Using the solution, one can
 Calculate the probability of obtaining the
observed data as a function of the regression
coefficients: Get maximum likelihood estimates
(beta-hat values)
 From maximum likelihood estimates, get tests
and confidence intervals
 Using beta-hat values in Lj, estimate
probabilities of category membership for any
set of x values.

Multinominal Logistic Regression in SAS
https://support.sas.com/resources/papers/proceedings12/427-2012.pdf

Multinominal Logistic Regression in SAS
• For a good example and implementation, see:
•https://stats.idre.ucla.edu/sas/dae/multinomiallogistic-regression/
•For example and more statistical theory, see
• https://support.sas.com/resources/papers/proceedings12/427-2012.pdf
5/24/2022 23

 Extensions have arisen to deal with this issue
 The multinomial probit and the mixed logit are alternative models for nominal outcomes that
relax IIA, by allowing correlation among the errors (to reflect similarity among options)
 but these models often have issues and assumptions themselves 
 IIA can also be relaxed by specifying a hierarchical model, ranking the choice alternatives.
The most popular of these is called the McFadden’s nested logit model, which allows
correlation among some errors, but not all (e.g. Heiss 2002)
 Generalized extreme value and multinomial probit models possess another property, the
Invariant Proportion of Substitution (Steenburgh 2008), which itself also suggests similarly
counterintuitive real-life individual choice behavior
 The multinomial probit has serious computational disadvantages too, since it involves
calculating multiple (one less than the number of categories) integrals. With integration by
simulation this problem is being ameliorated now…
IIA assumption

Combining Categories
Consider testing whether two categories could be combined
If none of the independent variables really explain the odds of choosing choice A versus B, you
should merge them

Multinomial Logit versus Probit
Computational issues make the Multinomial Probit very rare
Advantage: it does not need IIA 

Example: choice between three types (J = 3) of soft drinks, say Pepsi,
7-Up and Coke Classic.
Let yi1, yi2 and yi3 be dummy variables that indicate the choice made by individual i. The price facing
individual i for brand j is PRICEij.
Variables like price are to be individual and alternative specific, because they vary from individual
to individual and are different for each choice the consumer might make
Another example: of mode of transportation choice: time from home to work using train, car, or bus.
For more details and example in implementing conditional logit in SAS, see
https://stats.idre.ucla.edu/sas/faq/how-do-i-do-a-conditional-logit-model-analysis-in-sas-9-1/

Logit Models for Ordinal Responses
• Response variable is ordinal (categorical with natural ordering)
• Predictor variable(s) can be numeric or qualitative (dummy
variables)
• Labeling the ordinal categories from 1 (lowest level) to c (highest),
can obtain the cumulative probabilities:
c
j
j
Y
P
Y
P
j
Y
P ,
,
1
)
(
)
1
(
)
( 
 







Logistic Regression for Ordinal Response
• The odds of falling in category j or below:
1
)
(
1
,
,
1
)
(
)
(






c
Y
P
c
j
j
Y
P
j
Y
P

• Logit (log odds) of cumulative probabilities are modeled as linear
functions of predictor variable(s):
  1
,
,
1
)
(
)
(
log
)
(
logit 












 c
j
X
j
Y
P
j
Y
P
j
Y
P j 


This is called the proportional odds model, and assumes the effect of X is the
same for each cumulative probability

Example - Urban Renewal Attitudes
• Response: Attitude toward urban renewal project (Negative (Y=1),
Moderate (Y=2), Positive (Y=3))
• Predictor Variable: Respondent’s Race (White, Nonwhite)
• Contingency Table:
AttitudeRace White Nonwhite
Negative (Y=1) 101 106
Moderate (Y=2) 91 127
Positive (Y=3) 170 190

SPSS Output
• Note that SPSS fits the model in the following form:
  1
,
,
1
)
(
)
(
log
)
(
logit 












 c
j
X
j
Y
P
j
Y
P
j
Y
P j 


r E
7
2
3
1
0
7
8
5
4
0
1
0
0
1
1
3
0
1
3
3
0
0 a
.
.
0
.
.
.
[ A
[ A
T
[ R
[ R
L
m
E
a
d f
S i g
r B
r B
d e
L
T
a
Note that the race variable is not significant (or even close).

Fitted Equation
• The fitted equation for each group/category:
165
.
0
0
165
.
0
)
Nonwhite
|
2
(
)
Nonwhite
|
2
(
logit
:
te
Mod/Nonwhi
or
Neg
166
.
0
)
001
.
0
(
165
.
0
)
White
|
2
(
)
White
|
2
(
logit
:
Mod/White
or
Neg
027
.
1
)
0
(
027
.
1
)
Nonwhite
|
1
(
)
Nonwhite
|
1
(
logit
:
onwhite
Negative/N
026
.
1
)
001
.
0
(
027
.
1
)
White
|
1
(
)
White
|
1
(
logit
:
hite
Negative/W


















































Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
Y
P
For each group, the fitted probability of falling in that set of categories is eL/(1+eL)
where L is the logit value (0.264,0.264,0.541,0.541)

Inference for Regression Coefficients
• If  = 0, the response (Y) is independent of X
• Z-test can be conducted to test this (estimate divided by its standard error)
• Most software will conduct the Wald test, with the statistic being the z-statistic
squared, which has a chi-squared distribution with 1 degree of freedom under
the null hypothesis
• Odds ratio of increasing X by 1 unit and its confidence interval are obtained by
raising e to the power of the regression coefficient and its upper and lower
bounds

Example - Urban Renewal Attitudes
• Z-statistic for testing for race differences:
Z=0.001/0.133 = 0.0075 (recall model estimates -)
• Wald statistic: .000 (P-value=.993)
• Estimated odds ratio: e.001 = 1.001
• 95% Confidence Interval: (e-.260,e.263)=(0.771,1.301)
• Interval contains 1, odds of being in a given category or below is same for
whites as nonwhites
r E
7
2
3
1
0
7
8
5
4
0
1
0
0
1
1
3
0
1
3
3
0
0 a
.
.
0
.
.
.
[ A
[ A
T
[ R
[ R
L
m
E
a
d f
S i g
r B
r B
d e
L
T
a

Ordinal Predictors
• Creating dummy variables for ordinal categories treats them as if
nominal
• To make an ordinal variable, create a new variable X that models the
levels of the ordinal variable
• Setting depends on assignment of levels (simplest form is to let
X=1,...,c for the categories which treats levels as being equally
spaced)

Censored Regression or Count Data Models:
Tobit Models

3
7
INTRODUCTION
• Data on y is censored if for part of the range of y
we observe only that y is in that range, rather than
observing the exact value of y.
e.g. income is top-coded at $75,000 per year.
• Data on y is truncated if for part of the range of y we
do not observe y at all.
e.g. people with income above $75,000 per year are
excluded from the sample.

3
8
• Meaningful policy analysis requires extrapolationfrom
the restricted sample to the population as a whole.
• But running regressions on censored or truncated data,
without controlling for censoring or truncation, leads
to inconsistent parameter estimates.

3
9
• We focus on the normal, with censoring or truncation
at zero.
e.g. annual hours worked, and annual expenditure on
automobiles.
• The class of models presented in this chapter is called
limited dependent variable models or latent variable
models. Econometricians also use the terminology
tobit models or generalized tobit models.

4
0
• Censoring can arise for distributions other than the
normal.
• For e.g. count data treatment is similar to here except
different distributions.
• For duration data, e.g. the length of a spell of unem-
ployment, a separate treatment of censoring is given
there due to different censoring mechanism (random)
to that considered here.

4
1
Many Models for Censored Data
• Tobit model: MLE, NLS and Heckman 2-step.
• Sample selectivity model, a generalization of Tobit.
• Semiparametric estimation.
• Structural economic models for censored choice.
• Simultaneous equation models

4
2
TOBIT
MODEL
• Interest lies in a latent dependent variable y*
y* = X'β + c.
• This variable is only partially observed.
• In censored regression we observe
y =
*
y if y >0
0 if y <=0
• In truncated regression we observe
y = y* if y* >0
.

• For censored and truncated data, linear regression is
inappropriate.

4
4
• The standard estimators require stochasticassumptions
about the distribution of c and hence y*.
• The Tobit model assumes normality of the error term.
• For consistent estimates, OLS is not good. We should
use MLE

Tobit in SAS
• For example, and using SAS to estimate Tobit regression, see
• https://stats.idre.ucla.edu/sas/dae/tobit-analysis/

Takeaway
• Multinominal and ordered logit models are increasingly being used in consumer product
choice research as well as in practice.
• Tobit models are useful when data is count data or duration data. These are often used in
specific contexts only.
5/24/2022 46

Going forward
• No more face-to-face class. Session 10 and 11 will be only on Teams.
• Final exam is on 4th May from 8.30 am to 11.00 am and will be on Teams through proctorio.
5/24/2022 47

Discrete Choice Model - Part 2

More Related Content

Similar to Discrete Choice Model - Part 2 (20)

More from Michael770443 (9)

Recently uploaded (20)

Discrete Choice Model - Part 2