An introduction to categorical data analysis

STAT 226 Lecture 1 & 2
Yibi Huang
1

Outline
• Variable Types
• Review of Binomial Distributions
• Likelihood and Maximum Likelihood Method
• Tests for Binomial Proportions
• Confidence Intervals for Binomial Proportions
2

Variable Types
Regression methods are used to analyze data when the
response variable is numerical.
• e.g., temperature, blood pressure, heights, speeds, income
• Covered in Stat 222 & 224
Methods in categorical data analysis are used when the
response variable are categorical, e.g.,
• gender (male, female),
• political philosophy (liberal, moderate, conservative),
• region (metropolitan, urban, suburban, rural)
• Covered in Stat 226 & 227 (Don’t take both STAT 226 and
227)
In either case, the explanatory variables can be numerical or
categorical. 3

Nominal and Ordinal Categorical Variables
• Nominal: unordered categories, e.g.,
• transport to work (car, bus, bicycle, walk, other)
• favorite music (rock, hiphop, pop, classical, jazz, country, folk)
• Ordinal: ordered categories
• patient condition (excellent, good, fair, poor)
• government spending (too high, about right, too low)
We pay special attention to — binary variables: success or failure
for which nominal-ordinal distinction is unimportant.
4

Review of Binomial Distributions

Binomial Distributions (Review)
If n Bernoulli trials are performed:
• only two possible outcomes for each trial (success, failure)
• π = P(success), 1 − π = P(failure), for each trial,
• trials are independent
• Y = number of successes out of n trials
then we say Y has a binomial distribution, denoted as
Y ∼ Binomial (n, π).
The probability function of Y is
P(Y = y) =
n
y
!
πy
(1 − π)n−y
, y = 0, 1, . . . , n.
where
n
y
!
=
n!
y! (n − y)!
is the binomial coefficient and
m! = m factorial = m × (m − 1) × (m − 2) × · · · × 1 Note that 0! = 1 5

Example: Are You Comfortable Getting a Covid Booster?
Response (Yes, No). Suppose π = Pr(Yes) = 0.4.
Let y = # answering Yes among n = 3 randomly selected people.
6

Example: Are You Comfortable Getting a Covid Booster?
Response (Yes, No). Suppose π = Pr(Yes) = 0.4.
Let y = # answering Yes among n = 3 randomly selected people.
P(y) =
n!
y!(n − y)!
πy
(1 − π)n−y
=
3!
y!(3 − y)!
(0.4)y
(0.6)3−y
P(0) =
3!
0!3!
(0.4)0
(0.6)3
= (0.6)3
= 0.216
P(1) =
3!
1!2!
(0.4)1
(0.6)2
= 3(0.4)(0.6)2
= 0.432
P(2) =
3!
2!1!
(0.4)2
(0.6)1
= 3(0.4)2
(0.6) = 0.288
P(3) =
3!
3!0!
(0.4)3
(0.6)0
= (0.4)3
= 0.064
y 0 1 2 3 Total
P(y) 0.216 0.432 0.288 0.064 1
6

Binomial Probabilities in R
dbinom(x=0, size=3, p=0.4)
[1] 0.216
dbinom(0, 3, 0.4)
[1] 0.216
dbinom(1, 3, 0.4)
[1] 0.432
dbinom(x=0:3, size=3, p=0.4)
[1] 0.216 0.432 0.288 0.064
plot(0:3, dbinom(0:3, 3, .4), type = "h", xlab = "y", ylab = "P(y)")
0.0 1.0 2.0 3.0
0.1
0.3
y
P(y)
7

Binomial Distribution Facts
If Y is a Binomial (n, π) random variable, then
• E(Y) = nπ
• SD = σ(Y) =
√
Var(Y) =
√
nπ(1 − π)
• Binomial (n, π) can be approx. by Normal (nπ, nπ(1 − π)) when
n is large (nπ ≥ 5 and n(1 − π) ≥ 5).
0 2 4 6 8
0.00
0.15
0.30
Binomial(n = 8, π = 0.2)
y
P(y)
0 5 10 15 20 25
0.00
0.10
0.20
Binomial(n = 25, π = 0.2)
y
P(y)
8

Likelihood & Maximum Likelihood
Estimation

A Probability Question
Let π be the proportion of US adults that are willing to get an
Omicron booster.
A sample of 5 subjects are randomly selected. Let Y be the
number of them that are willing to get an Omicron booster. What is
P(Y = 3)?
Answer: Y is Binomial (n = 5, π) (Why?)
P(Y = y; π) =
n!
y! (n − y)!
πy
(1 − π)n−y
If π is known to be 0.3, then
P(Y = 3; π) =
5!
3!2!
(0.3)3
(0.7)2
= 0.1323.
9

A Statistics Question
Of course, in practice we don’t know π
and we collect data to estimate it.
How shall we choose a “good” estimator for π?
An estimator is a formula based on the data (a statistic) that we
plan to use to estimate a parameter (π) after we collect the data.
Once the data are collected, we can calculate the value of the
statistic: an estimate for π.
10

A Statistics Question
Suppose 8 of 20 randomly selected U.S. adults said they are
willing to get an Omicron booster
What can we infer about the value of
π = proportion of U.S. adults that are
comfortable getting a booster?
The chance to observe Y = 8 in a random sample of size n = 20 is
P(Y = 8; π) =















20
8
!
(0.3)8
(0.7)12
≈ 0.1143 if π = 0.3
20
8
!
(0.6)8
(0.4)12
≈ 0.0354 if π = 0.6
It appears that π = 0.3 is more likely to be π than π = 0.6, since
the former gives a higher prob. to observe the outcome y = 8.
We say the likelihood of π = 0.3 is higher than that of π = 0.6.
11

Maximum Likelihood Estimate (MLE)
The maximum likelihood estimate (MLE) of a parameter (like π) is
the value at which the likelihood function is maximized.
Example. If 8 of 20 randomly selected U.S. adults are comfortable
getting the booster, the likelihood function
ℓ(π | y = 8) =
20
8
!
π8
(1 − π)12
reaches its max at π = 0.4,
the MLE for π is b
π = 0.4 given the data y = 8.
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
π 12

Maximum Likelihood Estimate (MLE)
The probability
P(Y = y; π) =
n
y
!
πy
(1 − π)n−y
= ℓ(π | y)
viewed as a function of π, is called the likelihood function,
(or just likelihood) of π, denoted as ℓ(π | y).
It measure the “plausibility” of a value being the true value of π.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=2
y=8 y=14
Likelihood functions ℓ(π | y) at different values of y for n = 20.
13

0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=2
y=8 y=14
Likelihood functions ℓ(π | y) for various values of y when n = 20.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=20 y=80 y=140
Likelihood functions ℓ(π | y) at various values of y when n = 200.
14

Likelihood in General
In general, suppose the observed data (Y1, Y2, . . . , Yn) have a joint
probability distribution with some parameter(s) called θ
P(Y1 = y1, Y2 = y2, . . . , Yn = yn) = f(y1, y2, . . . , yn | θ)
The likelihood function for the parameterθ is
ℓ(θ | data) = ℓ(θ | y1, y2, . . . , yn) = f(y1, y2, . . . , yn | θ).
• Note the likelihood function regards the probability as a
function of the parameter θ rather than as a function of the
data y1, y2, . . . , yn.
• If
ℓ(θ1 | y1, . . . , yn) > ℓ(θ2 | y1, . . . , yn),
then θ1 appears more plausible to be the true value of θ than
θ2 does, given the observed data y1, . . . , yn.
15

Maximizing the Log-likelihood
Rather than maximizing the likelihood, it is often computationally
easier to maximize its natural logarithm, called the log-likelihood,
log ℓ(π | y)
which results in the same answer since logarithm is strictly
increasing,
x1 > x2 ⇐⇒ log(x1) > log(x2).
So
ℓ(π1 | y) > ℓ(π2 | y) ⇐⇒ log ℓ(π1 | y) > log ℓ(π2 | y).
16

Example (MLE for Binomial)
If the observed data Y ∼ Binomial (n, π) but π is unknown, the
likelihood of π is
ℓ(π | y) = p(Y = y|π) =
n
y
!
πy
(1 − π)n−y
and the log-likelihood is
log ℓ(π | y) = log
n
y
!
+ y log(π) + (n − y) log(1 − π).
From calculus, we know a function f(x) reaches its max at x = x0 if
d
dx
f(x) = 0 at x = x0, and
d2
dx2
f(x) < 0 at x = x0.
17

Example (MLE for Binomial)
d
dπ
log ℓ(π | y) =
y
π
−
n − y
1 − π
=
y − nπ
π(1 − π).
equals 0 when
y − nπ
π(1 − π)
= 0
That is, when y − nπ = 0.
Solving for π gives the ML estimator (MLE) b
π =
y
n
.
and
d2
dπ2
log ℓ(π | y) = −
y
π2
−
n − y
(1 − π)2
< 0 for any 0 < π < 1
Thus, we know log ℓ(π | y) reaches its max when π = y/n.
So MLE of π is b
π =
y
n
= sample proportion of successes.
18

MLEs for Other Inference Problems
• If Y1, Y2, . . . , Yn are i.i.d. N(µ, σ2),
the MLE for µ is the sample mean Y =
Pn
i=1 Yi
n
.
• In simple linear regression,
Yi = β0 + β1xi + εi
When the errors εi are i.i.d. normal,
the usual least squares estimates for β0 and β1 are the MLEs.
i.i.d. = Independent and identically distributed
(same distribution each εi).
19

Hypothesis Tests of a Binomial
Proportion

Hypothesis Tests of a Binomial Proportion
If the observed data Y ∼ Binomial (n, π), recall the MLE for π is
π̂ = Y/n.
Recall that since Y ∼ Binomial (n, π), the mean and standard
deviation (SD) of Y are respectively,
E[Y] = nπ, SD(Y) =
p
nπ(1 − π).
The mean and SD of π̂ are thus respectively
E(π̂) = E
Y
n

=
E(Y)
n
= π,
SD(π̂) = SD
Y
n

=
SD(Y)
n
=
r
π(1 − π)
n
.
By CLT, as n gets large,
π̂ − π
√
π(1 − π)/n
∼ N(0, 1).
20

Hypothesis Tests for a Binomial Proportion
The textbook lists 3 different tests for testing
H0: π = π0 v.s. Ha: π , π0 (or 1-sided alternative.)
• Score Test uses the score statistic zs =
π̂ − π0
√
π0(1 − π0)/n
• Wald Test uses the Wald statistic zw =
π̂ − π0
√
π̂(1 − π̂)/n
• Likelihood Ratio Test: we’ll introduce shortly
As n gets large,
both zs and zw ∼ N(0, 1),
both z2
s and z2
w ∼ χ2
1.
based on which, P-value can be computed.
21

Example (Will You Get the COVID-19 Vaccine?)
Pew Research Institute surveyed 12,648 U.S. adults during
Nov. 18-29, 2020 about their intention to be vaccinated for
COVID-19. Among the 1264 respondents in the 18-29 age group,
695 said they would probably or definitely get the vaccine if it’s
available today.
• estimate of π = π̂ =
695
1264
≈ 0.55
22

Example (Will You Get the COVID-19 Vaccine?)
Pew Research Institute surveyed 12,648 U.S. adults during
Nov. 18-29, 2020 about their intention to be vaccinated for
COVID-19. Among the 1264 respondents in the 18-29 age group,
695 said they would probably or definitely get the vaccine if it’s
available today.
• estimate of π = π̂ =
695
1264
≈ 0.55
Want to test whether 60% of 18-29 year-olds in the U.S. would
probably or definitely get the vaccine.
H0: π = 0.6 v.s. Ha: π , 0.6
• Score statistic zs =
0.55 − 0.6
√
0.6 × 0.4/1264
≈ −3.64
• Wald statistic zw =
0.55 − 0.6
√
0.55 × 0.45/1264
≈ −3.58
22

Note that the P-values computed using N(0, 1) or χ2
1 are identical.
P-value for the score test
2*pnorm(-3.64)
[1] 0.0002726
pchisq(3.64ˆ2,df=1,lower.tail=F)
[1] 0.0002726
P-value for the Wald test
2*pnorm(-3.58)
[1] 0.0003436
pchisq(3.58ˆ2,df=1,lower.tail=F)
[1] 0.0003436
See slides L01_supp_chisq_table.pdf for more details about
chi-squared distributions.
23

Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
24

ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
• ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y)
where π̂ = y/n is the MLE of π.
24

ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
Observe that
• ℓ0 ≤ ℓ1 always
24

ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
Observe that
• Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1.
24

ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
Observe that
• Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1.
• ℓ0 ≪ ℓ1 is a sign to reject H0
24

Likelihood Ratio Test Statistic (LRT Statistic)
The likelihood-ratio test statistic (LRT statistic) for testing H0:
π = π0 v.s. Ha: π , π0 equals
−2 log(ℓ0/ℓ1).
• Here log is the natural log
• LRT statistic −2 log(ℓ0/ℓ1) is always nonnegative since ℓ0 ≤ ℓ1
• When n is large, −2 log(ℓ0/ℓ1) ∼ χ2
1.
• Reject H0 at level α if −2 log(ℓ0/ℓ1) χ2
1,α = qchisq(1-alpha, df=1)
• P-value = P(χ2
1 observed LRT statistic)
0 χ1,α
2
α = shaded area
chi−square curve w/ df = 1
0 observed value of
the LRT−statistic
P−value = shaded area
chi−square curve w/ df = 1
25

Likelihood Ratio Test Statistic for a Binomial Proportion
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
Thus
ℓ0
ℓ1
=
n
y

π
y
0(1 − π0)n−y
n
y

(y
n)y(1 − (y
n ))n−y
=
nπ0
y
!y
n (1 − π0)
n − y
!n−y
and hence the LRT statistic is
−2 log(ℓ0/ℓ1) = 2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
= 2
(
Oyes ×

log
Oyes
Eyes
!#
+ Ono ×

log
Ono
Eno
!#)
where Oyes = y and Ono = n − y are the observed counts of yes
no, and Eyes = nπ0 and Eno = n(1 − π0) are the expected counts of
yes no under H0.
26

Example (COVID-19 , Cont’d)
Among the 1264 respondents in the 18-29 age group , 695
answered “yes”, 569 answered “no”, so
Oyes = y = 695, Ono = n − y = 569.
Under H0: π = 0.6, we expect 60% of the 1264 subjects to answer
“yes” and 40% to answer “no.” Don’t round nπ0 and n(1 − π0) to integers.
Eyes = nπ0 = 1264 × 0.6 = 758.4,
Eno = n(1 − π0) = 1264 × 0.4 = 505.6.
LRT statistic = 2

695 log
695
758.4
!
+ 569 log
569
505.6
!#
≈ 13.091
which exceeds the critical value χ2
1,α = χ2
1,0.05 = 3.84 at α = 0.05
and hence H0 is rejected 5% level
qchisq(1-0.05, df=1)
[1] 3.841 27

P-value of LRT test of Porportions
Even though Ha is two-sided, the P-value remains to be the upper
tail probability below, since a large deviation of b
π = y/n from π0
would lead to a large LRT statistic, no matter π0 b
π or π0 b
π.
chi−squared curve w/ df = 1
0 observed value of
the LRT statistic
P−value = shaded area
For the COVID-19 example, the P-value is P(χ2
1 13.09), which is
pchisq(13.09, df=1, lower.tail=F)
[1] 0.0002969
28

Confidence Intervals for Binomial
Proportions

Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value 10%
29

• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value 5%
29

E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then
• 0.2 is in the 95% CI
29

• The corresponding α for a 95% CI is 5%. As p-value = 6%
α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI.
29

• but 0.2 is NOT in the 90% CI
29

• but 0.2 is NOT in the 90% CI
α = 10%, H0: π = 0.2 is rejected so 0.2 NOT in the 90% CI. 29

Wald Confidence Intervals (Wald CIs)
For a Wald test, H0: π = π∗ is not rejected at level α if
π̂ − π∗
√
π̂(1 − π̂)/n
zα/2,
so a 100(1 − α)% Wald CI is






π̂ − zα/2
r
π̂(1 − π̂)
n
, π̂ + zα/2
r
π̂(1 − π̂)
n






 .
where confidence level 100(1 − α)% 90% 95% 99%
zα/2 1.645 1.96 2.576
• Introduced in STAT 220 and 234
Drawbacks:
• Wald CI for π collapses whenever π̂ = 0 or 1.
• Actual coverage prob. for Wald CI is usually much less than
100(1 − α)% if π close to 0 or 1, unless n is quite large. 30

Score Confidence Intervals (Score CIs)
For a Score test, H0 π = π∗ is not rejected at level α if
π̂ − π∗
√
π∗(1 − π∗)/n
zα/2.
A 100(1 − α)% score confidence interval consists of those π∗
satisfying the inequality above.
Example. If π̂ = 0, the 95% score CI consists of those π∗ satisfying
0 − π∗
√
π∗(1 − π∗)/n
1.96.
After a few steps of algebra, we can show such π∗’s are those
satisfying 0 π∗ 1.962
n+1.962 . The 95% score CI for π when π̂ = 0 is
thus
0,
1.962
n + 1.962
!
,
which is NOT collapsing! 31

Score CI (Cont’d)
The end points of the score CI can be shown to be
(y + z2/2) ± zα/2
p
nπ̂(1 − π̂) + z2/4
n + z2
where z = zα/2.
• midpoint of the score CI,
π̂ + z2/2n
1 + z2/n
, is between π̂ and 0.5.
• better than the Wald CI, that the actual coverage probabilities
are closer to the nominal levels.
32

Agresti-Coull Confidence Intervals
Recall the midpoint for a 100(1 − α)% score CI is
π̃ =
y + z2/2
n + z2
, where z = zα/2,
which looks as if we add z2/2 more successes and z2/2 more
failures to the data before we estimate π.
This inspires the Agresti-Coull 100(1 − α)% confidence interval:
π̃ ± z
r
π̃(1 − π̃)
n + z2
where π̃ =
y + z2/2
n + z2
and z = zα/2.
which is essentially a Wald-type interval after adding z2/2 more
successes and z2/2 more failures to the data, where z = zα/2.
33

95% “Plus-Four” Confidence Intervals
At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull
CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
Hence some approximate the 95% Agresti-Coull correction to the
Wald CI by adding 2 successes and 2 failures before computing
π̂ and then compute the Wald CI:
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• This is so called the “Plus-Four” confidence interval
34

CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• Note the “Plus-Four” CI is for 95% confidence level only
34

CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• Note the “Plus-Four” CI is for 95% confidence level only
• At 90% level, zα/2 = z0.05 = 1.645, Agresti-Coull CI would add
z2
α/2/2 = 1.6452/2 ≈ 1.35 more successes and 1.35 more
failures. 34

Likelihood Ratio Confidence Intervals (LR CIs)
A LR test will not reject H0: π = π∗ at level α if
−2 log(ℓ0/ℓ1) = −2 log
ℓ(π∗|y)
ℓ(π̂ | y)
!
χ2
1,α.
A 100(1 − α)% likelihood ratio CI consists of those π∗ with likelihood
ℓ(π∗
|y) e−χ2
1,α/2
ℓ(π̂|y)
E.g., the 95% LR CI contains those π∗
with likelihood above
e−χ2
1,0.05/2
= e−3.84/2
≈ 0.0147 multiple of the max. likelihood.
Likelihood ℓ(π|y) for n = 20, y = 8.
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
π
95%
• No close form expression for
end points of a LR CI
• Can use software to find the
end points numerically
35

Likelihood Ratio Confidence Intervals Do Not Collapse at 0
Recall the LRT statistic for testing H0: π = π0 against Ha: π , π0 is
−2 log(ℓ0/ℓ1) = 2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
and the H0: π = π0 is rejected if −2 log(ℓ0/ℓ1) χ2
1,α. Hence the
100(1 − α)% LR confidence interval consists of those π0 satisfying
2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
≤ χ2
1,α
In particular, when y = 0, the 95% LR CI consists of those π0
satisfying
−2n log(1 − π0) χ2
1,0.05 = 3.84.
That is, (0, 1 − e−3.84/(2n)), which is NOT collapsing, either!
36

Example (Political Party Affiliation)
A survey about the political party affiliation of residents in a town
found 4 of 400 in the sample to be Independents.
Want a 95% CI for π = proportion of Independents in the town.
• estimate of π = 4/400 = 0.01
• Wald CI: 0.01 ± 1.96
r
0.01 × (1 − 0.01)
400
≈ (0.00025, 0.01975).
• 95% Score CI contains those π∗
satisfying
0.01 − π∗
√
π∗(1 − π∗)/400
1.96
which is the interval (0.0039, 0.0254).
• 95% Agresti-Coull CI: adding z2
/2 = z2
0.05/2 = 1.962
/2 ≈ 1.92.
The estimate of π is (4 + 1.92)/(400 + 3.84) ≈ 0.01466
0.01466 ± 1.96
r
0.01466 × (1 − 0.01466)
403.84
≈ (0.00294, 0.02638).
37

R Function “prop.test()” for Score Test and CI
The R function prop.test() performs the score test and produces
the score CI.
• It test H0: π = 0.5 vs Ha: π , 0.5 by default
• Uses continuity correction by default.
prop.test(4,400)
1-sample proportions test with continuity correction
data: 4 out of 400, null probability 0.5
X-squared = 382, df = 1, p-value 2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.003208 0.027187
sample estimates:
p
0.01 38

R Function “prop.test()” for Score Test and CI
To perform a score test of H0: π = 0.02 vs Ha: π , 0.02 without the
continuity correction . . .
prop.test(4,400, p=0.02, correct=F)
1-sample proportions test without continuity correction
data: 4 out of 400, null probability 0.02
X-squared = 2, df = 1, p-value = 0.2
alternative hypothesis: true p is not equal to 0.02
95 percent confidence interval:
0.003895 0.025427
sample estimates:
p
0.01
The 95% CI matches the score CI computed earlier.
39

R function for Other CIs of Binomial Proportions
The function binom.confint() in the package binom can
produce confidence intervals for several methods.
You need to first install the binom package just once, ever.
To check if the binom package has installed on your computer,
library(binom)
If you get an error message,
# Error in library(binom) : there is no package called ‘binom’
that means the binom library is not installed. You can run the
following command to install the binom library.
If FALSE, you can install the library using the command below
install.packages(binom) 40

Now one can use binom.confint() to find the CIs.
# Wald CI
binom.confint(4, 400, conf.level = 0.95, method = asymptotic)
method x n mean lower upper
1 asymptotic 4 400 0.01 0.0002493 0.01975
# Score CI, also called ``Wilson''
binom.confint(4, 400, conf.level = 0.95, method = wilson)
1 wilson 4 400 0.01 0.003895 0.02543
# Agresti-Coull CI
binom.confint(4, 400, conf.level = 0.95, method = ac)
1 agresti-coull 4 400 0.01 0.002939 0.02638
# Likelihood-Ratio Test CI
binom.confint(4, 400, conf.level = 0.95, method = lrt)
1 lrt 4 400 0.01 0.003136 0.02308 41

Example (Political Party Affiliation) LR CI
Recall the 95% LR confidence interval consists of those π0
satisfying
2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
≤ χ2
1,0.05 = 3.8415
To verify the LRT confidence interval (0.003135542, 0.02307655)
given by binom.confint(), let’s plug the end points in to the LRT
test statistic above and see if we obtain 3.84146
y = 4
n = 400
pi0 = c(0.003135542, 0.02307655)
2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0))
[1] 3.806 3.841
pi0 = c(0.003115255, 0.02307735)
2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0))
[1] 3.841 3.841
42

Comparison of Wald, Score, Agresti-Coull, and LRT CIs
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
Confidence Intervals
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Score,
Agresti-Coull, and LRT CIs
are generally closer to 0.5
than those for the Wald CIs
43

0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Wald and
Agresti-Coull CIs may fall
outside of [0, 1], while
those of Score and LRT CIs
always fall between 0 and 1
43

0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• Agresti-Coull CIs always
contain the Score CIs
43

0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• Agresti-Coull CIs always
contain the Score CIs
• Score CIs are narrower
than Wald CIs unless y/n is
close to 0 or 1.
43

True Confidence Levels for Various Types of CIs When n = 12
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Wald
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Score
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Agresti−Coull
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
LRT
π
True
Confidence
Level
44

True Coverage Probabilities for Various CIs When n = 200
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Wald
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Score
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Agresti−Coull
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
LRT
π
True
Confidence
Level
45

True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
46

• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
46

• Score CIs are closer to the 0.95 level, though it may fall below
0.95 when π is close to 0 or 1
46

• Agresti-Coull CIs are usually conservative (true level are
above 0.95) especially when π close to 0 or 1.
46

• LRT CIs are better than Wald but generally not as good as
Score or Agresti-Coull CIs
46

• LRT CIs are better than Wald but generally not as good as
Score or Agresti-Coull CIs
• When n gets larger, all 4 types of intervals become closer to
the 0.95 level, though Wald CIs remain poor when π is close
to 0 or 1
46

How To Compute the True Confidence Levels? (1)
Consider the true confidence level the 95% Wald CI when n = 12
and π = 0.1, i.e., the probability that the 95% Wald confidence
interval (Wald CI) below






π̂ − 1.96
r
π̂(1 − π̂)
n
, π̂ + 1.96
r
π̂(1 − π̂)
n






 where π̂ = y/n
contains π = 0.1 when y ∼ Binomial(n = 12, π = 0.1).
If y has a Binomial(n = 12, π = 0.1) distribution, the possible values
of y are the integers 0, 1, 2, . . . , 12.
We can calculate the corresponding Wald CI for each possible
value of y on the next page.
See also: https://yibi-huang.shinyapps.io/shiny/
47

n = 12
y = 0:n
p = y/n
CI.lower = p - 1.96*sqrt(p*(1-p)/n)
CI.upper = p + 1.96*sqrt(p*(1-p)/n)
data.frame(y, CI.lower, CI.upper)
y CI.lower CI.upper
1 0 0.00000 0.0000
2 1 -0.07305 0.2397
3 2 -0.04420 0.3775
4 3 0.00500 0.4950
5 4 0.06661 0.6001
6 5 0.13772 0.6956
7 6 0.21710 0.7829
8 7 0.30439 0.8623
9 8 0.39994 0.9334
10 9 0.50500 0.9950
11 10 0.62247 1.0442
12 11 0.76029 1.0730
13 12 1.00000 1.0000
Which of the Wald intervals contain
π = 0.1?
48

n = 12
y = 0:n
p = y/n
CI.lower = p - 1.96*sqrt(p*(1-p)/n)
CI.upper = p + 1.96*sqrt(p*(1-p)/n)
data.frame(y, CI.lower, CI.upper)
y CI.lower CI.upper
1 0 0.00000 0.0000
2 1 -0.07305 0.2397
3 2 -0.04420 0.3775
4 3 0.00500 0.4950
5 4 0.06661 0.6001
6 5 0.13772 0.6956
7 6 0.21710 0.7829
8 7 0.30439 0.8623
9 8 0.39994 0.9334
10 9 0.50500 0.9950
11 10 0.62247 1.0442
12 11 0.76029 1.0730
13 12 1.00000 1.0000
Which of the Wald intervals contain
π = 0.1?
Only the CIs for y = 1, 2, 3, 4.
48

When y ∼ Binomial(n = 12, π = 0.1),
P(95% Wald CI contains π = 0.1)
= P(y = 1) + P(y = 2) + P(y = 3) + P(y = 4)
=
12
1
!
(0.1)1
(0.9)11
+
12
2
!
(0.1)2
(0.9)10
+
12
3
!
(0.1)3
(0.9)9
+
12
4
!
(0.1)4
(0.9)8
.
The four Binomial probabilities above can be found using
dbinom(1:4, size = 12, p=0.1)
[1] 0.37657 0.23013 0.08523 0.02131
and hence their total is
sum(dbinom(1:4, size = 12, p=0.1))
[1] 0.7132
The true confidence level of a 95% Wald CI is just 71%, far below
the nominal 95% level.
49

An introduction to categorical data analysis

More Related Content

Similar to An introduction to categorical data analysis (20)

Recently uploaded (20)

An introduction to categorical data analysis