SlideShare a Scribd company logo
STAT 226 Lecture 1 & 2
Yibi Huang
1
Outline
• Variable Types
• Review of Binomial Distributions
• Likelihood and Maximum Likelihood Method
• Tests for Binomial Proportions
• Confidence Intervals for Binomial Proportions
2
Variable Types
Regression methods are used to analyze data when the
response variable is numerical.
• e.g., temperature, blood pressure, heights, speeds, income
• Covered in Stat 222 & 224
Methods in categorical data analysis are used when the
response variable are categorical, e.g.,
• gender (male, female),
• political philosophy (liberal, moderate, conservative),
• region (metropolitan, urban, suburban, rural)
• Covered in Stat 226 & 227 (Don’t take both STAT 226 and
227)
In either case, the explanatory variables can be numerical or
categorical. 3
Nominal and Ordinal Categorical Variables
• Nominal: unordered categories, e.g.,
• transport to work (car, bus, bicycle, walk, other)
• favorite music (rock, hiphop, pop, classical, jazz, country, folk)
• Ordinal: ordered categories
• patient condition (excellent, good, fair, poor)
• government spending (too high, about right, too low)
We pay special attention to — binary variables: success or failure
for which nominal-ordinal distinction is unimportant.
4
Review of Binomial Distributions
Binomial Distributions (Review)
If n Bernoulli trials are performed:
• only two possible outcomes for each trial (success, failure)
• π = P(success), 1 − π = P(failure), for each trial,
• trials are independent
• Y = number of successes out of n trials
then we say Y has a binomial distribution, denoted as
Y ∼ Binomial (n, π).
The probability function of Y is
P(Y = y) =
n
y
!
πy
(1 − π)n−y
, y = 0, 1, . . . , n.
where
n
y
!
=
n!
y! (n − y)!
is the binomial coefficient and
m! = m factorial = m × (m − 1) × (m − 2) × · · · × 1 Note that 0! = 1 5
Example: Are You Comfortable Getting a Covid Booster?
Response (Yes, No). Suppose π = Pr(Yes) = 0.4.
Let y = # answering Yes among n = 3 randomly selected people.
6
Example: Are You Comfortable Getting a Covid Booster?
Response (Yes, No). Suppose π = Pr(Yes) = 0.4.
Let y = # answering Yes among n = 3 randomly selected people.
P(y) =
n!
y!(n − y)!
πy
(1 − π)n−y
=
3!
y!(3 − y)!
(0.4)y
(0.6)3−y
P(0) =
3!
0!3!
(0.4)0
(0.6)3
= (0.6)3
= 0.216
P(1) =
3!
1!2!
(0.4)1
(0.6)2
= 3(0.4)(0.6)2
= 0.432
P(2) =
3!
2!1!
(0.4)2
(0.6)1
= 3(0.4)2
(0.6) = 0.288
P(3) =
3!
3!0!
(0.4)3
(0.6)0
= (0.4)3
= 0.064
y 0 1 2 3 Total
P(y) 0.216 0.432 0.288 0.064 1
6
Binomial Probabilities in R
dbinom(x=0, size=3, p=0.4)
[1] 0.216
dbinom(0, 3, 0.4)
[1] 0.216
dbinom(1, 3, 0.4)
[1] 0.432
dbinom(x=0:3, size=3, p=0.4)
[1] 0.216 0.432 0.288 0.064
plot(0:3, dbinom(0:3, 3, .4), type = "h", xlab = "y", ylab = "P(y)")
0.0 1.0 2.0 3.0
0.1
0.3
y
P(y)
7
Binomial Distribution Facts
If Y is a Binomial (n, π) random variable, then
• E(Y) = nπ
• SD = σ(Y) =
√
Var(Y) =
√
nπ(1 − π)
• Binomial (n, π) can be approx. by Normal (nπ, nπ(1 − π)) when
n is large (nπ ≥ 5 and n(1 − π) ≥ 5).
0 2 4 6 8
0.00
0.15
0.30
Binomial(n = 8, π = 0.2)
y
P(y)
0 5 10 15 20 25
0.00
0.10
0.20
Binomial(n = 25, π = 0.2)
y
P(y)
8
Likelihood & Maximum Likelihood
Estimation
A Probability Question
Let π be the proportion of US adults that are willing to get an
Omicron booster.
A sample of 5 subjects are randomly selected. Let Y be the
number of them that are willing to get an Omicron booster. What is
P(Y = 3)?
Answer: Y is Binomial (n = 5, π) (Why?)
P(Y = y; π) =
n!
y! (n − y)!
πy
(1 − π)n−y
If π is known to be 0.3, then
P(Y = 3; π) =
5!
3!2!
(0.3)3
(0.7)2
= 0.1323.
9
A Statistics Question
Of course, in practice we don’t know π
and we collect data to estimate it.
How shall we choose a “good” estimator for π?
An estimator is a formula based on the data (a statistic) that we
plan to use to estimate a parameter (π) after we collect the data.
Once the data are collected, we can calculate the value of the
statistic: an estimate for π.
10
A Statistics Question
Suppose 8 of 20 randomly selected U.S. adults said they are
willing to get an Omicron booster
What can we infer about the value of
π = proportion of U.S. adults that are
comfortable getting a booster?
The chance to observe Y = 8 in a random sample of size n = 20 is
P(Y = 8; π) =















20
8
!
(0.3)8
(0.7)12
≈ 0.1143 if π = 0.3
20
8
!
(0.6)8
(0.4)12
≈ 0.0354 if π = 0.6
It appears that π = 0.3 is more likely to be π than π = 0.6, since
the former gives a higher prob. to observe the outcome y = 8.
We say the likelihood of π = 0.3 is higher than that of π = 0.6.
11
Maximum Likelihood Estimate (MLE)
The maximum likelihood estimate (MLE) of a parameter (like π) is
the value at which the likelihood function is maximized.
Example. If 8 of 20 randomly selected U.S. adults are comfortable
getting the booster, the likelihood function
ℓ(π | y = 8) =
20
8
!
π8
(1 − π)12
reaches its max at π = 0.4,
the MLE for π is b
π = 0.4 given the data y = 8.
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
π 12
Maximum Likelihood Estimate (MLE)
The probability
P(Y = y; π) =
n
y
!
πy
(1 − π)n−y
= ℓ(π | y)
viewed as a function of π, is called the likelihood function,
(or just likelihood) of π, denoted as ℓ(π | y).
It measure the “plausibility” of a value being the true value of π.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=2
y=8 y=14
Likelihood functions ℓ(π | y) at different values of y for n = 20.
13
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=2
y=8 y=14
Likelihood functions ℓ(π | y) for various values of y when n = 20.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
π
Likelihood
y=0
y=20 y=80 y=140
Likelihood functions ℓ(π | y) at various values of y when n = 200.
14
Likelihood in General
In general, suppose the observed data (Y1, Y2, . . . , Yn) have a joint
probability distribution with some parameter(s) called θ
P(Y1 = y1, Y2 = y2, . . . , Yn = yn) = f(y1, y2, . . . , yn | θ)
The likelihood function for the parameterθ is
ℓ(θ | data) = ℓ(θ | y1, y2, . . . , yn) = f(y1, y2, . . . , yn | θ).
• Note the likelihood function regards the probability as a
function of the parameter θ rather than as a function of the
data y1, y2, . . . , yn.
• If
ℓ(θ1 | y1, . . . , yn) > ℓ(θ2 | y1, . . . , yn),
then θ1 appears more plausible to be the true value of θ than
θ2 does, given the observed data y1, . . . , yn.
15
Maximizing the Log-likelihood
Rather than maximizing the likelihood, it is often computationally
easier to maximize its natural logarithm, called the log-likelihood,
log ℓ(π | y)
which results in the same answer since logarithm is strictly
increasing,
x1 > x2 ⇐⇒ log(x1) > log(x2).
So
ℓ(π1 | y) > ℓ(π2 | y) ⇐⇒ log ℓ(π1 | y) > log ℓ(π2 | y).
16
Example (MLE for Binomial)
If the observed data Y ∼ Binomial (n, π) but π is unknown, the
likelihood of π is
ℓ(π | y) = p(Y = y|π) =
n
y
!
πy
(1 − π)n−y
and the log-likelihood is
log ℓ(π | y) = log
n
y
!
+ y log(π) + (n − y) log(1 − π).
From calculus, we know a function f(x) reaches its max at x = x0 if
d
dx
f(x) = 0 at x = x0, and
d2
dx2
f(x) < 0 at x = x0.
17
Example (MLE for Binomial)
d
dπ
log ℓ(π | y) =
y
π
−
n − y
1 − π
=
y − nπ
π(1 − π).
equals 0 when
y − nπ
π(1 − π)
= 0
That is, when y − nπ = 0.
Solving for π gives the ML estimator (MLE) b
π =
y
n
.
and
d2
dπ2
log ℓ(π | y) = −
y
π2
−
n − y
(1 − π)2
< 0 for any 0 < π < 1
Thus, we know log ℓ(π | y) reaches its max when π = y/n.
So MLE of π is b
π =
y
n
= sample proportion of successes.
18
MLEs for Other Inference Problems
• If Y1, Y2, . . . , Yn are i.i.d. N(µ, σ2),
the MLE for µ is the sample mean Y =
Pn
i=1 Yi
n
.
• In simple linear regression,
Yi = β0 + β1xi + εi
When the errors εi are i.i.d. normal,
the usual least squares estimates for β0 and β1 are the MLEs.
i.i.d. = Independent and identically distributed
(same distribution each εi).
19
Hypothesis Tests of a Binomial
Proportion
Hypothesis Tests of a Binomial Proportion
If the observed data Y ∼ Binomial (n, π), recall the MLE for π is
π̂ = Y/n.
Recall that since Y ∼ Binomial (n, π), the mean and standard
deviation (SD) of Y are respectively,
E[Y] = nπ, SD(Y) =
p
nπ(1 − π).
The mean and SD of π̂ are thus respectively
E(π̂) = E
Y
n

=
E(Y)
n
= π,
SD(π̂) = SD
Y
n

=
SD(Y)
n
=
r
π(1 − π)
n
.
By CLT, as n gets large,
π̂ − π
√
π(1 − π)/n
∼ N(0, 1).
20
Hypothesis Tests for a Binomial Proportion
The textbook lists 3 different tests for testing
H0: π = π0 v.s. Ha: π , π0 (or 1-sided alternative.)
• Score Test uses the score statistic zs =
π̂ − π0
√
π0(1 − π0)/n
• Wald Test uses the Wald statistic zw =
π̂ − π0
√
π̂(1 − π̂)/n
• Likelihood Ratio Test: we’ll introduce shortly
As n gets large,
both zs and zw ∼ N(0, 1),
both z2
s and z2
w ∼ χ2
1.
based on which, P-value can be computed.
21
Example (Will You Get the COVID-19 Vaccine?)
Pew Research Institute surveyed 12,648 U.S. adults during
Nov. 18-29, 2020 about their intention to be vaccinated for
COVID-19. Among the 1264 respondents in the 18-29 age group,
695 said they would probably or definitely get the vaccine if it’s
available today.
• estimate of π = π̂ =
695
1264
≈ 0.55
22
Example (Will You Get the COVID-19 Vaccine?)
Pew Research Institute surveyed 12,648 U.S. adults during
Nov. 18-29, 2020 about their intention to be vaccinated for
COVID-19. Among the 1264 respondents in the 18-29 age group,
695 said they would probably or definitely get the vaccine if it’s
available today.
• estimate of π = π̂ =
695
1264
≈ 0.55
Want to test whether 60% of 18-29 year-olds in the U.S. would
probably or definitely get the vaccine.
H0: π = 0.6 v.s. Ha: π , 0.6
• Score statistic zs =
0.55 − 0.6
√
0.6 × 0.4/1264
≈ −3.64
• Wald statistic zw =
0.55 − 0.6
√
0.55 × 0.45/1264
≈ −3.58
22
Note that the P-values computed using N(0, 1) or χ2
1 are identical.
P-value for the score test
2*pnorm(-3.64)
[1] 0.0002726
pchisq(3.64ˆ2,df=1,lower.tail=F)
[1] 0.0002726
P-value for the Wald test
2*pnorm(-3.58)
[1] 0.0003436
pchisq(3.58ˆ2,df=1,lower.tail=F)
[1] 0.0003436
See slides L01_supp_chisq_table.pdf for more details about
chi-squared distributions.
23
Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
24
Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
• ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y)
where π̂ = y/n is the MLE of π.
24
Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
• ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y)
where π̂ = y/n is the MLE of π.
Observe that
• ℓ0 ≤ ℓ1 always
24
Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
• ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y)
where π̂ = y/n is the MLE of π.
Observe that
• ℓ0 ≤ ℓ1 always
• Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1.
24
Likelihood Ratio Test (LRT)
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
To test H0: π = π0 v.s. Ha: π , π0, let
• ℓ0 be the max. likelihood under H0, which is ℓ(π0|y)
• ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y)
where π̂ = y/n is the MLE of π.
Observe that
• ℓ0 ≤ ℓ1 always
• Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1.
• ℓ0 ≪ ℓ1 is a sign to reject H0
24
Likelihood Ratio Test Statistic (LRT Statistic)
The likelihood-ratio test statistic (LRT statistic) for testing H0:
π = π0 v.s. Ha: π , π0 equals
−2 log(ℓ0/ℓ1).
• Here log is the natural log
• LRT statistic −2 log(ℓ0/ℓ1) is always nonnegative since ℓ0 ≤ ℓ1
• When n is large, −2 log(ℓ0/ℓ1) ∼ χ2
1.
• Reject H0 at level α if −2 log(ℓ0/ℓ1)  χ2
1,α = qchisq(1-alpha, df=1)
• P-value = P(χ2
1  observed LRT statistic)
0 χ1,α
2
α = shaded area
chi−square curve w/ df = 1
0 observed value of
the LRT−statistic
P−value = shaded area
chi−square curve w/ df = 1
25
Likelihood Ratio Test Statistic for a Binomial Proportion
Recall the likelihood function for a binomial proportion π is
ℓ(π|y) =
n
y
!
πy
(1 − π)n−y
.
Thus
ℓ0
ℓ1
=
n
y

π
y
0(1 − π0)n−y
n
y

(y
n)y(1 − (y
n ))n−y
=
nπ0
y
!y
n (1 − π0)
n − y
!n−y
and hence the LRT statistic is
−2 log(ℓ0/ℓ1) = 2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
= 2
(
Oyes ×

log
Oyes
Eyes
!#
+ Ono ×

log
Ono
Eno
!#)
where Oyes = y and Ono = n − y are the observed counts of yes 
no, and Eyes = nπ0 and Eno = n(1 − π0) are the expected counts of
yes  no under H0.
26
Example (COVID-19 , Cont’d)
Among the 1264 respondents in the 18-29 age group , 695
answered “yes”, 569 answered “no”, so
Oyes = y = 695, Ono = n − y = 569.
Under H0: π = 0.6, we expect 60% of the 1264 subjects to answer
“yes” and 40% to answer “no.” Don’t round nπ0 and n(1 − π0) to integers.
Eyes = nπ0 = 1264 × 0.6 = 758.4,
Eno = n(1 − π0) = 1264 × 0.4 = 505.6.
LRT statistic = 2

695 log
695
758.4
!
+ 569 log
569
505.6
!#
≈ 13.091
which exceeds the critical value χ2
1,α = χ2
1,0.05 = 3.84 at α = 0.05
and hence H0 is rejected 5% level
qchisq(1-0.05, df=1)
[1] 3.841 27
P-value of LRT test of Porportions
Even though Ha is two-sided, the P-value remains to be the upper
tail probability below, since a large deviation of b
π = y/n from π0
would lead to a large LRT statistic, no matter π0  b
π or π0  b
π.
chi−squared curve w/ df = 1
0 observed value of
the LRT statistic
P−value = shaded area
For the COVID-19 example, the P-value is P(χ2
1  13.09), which is
pchisq(13.09, df=1, lower.tail=F)
[1] 0.0002969
28
Confidence Intervals for Binomial
Proportions
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
29
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value  5%
29
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value  5%
E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then
• 0.2 is in the 95% CI
29
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value  5%
E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then
• 0.2 is in the 95% CI
• The corresponding α for a 95% CI is 5%. As p-value = 6% 
α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI.
29
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value  5%
E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then
• 0.2 is in the 95% CI
• The corresponding α for a 95% CI is 5%. As p-value = 6% 
α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI.
• but 0.2 is NOT in the 90% CI
29
Duality of Confidence Intervals and Significance Tests
For a 2-sided test of θ, the dual 100(1 − α)% confidence interval
(CI) for the parameter θ consists of all those θ∗ values that a
two-sided test of H0: θ = θ∗ is not rejected at level α. E.g.,
• the dual 90% Wald CI for π is the collection of all π0 such that
a 2-sided Wald test of H0: π = π0 having a P-value  10%
• the dual 95% score CI for π is the collection of all π0 such that
a 2-sided score test of H0: π = π0 having a P-value  5%
E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then
• 0.2 is in the 95% CI
• The corresponding α for a 95% CI is 5%. As p-value = 6% 
α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI.
• but 0.2 is NOT in the 90% CI
• The corresponding α for a 90% CI is 10%. As p-value = 6% 
α = 10%, H0: π = 0.2 is rejected so 0.2 NOT in the 90% CI. 29
Wald Confidence Intervals (Wald CIs)
For a Wald test, H0: π = π∗ is not rejected at level α if
π̂ − π∗
√
π̂(1 − π̂)/n
 zα/2,
so a 100(1 − α)% Wald CI is






π̂ − zα/2
r
π̂(1 − π̂)
n
, π̂ + zα/2
r
π̂(1 − π̂)
n






 .
where confidence level 100(1 − α)% 90% 95% 99%
zα/2 1.645 1.96 2.576
• Introduced in STAT 220 and 234
Drawbacks:
• Wald CI for π collapses whenever π̂ = 0 or 1.
• Actual coverage prob. for Wald CI is usually much less than
100(1 − α)% if π close to 0 or 1, unless n is quite large. 30
Score Confidence Intervals (Score CIs)
For a Score test, H0 π = π∗ is not rejected at level α if
π̂ − π∗
√
π∗(1 − π∗)/n
 zα/2.
A 100(1 − α)% score confidence interval consists of those π∗
satisfying the inequality above.
Example. If π̂ = 0, the 95% score CI consists of those π∗ satisfying
0 − π∗
√
π∗(1 − π∗)/n
 1.96.
After a few steps of algebra, we can show such π∗’s are those
satisfying 0  π∗  1.962
n+1.962 . The 95% score CI for π when π̂ = 0 is
thus
0,
1.962
n + 1.962
!
,
which is NOT collapsing! 31
Score CI (Cont’d)
The end points of the score CI can be shown to be
(y + z2/2) ± zα/2
p
nπ̂(1 − π̂) + z2/4
n + z2
where z = zα/2.
• midpoint of the score CI,
π̂ + z2/2n
1 + z2/n
, is between π̂ and 0.5.
• better than the Wald CI, that the actual coverage probabilities
are closer to the nominal levels.
32
Agresti-Coull Confidence Intervals
Recall the midpoint for a 100(1 − α)% score CI is
π̃ =
y + z2/2
n + z2
, where z = zα/2,
which looks as if we add z2/2 more successes and z2/2 more
failures to the data before we estimate π.
This inspires the Agresti-Coull 100(1 − α)% confidence interval:
π̃ ± z
r
π̃(1 − π̃)
n + z2
where π̃ =
y + z2/2
n + z2
and z = zα/2.
which is essentially a Wald-type interval after adding z2/2 more
successes and z2/2 more failures to the data, where z = zα/2.
33
95% “Plus-Four” Confidence Intervals
At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull
CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
Hence some approximate the 95% Agresti-Coull correction to the
Wald CI by adding 2 successes and 2 failures before computing
π̂ and then compute the Wald CI:
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• This is so called the “Plus-Four” confidence interval
34
95% “Plus-Four” Confidence Intervals
At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull
CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
Hence some approximate the 95% Agresti-Coull correction to the
Wald CI by adding 2 successes and 2 failures before computing
π̂ and then compute the Wald CI:
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• This is so called the “Plus-Four” confidence interval
• Note the “Plus-Four” CI is for 95% confidence level only
34
95% “Plus-Four” Confidence Intervals
At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull
CI is
y + z2
α/2/2
n + z2
α/2
=
y + 1.962/2
n + 1.962
≈
y + 2
n + 4
.
Hence some approximate the 95% Agresti-Coull correction to the
Wald CI by adding 2 successes and 2 failures before computing
π̂ and then compute the Wald CI:
π̂∗
± 1.96
r
π̂∗(1 − π̂∗)
n + 4
, where π̂∗
=
y + 2
n + 4
.
• This is so called the “Plus-Four” confidence interval
• Note the “Plus-Four” CI is for 95% confidence level only
• At 90% level, zα/2 = z0.05 = 1.645, Agresti-Coull CI would add
z2
α/2/2 = 1.6452/2 ≈ 1.35 more successes and 1.35 more
failures. 34
Likelihood Ratio Confidence Intervals (LR CIs)
A LR test will not reject H0: π = π∗ at level α if
−2 log(ℓ0/ℓ1) = −2 log
ℓ(π∗|y)
ℓ(π̂ | y)
!
 χ2
1,α.
A 100(1 − α)% likelihood ratio CI consists of those π∗ with likelihood
ℓ(π∗
|y)  e−χ2
1,α/2
ℓ(π̂|y)
E.g., the 95% LR CI contains those π∗
with likelihood above
e−χ2
1,0.05/2
= e−3.84/2
≈ 0.0147 multiple of the max. likelihood.
Likelihood ℓ(π|y) for n = 20, y = 8.
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
π
95%
• No close form expression for
end points of a LR CI
• Can use software to find the
end points numerically
35
Likelihood Ratio Confidence Intervals Do Not Collapse at 0
Recall the LRT statistic for testing H0: π = π0 against Ha: π , π0 is
−2 log(ℓ0/ℓ1) = 2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
and the H0: π = π0 is rejected if −2 log(ℓ0/ℓ1)  χ2
1,α. Hence the
100(1 − α)% LR confidence interval consists of those π0 satisfying
2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
≤ χ2
1,α
In particular, when y = 0, the 95% LR CI consists of those π0
satisfying
−2n log(1 − π0)  χ2
1,0.05 = 3.84.
That is, (0, 1 − e−3.84/(2n)), which is NOT collapsing, either!
36
Example (Political Party Affiliation)
A survey about the political party affiliation of residents in a town
found 4 of 400 in the sample to be Independents.
Want a 95% CI for π = proportion of Independents in the town.
• estimate of π = 4/400 = 0.01
• Wald CI: 0.01 ± 1.96
r
0.01 × (1 − 0.01)
400
≈ (0.00025, 0.01975).
• 95% Score CI contains those π∗
satisfying
0.01 − π∗
√
π∗(1 − π∗)/400
 1.96
which is the interval (0.0039, 0.0254).
• 95% Agresti-Coull CI: adding z2
/2 = z2
0.05/2 = 1.962
/2 ≈ 1.92.
The estimate of π is (4 + 1.92)/(400 + 3.84) ≈ 0.01466
0.01466 ± 1.96
r
0.01466 × (1 − 0.01466)
403.84
≈ (0.00294, 0.02638).
37
R Function “prop.test()” for Score Test and CI
The R function prop.test() performs the score test and produces
the score CI.
• It test H0: π = 0.5 vs Ha: π , 0.5 by default
• Uses continuity correction by default.
prop.test(4,400)
1-sample proportions test with continuity correction
data: 4 out of 400, null probability 0.5
X-squared = 382, df = 1, p-value 2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.003208 0.027187
sample estimates:
p
0.01 38
R Function “prop.test()” for Score Test and CI
To perform a score test of H0: π = 0.02 vs Ha: π , 0.02 without the
continuity correction . . .
prop.test(4,400, p=0.02, correct=F)
1-sample proportions test without continuity correction
data: 4 out of 400, null probability 0.02
X-squared = 2, df = 1, p-value = 0.2
alternative hypothesis: true p is not equal to 0.02
95 percent confidence interval:
0.003895 0.025427
sample estimates:
p
0.01
The 95% CI matches the score CI computed earlier.
39
R function for Other CIs of Binomial Proportions
The function binom.confint() in the package binom can
produce confidence intervals for several methods.
You need to first install the binom package just once, ever.
To check if the binom package has installed on your computer,
library(binom)
If you get an error message,
# Error in library(binom) : there is no package called ‘binom’
that means the binom library is not installed. You can run the
following command to install the binom library.
If FALSE, you can install the library using the command below
install.packages(binom) 40
Now one can use binom.confint() to find the CIs.
# Wald CI
binom.confint(4, 400, conf.level = 0.95, method = asymptotic)
method x n mean lower upper
1 asymptotic 4 400 0.01 0.0002493 0.01975
# Score CI, also called ``Wilson''
binom.confint(4, 400, conf.level = 0.95, method = wilson)
method x n mean lower upper
1 wilson 4 400 0.01 0.003895 0.02543
# Agresti-Coull CI
binom.confint(4, 400, conf.level = 0.95, method = ac)
method x n mean lower upper
1 agresti-coull 4 400 0.01 0.002939 0.02638
# Likelihood-Ratio Test CI
binom.confint(4, 400, conf.level = 0.95, method = lrt)
method x n mean lower upper
1 lrt 4 400 0.01 0.003136 0.02308 41
Example (Political Party Affiliation) LR CI
Recall the 95% LR confidence interval consists of those π0
satisfying
2y log
y
nπ0
!
+ 2(n − y) log
n − y
n (1 − π0)
!
≤ χ2
1,0.05 = 3.8415
To verify the LRT confidence interval (0.003135542, 0.02307655)
given by binom.confint(), let’s plug the end points in to the LRT
test statistic above and see if we obtain 3.84146
y = 4
n = 400
pi0 = c(0.003135542, 0.02307655)
2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0))
[1] 3.806 3.841
pi0 = c(0.003115255, 0.02307735)
2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0))
[1] 3.841 3.841
42
Comparison of Wald, Score, Agresti-Coull, and LRT CIs
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
Confidence Intervals
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Score,
Agresti-Coull, and LRT CIs
are generally closer to 0.5
than those for the Wald CIs
43
Comparison of Wald, Score, Agresti-Coull, and LRT CIs
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
Confidence Intervals
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Score,
Agresti-Coull, and LRT CIs
are generally closer to 0.5
than those for the Wald CIs
• End points of Wald and
Agresti-Coull CIs may fall
outside of [0, 1], while
those of Score and LRT CIs
always fall between 0 and 1
43
Comparison of Wald, Score, Agresti-Coull, and LRT CIs
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
Confidence Intervals
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Score,
Agresti-Coull, and LRT CIs
are generally closer to 0.5
than those for the Wald CIs
• End points of Wald and
Agresti-Coull CIs may fall
outside of [0, 1], while
those of Score and LRT CIs
always fall between 0 and 1
• Agresti-Coull CIs always
contain the Score CIs
43
Comparison of Wald, Score, Agresti-Coull, and LRT CIs
0.0 0.2 0.4 0.6 0.8 1.0
0
2
4
6
8
10
12
n = 12
Confidence Intervals
y
=
count
of
successes
Wald
Score
Agresti−Coull
LRT
• End points of Score,
Agresti-Coull, and LRT CIs
are generally closer to 0.5
than those for the Wald CIs
• End points of Wald and
Agresti-Coull CIs may fall
outside of [0, 1], while
those of Score and LRT CIs
always fall between 0 and 1
• Agresti-Coull CIs always
contain the Score CIs
• Score CIs are narrower
than Wald CIs unless y/n is
close to 0 or 1.
43
True Confidence Levels for Various Types of CIs When n = 12
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Wald
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Score
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Agresti−Coull
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
LRT
π
True
Confidence
Level
44
True Coverage Probabilities for Various CIs When n = 200
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Wald
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Score
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Agresti−Coull
π
True
Confidence
Level
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
LRT
π
True
Confidence
Level
45
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
46
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
46
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
• Score CIs are closer to the 0.95 level, though it may fall below
0.95 when π is close to 0 or 1
46
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
• Score CIs are closer to the 0.95 level, though it may fall below
0.95 when π is close to 0 or 1
• Agresti-Coull CIs are usually conservative (true level are
above 0.95) especially when π close to 0 or 1.
46
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
• Score CIs are closer to the 0.95 level, though it may fall below
0.95 when π is close to 0 or 1
• Agresti-Coull CIs are usually conservative (true level are
above 0.95) especially when π close to 0 or 1.
• LRT CIs are better than Wald but generally not as good as
Score or Agresti-Coull CIs
46
True Confidence Levels of Various CIs
• How are true confidence levels computed? Why do the curves
look jumpy? See HW2.
• Wald CIs tend to be farthest below the 0.95 level. In fact, the
true level can be as low as 0 when π is close to 0 or 1
• Score CIs are closer to the 0.95 level, though it may fall below
0.95 when π is close to 0 or 1
• Agresti-Coull CIs are usually conservative (true level are
above 0.95) especially when π close to 0 or 1.
• LRT CIs are better than Wald but generally not as good as
Score or Agresti-Coull CIs
• When n gets larger, all 4 types of intervals become closer to
the 0.95 level, though Wald CIs remain poor when π is close
to 0 or 1
46
How To Compute the True Confidence Levels? (1)
Consider the true confidence level the 95% Wald CI when n = 12
and π = 0.1, i.e., the probability that the 95% Wald confidence
interval (Wald CI) below






π̂ − 1.96
r
π̂(1 − π̂)
n
, π̂ + 1.96
r
π̂(1 − π̂)
n






 where π̂ = y/n
contains π = 0.1 when y ∼ Binomial(n = 12, π = 0.1).
If y has a Binomial(n = 12, π = 0.1) distribution, the possible values
of y are the integers 0, 1, 2, . . . , 12.
We can calculate the corresponding Wald CI for each possible
value of y on the next page.
See also: https://yibi-huang.shinyapps.io/shiny/
47
n = 12
y = 0:n
p = y/n
CI.lower = p - 1.96*sqrt(p*(1-p)/n)
CI.upper = p + 1.96*sqrt(p*(1-p)/n)
data.frame(y, CI.lower, CI.upper)
y CI.lower CI.upper
1 0 0.00000 0.0000
2 1 -0.07305 0.2397
3 2 -0.04420 0.3775
4 3 0.00500 0.4950
5 4 0.06661 0.6001
6 5 0.13772 0.6956
7 6 0.21710 0.7829
8 7 0.30439 0.8623
9 8 0.39994 0.9334
10 9 0.50500 0.9950
11 10 0.62247 1.0442
12 11 0.76029 1.0730
13 12 1.00000 1.0000
Which of the Wald intervals contain
π = 0.1?
48
n = 12
y = 0:n
p = y/n
CI.lower = p - 1.96*sqrt(p*(1-p)/n)
CI.upper = p + 1.96*sqrt(p*(1-p)/n)
data.frame(y, CI.lower, CI.upper)
y CI.lower CI.upper
1 0 0.00000 0.0000
2 1 -0.07305 0.2397
3 2 -0.04420 0.3775
4 3 0.00500 0.4950
5 4 0.06661 0.6001
6 5 0.13772 0.6956
7 6 0.21710 0.7829
8 7 0.30439 0.8623
9 8 0.39994 0.9334
10 9 0.50500 0.9950
11 10 0.62247 1.0442
12 11 0.76029 1.0730
13 12 1.00000 1.0000
Which of the Wald intervals contain
π = 0.1?
Only the CIs for y = 1, 2, 3, 4.
48
When y ∼ Binomial(n = 12, π = 0.1),
P(95% Wald CI contains π = 0.1)
= P(y = 1) + P(y = 2) + P(y = 3) + P(y = 4)
=
12
1
!
(0.1)1
(0.9)11
+
12
2
!
(0.1)2
(0.9)10
+
12
3
!
(0.1)3
(0.9)9
+
12
4
!
(0.1)4
(0.9)8
.
The four Binomial probabilities above can be found using
dbinom(1:4, size = 12, p=0.1)
[1] 0.37657 0.23013 0.08523 0.02131
and hence their total is
sum(dbinom(1:4, size = 12, p=0.1))
[1] 0.7132
The true confidence level of a 95% Wald CI is just 71%, far below
the nominal 95% level.
49

More Related Content

PPTX
Categorical data analysis full lecture note PPT.pptx
PPT
Sociology 601 class 7
PDF
An introduction to small samples binomial inference
PDF
Introduction to categorical data analysis
PDF
Introduction to small samples binomial inference
PDF
An introduction to Small Sample Binomial Inference
PDF
Federico Vegetti_GLM and Maximum Likelihood.pdf
PPTX
Probability
Categorical data analysis full lecture note PPT.pptx
Sociology 601 class 7
An introduction to small samples binomial inference
Introduction to categorical data analysis
Introduction to small samples binomial inference
An introduction to Small Sample Binomial Inference
Federico Vegetti_GLM and Maximum Likelihood.pdf
Probability

Similar to An introduction to categorical data analysis (20)

PPTX
introduction CDA.pptx
PDF
An Introduction To Probability And Statistical Inference 1st Edition George G...
PDF
Statistical Inference & Hypothesis Testing.pdf
PDF
Statistical inference: Probability and Distribution
PPT
review of statistics for schools and colleges.ppt
PDF
Probability and basic statistics with R
PDF
lec2_CS540_handouts.pdf
PPTX
Final examexamplesapr2013
PPT
Chapter 2 Probabilty And Distribution
PPTX
Math Exam Help
PPTX
Statistics Applied to Biomedical Sciences
PDF
Logit model testing and interpretation
PPTX
2.statistical DEcision makig.pptx
PPTX
StatBasicsRefffffffffffffffffffffffffffffffffffffffffv2.pptx
DOCX
Inferential statistics
PPTX
Bayesian statistics for biologists and ecologists
PPT
05inference_2011.ppt
PDF
MLE.pdf
introduction CDA.pptx
An Introduction To Probability And Statistical Inference 1st Edition George G...
Statistical Inference & Hypothesis Testing.pdf
Statistical inference: Probability and Distribution
review of statistics for schools and colleges.ppt
Probability and basic statistics with R
lec2_CS540_handouts.pdf
Final examexamplesapr2013
Chapter 2 Probabilty And Distribution
Math Exam Help
Statistics Applied to Biomedical Sciences
Logit model testing and interpretation
2.statistical DEcision makig.pptx
StatBasicsRefffffffffffffffffffffffffffffffffffffffffv2.pptx
Inferential statistics
Bayesian statistics for biologists and ecologists
05inference_2011.ppt
MLE.pdf
Ad

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
master seminar digital applications in india
PDF
Complications of Minimal Access Surgery at WLH
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Cell Structure & Organelles in detailed.
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Cell Types and Its function , kingdom of life
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Institutional Correction lecture only . . .
Anesthesia in Laparoscopic Surgery in India
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
master seminar digital applications in india
Complications of Minimal Access Surgery at WLH
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial disease of the cardiovascular and lymphatic systems
A systematic review of self-coping strategies used by university students to ...
Cell Structure & Organelles in detailed.
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
O7-L3 Supply Chain Operations - ICLT Program
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Cell Types and Its function , kingdom of life
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Institutional Correction lecture only . . .
Ad

An introduction to categorical data analysis

  • 1. STAT 226 Lecture 1 & 2 Yibi Huang 1
  • 2. Outline • Variable Types • Review of Binomial Distributions • Likelihood and Maximum Likelihood Method • Tests for Binomial Proportions • Confidence Intervals for Binomial Proportions 2
  • 3. Variable Types Regression methods are used to analyze data when the response variable is numerical. • e.g., temperature, blood pressure, heights, speeds, income • Covered in Stat 222 & 224 Methods in categorical data analysis are used when the response variable are categorical, e.g., • gender (male, female), • political philosophy (liberal, moderate, conservative), • region (metropolitan, urban, suburban, rural) • Covered in Stat 226 & 227 (Don’t take both STAT 226 and 227) In either case, the explanatory variables can be numerical or categorical. 3
  • 4. Nominal and Ordinal Categorical Variables • Nominal: unordered categories, e.g., • transport to work (car, bus, bicycle, walk, other) • favorite music (rock, hiphop, pop, classical, jazz, country, folk) • Ordinal: ordered categories • patient condition (excellent, good, fair, poor) • government spending (too high, about right, too low) We pay special attention to — binary variables: success or failure for which nominal-ordinal distinction is unimportant. 4
  • 5. Review of Binomial Distributions
  • 6. Binomial Distributions (Review) If n Bernoulli trials are performed: • only two possible outcomes for each trial (success, failure) • π = P(success), 1 − π = P(failure), for each trial, • trials are independent • Y = number of successes out of n trials then we say Y has a binomial distribution, denoted as Y ∼ Binomial (n, π). The probability function of Y is P(Y = y) = n y ! πy (1 − π)n−y , y = 0, 1, . . . , n. where n y ! = n! y! (n − y)! is the binomial coefficient and m! = m factorial = m × (m − 1) × (m − 2) × · · · × 1 Note that 0! = 1 5
  • 7. Example: Are You Comfortable Getting a Covid Booster? Response (Yes, No). Suppose π = Pr(Yes) = 0.4. Let y = # answering Yes among n = 3 randomly selected people. 6
  • 8. Example: Are You Comfortable Getting a Covid Booster? Response (Yes, No). Suppose π = Pr(Yes) = 0.4. Let y = # answering Yes among n = 3 randomly selected people. P(y) = n! y!(n − y)! πy (1 − π)n−y = 3! y!(3 − y)! (0.4)y (0.6)3−y P(0) = 3! 0!3! (0.4)0 (0.6)3 = (0.6)3 = 0.216 P(1) = 3! 1!2! (0.4)1 (0.6)2 = 3(0.4)(0.6)2 = 0.432 P(2) = 3! 2!1! (0.4)2 (0.6)1 = 3(0.4)2 (0.6) = 0.288 P(3) = 3! 3!0! (0.4)3 (0.6)0 = (0.4)3 = 0.064 y 0 1 2 3 Total P(y) 0.216 0.432 0.288 0.064 1 6
  • 9. Binomial Probabilities in R dbinom(x=0, size=3, p=0.4) [1] 0.216 dbinom(0, 3, 0.4) [1] 0.216 dbinom(1, 3, 0.4) [1] 0.432 dbinom(x=0:3, size=3, p=0.4) [1] 0.216 0.432 0.288 0.064 plot(0:3, dbinom(0:3, 3, .4), type = "h", xlab = "y", ylab = "P(y)") 0.0 1.0 2.0 3.0 0.1 0.3 y P(y) 7
  • 10. Binomial Distribution Facts If Y is a Binomial (n, π) random variable, then • E(Y) = nπ • SD = σ(Y) = √ Var(Y) = √ nπ(1 − π) • Binomial (n, π) can be approx. by Normal (nπ, nπ(1 − π)) when n is large (nπ ≥ 5 and n(1 − π) ≥ 5). 0 2 4 6 8 0.00 0.15 0.30 Binomial(n = 8, π = 0.2) y P(y) 0 5 10 15 20 25 0.00 0.10 0.20 Binomial(n = 25, π = 0.2) y P(y) 8
  • 11. Likelihood & Maximum Likelihood Estimation
  • 12. A Probability Question Let π be the proportion of US adults that are willing to get an Omicron booster. A sample of 5 subjects are randomly selected. Let Y be the number of them that are willing to get an Omicron booster. What is P(Y = 3)? Answer: Y is Binomial (n = 5, π) (Why?) P(Y = y; π) = n! y! (n − y)! πy (1 − π)n−y If π is known to be 0.3, then P(Y = 3; π) = 5! 3!2! (0.3)3 (0.7)2 = 0.1323. 9
  • 13. A Statistics Question Of course, in practice we don’t know π and we collect data to estimate it. How shall we choose a “good” estimator for π? An estimator is a formula based on the data (a statistic) that we plan to use to estimate a parameter (π) after we collect the data. Once the data are collected, we can calculate the value of the statistic: an estimate for π. 10
  • 14. A Statistics Question Suppose 8 of 20 randomly selected U.S. adults said they are willing to get an Omicron booster What can we infer about the value of π = proportion of U.S. adults that are comfortable getting a booster? The chance to observe Y = 8 in a random sample of size n = 20 is P(Y = 8; π) =                20 8 ! (0.3)8 (0.7)12 ≈ 0.1143 if π = 0.3 20 8 ! (0.6)8 (0.4)12 ≈ 0.0354 if π = 0.6 It appears that π = 0.3 is more likely to be π than π = 0.6, since the former gives a higher prob. to observe the outcome y = 8. We say the likelihood of π = 0.3 is higher than that of π = 0.6. 11
  • 15. Maximum Likelihood Estimate (MLE) The maximum likelihood estimate (MLE) of a parameter (like π) is the value at which the likelihood function is maximized. Example. If 8 of 20 randomly selected U.S. adults are comfortable getting the booster, the likelihood function ℓ(π | y = 8) = 20 8 ! π8 (1 − π)12 reaches its max at π = 0.4, the MLE for π is b π = 0.4 given the data y = 8. 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 π 12
  • 16. Maximum Likelihood Estimate (MLE) The probability P(Y = y; π) = n y ! πy (1 − π)n−y = ℓ(π | y) viewed as a function of π, is called the likelihood function, (or just likelihood) of π, denoted as ℓ(π | y). It measure the “plausibility” of a value being the true value of π. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 π Likelihood y=0 y=2 y=8 y=14 Likelihood functions ℓ(π | y) at different values of y for n = 20. 13
  • 17. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 π Likelihood y=0 y=2 y=8 y=14 Likelihood functions ℓ(π | y) for various values of y when n = 20. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 π Likelihood y=0 y=20 y=80 y=140 Likelihood functions ℓ(π | y) at various values of y when n = 200. 14
  • 18. Likelihood in General In general, suppose the observed data (Y1, Y2, . . . , Yn) have a joint probability distribution with some parameter(s) called θ P(Y1 = y1, Y2 = y2, . . . , Yn = yn) = f(y1, y2, . . . , yn | θ) The likelihood function for the parameterθ is ℓ(θ | data) = ℓ(θ | y1, y2, . . . , yn) = f(y1, y2, . . . , yn | θ). • Note the likelihood function regards the probability as a function of the parameter θ rather than as a function of the data y1, y2, . . . , yn. • If ℓ(θ1 | y1, . . . , yn) > ℓ(θ2 | y1, . . . , yn), then θ1 appears more plausible to be the true value of θ than θ2 does, given the observed data y1, . . . , yn. 15
  • 19. Maximizing the Log-likelihood Rather than maximizing the likelihood, it is often computationally easier to maximize its natural logarithm, called the log-likelihood, log ℓ(π | y) which results in the same answer since logarithm is strictly increasing, x1 > x2 ⇐⇒ log(x1) > log(x2). So ℓ(π1 | y) > ℓ(π2 | y) ⇐⇒ log ℓ(π1 | y) > log ℓ(π2 | y). 16
  • 20. Example (MLE for Binomial) If the observed data Y ∼ Binomial (n, π) but π is unknown, the likelihood of π is ℓ(π | y) = p(Y = y|π) = n y ! πy (1 − π)n−y and the log-likelihood is log ℓ(π | y) = log n y ! + y log(π) + (n − y) log(1 − π). From calculus, we know a function f(x) reaches its max at x = x0 if d dx f(x) = 0 at x = x0, and d2 dx2 f(x) < 0 at x = x0. 17
  • 21. Example (MLE for Binomial) d dπ log ℓ(π | y) = y π − n − y 1 − π = y − nπ π(1 − π). equals 0 when y − nπ π(1 − π) = 0 That is, when y − nπ = 0. Solving for π gives the ML estimator (MLE) b π = y n . and d2 dπ2 log ℓ(π | y) = − y π2 − n − y (1 − π)2 < 0 for any 0 < π < 1 Thus, we know log ℓ(π | y) reaches its max when π = y/n. So MLE of π is b π = y n = sample proportion of successes. 18
  • 22. MLEs for Other Inference Problems • If Y1, Y2, . . . , Yn are i.i.d. N(µ, σ2), the MLE for µ is the sample mean Y = Pn i=1 Yi n . • In simple linear regression, Yi = β0 + β1xi + εi When the errors εi are i.i.d. normal, the usual least squares estimates for β0 and β1 are the MLEs. i.i.d. = Independent and identically distributed (same distribution each εi). 19
  • 23. Hypothesis Tests of a Binomial Proportion
  • 24. Hypothesis Tests of a Binomial Proportion If the observed data Y ∼ Binomial (n, π), recall the MLE for π is π̂ = Y/n. Recall that since Y ∼ Binomial (n, π), the mean and standard deviation (SD) of Y are respectively, E[Y] = nπ, SD(Y) = p nπ(1 − π). The mean and SD of π̂ are thus respectively E(π̂) = E Y n = E(Y) n = π, SD(π̂) = SD Y n = SD(Y) n = r π(1 − π) n . By CLT, as n gets large, π̂ − π √ π(1 − π)/n ∼ N(0, 1). 20
  • 25. Hypothesis Tests for a Binomial Proportion The textbook lists 3 different tests for testing H0: π = π0 v.s. Ha: π , π0 (or 1-sided alternative.) • Score Test uses the score statistic zs = π̂ − π0 √ π0(1 − π0)/n • Wald Test uses the Wald statistic zw = π̂ − π0 √ π̂(1 − π̂)/n • Likelihood Ratio Test: we’ll introduce shortly As n gets large, both zs and zw ∼ N(0, 1), both z2 s and z2 w ∼ χ2 1. based on which, P-value can be computed. 21
  • 26. Example (Will You Get the COVID-19 Vaccine?) Pew Research Institute surveyed 12,648 U.S. adults during Nov. 18-29, 2020 about their intention to be vaccinated for COVID-19. Among the 1264 respondents in the 18-29 age group, 695 said they would probably or definitely get the vaccine if it’s available today. • estimate of π = π̂ = 695 1264 ≈ 0.55 22
  • 27. Example (Will You Get the COVID-19 Vaccine?) Pew Research Institute surveyed 12,648 U.S. adults during Nov. 18-29, 2020 about their intention to be vaccinated for COVID-19. Among the 1264 respondents in the 18-29 age group, 695 said they would probably or definitely get the vaccine if it’s available today. • estimate of π = π̂ = 695 1264 ≈ 0.55 Want to test whether 60% of 18-29 year-olds in the U.S. would probably or definitely get the vaccine. H0: π = 0.6 v.s. Ha: π , 0.6 • Score statistic zs = 0.55 − 0.6 √ 0.6 × 0.4/1264 ≈ −3.64 • Wald statistic zw = 0.55 − 0.6 √ 0.55 × 0.45/1264 ≈ −3.58 22
  • 28. Note that the P-values computed using N(0, 1) or χ2 1 are identical. P-value for the score test 2*pnorm(-3.64) [1] 0.0002726 pchisq(3.64ˆ2,df=1,lower.tail=F) [1] 0.0002726 P-value for the Wald test 2*pnorm(-3.58) [1] 0.0003436 pchisq(3.58ˆ2,df=1,lower.tail=F) [1] 0.0003436 See slides L01_supp_chisq_table.pdf for more details about chi-squared distributions. 23
  • 29. Likelihood Ratio Test (LRT) Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . To test H0: π = π0 v.s. Ha: π , π0, let • ℓ0 be the max. likelihood under H0, which is ℓ(π0|y) 24
  • 30. Likelihood Ratio Test (LRT) Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . To test H0: π = π0 v.s. Ha: π , π0, let • ℓ0 be the max. likelihood under H0, which is ℓ(π0|y) • ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y) where π̂ = y/n is the MLE of π. 24
  • 31. Likelihood Ratio Test (LRT) Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . To test H0: π = π0 v.s. Ha: π , π0, let • ℓ0 be the max. likelihood under H0, which is ℓ(π0|y) • ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y) where π̂ = y/n is the MLE of π. Observe that • ℓ0 ≤ ℓ1 always 24
  • 32. Likelihood Ratio Test (LRT) Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . To test H0: π = π0 v.s. Ha: π , π0, let • ℓ0 be the max. likelihood under H0, which is ℓ(π0|y) • ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y) where π̂ = y/n is the MLE of π. Observe that • ℓ0 ≤ ℓ1 always • Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1. 24
  • 33. Likelihood Ratio Test (LRT) Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . To test H0: π = π0 v.s. Ha: π , π0, let • ℓ0 be the max. likelihood under H0, which is ℓ(π0|y) • ℓ1 be the max. likelihood over all possible π, which is ℓ(π̂|y) where π̂ = y/n is the MLE of π. Observe that • ℓ0 ≤ ℓ1 always • Under H0, we expect π̂ ≈ π0 and hence ℓ0 ≈ ℓ1. • ℓ0 ≪ ℓ1 is a sign to reject H0 24
  • 34. Likelihood Ratio Test Statistic (LRT Statistic) The likelihood-ratio test statistic (LRT statistic) for testing H0: π = π0 v.s. Ha: π , π0 equals −2 log(ℓ0/ℓ1). • Here log is the natural log • LRT statistic −2 log(ℓ0/ℓ1) is always nonnegative since ℓ0 ≤ ℓ1 • When n is large, −2 log(ℓ0/ℓ1) ∼ χ2 1. • Reject H0 at level α if −2 log(ℓ0/ℓ1) χ2 1,α = qchisq(1-alpha, df=1) • P-value = P(χ2 1 observed LRT statistic) 0 χ1,α 2 α = shaded area chi−square curve w/ df = 1 0 observed value of the LRT−statistic P−value = shaded area chi−square curve w/ df = 1 25
  • 35. Likelihood Ratio Test Statistic for a Binomial Proportion Recall the likelihood function for a binomial proportion π is ℓ(π|y) = n y ! πy (1 − π)n−y . Thus ℓ0 ℓ1 = n y π y 0(1 − π0)n−y n y (y n)y(1 − (y n ))n−y = nπ0 y !y n (1 − π0) n − y !n−y and hence the LRT statistic is −2 log(ℓ0/ℓ1) = 2y log y nπ0 ! + 2(n − y) log n − y n (1 − π0) ! = 2 ( Oyes × log Oyes Eyes !# + Ono × log Ono Eno !#) where Oyes = y and Ono = n − y are the observed counts of yes no, and Eyes = nπ0 and Eno = n(1 − π0) are the expected counts of yes no under H0. 26
  • 36. Example (COVID-19 , Cont’d) Among the 1264 respondents in the 18-29 age group , 695 answered “yes”, 569 answered “no”, so Oyes = y = 695, Ono = n − y = 569. Under H0: π = 0.6, we expect 60% of the 1264 subjects to answer “yes” and 40% to answer “no.” Don’t round nπ0 and n(1 − π0) to integers. Eyes = nπ0 = 1264 × 0.6 = 758.4, Eno = n(1 − π0) = 1264 × 0.4 = 505.6. LRT statistic = 2 695 log 695 758.4 ! + 569 log 569 505.6 !# ≈ 13.091 which exceeds the critical value χ2 1,α = χ2 1,0.05 = 3.84 at α = 0.05 and hence H0 is rejected 5% level qchisq(1-0.05, df=1) [1] 3.841 27
  • 37. P-value of LRT test of Porportions Even though Ha is two-sided, the P-value remains to be the upper tail probability below, since a large deviation of b π = y/n from π0 would lead to a large LRT statistic, no matter π0 b π or π0 b π. chi−squared curve w/ df = 1 0 observed value of the LRT statistic P−value = shaded area For the COVID-19 example, the P-value is P(χ2 1 13.09), which is pchisq(13.09, df=1, lower.tail=F) [1] 0.0002969 28
  • 38. Confidence Intervals for Binomial Proportions
  • 39. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% 29
  • 40. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% • the dual 95% score CI for π is the collection of all π0 such that a 2-sided score test of H0: π = π0 having a P-value 5% 29
  • 41. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% • the dual 95% score CI for π is the collection of all π0 such that a 2-sided score test of H0: π = π0 having a P-value 5% E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then • 0.2 is in the 95% CI 29
  • 42. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% • the dual 95% score CI for π is the collection of all π0 such that a 2-sided score test of H0: π = π0 having a P-value 5% E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then • 0.2 is in the 95% CI • The corresponding α for a 95% CI is 5%. As p-value = 6% α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI. 29
  • 43. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% • the dual 95% score CI for π is the collection of all π0 such that a 2-sided score test of H0: π = π0 having a P-value 5% E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then • 0.2 is in the 95% CI • The corresponding α for a 95% CI is 5%. As p-value = 6% α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI. • but 0.2 is NOT in the 90% CI 29
  • 44. Duality of Confidence Intervals and Significance Tests For a 2-sided test of θ, the dual 100(1 − α)% confidence interval (CI) for the parameter θ consists of all those θ∗ values that a two-sided test of H0: θ = θ∗ is not rejected at level α. E.g., • the dual 90% Wald CI for π is the collection of all π0 such that a 2-sided Wald test of H0: π = π0 having a P-value 10% • the dual 95% score CI for π is the collection of all π0 such that a 2-sided score test of H0: π = π0 having a P-value 5% E.g., If the 2-sided P-value for testing H0: π = 0.2 is 6%, then • 0.2 is in the 95% CI • The corresponding α for a 95% CI is 5%. As p-value = 6% α = 5%, H0: π = 0.2 is not rejected so 0.2 in the 95% CI. • but 0.2 is NOT in the 90% CI • The corresponding α for a 90% CI is 10%. As p-value = 6% α = 10%, H0: π = 0.2 is rejected so 0.2 NOT in the 90% CI. 29
  • 45. Wald Confidence Intervals (Wald CIs) For a Wald test, H0: π = π∗ is not rejected at level α if π̂ − π∗ √ π̂(1 − π̂)/n zα/2, so a 100(1 − α)% Wald CI is       π̂ − zα/2 r π̂(1 − π̂) n , π̂ + zα/2 r π̂(1 − π̂) n        . where confidence level 100(1 − α)% 90% 95% 99% zα/2 1.645 1.96 2.576 • Introduced in STAT 220 and 234 Drawbacks: • Wald CI for π collapses whenever π̂ = 0 or 1. • Actual coverage prob. for Wald CI is usually much less than 100(1 − α)% if π close to 0 or 1, unless n is quite large. 30
  • 46. Score Confidence Intervals (Score CIs) For a Score test, H0 π = π∗ is not rejected at level α if π̂ − π∗ √ π∗(1 − π∗)/n zα/2. A 100(1 − α)% score confidence interval consists of those π∗ satisfying the inequality above. Example. If π̂ = 0, the 95% score CI consists of those π∗ satisfying 0 − π∗ √ π∗(1 − π∗)/n 1.96. After a few steps of algebra, we can show such π∗’s are those satisfying 0 π∗ 1.962 n+1.962 . The 95% score CI for π when π̂ = 0 is thus 0, 1.962 n + 1.962 ! , which is NOT collapsing! 31
  • 47. Score CI (Cont’d) The end points of the score CI can be shown to be (y + z2/2) ± zα/2 p nπ̂(1 − π̂) + z2/4 n + z2 where z = zα/2. • midpoint of the score CI, π̂ + z2/2n 1 + z2/n , is between π̂ and 0.5. • better than the Wald CI, that the actual coverage probabilities are closer to the nominal levels. 32
  • 48. Agresti-Coull Confidence Intervals Recall the midpoint for a 100(1 − α)% score CI is π̃ = y + z2/2 n + z2 , where z = zα/2, which looks as if we add z2/2 more successes and z2/2 more failures to the data before we estimate π. This inspires the Agresti-Coull 100(1 − α)% confidence interval: π̃ ± z r π̃(1 − π̃) n + z2 where π̃ = y + z2/2 n + z2 and z = zα/2. which is essentially a Wald-type interval after adding z2/2 more successes and z2/2 more failures to the data, where z = zα/2. 33
  • 49. 95% “Plus-Four” Confidence Intervals At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull CI is y + z2 α/2/2 n + z2 α/2 = y + 1.962/2 n + 1.962 ≈ y + 2 n + 4 . Hence some approximate the 95% Agresti-Coull correction to the Wald CI by adding 2 successes and 2 failures before computing π̂ and then compute the Wald CI: π̂∗ ± 1.96 r π̂∗(1 − π̂∗) n + 4 , where π̂∗ = y + 2 n + 4 . • This is so called the “Plus-Four” confidence interval 34
  • 50. 95% “Plus-Four” Confidence Intervals At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull CI is y + z2 α/2/2 n + z2 α/2 = y + 1.962/2 n + 1.962 ≈ y + 2 n + 4 . Hence some approximate the 95% Agresti-Coull correction to the Wald CI by adding 2 successes and 2 failures before computing π̂ and then compute the Wald CI: π̂∗ ± 1.96 r π̂∗(1 − π̂∗) n + 4 , where π̂∗ = y + 2 n + 4 . • This is so called the “Plus-Four” confidence interval • Note the “Plus-Four” CI is for 95% confidence level only 34
  • 51. 95% “Plus-Four” Confidence Intervals At 95% level, zα/2 = z0.025 = 1.96, the midpoint of the Agresti-Coull CI is y + z2 α/2/2 n + z2 α/2 = y + 1.962/2 n + 1.962 ≈ y + 2 n + 4 . Hence some approximate the 95% Agresti-Coull correction to the Wald CI by adding 2 successes and 2 failures before computing π̂ and then compute the Wald CI: π̂∗ ± 1.96 r π̂∗(1 − π̂∗) n + 4 , where π̂∗ = y + 2 n + 4 . • This is so called the “Plus-Four” confidence interval • Note the “Plus-Four” CI is for 95% confidence level only • At 90% level, zα/2 = z0.05 = 1.645, Agresti-Coull CI would add z2 α/2/2 = 1.6452/2 ≈ 1.35 more successes and 1.35 more failures. 34
  • 52. Likelihood Ratio Confidence Intervals (LR CIs) A LR test will not reject H0: π = π∗ at level α if −2 log(ℓ0/ℓ1) = −2 log ℓ(π∗|y) ℓ(π̂ | y) ! χ2 1,α. A 100(1 − α)% likelihood ratio CI consists of those π∗ with likelihood ℓ(π∗ |y) e−χ2 1,α/2 ℓ(π̂|y) E.g., the 95% LR CI contains those π∗ with likelihood above e−χ2 1,0.05/2 = e−3.84/2 ≈ 0.0147 multiple of the max. likelihood. Likelihood ℓ(π|y) for n = 20, y = 8. 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 π 95% • No close form expression for end points of a LR CI • Can use software to find the end points numerically 35
  • 53. Likelihood Ratio Confidence Intervals Do Not Collapse at 0 Recall the LRT statistic for testing H0: π = π0 against Ha: π , π0 is −2 log(ℓ0/ℓ1) = 2y log y nπ0 ! + 2(n − y) log n − y n (1 − π0) ! and the H0: π = π0 is rejected if −2 log(ℓ0/ℓ1) χ2 1,α. Hence the 100(1 − α)% LR confidence interval consists of those π0 satisfying 2y log y nπ0 ! + 2(n − y) log n − y n (1 − π0) ! ≤ χ2 1,α In particular, when y = 0, the 95% LR CI consists of those π0 satisfying −2n log(1 − π0) χ2 1,0.05 = 3.84. That is, (0, 1 − e−3.84/(2n)), which is NOT collapsing, either! 36
  • 54. Example (Political Party Affiliation) A survey about the political party affiliation of residents in a town found 4 of 400 in the sample to be Independents. Want a 95% CI for π = proportion of Independents in the town. • estimate of π = 4/400 = 0.01 • Wald CI: 0.01 ± 1.96 r 0.01 × (1 − 0.01) 400 ≈ (0.00025, 0.01975). • 95% Score CI contains those π∗ satisfying 0.01 − π∗ √ π∗(1 − π∗)/400 1.96 which is the interval (0.0039, 0.0254). • 95% Agresti-Coull CI: adding z2 /2 = z2 0.05/2 = 1.962 /2 ≈ 1.92. The estimate of π is (4 + 1.92)/(400 + 3.84) ≈ 0.01466 0.01466 ± 1.96 r 0.01466 × (1 − 0.01466) 403.84 ≈ (0.00294, 0.02638). 37
  • 55. R Function “prop.test()” for Score Test and CI The R function prop.test() performs the score test and produces the score CI. • It test H0: π = 0.5 vs Ha: π , 0.5 by default • Uses continuity correction by default. prop.test(4,400) 1-sample proportions test with continuity correction data: 4 out of 400, null probability 0.5 X-squared = 382, df = 1, p-value 2e-16 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.003208 0.027187 sample estimates: p 0.01 38
  • 56. R Function “prop.test()” for Score Test and CI To perform a score test of H0: π = 0.02 vs Ha: π , 0.02 without the continuity correction . . . prop.test(4,400, p=0.02, correct=F) 1-sample proportions test without continuity correction data: 4 out of 400, null probability 0.02 X-squared = 2, df = 1, p-value = 0.2 alternative hypothesis: true p is not equal to 0.02 95 percent confidence interval: 0.003895 0.025427 sample estimates: p 0.01 The 95% CI matches the score CI computed earlier. 39
  • 57. R function for Other CIs of Binomial Proportions The function binom.confint() in the package binom can produce confidence intervals for several methods. You need to first install the binom package just once, ever. To check if the binom package has installed on your computer, library(binom) If you get an error message, # Error in library(binom) : there is no package called ‘binom’ that means the binom library is not installed. You can run the following command to install the binom library. If FALSE, you can install the library using the command below install.packages(binom) 40
  • 58. Now one can use binom.confint() to find the CIs. # Wald CI binom.confint(4, 400, conf.level = 0.95, method = asymptotic) method x n mean lower upper 1 asymptotic 4 400 0.01 0.0002493 0.01975 # Score CI, also called ``Wilson'' binom.confint(4, 400, conf.level = 0.95, method = wilson) method x n mean lower upper 1 wilson 4 400 0.01 0.003895 0.02543 # Agresti-Coull CI binom.confint(4, 400, conf.level = 0.95, method = ac) method x n mean lower upper 1 agresti-coull 4 400 0.01 0.002939 0.02638 # Likelihood-Ratio Test CI binom.confint(4, 400, conf.level = 0.95, method = lrt) method x n mean lower upper 1 lrt 4 400 0.01 0.003136 0.02308 41
  • 59. Example (Political Party Affiliation) LR CI Recall the 95% LR confidence interval consists of those π0 satisfying 2y log y nπ0 ! + 2(n − y) log n − y n (1 − π0) ! ≤ χ2 1,0.05 = 3.8415 To verify the LRT confidence interval (0.003135542, 0.02307655) given by binom.confint(), let’s plug the end points in to the LRT test statistic above and see if we obtain 3.84146 y = 4 n = 400 pi0 = c(0.003135542, 0.02307655) 2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0)) [1] 3.806 3.841 pi0 = c(0.003115255, 0.02307735) 2*y*log(y/n/pi0) + 2*(n-y)*log((n-y)/n/(1-pi0)) [1] 3.841 3.841 42
  • 60. Comparison of Wald, Score, Agresti-Coull, and LRT CIs 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 n = 12 Confidence Intervals y = count of successes Wald Score Agresti−Coull LRT • End points of Score, Agresti-Coull, and LRT CIs are generally closer to 0.5 than those for the Wald CIs 43
  • 61. Comparison of Wald, Score, Agresti-Coull, and LRT CIs 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 n = 12 Confidence Intervals y = count of successes Wald Score Agresti−Coull LRT • End points of Score, Agresti-Coull, and LRT CIs are generally closer to 0.5 than those for the Wald CIs • End points of Wald and Agresti-Coull CIs may fall outside of [0, 1], while those of Score and LRT CIs always fall between 0 and 1 43
  • 62. Comparison of Wald, Score, Agresti-Coull, and LRT CIs 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 n = 12 Confidence Intervals y = count of successes Wald Score Agresti−Coull LRT • End points of Score, Agresti-Coull, and LRT CIs are generally closer to 0.5 than those for the Wald CIs • End points of Wald and Agresti-Coull CIs may fall outside of [0, 1], while those of Score and LRT CIs always fall between 0 and 1 • Agresti-Coull CIs always contain the Score CIs 43
  • 63. Comparison of Wald, Score, Agresti-Coull, and LRT CIs 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 n = 12 Confidence Intervals y = count of successes Wald Score Agresti−Coull LRT • End points of Score, Agresti-Coull, and LRT CIs are generally closer to 0.5 than those for the Wald CIs • End points of Wald and Agresti-Coull CIs may fall outside of [0, 1], while those of Score and LRT CIs always fall between 0 and 1 • Agresti-Coull CIs always contain the Score CIs • Score CIs are narrower than Wald CIs unless y/n is close to 0 or 1. 43
  • 64. True Confidence Levels for Various Types of CIs When n = 12 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Wald π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Score π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Agresti−Coull π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 LRT π True Confidence Level 44
  • 65. True Coverage Probabilities for Various CIs When n = 200 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Wald π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Score π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Agresti−Coull π True Confidence Level 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 LRT π True Confidence Level 45
  • 66. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. 46
  • 67. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. • Wald CIs tend to be farthest below the 0.95 level. In fact, the true level can be as low as 0 when π is close to 0 or 1 46
  • 68. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. • Wald CIs tend to be farthest below the 0.95 level. In fact, the true level can be as low as 0 when π is close to 0 or 1 • Score CIs are closer to the 0.95 level, though it may fall below 0.95 when π is close to 0 or 1 46
  • 69. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. • Wald CIs tend to be farthest below the 0.95 level. In fact, the true level can be as low as 0 when π is close to 0 or 1 • Score CIs are closer to the 0.95 level, though it may fall below 0.95 when π is close to 0 or 1 • Agresti-Coull CIs are usually conservative (true level are above 0.95) especially when π close to 0 or 1. 46
  • 70. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. • Wald CIs tend to be farthest below the 0.95 level. In fact, the true level can be as low as 0 when π is close to 0 or 1 • Score CIs are closer to the 0.95 level, though it may fall below 0.95 when π is close to 0 or 1 • Agresti-Coull CIs are usually conservative (true level are above 0.95) especially when π close to 0 or 1. • LRT CIs are better than Wald but generally not as good as Score or Agresti-Coull CIs 46
  • 71. True Confidence Levels of Various CIs • How are true confidence levels computed? Why do the curves look jumpy? See HW2. • Wald CIs tend to be farthest below the 0.95 level. In fact, the true level can be as low as 0 when π is close to 0 or 1 • Score CIs are closer to the 0.95 level, though it may fall below 0.95 when π is close to 0 or 1 • Agresti-Coull CIs are usually conservative (true level are above 0.95) especially when π close to 0 or 1. • LRT CIs are better than Wald but generally not as good as Score or Agresti-Coull CIs • When n gets larger, all 4 types of intervals become closer to the 0.95 level, though Wald CIs remain poor when π is close to 0 or 1 46
  • 72. How To Compute the True Confidence Levels? (1) Consider the true confidence level the 95% Wald CI when n = 12 and π = 0.1, i.e., the probability that the 95% Wald confidence interval (Wald CI) below       π̂ − 1.96 r π̂(1 − π̂) n , π̂ + 1.96 r π̂(1 − π̂) n        where π̂ = y/n contains π = 0.1 when y ∼ Binomial(n = 12, π = 0.1). If y has a Binomial(n = 12, π = 0.1) distribution, the possible values of y are the integers 0, 1, 2, . . . , 12. We can calculate the corresponding Wald CI for each possible value of y on the next page. See also: https://yibi-huang.shinyapps.io/shiny/ 47
  • 73. n = 12 y = 0:n p = y/n CI.lower = p - 1.96*sqrt(p*(1-p)/n) CI.upper = p + 1.96*sqrt(p*(1-p)/n) data.frame(y, CI.lower, CI.upper) y CI.lower CI.upper 1 0 0.00000 0.0000 2 1 -0.07305 0.2397 3 2 -0.04420 0.3775 4 3 0.00500 0.4950 5 4 0.06661 0.6001 6 5 0.13772 0.6956 7 6 0.21710 0.7829 8 7 0.30439 0.8623 9 8 0.39994 0.9334 10 9 0.50500 0.9950 11 10 0.62247 1.0442 12 11 0.76029 1.0730 13 12 1.00000 1.0000 Which of the Wald intervals contain π = 0.1? 48
  • 74. n = 12 y = 0:n p = y/n CI.lower = p - 1.96*sqrt(p*(1-p)/n) CI.upper = p + 1.96*sqrt(p*(1-p)/n) data.frame(y, CI.lower, CI.upper) y CI.lower CI.upper 1 0 0.00000 0.0000 2 1 -0.07305 0.2397 3 2 -0.04420 0.3775 4 3 0.00500 0.4950 5 4 0.06661 0.6001 6 5 0.13772 0.6956 7 6 0.21710 0.7829 8 7 0.30439 0.8623 9 8 0.39994 0.9334 10 9 0.50500 0.9950 11 10 0.62247 1.0442 12 11 0.76029 1.0730 13 12 1.00000 1.0000 Which of the Wald intervals contain π = 0.1? Only the CIs for y = 1, 2, 3, 4. 48
  • 75. When y ∼ Binomial(n = 12, π = 0.1), P(95% Wald CI contains π = 0.1) = P(y = 1) + P(y = 2) + P(y = 3) + P(y = 4) = 12 1 ! (0.1)1 (0.9)11 + 12 2 ! (0.1)2 (0.9)10 + 12 3 ! (0.1)3 (0.9)9 + 12 4 ! (0.1)4 (0.9)8 . The four Binomial probabilities above can be found using dbinom(1:4, size = 12, p=0.1) [1] 0.37657 0.23013 0.08523 0.02131 and hence their total is sum(dbinom(1:4, size = 12, p=0.1)) [1] 0.7132 The true confidence level of a 95% Wald CI is just 71%, far below the nominal 95% level. 49