1
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Topics
1. Introduction: Distributions and Inference for
Categorical Data
2. Describing Contingency Tables
3. Inference for Contingency Tables
4. Introduction to Generalized Linear Models
5. Logistic Regression
6. Building and Applying Logistic Regression Models
7. Logit Models for Multinomial Responses
2
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Chapter 1 - Outline
1.1 Categorical Response Data
1.2 Distributions for Categorical Data
1.3 Statistical Inference for Categorical
Data
1.4 Statistical Inference for Binomial
Parameters
1.5 Statistical Inference for Multinomial
Parameters
3
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.1 CATEGORICAL RESPONSE DATA
 A categorical variable has a measurement scale
consisting of a set of categories.
 political philosophy: liberal, moderate, or
conservative.
 brands of a product: brand A, brand B, and brand C
 A categorical variable can be a response variable or
independent variable
 We consider primarily the CATEGORICAL RESPONSE
DATA in this course
4
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.1.1 Response–Explanatory Variable
Distinction
 Most statistical analyses distinguish between response
(or dependent) variables and explanatory (or
independent) variables.
 For instance, regression models:
selling price of a house = f(square footage, location)
 In this book we focus on methods for categorical
response variables.
 As in ordinary regression, explanatory variables can be
of any type.
5
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.1.2 Nominal–Ordinal Scale
Distinction
 Nominal: Variables having categories without a natural
ordering
 religious affiliation: Catholic, Protestant, Jewish,
Muslim, other.
 mode of transportation: automobile, bicycle, bus,
subway, walk
 favorite type of music: classical, country, folk, jazz,
rock
 choice of residence: apartment, condominium,
house, other.
 For nominal variables, the order of listing the
categories is irrelevant.
 The statistical analysis does not depend on that
ordering.
6
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Nominal or Ordinal
 Ordinal: ordered categories
 automobile: subcompact, compact, midsize, large
 social class: upper, middle, lower
 political philosophy: liberal, moderate, conservative
 patient condition: good, fair, serious, critical.
 Ordinal variables have ordered categories, but
distances between categories are unknown.
 Although a person categorized as moderate is more
liberal than a person categorized as conservative, no
numerical value describes how much more liberal that
person is. Methods for ordinal variables utilize the
category ordering.
7
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Interval variable
 An interval variable is one that does have numerical
distances between any two values.
 blood pressure level
 functional life length of television set
 length of prison term
 annual income
 An internal variable is sometimes called a ratio variable
if ratios of values are also valid. It has a clear
definition of 0:
 Height
 Weight
 enzyme activity
8
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
categories are not as clear cut as
they sound
 What kind of variable is color?
 In a psychological study of perception, different
colors would be regarded as nominal.
 In a physics study, color is quantified by
wavelength, so color would be considered a ratio
variable.
 What about counts?
 If your dependent variable is the number of cells in
a certain volume, what kind of variable is that. It
has all the properties of a ratio variable, except it
must be an integer.
 Is that a ratio variable or not? These questions just
point out that the classification scheme is appears
to be more comprehensive than it is
9
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 A variable’s measurement scale determines which statistical
methods are appropriate.
 In the measurement hierarchy,
 interval variables are highest,
 ordinal variables are next,
 and nominal variables are lowest.
 Statistical methods for variables of one type can also be used
with variables at higher levels but not at lower levels.
 For instance, statistical methods for nominal variables can be
used with ordinal variables by ignoring the ordering of
categories.
 Methods for ordinal variables cannot, however, be used with
nominal variables, since their categories have no meaningful
ordering.
 It is usually best to apply methods appropriate for the actual
scale.
10
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.1.3 Continuous–Discrete Variable
Distinction
 according to the number of values they can take
 Actual measurement of all variables occurs in a discrete
manner, due to precision limitations in measuring
instruments.
 The continuous / discrete classification, in practice,
distinguishes between variables that take lots of values
and variables that take few values.
 Statisticians often treat discrete interval variables
having a large number of values, such as test scores,
as continuous
11
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
This class: Discretely measured
responses can be:
 Binary (two categories)
 nominal variables (unordered)
 ordinal variables (ordered)
 discrete interval variables having relatively few values,
and
 continuous variables grouped into a small number of
categories.
12
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.1.4 Quantitative–Qualitative
Variable Distinction
 Nominal variables are qualitative distinct categories
differ in quality, not in quantity.
 Interval variables are quantitative distinct levels have
differing amounts of the characteristic of interest.
 The position of ordinal variables in the quantitative or
qualitative classification is fuzzy.
13
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 Analysts often utilize the quantitative nature of ordinal
variables by assigning numerical scores to categories or
assuming an underlying continuous distribution.
 This requires good judgment and guidance from
researchers who use the scale, but it provides benefits
in the variety of methods available for data analysis.
14
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Summary
 Continuous variable
 Ratio
 Interval
 Discrete
 Categorical
 Binary
 Ordinal
 Nominal
15
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Calculation:
OK to compute.... Nominal Ordinal Interval Ratio
frequency distribution Yes Yes Yes Yes
median and percentiles No Yes Yes Yes
add or subtract No No Yes Yes
mean, standard
deviation, standard
error of the mean
No No Yes Yes
ratio, or coefficient of
variation
No No No Yes
16
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Example1: Grades measured
 pass/fail
 A,B,C,D,F
 3.2, 4.1, 5.0, 2.1, …
 86,71,99 … of 100
17
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Example 2
o Did you get a flu? (Yes or No) – is a binary nominal
categorical variable
o What was the severity of your flu? (low, medium, or
high) – is an ordinal categorical variable
 Context is important. The context of the study and
corresponding questions are important in specifying
what kind of variable we will analyze.
18
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2 DISTRIBUTIONS FOR
CATEGORICAL DATA
 Inferential data analyses require assumptions about the
random mechanism that generated the data.
 For continuous variable, Normal distribution
 For categorical variable
 Binomial
 hypergeometric distribution
 Multinomial
 Poisson
19
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Overview of probability and
inference
 The basic problem we study in probability: Given a data
generating process, what are the properties of the
outcomes?
 The basic problem of statistical inference: Given the
outcomes (data), what we can say about the process
that generated the data?
Observed
data
Data
generating
process
probability
inference
20
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Random variable
 A random variable is the outcome of an experiment
(i.e. a random process) expressed as a number.
 We use capital letters near the end of the alphabet (X,
Y , Z, etc.) to denote random variables.
 Just like variables, probability distributions can be
classified as discrete or continuous.
21
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Continuous Probability Distributions
 If a random variable is a continuous variable, its
probability distribution is called a continuous probability
distribution.
 A continuous probability distribution differs from a
discrete probability distribution in several ways.
 The probability that a continuous random variable
will assume a particular value is zero.
 As a result, a continuous probability distribution
cannot be expressed in tabular form.
 Instead, an equation or formula is used to describe
a continuous probability distribution.
22
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Normal
 Most often, the equation used to describe a continuous
probability distribution is called a probability density
function. Sometimes, it is referred to as a density
function, or a PDF.
 Normal N(µ, 2
) PDF
}
2
)
(
exp{
2
1
)
,
;
( 2
2
2
2








x
x
f
23
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Chi-square distribution, PDF
24
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Discrete random variables
 A discrete random variable is one which may take on only
a countable number of distinct values such as 0,1,2,3,4,........
 Discrete random variables are usually (but not necessarily)
counts. If a random variable can take only a finite number of
distinct values, then it must be discrete.
 Examples:
 the number of children in a family
 the Friday night attendance at a cinema
 the number of patients in a doctor's surgery
 the number of defective light bulbs in a box of ten.
25
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
discrete random variable
 The probability distribution of a discrete random variable
is a list of probabilities associated with each of its possible
values. It is also sometimes called the probability function or
the probability mass function.
 Suppose a random variable X may take k different values,
with the probability that X = xi defined to be P(X = xi) = pi.
The probabilities pi must satisfy the following:
 0 < pi < 1 for each i
 p1 + p2 + ... + pk = 1.
26
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Example
 Suppose a variable X can take the values 1, 2, 3, or 4.
The probabilities associated with each outcome are
described by the following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
 The probability that X is equal to 2 or 3 is the sum of
the two probabilities: P(X = 2 or X = 3) = P(X = 2) +
P(X = 3) = 0.3 + 0.4 = 0.7.
 Similarly, the probability that X is
greater than 1 is equal to
1 - P(X = 1) = 1 - 0.1 = 0.9,
by the complement rule.
 This distribution may
also be described by
the probability histogram
shown to the right
27
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Properties
 E(X)= x f(x)
 var(X)= (x-E(X))2
f(x)
 If the distribution depends on unknown parameters 
we write it as f(x; ) or f(x | )
28
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.0 Bernoulli Distribution
 the Bernoulli distribution is a discrete
probability distribution, which takes value 1 with
success probability  and value 0 with failure
probability 1 − . So if X is a random variable with this
distribution, we have:
or write it as
Then
29
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.1 Binomial Distribution
 Many applications refer to a fixed number n of binary
observations.
 Let y1 , y2 , . . . , yn denote responses for n
independent and identical trials (Bernoulli trials)
 Identical trials means that the probability of success 
is the same for each trial.
 Independent trials means that the Yi are independent
random variables.
30
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 The total number of successes
 has the binomial distribution with index n and
parameter , denoted by bin(n,)
 The probability mass function
where
31
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
binomial pdf bin(25, )
=0.10
=0.25
=0.50
32
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Moments
Because Yi=1 or 0, Yi=Yi
2
E(Yi)=E(Yi
2
)=1 x  + 0 x (1-)=
Skewness:
33
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
The distribution converges to
normality as n increases
0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Binomial(25, 0.25)
Normal
34
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Binomial(5, 0.25)
Normal(1.25,0.96825)
35
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.2 Multinomial Distribution
Multiple possible outcomes
 Suppose that each of n independent, identical trials can
have outcome in any of c categories.
if trial i has outcome in category j
= 0 otherwise

represents a multinomial trial, with
 Let denote the number of trials having
outcome in category j.
 The counts have the multinomial
distribution.
36
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
pdf:
37
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.3 Poisson Distribution
 count data do not result from a fixed number of trials.
 y=number of deaths due to automobile accidents on
motorways in Italy
 y>0
 Poisson probability mass function (Poisson 1837)
 It satisfies
 Skewness
38
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Poisson
0 5 10 15 20 25 30 35
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
=5
=10
=15
39
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Poisson Distribution
 used for counts of events that occur randomly over
time or space, when outcomes in disjoint periods or
regions are independent.
 an approximation for the binomial when n is large and
 is small with µ=n
 For example,
 n=50 million driving in Italy
 death rate/week =0.000002
 the number of deaths is bin(n, )
 Or approximately Poisson with µ=n=100
40
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.4 Overdispersion
 A key feature of the Poisson distribution is that its
variance equals its mean.
 Sample counts vary more when their mean is
higher.
 Overdispersion: Count observations often exhibit
variability exceeding that predicted by the binomial or
Poisson.
41
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.2.5 Connection between Poisson
and Multinomial Distributions
 Example, In Italy this next week, let
 y1=# of people who die in automobile accidents
 y2=number who die in airplane accidents
 y3=number who die in railway accidents
 (Y1, Y2, Y3) ~ independent Poisson ( )
 The total ~ Poisson ( )
 Here n is random variable rather than fixed
 If n is given, (Y1, Y2, Y3) is no longer independent and
Poisson, WHY
42
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
conditional distribution given that
let
~ multinomial distribution
43
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
multinomial distribution
vs. Poisson distribution
 Many categorical data analyses assume a multinomial
distribution.
 Such analyses usually have the same parameter
estimates as those of analyses assuming a Poisson
distribution, because of the similarity in the likelihood
functions.
44
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.3 STATISTICAL INFERENCE FOR
CATEGORICAL DATA(general)
 Once you choose the distribution of the categorical
variable, you need to estimate the parameters in the
distribution
 We first review general method
 Point estimate
 Confidence interval
 Section 1.4 MLE for binomial
 Section 1.5 MLE for multinominal
45
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Likelihood
 Likelihood is a tool for summarizing the data's evidence
about unknown parameters. Let us denote the
unknown parameter(s) of a distribution generically by
.
 If we observe a random variable X = x from distribution
f (x|), then the likelihood associated with x, l(|x), is
simply the distribution f (x|) regarded as a function of
 with x fixed.
 For example, if we observe x from bin(n; ), the
likelihood function is
x
n
x
x
n
x
l 









 )
1
(
)
|
( 


46
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Likelihood
 The formula for the likelihood looks similar algebraically
to the f (x|) but the distinction should be clear!
 The distribution function is defined over the support of
discrete variable x with  given, whereas the likelihood
is defined over the continuous parameter space for .
 Consequently, a graph of the likelihood usually looks
different from a graph of the probability distribution.
 In most cases, we work with loglikelihood
)
|
(
log
)
|
( x
l
x
L 
 
)
1
log(
)
(
log
)
1
log(
)
(
log
log
)
|
(
log
)
|
(
























x
n
x
x
n
x
x
n
x
l
x
L
47
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Loglikelihood function bin(5,) and
we observe x=0, x=1, and x=2
0 0.5 1
-30
-20
-10
0

l
(

|
x
)
l (  | x )=x log +(n-x) log (1-)
0 0.5 1
-40
-30
-20
-10
0

l
(

|
x
)
l (  | x )=x log +(n-x) log (1-)
0 0.5 1
-20
-15
-10
-5
0

l
(

|
x
)
l (  | x )=x log +(n-x) log (1-)
0 0.2 0.4 0.6 0.8 1
-7000
-6000
-5000
-4000
-3000
-2000
-1000

l
(

|
x
)
l (  | x )=x log +(n-x) log (1-)
bin(842+982,)
x=842 (yes)
48
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Likelihood
 In many problems of interest, we will derive our
loglikelihood from a sample rather than from a single
observation. If we observe an independent sample x1,
x2, …, xn from a distribution f (x|), then the overall
likelihood is the product of the individual likelihoods:
and loglikelihood is

 



n
i
i
n
i
i
n x
l
x
f
x
x
l
1
1
1 )
|
(
)
|
(
)
,
,
|
( 

 









n
i
i
n
i
i
n
i
i
n
x
L
x
f
x
f
x
x
L
1
1
1
1
)
|
(
)
|
(
log
)
|
(
log
)
,
,
|
(



 
49
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Log likelihood
 In regular problems, as the total sample size n grows,
the loglikelihood function does two things:
(a) it becomes more sharply peaked around its
maximum,
and (b) its shape becomes nearly quadratic
 the loglikelihood for a normal-mean problem is exactly
quadratic.
 That is, if we observe y1, . . . , yn from a normal
population with known variance, the loglikelihood is
or in multi-dimension
50
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
MLE (maximum likelihood
estimation)
 ML estimate for θ is the maximizer of L(θ) or,
equivalently, the maximizer of l(θ). This is the
parameter value under which the data observed have
the highest probability of occurrence.
 In regular problems, the ML estimate can be found by
setting to zero the first derivative(s) of l(θ) with
respect to θ.
51
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Transformations of parameters
 If l(θ) is a likelihood and φ = g(θ) is a one-to-one
function of the parameter with back-transformation θ =
g−1
(φ), then we can express the likelihood in terms of
φ as l( g−1
(φ) ).
 Transformations may help us to improve the shape of
the loglikelihood.
 If the parameter space for θ has boundaries, we may
want to choose a transformation to the entire real
space.
 For example, consider the binomial loglikelihood,
L
52
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
binomial loglikelihood
 If we apply the logit transformation
 whose back-transformation is
 the loglikelihood in terms of β is
L
53
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 If we observe y = 1 from a binomial with n = 5, the
loglikelihood in terms of β looks like this.
54
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 n=5; y=2;
 pi=0:0.01:1
 subplot(1,2,1)
 plot(pi, y*log(pi)+(n-y)*log(1-pi), '-')
 subplot(1,2,2)
 beta=-3:0.1:3
 plot(beta, y*beta+n*log(1./(1+exp(beta))), '-')
0 0.2 0.4 0.6 0.8 1
-14
-12
-10
-8
-6
-4
-2
-3 -2 -1 0 1 2 3
-10
-9
-8
-7
-6
-5
-4
-3
55
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 Transformations do not affect the location of the
maximum-likelihood (ML) estimate.
 If l(θ) is maximized at ˆθ, then l(φ) is maximized at ˆφ
= g(ˆθ).
56
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
score function
 A first derivative of L(θ) with respect to θ is called a
score function or simply a score.
 In a one-parameter problem, the score function from
an independent sample y1, . . . , yn is
where
is the score contribution for yi.
 The ML estimate is usually the solution of the likelihood
equation L’(θ)=0.
L
57
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Mean of the score function.
 A well known property of the score is that it has mean
zero.
 The score is an expression that involves both the
parameter θ and the data Y . Because it involves Y , we
can take its expectation with respect to the data
distribution f(y|θ). The expected score is no longer a
function of Y , but it’s still a function of θ. If we
evaluate this expected score at the “true value” of θ—
that is, at the same value of θ assumed when we took
the expectation—we get zero:
If certain differentiability conditions are met, the integral may be
rewritten as
58
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 For example, in the case of the binomial proportion, we
have
which is zero because E(Y ) = n.
 If we apply a one-to-one transformation to the
parameter φ = g(θ), then the score function with
respect to the new parameter φ also has mean zero.
59
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Estimating functions.
 This property of the score function—that it has an
expectation of zero when evaluated at the true
parameter θ—is a key to the modern theory of
statistical estimation.
 In the original theory of likelihood-based estimation, as
developed by R.A. Fisher and others, the ML estimate
ˆθ is viewed as the value of the parameter that, under
the parametric model, that makes the observed data
most likely.
 statisticians have begun to view ˆθ as the solution the
score equation(s). That is, we now often view an ML
estimate as the solution to
L’(θ)=0
60
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
estimating equations
 Any function of the data and the parameters having
mean zero at the true θ has this property as well.
Functions having the mean-zero property are called
estimating functions.
 Setting the estimating functions to zero is called
the estimating equations.
 In the case of the binomial proportion, for example,
Y − n
is a mean-zero estimating function, and so is
−1
[Y − n] .
61
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Information and variance estimation.
 The variance of the score is known as the Fisher
information. In the case of a single parameter, the
Fisher information is
 If θ has k parameters, the Fisher information is the
k x k covariance matrix for scores
62
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 Like the score function, the Fisher information is also a
function of θ. So we can evaluate it at any given value
of θ.
 Notice that i(θ) as we defined it is the square of a sum
which, in many problems, can be messy.
 To actually compute the Fisher information, we usually
make use of the well known identity
63
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 In the multiparameter case, l(θ) is the k x k matrix of
second derivatives
whose (l,m)th element is
64
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 why we care about the Fisher information?
 it provides us with a way (several ways, actually) of
assessing the uncertainty in the ML estimate.
 It is well known that, in regular problems, ˆθ is
approximately normally distributed about the true θ
with variance given by the reciprocal (or, in the
multiparameter case, the matrix inverse) of the Fisher
information.
65
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
two common ways to
approximate the variance of ˆθ.
 The first way is to plug ˆθ into i(θ) and invert,
this is commonly called the “expected information.”
 The second way is to invert (minus one times) the
actual second derivative of the loglikelihood at θ = ˆθ,
this is commonly called the “observed information.”
66
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.3.2 Likelihood Function and ML
Estimate for Binomial Parameter
 The binomial log likelihood is
 Differentiating with respect to  yields
 Equating this to 0 gives the likelihood equation, which
has solution
the sample proportion of successes for the n trials.
67
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 Calculating , taking the expectation, and
we get
 Thus, the asymptotic variance of is
 Actually, from E(Y)=n and var(Y)=n (1- ), the
distribution if =Y/n has mean and standard error
68
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Likelihood function and MLE
summary
 We use maximum likelihood estimate (MLE)
 asymptotically normal
 asymptotically consistent
 asymptotically efficient
 Likelihood function
 probability of those data, treated as a function of
the unknown parameter.
 maximum likelihood (ML) estimate
 parameter value that maximizes this function
69
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
MLE and its variance
 If y1; y2; … ; yn is a random sample from distribution
f(y|), then the score function is
 In regular problems, we can find the ML estimate by
setting the score function(s) to zero and solving for .
 The equations L’(θ)=0 are called the score equations.
More generally, they can be called estimating equations
because their solution is the estimate for θ.
 We defined the Fisher information as the variance of
the score function and
70
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.3.3 Wald–Likelihood Ratio–Score
Test Triad
 Three standard ways exist to use the likelihood function
to perform large-sample inference.
 Wald test
 Score test
 Likelihood ratio test
 We introduce these for a significance test of a null
hypothesis
H0:
and then discuss their relation to interval estimation.
 They all exploit the large-sample normality of ML
estimators.
71
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Wald test
 With nonnull standard error SE of , the test statistic
has an approximate standard normal distribution when
 One- or two-sided P-value by z.
 Or z2
has a chi-squared null distribution with 1 df
The P-value is then the right-tailed chi-squared
probability above the observed value
This type of statistic, using the nonnull standard
error, is called a Wald statistic (Wald 1943).
72
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Wald test
 For an .05-level two-side test, we reject H0 if
 Equivalently, if
where 3.84 is the 95th
percentile of 2
(1).
96
.
1
ˆ
0


SE


2
2
0
96
.
1
84
.
3
)
ˆ
var(
)
ˆ
(






73
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Wald test
 The multivariate extension for the Wald test of
has test statistic
where is the inverse matrix of Information
matrix.
 W is an asymptotic chi-squared distribution with df =
rank of .
 Wald test is not invariant to transformations. That is, a
Wald test on a transformed parameter φ= g() may
yield a different p-value than a Wald test on the
original scale.
74
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 uses the likelihood function through the ratio of two
maximizations:
(1). the maximum over the possible parameter values
under H0
(2). the maximum over the larger set of parameter values
permitting H0 or an alternative Ha to be true.
 The likelihood-ratio test statistic equals
where L0 and L1 denote the maximized log-likelihood
functions.
 is 2
distribution with df=dim(Ha U H0)-dim(H0)
 Reject H0 if
> 2
(=0.05)
Likelihood ratio test
75
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 The score test is based on the slope and expected
curvature of the log-likelihood function L() at the null
value 0.
 Score function
The value tends to be larger in absolute value
when is farther from 0.
 Score statistic
has an approximate standard normal null distribution.
 The chi-squared form of the score statistic is
Score test
)
/
)
(
(
)
(
2
0
2
0





 L
E
u
)
/
)
(
(
)
(
2
0
2
0
2





 L
E
u
76
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Why is score statistic reasonable?
 Recall that the mean of the score is zero and its
variance is equal to the Fisher information.
 In a large sample, the score will also be approximately
normally distributed because it's a sum of iid random
variables.
 Therefore, it will behave like a squared standard normal
[2
(1)] if H0 is true.
77
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Wald–Likelihood Ratio–Score Test
78
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 The three test statistics - Wald, LR and score are
asymptotically equivalent.
 The differences among them vanish in large samples if
the null hypothesis is true.
 If the null hypothesis is false, they may take very
different values. But in that case, all the test statistics
will be large, the p-values will be essentially zero, and
they will all lead us to reject H0.
 Score test does not require to calculate MLE.
 LR test is scale-invariant.
 LR statistic uses the most information of the three
types of test statistic and is the most versatile.
79
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.3.4 Constructing confidence
intervals
 In practice, it is more informative to construct
confidence intervals for parameters than to test
hypotheses about their values.
 For any of the three test methods, a confidence interval
results from inverting the test. For instance, a 95%
confidence interval for is the set of 0 for which the
test of H0: has a P-value exceeding 0.05.
 Let denote the z-score from the standard normal
distribution having right-tailed probability a; this is the
100(1-a) percentile of that distribution.
 Let denote the 100(1-a) percentile of the chi-
squared distribution with degrees of freedom df.
80
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Tests and Confidence Intervals
At significant level ,
reject H0: , if
2
/
0
ˆ



z
SE


100(1-)%
confidence interval
2
/
0
ˆ



z
SE


 }
:
{ 0

 }
:
{ 0

0

 
81
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Confidence Intervals
 The Wald confidence interval is most common in
practice because it is simple to construct using ML
estimates and standard errors reported by statistical
software.
 The likelihood-ratio-based interval is becoming more
widely available in software and is preferable for
categorical data with small to moderate n.
 For the best known statistical model, regression for a
normal response, the three types of inference
necessarily provide identical results.
82
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.4 STATISTICAL INFERENCE FOR
BINOMIAL PARAMETERS
 Recall log likelihood
 Score function
 MLE
SE=
)
1
log(
)
(
log
)
|
( 

 


 y
n
y
y
L
)
1
/(
)
(
/
)
( 

 


 y
n
y
u
83
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.4.1 Tests about a Binomial
Parameter
 Since H0 has a single parameter, we use the normal
rather than chi-squared forms of Wald and score test
statistics. They permit tests against one-sided as well
as two-sided alternatives.
 Wald statistic
 Evaluating the binomial score and information at 0
 The normal form of the score statistic simplifies to
84
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
 binomial log-likelihood under H0
 Under Ha
 The likelihood-ratio test statistic
or
has an asymptotic chi-squared distribution with df=1.
)
1
log(
)
(
log 0
0
0 
 


 y
n
y
L
)
ˆ
1
log(
)
(
ˆ
log
1 
 


 y
n
y
L
85
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Test
 At significant level , two sided, reject H0, if
(Wald test)
(Score test)
(LR test)



86
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.4.2 Confidence Intervals for a
Binomial Parameter
 Inverting the Wald test,
 Unfortunately, it performs poorly unless n is very large
 The actual coverage probability usually falls below the
nominal confidence coefficient, much below when  is
near 0 or 1.
 An adjustment is needed. (Problem 1.24)
87
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
Simulation to calculate coverage prob.
%let n=1000; %let pi=0.5; %let simuN=10000;
data simu; drop i;
do i=1 to &simuN;
k=RAND('BINOMIAL',&pi,&n); output;
end;
run;
data res; set simu;
pihat=k/&n;
lci=pihat-1.96*sqrt(pihat*(1-pihat)/&n);
uci=pihat+1.96*sqrt(pihat*(1-pihat)/&n);
if lci>&pi or uci<&pi then cover=0; else cover=1;
proc sql;
select sum(cover)/&simuN as coverageprobabilty from res;
88
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
it performs poorly if 1) n is small; 2)
pi near 0 or 1.
 %let n=1000; %let pi=0.5; %let simuN=10000;
 %let n=20; %let pi=0.5; %let simuN=10000;
 %let n=20; %let pi=0.1; %let simuN=10000;
 %let n=20; %let pi=0.9; %let simuN=10000;
89
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
An adjustment is needed. (Problem
1.24)
90
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
%let n=20; %let pi=0.5; %let simuN=10000;
data simu; drop i;
do i=1 to &simuN;
k=RAND('BINOMIAL',&pi,&n); output;
end;
run;
data res; set simu;
pihat=(k+1.96**2/2)/(&n+1.96*1.96);
lci=pihat-1.96*sqrt(pihat*(1-pihat)/(&n+1.96*1.96));
uci=pihat+1.96*sqrt(pihat*(1-pihat)/(&n+1.96*1.96));
if lci>&pi or uci<&pi then cover=0; else cover=1;
proc sql;
select sum(cover)/&simuN as coverageprobabilty from res;
91
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
score confidence interval
 The score confidence interval contains 0 values for
which
 Its endpoints are the 0 solutions to the equations
 It is quadratic in 0. This interval is
92
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
LR-based confidence interval
 The likelihood-ratio-based confidence interval is more
complex computationally, but simple in principle.
 It is the set of 0 for which the likelihood ratio test has
a P-value exceeding  .
 Equivalently, it is the set of 0 for which double the log
likelihood drops by less than from its value at
the ML estimate.

93
STA 517 – Introduction: Distribution and Inference
STA 517 – Introduction: Distribution and Inference
1.4.3 Proportion of Vegetarians
Example

More Related Content

PPT
A review of statistics
PPT
Edison S Statistics
PPT
Edisons Statistics
PDF
Statistics and permeability engineering reports
PPTX
Meaning and Importance of Statistics
PPTX
050325Online SPSS.pptx spss social science
PDF
Review of Basic Statistics and Terminology
DOCX
·Quantitative Data Analysis StatisticsIntroductionUnd.docx
A review of statistics
Edison S Statistics
Edisons Statistics
Statistics and permeability engineering reports
Meaning and Importance of Statistics
050325Online SPSS.pptx spss social science
Review of Basic Statistics and Terminology
·Quantitative Data Analysis StatisticsIntroductionUnd.docx

Similar to mapping disease risk in space and time, re-mapping (20)

PPTX
Analyzing quantitative data
PDF
00 - Lecture - 04_MVA - Applications and Assumptions of MVA.pdf
PDF
Data presentation by nndd data presentation.pdf
PDF
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
PPTX
Presentation1
PDF
Introduction to basic statistics
PDF
Introduction to basic statistics
PDF
Frequency Distribution.pdf
PPTX
Chapter_1_Lecture.pptx
PDF
Selection of appropriate data analysis technique
PPTX
Breakdown of Regression Models for Dissertations
PPTX
Section 6 - Chapter 1 - Introduction to Statistics Part I
PPTX
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
DOCX
this activity is designed for you to explore the continuum of an a.docx
PPTX
Introduction to Educational statistics and measurement
PDF
Lect 1_Biostat.pdf
PDF
Lecture 1 - Introduction to Data Analysis (1).pdf
PPT
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
PPTX
practical research -----PR2-LESSON1.pptx
PPTX
discriminate analysis of Biostatistics ppt for MPH Students
Analyzing quantitative data
00 - Lecture - 04_MVA - Applications and Assumptions of MVA.pdf
Data presentation by nndd data presentation.pdf
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
Presentation1
Introduction to basic statistics
Introduction to basic statistics
Frequency Distribution.pdf
Chapter_1_Lecture.pptx
Selection of appropriate data analysis technique
Breakdown of Regression Models for Dissertations
Section 6 - Chapter 1 - Introduction to Statistics Part I
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
this activity is designed for you to explore the continuum of an a.docx
Introduction to Educational statistics and measurement
Lect 1_Biostat.pdf
Lecture 1 - Introduction to Data Analysis (1).pdf
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
practical research -----PR2-LESSON1.pptx
discriminate analysis of Biostatistics ppt for MPH Students
Ad

More from yonas381043 (6)

PPT
spatio-temporal modelling, in samall area
PPT
Statistical tests for categorical data(2020)88.ppt
PPT
spatial modeling of aggregated data in small area
PPT
Non-parametric methods of data analysis for non normal data
PPT
Categorical data analysis which part of the generalized linear model
PPT
Non-parametric statistics - a class of statistics associated with non paramet...
spatio-temporal modelling, in samall area
Statistical tests for categorical data(2020)88.ppt
spatial modeling of aggregated data in small area
Non-parametric methods of data analysis for non normal data
Categorical data analysis which part of the generalized linear model
Non-parametric statistics - a class of statistics associated with non paramet...
Ad

Recently uploaded (20)

PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
advance database management system book.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
HVAC Specification 2024 according to central public works department
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
International_Financial_Reporting_Standa.pdf
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Uderstanding digital marketing and marketing stratergie for engaging the digi...
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
advance database management system book.pdf
What if we spent less time fighting change, and more time building what’s rig...
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
HVAC Specification 2024 according to central public works department
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Core Concepts of Personalized Learning and Virtual Learning Environments
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
International_Financial_Reporting_Standa.pdf
What’s under the hood: Parsing standardized learning content for AI
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx

mapping disease risk in space and time, re-mapping

  • 1. 1 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Topics 1. Introduction: Distributions and Inference for Categorical Data 2. Describing Contingency Tables 3. Inference for Contingency Tables 4. Introduction to Generalized Linear Models 5. Logistic Regression 6. Building and Applying Logistic Regression Models 7. Logit Models for Multinomial Responses
  • 2. 2 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Chapter 1 - Outline 1.1 Categorical Response Data 1.2 Distributions for Categorical Data 1.3 Statistical Inference for Categorical Data 1.4 Statistical Inference for Binomial Parameters 1.5 Statistical Inference for Multinomial Parameters
  • 3. 3 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.1 CATEGORICAL RESPONSE DATA  A categorical variable has a measurement scale consisting of a set of categories.  political philosophy: liberal, moderate, or conservative.  brands of a product: brand A, brand B, and brand C  A categorical variable can be a response variable or independent variable  We consider primarily the CATEGORICAL RESPONSE DATA in this course
  • 4. 4 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.1.1 Response–Explanatory Variable Distinction  Most statistical analyses distinguish between response (or dependent) variables and explanatory (or independent) variables.  For instance, regression models: selling price of a house = f(square footage, location)  In this book we focus on methods for categorical response variables.  As in ordinary regression, explanatory variables can be of any type.
  • 5. 5 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.1.2 Nominal–Ordinal Scale Distinction  Nominal: Variables having categories without a natural ordering  religious affiliation: Catholic, Protestant, Jewish, Muslim, other.  mode of transportation: automobile, bicycle, bus, subway, walk  favorite type of music: classical, country, folk, jazz, rock  choice of residence: apartment, condominium, house, other.  For nominal variables, the order of listing the categories is irrelevant.  The statistical analysis does not depend on that ordering.
  • 6. 6 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Nominal or Ordinal  Ordinal: ordered categories  automobile: subcompact, compact, midsize, large  social class: upper, middle, lower  political philosophy: liberal, moderate, conservative  patient condition: good, fair, serious, critical.  Ordinal variables have ordered categories, but distances between categories are unknown.  Although a person categorized as moderate is more liberal than a person categorized as conservative, no numerical value describes how much more liberal that person is. Methods for ordinal variables utilize the category ordering.
  • 7. 7 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Interval variable  An interval variable is one that does have numerical distances between any two values.  blood pressure level  functional life length of television set  length of prison term  annual income  An internal variable is sometimes called a ratio variable if ratios of values are also valid. It has a clear definition of 0:  Height  Weight  enzyme activity
  • 8. 8 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference categories are not as clear cut as they sound  What kind of variable is color?  In a psychological study of perception, different colors would be regarded as nominal.  In a physics study, color is quantified by wavelength, so color would be considered a ratio variable.  What about counts?  If your dependent variable is the number of cells in a certain volume, what kind of variable is that. It has all the properties of a ratio variable, except it must be an integer.  Is that a ratio variable or not? These questions just point out that the classification scheme is appears to be more comprehensive than it is
  • 9. 9 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  A variable’s measurement scale determines which statistical methods are appropriate.  In the measurement hierarchy,  interval variables are highest,  ordinal variables are next,  and nominal variables are lowest.  Statistical methods for variables of one type can also be used with variables at higher levels but not at lower levels.  For instance, statistical methods for nominal variables can be used with ordinal variables by ignoring the ordering of categories.  Methods for ordinal variables cannot, however, be used with nominal variables, since their categories have no meaningful ordering.  It is usually best to apply methods appropriate for the actual scale.
  • 10. 10 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.1.3 Continuous–Discrete Variable Distinction  according to the number of values they can take  Actual measurement of all variables occurs in a discrete manner, due to precision limitations in measuring instruments.  The continuous / discrete classification, in practice, distinguishes between variables that take lots of values and variables that take few values.  Statisticians often treat discrete interval variables having a large number of values, such as test scores, as continuous
  • 11. 11 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference This class: Discretely measured responses can be:  Binary (two categories)  nominal variables (unordered)  ordinal variables (ordered)  discrete interval variables having relatively few values, and  continuous variables grouped into a small number of categories.
  • 12. 12 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.1.4 Quantitative–Qualitative Variable Distinction  Nominal variables are qualitative distinct categories differ in quality, not in quantity.  Interval variables are quantitative distinct levels have differing amounts of the characteristic of interest.  The position of ordinal variables in the quantitative or qualitative classification is fuzzy.
  • 13. 13 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  Analysts often utilize the quantitative nature of ordinal variables by assigning numerical scores to categories or assuming an underlying continuous distribution.  This requires good judgment and guidance from researchers who use the scale, but it provides benefits in the variety of methods available for data analysis.
  • 14. 14 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Summary  Continuous variable  Ratio  Interval  Discrete  Categorical  Binary  Ordinal  Nominal
  • 15. 15 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Calculation: OK to compute.... Nominal Ordinal Interval Ratio frequency distribution Yes Yes Yes Yes median and percentiles No Yes Yes Yes add or subtract No No Yes Yes mean, standard deviation, standard error of the mean No No Yes Yes ratio, or coefficient of variation No No No Yes
  • 16. 16 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Example1: Grades measured  pass/fail  A,B,C,D,F  3.2, 4.1, 5.0, 2.1, …  86,71,99 … of 100
  • 17. 17 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Example 2 o Did you get a flu? (Yes or No) – is a binary nominal categorical variable o What was the severity of your flu? (low, medium, or high) – is an ordinal categorical variable  Context is important. The context of the study and corresponding questions are important in specifying what kind of variable we will analyze.
  • 18. 18 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2 DISTRIBUTIONS FOR CATEGORICAL DATA  Inferential data analyses require assumptions about the random mechanism that generated the data.  For continuous variable, Normal distribution  For categorical variable  Binomial  hypergeometric distribution  Multinomial  Poisson
  • 19. 19 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Overview of probability and inference  The basic problem we study in probability: Given a data generating process, what are the properties of the outcomes?  The basic problem of statistical inference: Given the outcomes (data), what we can say about the process that generated the data? Observed data Data generating process probability inference
  • 20. 20 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Random variable  A random variable is the outcome of an experiment (i.e. a random process) expressed as a number.  We use capital letters near the end of the alphabet (X, Y , Z, etc.) to denote random variables.  Just like variables, probability distributions can be classified as discrete or continuous.
  • 21. 21 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Continuous Probability Distributions  If a random variable is a continuous variable, its probability distribution is called a continuous probability distribution.  A continuous probability distribution differs from a discrete probability distribution in several ways.  The probability that a continuous random variable will assume a particular value is zero.  As a result, a continuous probability distribution cannot be expressed in tabular form.  Instead, an equation or formula is used to describe a continuous probability distribution.
  • 22. 22 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Normal  Most often, the equation used to describe a continuous probability distribution is called a probability density function. Sometimes, it is referred to as a density function, or a PDF.  Normal N(µ, 2 ) PDF } 2 ) ( exp{ 2 1 ) , ; ( 2 2 2 2         x x f
  • 23. 23 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Chi-square distribution, PDF
  • 24. 24 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Discrete random variables  A discrete random variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,........  Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete.  Examples:  the number of children in a family  the Friday night attendance at a cinema  the number of patients in a doctor's surgery  the number of defective light bulbs in a box of ten.
  • 25. 25 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference discrete random variable  The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function.  Suppose a random variable X may take k different values, with the probability that X = xi defined to be P(X = xi) = pi. The probabilities pi must satisfy the following:  0 < pi < 1 for each i  p1 + p2 + ... + pk = 1.
  • 26. 26 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Example  Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities associated with each outcome are described by the following table: Outcome 1 2 3 4 Probability 0.1 0.3 0.4 0.2  The probability that X is equal to 2 or 3 is the sum of the two probabilities: P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.3 + 0.4 = 0.7.  Similarly, the probability that X is greater than 1 is equal to 1 - P(X = 1) = 1 - 0.1 = 0.9, by the complement rule.  This distribution may also be described by the probability histogram shown to the right
  • 27. 27 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Properties  E(X)= x f(x)  var(X)= (x-E(X))2 f(x)  If the distribution depends on unknown parameters  we write it as f(x; ) or f(x | )
  • 28. 28 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.0 Bernoulli Distribution  the Bernoulli distribution is a discrete probability distribution, which takes value 1 with success probability  and value 0 with failure probability 1 − . So if X is a random variable with this distribution, we have: or write it as Then
  • 29. 29 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.1 Binomial Distribution  Many applications refer to a fixed number n of binary observations.  Let y1 , y2 , . . . , yn denote responses for n independent and identical trials (Bernoulli trials)  Identical trials means that the probability of success  is the same for each trial.  Independent trials means that the Yi are independent random variables.
  • 30. 30 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  The total number of successes  has the binomial distribution with index n and parameter , denoted by bin(n,)  The probability mass function where
  • 31. 31 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 0 5 10 15 20 25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 binomial pdf bin(25, ) =0.10 =0.25 =0.50
  • 32. 32 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Moments Because Yi=1 or 0, Yi=Yi 2 E(Yi)=E(Yi 2 )=1 x  + 0 x (1-)= Skewness:
  • 33. 33 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference The distribution converges to normality as n increases 0 5 10 15 20 25 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Binomial(25, 0.25) Normal
  • 34. 34 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Binomial(5, 0.25) Normal(1.25,0.96825)
  • 35. 35 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.2 Multinomial Distribution Multiple possible outcomes  Suppose that each of n independent, identical trials can have outcome in any of c categories. if trial i has outcome in category j = 0 otherwise  represents a multinomial trial, with  Let denote the number of trials having outcome in category j.  The counts have the multinomial distribution.
  • 36. 36 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference pdf:
  • 37. 37 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.3 Poisson Distribution  count data do not result from a fixed number of trials.  y=number of deaths due to automobile accidents on motorways in Italy  y>0  Poisson probability mass function (Poisson 1837)  It satisfies  Skewness
  • 38. 38 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Poisson 0 5 10 15 20 25 30 35 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 =5 =10 =15
  • 39. 39 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Poisson Distribution  used for counts of events that occur randomly over time or space, when outcomes in disjoint periods or regions are independent.  an approximation for the binomial when n is large and  is small with µ=n  For example,  n=50 million driving in Italy  death rate/week =0.000002  the number of deaths is bin(n, )  Or approximately Poisson with µ=n=100
  • 40. 40 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.4 Overdispersion  A key feature of the Poisson distribution is that its variance equals its mean.  Sample counts vary more when their mean is higher.  Overdispersion: Count observations often exhibit variability exceeding that predicted by the binomial or Poisson.
  • 41. 41 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.2.5 Connection between Poisson and Multinomial Distributions  Example, In Italy this next week, let  y1=# of people who die in automobile accidents  y2=number who die in airplane accidents  y3=number who die in railway accidents  (Y1, Y2, Y3) ~ independent Poisson ( )  The total ~ Poisson ( )  Here n is random variable rather than fixed  If n is given, (Y1, Y2, Y3) is no longer independent and Poisson, WHY
  • 42. 42 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference conditional distribution given that let ~ multinomial distribution
  • 43. 43 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference multinomial distribution vs. Poisson distribution  Many categorical data analyses assume a multinomial distribution.  Such analyses usually have the same parameter estimates as those of analyses assuming a Poisson distribution, because of the similarity in the likelihood functions.
  • 44. 44 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.3 STATISTICAL INFERENCE FOR CATEGORICAL DATA(general)  Once you choose the distribution of the categorical variable, you need to estimate the parameters in the distribution  We first review general method  Point estimate  Confidence interval  Section 1.4 MLE for binomial  Section 1.5 MLE for multinominal
  • 45. 45 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Likelihood  Likelihood is a tool for summarizing the data's evidence about unknown parameters. Let us denote the unknown parameter(s) of a distribution generically by .  If we observe a random variable X = x from distribution f (x|), then the likelihood associated with x, l(|x), is simply the distribution f (x|) regarded as a function of  with x fixed.  For example, if we observe x from bin(n; ), the likelihood function is x n x x n x l            ) 1 ( ) | (   
  • 46. 46 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Likelihood  The formula for the likelihood looks similar algebraically to the f (x|) but the distinction should be clear!  The distribution function is defined over the support of discrete variable x with  given, whereas the likelihood is defined over the continuous parameter space for .  Consequently, a graph of the likelihood usually looks different from a graph of the probability distribution.  In most cases, we work with loglikelihood ) | ( log ) | ( x l x L    ) 1 log( ) ( log ) 1 log( ) ( log log ) | ( log ) | (                         x n x x n x x n x l x L
  • 47. 47 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Loglikelihood function bin(5,) and we observe x=0, x=1, and x=2 0 0.5 1 -30 -20 -10 0  l (  | x ) l (  | x )=x log +(n-x) log (1-) 0 0.5 1 -40 -30 -20 -10 0  l (  | x ) l (  | x )=x log +(n-x) log (1-) 0 0.5 1 -20 -15 -10 -5 0  l (  | x ) l (  | x )=x log +(n-x) log (1-) 0 0.2 0.4 0.6 0.8 1 -7000 -6000 -5000 -4000 -3000 -2000 -1000  l (  | x ) l (  | x )=x log +(n-x) log (1-) bin(842+982,) x=842 (yes)
  • 48. 48 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Likelihood  In many problems of interest, we will derive our loglikelihood from a sample rather than from a single observation. If we observe an independent sample x1, x2, …, xn from a distribution f (x|), then the overall likelihood is the product of the individual likelihoods: and loglikelihood is       n i i n i i n x l x f x x l 1 1 1 ) | ( ) | ( ) , , | (              n i i n i i n i i n x L x f x f x x L 1 1 1 1 ) | ( ) | ( log ) | ( log ) , , | (     
  • 49. 49 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Log likelihood  In regular problems, as the total sample size n grows, the loglikelihood function does two things: (a) it becomes more sharply peaked around its maximum, and (b) its shape becomes nearly quadratic  the loglikelihood for a normal-mean problem is exactly quadratic.  That is, if we observe y1, . . . , yn from a normal population with known variance, the loglikelihood is or in multi-dimension
  • 50. 50 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference MLE (maximum likelihood estimation)  ML estimate for θ is the maximizer of L(θ) or, equivalently, the maximizer of l(θ). This is the parameter value under which the data observed have the highest probability of occurrence.  In regular problems, the ML estimate can be found by setting to zero the first derivative(s) of l(θ) with respect to θ.
  • 51. 51 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Transformations of parameters  If l(θ) is a likelihood and φ = g(θ) is a one-to-one function of the parameter with back-transformation θ = g−1 (φ), then we can express the likelihood in terms of φ as l( g−1 (φ) ).  Transformations may help us to improve the shape of the loglikelihood.  If the parameter space for θ has boundaries, we may want to choose a transformation to the entire real space.  For example, consider the binomial loglikelihood, L
  • 52. 52 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference binomial loglikelihood  If we apply the logit transformation  whose back-transformation is  the loglikelihood in terms of β is L
  • 53. 53 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  If we observe y = 1 from a binomial with n = 5, the loglikelihood in terms of β looks like this.
  • 54. 54 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  n=5; y=2;  pi=0:0.01:1  subplot(1,2,1)  plot(pi, y*log(pi)+(n-y)*log(1-pi), '-')  subplot(1,2,2)  beta=-3:0.1:3  plot(beta, y*beta+n*log(1./(1+exp(beta))), '-') 0 0.2 0.4 0.6 0.8 1 -14 -12 -10 -8 -6 -4 -2 -3 -2 -1 0 1 2 3 -10 -9 -8 -7 -6 -5 -4 -3
  • 55. 55 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  Transformations do not affect the location of the maximum-likelihood (ML) estimate.  If l(θ) is maximized at ˆθ, then l(φ) is maximized at ˆφ = g(ˆθ).
  • 56. 56 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference score function  A first derivative of L(θ) with respect to θ is called a score function or simply a score.  In a one-parameter problem, the score function from an independent sample y1, . . . , yn is where is the score contribution for yi.  The ML estimate is usually the solution of the likelihood equation L’(θ)=0. L
  • 57. 57 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Mean of the score function.  A well known property of the score is that it has mean zero.  The score is an expression that involves both the parameter θ and the data Y . Because it involves Y , we can take its expectation with respect to the data distribution f(y|θ). The expected score is no longer a function of Y , but it’s still a function of θ. If we evaluate this expected score at the “true value” of θ— that is, at the same value of θ assumed when we took the expectation—we get zero: If certain differentiability conditions are met, the integral may be rewritten as
  • 58. 58 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  For example, in the case of the binomial proportion, we have which is zero because E(Y ) = n.  If we apply a one-to-one transformation to the parameter φ = g(θ), then the score function with respect to the new parameter φ also has mean zero.
  • 59. 59 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Estimating functions.  This property of the score function—that it has an expectation of zero when evaluated at the true parameter θ—is a key to the modern theory of statistical estimation.  In the original theory of likelihood-based estimation, as developed by R.A. Fisher and others, the ML estimate ˆθ is viewed as the value of the parameter that, under the parametric model, that makes the observed data most likely.  statisticians have begun to view ˆθ as the solution the score equation(s). That is, we now often view an ML estimate as the solution to L’(θ)=0
  • 60. 60 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference estimating equations  Any function of the data and the parameters having mean zero at the true θ has this property as well. Functions having the mean-zero property are called estimating functions.  Setting the estimating functions to zero is called the estimating equations.  In the case of the binomial proportion, for example, Y − n is a mean-zero estimating function, and so is −1 [Y − n] .
  • 61. 61 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Information and variance estimation.  The variance of the score is known as the Fisher information. In the case of a single parameter, the Fisher information is  If θ has k parameters, the Fisher information is the k x k covariance matrix for scores
  • 62. 62 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  Like the score function, the Fisher information is also a function of θ. So we can evaluate it at any given value of θ.  Notice that i(θ) as we defined it is the square of a sum which, in many problems, can be messy.  To actually compute the Fisher information, we usually make use of the well known identity
  • 63. 63 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  In the multiparameter case, l(θ) is the k x k matrix of second derivatives whose (l,m)th element is
  • 64. 64 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  why we care about the Fisher information?  it provides us with a way (several ways, actually) of assessing the uncertainty in the ML estimate.  It is well known that, in regular problems, ˆθ is approximately normally distributed about the true θ with variance given by the reciprocal (or, in the multiparameter case, the matrix inverse) of the Fisher information.
  • 65. 65 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference two common ways to approximate the variance of ˆθ.  The first way is to plug ˆθ into i(θ) and invert, this is commonly called the “expected information.”  The second way is to invert (minus one times) the actual second derivative of the loglikelihood at θ = ˆθ, this is commonly called the “observed information.”
  • 66. 66 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.3.2 Likelihood Function and ML Estimate for Binomial Parameter  The binomial log likelihood is  Differentiating with respect to  yields  Equating this to 0 gives the likelihood equation, which has solution the sample proportion of successes for the n trials.
  • 67. 67 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  Calculating , taking the expectation, and we get  Thus, the asymptotic variance of is  Actually, from E(Y)=n and var(Y)=n (1- ), the distribution if =Y/n has mean and standard error
  • 68. 68 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Likelihood function and MLE summary  We use maximum likelihood estimate (MLE)  asymptotically normal  asymptotically consistent  asymptotically efficient  Likelihood function  probability of those data, treated as a function of the unknown parameter.  maximum likelihood (ML) estimate  parameter value that maximizes this function
  • 69. 69 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference MLE and its variance  If y1; y2; … ; yn is a random sample from distribution f(y|), then the score function is  In regular problems, we can find the ML estimate by setting the score function(s) to zero and solving for .  The equations L’(θ)=0 are called the score equations. More generally, they can be called estimating equations because their solution is the estimate for θ.  We defined the Fisher information as the variance of the score function and
  • 70. 70 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.3.3 Wald–Likelihood Ratio–Score Test Triad  Three standard ways exist to use the likelihood function to perform large-sample inference.  Wald test  Score test  Likelihood ratio test  We introduce these for a significance test of a null hypothesis H0: and then discuss their relation to interval estimation.  They all exploit the large-sample normality of ML estimators.
  • 71. 71 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Wald test  With nonnull standard error SE of , the test statistic has an approximate standard normal distribution when  One- or two-sided P-value by z.  Or z2 has a chi-squared null distribution with 1 df The P-value is then the right-tailed chi-squared probability above the observed value This type of statistic, using the nonnull standard error, is called a Wald statistic (Wald 1943).
  • 72. 72 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Wald test  For an .05-level two-side test, we reject H0 if  Equivalently, if where 3.84 is the 95th percentile of 2 (1). 96 . 1 ˆ 0   SE   2 2 0 96 . 1 84 . 3 ) ˆ var( ) ˆ (      
  • 73. 73 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Wald test  The multivariate extension for the Wald test of has test statistic where is the inverse matrix of Information matrix.  W is an asymptotic chi-squared distribution with df = rank of .  Wald test is not invariant to transformations. That is, a Wald test on a transformed parameter φ= g() may yield a different p-value than a Wald test on the original scale.
  • 74. 74 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  uses the likelihood function through the ratio of two maximizations: (1). the maximum over the possible parameter values under H0 (2). the maximum over the larger set of parameter values permitting H0 or an alternative Ha to be true.  The likelihood-ratio test statistic equals where L0 and L1 denote the maximized log-likelihood functions.  is 2 distribution with df=dim(Ha U H0)-dim(H0)  Reject H0 if > 2 (=0.05) Likelihood ratio test
  • 75. 75 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  The score test is based on the slope and expected curvature of the log-likelihood function L() at the null value 0.  Score function The value tends to be larger in absolute value when is farther from 0.  Score statistic has an approximate standard normal null distribution.  The chi-squared form of the score statistic is Score test ) / ) ( ( ) ( 2 0 2 0       L E u ) / ) ( ( ) ( 2 0 2 0 2       L E u
  • 76. 76 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Why is score statistic reasonable?  Recall that the mean of the score is zero and its variance is equal to the Fisher information.  In a large sample, the score will also be approximately normally distributed because it's a sum of iid random variables.  Therefore, it will behave like a squared standard normal [2 (1)] if H0 is true.
  • 77. 77 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Wald–Likelihood Ratio–Score Test
  • 78. 78 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  The three test statistics - Wald, LR and score are asymptotically equivalent.  The differences among them vanish in large samples if the null hypothesis is true.  If the null hypothesis is false, they may take very different values. But in that case, all the test statistics will be large, the p-values will be essentially zero, and they will all lead us to reject H0.  Score test does not require to calculate MLE.  LR test is scale-invariant.  LR statistic uses the most information of the three types of test statistic and is the most versatile.
  • 79. 79 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.3.4 Constructing confidence intervals  In practice, it is more informative to construct confidence intervals for parameters than to test hypotheses about their values.  For any of the three test methods, a confidence interval results from inverting the test. For instance, a 95% confidence interval for is the set of 0 for which the test of H0: has a P-value exceeding 0.05.  Let denote the z-score from the standard normal distribution having right-tailed probability a; this is the 100(1-a) percentile of that distribution.  Let denote the 100(1-a) percentile of the chi- squared distribution with degrees of freedom df.
  • 80. 80 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Tests and Confidence Intervals At significant level , reject H0: , if 2 / 0 ˆ    z SE   100(1-)% confidence interval 2 / 0 ˆ    z SE    } : { 0   } : { 0  0   
  • 81. 81 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Confidence Intervals  The Wald confidence interval is most common in practice because it is simple to construct using ML estimates and standard errors reported by statistical software.  The likelihood-ratio-based interval is becoming more widely available in software and is preferable for categorical data with small to moderate n.  For the best known statistical model, regression for a normal response, the three types of inference necessarily provide identical results.
  • 82. 82 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.4 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS  Recall log likelihood  Score function  MLE SE= ) 1 log( ) ( log ) | (        y n y y L ) 1 /( ) ( / ) (        y n y u
  • 83. 83 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.4.1 Tests about a Binomial Parameter  Since H0 has a single parameter, we use the normal rather than chi-squared forms of Wald and score test statistics. They permit tests against one-sided as well as two-sided alternatives.  Wald statistic  Evaluating the binomial score and information at 0  The normal form of the score statistic simplifies to
  • 84. 84 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference  binomial log-likelihood under H0  Under Ha  The likelihood-ratio test statistic or has an asymptotic chi-squared distribution with df=1. ) 1 log( ) ( log 0 0 0       y n y L ) ˆ 1 log( ) ( ˆ log 1       y n y L
  • 85. 85 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Test  At significant level , two sided, reject H0, if (Wald test) (Score test) (LR test)   
  • 86. 86 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.4.2 Confidence Intervals for a Binomial Parameter  Inverting the Wald test,  Unfortunately, it performs poorly unless n is very large  The actual coverage probability usually falls below the nominal confidence coefficient, much below when  is near 0 or 1.  An adjustment is needed. (Problem 1.24)
  • 87. 87 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference Simulation to calculate coverage prob. %let n=1000; %let pi=0.5; %let simuN=10000; data simu; drop i; do i=1 to &simuN; k=RAND('BINOMIAL',&pi,&n); output; end; run; data res; set simu; pihat=k/&n; lci=pihat-1.96*sqrt(pihat*(1-pihat)/&n); uci=pihat+1.96*sqrt(pihat*(1-pihat)/&n); if lci>&pi or uci<&pi then cover=0; else cover=1; proc sql; select sum(cover)/&simuN as coverageprobabilty from res;
  • 88. 88 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference it performs poorly if 1) n is small; 2) pi near 0 or 1.  %let n=1000; %let pi=0.5; %let simuN=10000;  %let n=20; %let pi=0.5; %let simuN=10000;  %let n=20; %let pi=0.1; %let simuN=10000;  %let n=20; %let pi=0.9; %let simuN=10000;
  • 89. 89 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference An adjustment is needed. (Problem 1.24)
  • 90. 90 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference %let n=20; %let pi=0.5; %let simuN=10000; data simu; drop i; do i=1 to &simuN; k=RAND('BINOMIAL',&pi,&n); output; end; run; data res; set simu; pihat=(k+1.96**2/2)/(&n+1.96*1.96); lci=pihat-1.96*sqrt(pihat*(1-pihat)/(&n+1.96*1.96)); uci=pihat+1.96*sqrt(pihat*(1-pihat)/(&n+1.96*1.96)); if lci>&pi or uci<&pi then cover=0; else cover=1; proc sql; select sum(cover)/&simuN as coverageprobabilty from res;
  • 91. 91 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference score confidence interval  The score confidence interval contains 0 values for which  Its endpoints are the 0 solutions to the equations  It is quadratic in 0. This interval is
  • 92. 92 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference LR-based confidence interval  The likelihood-ratio-based confidence interval is more complex computationally, but simple in principle.  It is the set of 0 for which the likelihood ratio test has a P-value exceeding  .  Equivalently, it is the set of 0 for which double the log likelihood drops by less than from its value at the ML estimate. 
  • 93. 93 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.4.3 Proportion of Vegetarians Example