SlideShare a Scribd company logo
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1
Quadratic forms
Cochran’s theorem,
degrees of freedom,
and all that…
Dr. Frank Wood
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 2
Why We Care
• Cochran’s theorem tells us about the distributions of
partitioned sums of squares of normally distributed
random variables.
• Traditional linear regression analysis relies upon
making statistical claims about the distribution of
sums of squares of normally distributed random
variables (and ratios between them)
– i.e. in the simple normal regression model
• Where does this come from?
SSE/σ2
= (Yi − ˆYi)2
∼ χ2
(n − 2)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 3
Outline
• Review some properties of multivariate
Gaussian distributions and sums of squares
• Establish the fact that the multivariate
Gaussian sum of squares is χ(n) distributed
• Provide intuition for Cochran’s theorem
• Prove a lemma in support of Cochran’s
theorem
• Prove Cochran’s theorem
• Connect Cochran’s theorem back to matrix
linear regression
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 4
Preliminaries
• Let Y1, Y2, …, Yn be N(µi,σi
) random
variables.
• As usual define
• Then we know that each Zi ~ N(0,1)
Zi = Yi−µi
σi
From Wackerly et al, 306
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 5
Theorem 0 : Statement
• The sum of squares of n N(0,1) random
variables is χ distributed with n degrees of
freedom
( n
i=1 Z2
i ) ∼ χ2
(n)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 6
Theorem 0: Givens
• Proof requires knowing both
1.
2.If Y1, Y2, …., Yn are independent random variables with
moment generating functions mY1
(t), mY2
(t), … mYn
(t),
then when U = Y1 + Y2 + … Yn
and from the uniqueness of moment generating functions
that mU(t) fully characterizes the distribution of U
Z2
i ∼ χ2
(ν), ν = 1 or equivalently
Z2
i ∼ Γ(ν/2, 2), ν = 1
mU (t) = mY1
(t) × mY2
(t) × . . . × mYn
(t)
Homework, midterm ?
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 7
Theorem 0: Proof
• The moment generating function for a χ(ν)
distribution is (Wackerley et al, back cover)
• The moment generating function for
is (by given prerequisite)
mZ2
i
(t) = (1 − 2t)ν/2
, where here ν = 1
V = ( n
i=1 Z2
i )
mV (t) = mZ2
1
(t) × mZ2
2
(t) × · · · × mZ2
n
(t)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 8
Theorem 0: Proof
• But
is just
• Which is itself, by inspection, just the moment
generating function for a χ(n) random variable
which implies (by uniqueness) that
V = ( n
i=1 Z2
i ) ∼ χ2
(n)
mV (t) = mZ2
1
(t) × mZ2
2
(t) × · · · × mZ2
n
(t)
mV (t) = (1 − 2t)1/2
× (1 − 2t)1/2
× · · · × (1 − 2t)1/2
mV (t) = (1 − 2t)n/2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 9
Quadratic Forms and Cochran’s Theorem
• Quadratic forms of normal random variables
are of great importance in many branches of
statistics
– Least squares
– ANOVA
– Regression analysis
– etc.
• General idea
– Split the sum of the squares of observations into
a number of quadratic forms where each
corresponds to some cause of variation
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 10
Quadratic Forms and Cochran’s Theorem
• The conclusion of Cochran’s theorem is that,
under the assumption of normality, the
various quadratic forms are independent and
χ distributed.
• This fact is the foundation upon which many
statistical tests rest.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 11
Preliminaries: A Common Quadratic Form
• Let
• Consider the (important) quadratic form that
appears in the exponent of the normal density
• In the special case of µ = 0 and Λ = I this
reduces to x’x which by what we just proved
we know is χ

(n) distributed
• Let’s prove that this holds in the general case
x ∼ N(µ, Λ)
(x − µ)′
Λ−1
(x − µ)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 12
Lemma 1
• Suppose that x~N(µ, Λ) with |Λ| > 0 then
(where n is the dimension of x)
• Proof: Set y = Λ-/(x-µ) then
– E(y) = 0
– Cov(y) = Λ-/ Λ Λ-/ = I
– That is y ~N(0,I) and thus
(x − µ)′
Λ−1
(x − µ) ∼ χ2
(n)
(x − µ)′
Λ−1
(x − µ) = y′
y ∼ χ2
(n)
Note: this is sometimes called “sphering” data
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 13
The Path
• What do we have?
• Where are we going?
– (Cochran’s Theorem) Let X1, X2, …, Xn be
independent N(0,σ)-distributed random
variables, and suppose that
Where Q1, Q2, …, Qk are positive semi-definite
quadratic forms in the random variables X1, X2,
…, Xm, that is,
(x − µ)′
Λ−1
(x − µ) = y′
y ∼ χ2
(n)
n
i=1 X2
i = Q1 + Q2 + . . . + Qk
Qi = X′
AiX, i = 1, 2, . . . , k
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 14
Cochran’s Theorem Statement
Set Rank Ai = ri, i=1,2,…, k. If
• Then
1. Q1, Q2, …, Qk are independent
2. Qi ~ σχ(ri)
r1 + r2 + . . . + rk = n
Reminder: the rank of a matrix is the number
of linearly independent rows / columns in the matrix,
or, equivalently, the number of its non-zero eigenvalues
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 15
Closing the Gap
• We start with a lemma that will help us prove
Cochran’s theorem
• This lemma is a linear algebra result
• We also need to know a couple results
regarding linear transformations of normal
vectors
– We attend to those first.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 16
Linear transformations
• Theorem 1: Let X be a normal random vector.
The components of X are independent iff they
are uncorrelated.
– Demonstrated in class by setting Cov(Xi, Xj) = 0
and then deriving product form of joint density
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 17
Linear transformations
• Theorem 2: Let X ~ N(µ, Λ) and set Y = C’X
where the orthogonal matrix C is such that
C’Λ C = D. Then Y ~ N(C’µ, D); the
components of Y are independent; and Var Yk
= λk, k =1…n, where λ, λ,…, λn are the
eigenvalues of Λ
Look up singular value decomposition.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 18
Orthogonal transforms of iid N(0,σ) variables
• Let X ~ N(µ, σ I ) where σ > 0 and set Y =
CX where C is an orthogonal matrix. Then
Cov{Y} = CσIC’ = σ I
• This leads to
• Theorem 2: Let X ~ N(µ, σ I ) where σ > 0,
let C be an arbitrary orthogonal matrix, and
set Y=CX. The Y ∼ N(Cµ, σI); in particular,
Y1, Y2, …, Yn are independent normal random
variables with the same variance σ.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 19
Where we are
• Now we can transform N(µ, ∑) random
variables into N(0,D) random variables.
• We know that orthogonal transformations of a
random vector X ~ N(µ,σI) results in a
transformed vector whose elements are still
independent
• The preliminaries are over, now we proceed
to proving a lemma that forms the backbone
of Cochran’s theorem.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 20
Lemma 1
• Let x1, x2, …, xn be real numbers. Suppose
that ∑ xi
2 can be split into a sum of positive
semidefinite quadratic forms, that is,
where Qi = x’Aix and (rank Qi = ) rank Ai = ri,
i=1,2,…,k. If ∑ ri = n then there exists an
orthogonal matrix C such that, with x = Cy we
have…
n
i=1 x2
i = Q1 + Q2 + . . . + Qk
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 21
Lemma 2 cont.
• Remark: Note that different quadratic forms contain
different y-variables and that the number of terms in
each Qi equals the rank, ri, of Qi
Q1 = y2
1 + y2
2 + . . . + y2
r1
Q2 = y2
r1+1 + y2
r1+2 + . . . + y2
r1+r2
Q3 = y2
r1+r2+1 + y2
r1+r2+2 + . . . + y2
r1+r2+r3
...
Qk = y2
n−rk+1 + y2
n−rk+2 + . . . + y2
n
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 22
What’s the point?
• We won’t construct this matrix C, it’s just
useful for proving Cochran’s theorem.
• We care that
– The yi
2’s end up in different sums – we’ll use this
to prove independence of the different quadratic
forms.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 23
Proof
• We prove the n=2 case. The general case is
obtained by induction. [Gut 95]
• For n=2 we have
where A1 and A2 are positive semi-definite
matrices with ranks r1 and r2 respectively and
r1 + r2 = n
Q = n
i=1 x2
i = x′
A1x + x′
A2x (= Q1 + Q2)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 24
Proof: Cont.
• By assumption there exists an orthogonal matrix C
such that
where D is a diagonal matrix, the diagonal elements
of which are the eigenvalues of A1; λ, λ, …, λn.
• Since Rank(A1) =r1 then r1 eigenvalues are positive
and n-r1 eigenvalues equal zero.
• Suppose without restriction that the first r1
eigenvalues are positive and the rest are zero.
C′
A1C = D
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 25
Proof : Cont
• Set
and remember that when C is an orthogonal
matrix that
then
x = Cy
x′
x = (Cy)′
Cy = y′
C′
Cy = y′
y
Q = y2
i = r1
i=1 λiy2
i + y′
C′
A2Cy
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 26
Proof : Cont
• Or, rearranging terms slightly and expanding
the second matrix product
• Since the rank of the matrix A2 equals
r2 ( = n-r1) we can conclude that
which proves the lemma for the case n=2.
ri
i=1(1 − λi)y2
i +
n
i=r1+1 y2
i = y′
C′
A2Cy
λ1 = λ2 = . . . = λr1
= 1
Q1 = r1
i=1 y2
i and Q2 = n
i=r1+1 y2
i
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 27
What does this mean again?
• This lemma only has to do with real numbers,
not random variables.
• It says that if ∑ xi
2 can be split into a sum of
positive semi-definite quadratic forms then
there is a orthogonal (projection) matrix x=Cy
(or C’x = y) that makes each of the quadratic
forms have some very nice properties,
foremost of which is that
– Each yi appears in only one resulting sum of
squares.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 28
Cochran’s Theorem
Let X1, X2, …, Xn be independent N(0,σ)-distributed
random variables, and suppose that
Where Q1, Q2, …, Qk are positive semi-definite quadratic
forms in the random variables X1, X2, …, Xm, that is,
Set Rank Ai = ri, i=1,2,…, k. If
then
1. Q1, Q2, …, Qk are independent
2. Qi ~ σχ(ri)
n
i=1 X2
i = Q1 + Q2 + . . . + Qk
Qi = X′
AiX, i = 1, 2, . . . , k
r1 + r2 + . . . + rk = n
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 29
Proof [from Gut 95]
• From the previous lemma we know that there exists
an orthogonal matrix C such that the transformation
X=CY yields
• But since every Y2 occurs in exactly one Qj and the
Yi’s are all independent N(0, σ) RV’s (because C is
an orthogonal matrix) Cochran’s theorem follows.
Q1 = Y 2
1 + Y 2
2 + . . . + Y 2
r1
Q2 = Y 2
r1+1 + Y 2
r1+2 + . . . + Y 2
r1+r2
Q3 = Y 2
r1+r2+1 + Y 2
r1+r2+2 + . . . + Y 2
r1+r2+r3
...
Qk = Y 2
n−rk+1 + Y 2
n−rk+2 + . . . + Y 2
n
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 30
Huh?
• Best to work an example to understand why
this is important
• Let’s consider the distribution of a sample
variance (not regression model yet). Let Yi,
i=1…n be samples from Y ~ N(0, σ). We can
use Cochran’s theorem to establish the
distribution of the sample variance (and it’s
independence from the sample mean).
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 31
Example
• Recall form of SSTO for regression model
and note that the form of SSTO = (n-1) s2{Y}
• Recognize that this can be rearranged and
the re-expressed in matrix form
SSTO = (Yi − ¯Y )2
= Y 2
i −
( Yi)2
n
Y 2
i = (Yi − ¯Y )2
+
( Yi)2
n
Y′
IY = Y′
(I − 1
n J)Y + Y′
( 1
n J)Y
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 32
Example cont.
• From earlier we know that
but we can read off the rank of the quadratic
form as well (rank(I) = n)
• The ranks of the remaining quadratic forms
can be read off too (with some linear algebra
reminders)
Y′
IY ∼ σ2
χ2
(n)
Y′
Y = Y′
(I − 1
n J)Y + Y′
( 1
n J)Y
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 33
Linear Algebra Reminders
• For a symmetric and idempotent matrix A,
rank(A) = trace(A), the number of non-zero
eigenvalues of A.
– Is (1/n)J symmetric and idempotent?
– How about (I-(1/n)J)?
• trace(A + B) = trace(A) + trace(B)
• Assuming they are we can read off the ranks
of each quadratic form
Y′
IY = Y′
(I − 1
n J)Y + Y′
( 1
n J)Y
rank: n rank: n-1 rank: 1
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 34
Cochran’s Theorem Usage
• Cochran’s theorem tells us, immediately, that
because each of the quadratic forms is χ
distributed with degrees of freedom given by
the rank of the corresponding quadratic form
and each sum of squares is independent of
the others.
Y′
IY = Y′
(I − 1
n J)Y + Y′
( 1
n J)Y
rank: n rank: n-1 rank: 1
Y 2
i ∼ σ2
χ2
(n), (Yi − ¯Y )2
∼ σ2
χ2
(n − 1),
( Yi)2
n ∼ σ2
χ2
(1)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 35
What about regression?
• Quick comment: in the preceding, one can
think about having modeled the population
with a single parameter model – the
parameter being the mean. The number of
degrees of freedom in the sample variance
sum of squares is reduced by the number of
parameters fit in the linear model (one, the
mean)
• Now – regression.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 36
Rank of ANOVA Sums of Squares
• Slightly stronger version of Cochran’s
theorem needed (will assume it exists) to
prove the following claim(s).
SSTO = Y′
[I −
1
n
J]Y
SSE = Y′
(I − H)Y
SSR = Y′
[H −
1
n
J]Y
Rank
n-1
n-p
p-1
good
midterm
question
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 37
Distribution of General Multiple Regression ANOVA Sums of Squares
• From Cochran’s theorem, knowing the ranks
of
gives you this immediately
SSTO ∼ σ2
χ2
(n − 1)
SSE ∼ σ2
χ2
(n − p)
SSR ∼ σ2
χ2
(p − 1)
SSTO = Y′
[I −
1
n
J]Y
SSE = Y′
(I − H)Y
SSR = Y′
[H −
1
n
J]Y
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 38
F Test for Regression Relation
• Now the test of whether there is a regression
relation between the response variable Y and
the set of X variables X1, …, Xp-1 makes more
sense
• The F distribution is defined to be the ratio of
χ distributions that have themselves been
normalized by their number of degrees of
freedom.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 39
F Test Hypotheses
• If we want to choose between the alternatives
– H0 : β = β = β … = βp- = 0
– H1 : not all βk k=1…n equal zero
• We can use the defined test statistic
• The decision rule to control the Type I error at
α is
F∗
= MSR
MSE ∼
σ2χ2(p−1)
p−1
σ2χ2(n−p)
n−p
If F∗
≤ F(1 − α; p − 1, n − p), conclude H0
If F∗
> F(1 − α; p − 1, n − p), conclude Ha

More Related Content

PPT
SOME PROPERTIES OF ESTIMATORS - 552.ppt
DOCX
Probability distribution
PDF
Random variable,Discrete and Continuous
PPTX
ders 7.1 VAR.pptx
PPTX
Probability distributions & expected values
PPTX
Properties of estimators (blue)
PPTX
Probability
PDF
Linear regression theory
SOME PROPERTIES OF ESTIMATORS - 552.ppt
Probability distribution
Random variable,Discrete and Continuous
ders 7.1 VAR.pptx
Probability distributions & expected values
Properties of estimators (blue)
Probability
Linear regression theory

What's hot (20)

PPT
Simple Linier Regression
PPTX
Lesson 2 stationary_time_series
PPTX
Metric space
PPT
PROBABILITY AND IT'S TYPES WITH RULES
PPT
Least square method
PPTX
Random variables
PPT
Unit Root Test
PPTX
Determinants
PPTX
Regression analysis.
PPTX
Exponential probability distribution
PPTX
Distributed lag model
PPTX
Bayes' theorem
PPT
Discrete mathematics counting and logic relation
PDF
Basic concepts of probability
PDF
Probability mass functions and probability density functions
PPTX
MATRICES AND ITS TYPE
PPTX
Covariance
PPTX
Probability
Simple Linier Regression
Lesson 2 stationary_time_series
Metric space
PROBABILITY AND IT'S TYPES WITH RULES
Least square method
Random variables
Unit Root Test
Determinants
Regression analysis.
Exponential probability distribution
Distributed lag model
Bayes' theorem
Discrete mathematics counting and logic relation
Basic concepts of probability
Probability mass functions and probability density functions
MATRICES AND ITS TYPE
Covariance
Probability
Ad

Similar to Lecture cochran (20)

PPTX
Ap calculus warm up 3.12.13
PDF
Random Matrix Theory and Machine Learning - Part 3
PPTX
First order linear differential equation
PDF
What are free particles in quantum mechanics
PDF
Pydata Katya Vasilaky
PDF
Numerical solution of boundary value problems by piecewise analysis method
PDF
Existence results for fractional q-differential equations with integral and m...
PDF
Existence results for fractional q-differential equations with integral and m...
PDF
Can we estimate a constant?
PPTX
3.1 Ordinary Differential equation with Higher Order
PPTX
Unit_3_Lecture_Notes_PPT.pptxon differential Equation
PPTX
Seismic data processing lecture 3
PDF
Markov chain Monte Carlo methods and some attempts at parallelizing them
PPTX
Lecture 1-EPMP673-Functions and Applications to Economic Problems I.pptx
PDF
Lecture_note2.pdf
PDF
4419025.pdf
PDF
A Novel Methodology for Designing Linear Phase IIR Filters
PDF
Supporting Vector Machine
PPTX
Quick run through on classical mechancis and quantum mechanics
PPTX
Dimension Reduction Introduction & PCA.pptx
Ap calculus warm up 3.12.13
Random Matrix Theory and Machine Learning - Part 3
First order linear differential equation
What are free particles in quantum mechanics
Pydata Katya Vasilaky
Numerical solution of boundary value problems by piecewise analysis method
Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...
Can we estimate a constant?
3.1 Ordinary Differential equation with Higher Order
Unit_3_Lecture_Notes_PPT.pptxon differential Equation
Seismic data processing lecture 3
Markov chain Monte Carlo methods and some attempts at parallelizing them
Lecture 1-EPMP673-Functions and Applications to Economic Problems I.pptx
Lecture_note2.pdf
4419025.pdf
A Novel Methodology for Designing Linear Phase IIR Filters
Supporting Vector Machine
Quick run through on classical mechancis and quantum mechanics
Dimension Reduction Introduction & PCA.pptx
Ad

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Lesson notes of climatology university.
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Types and Its function , kingdom of life
PDF
Anesthesia in Laparoscopic Surgery in India
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Sports Quiz easy sports quiz sports quiz
Pharma ospi slides which help in ospi learning
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Lesson notes of climatology university.
O7-L3 Supply Chain Operations - ICLT Program
VCE English Exam - Section C Student Revision Booklet
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Cell Structure & Organelles in detailed.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Types and Its function , kingdom of life
Anesthesia in Laparoscopic Surgery in India

Lecture cochran

  • 1. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Quadratic forms Cochran’s theorem, degrees of freedom, and all that… Dr. Frank Wood
  • 2. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 2 Why We Care • Cochran’s theorem tells us about the distributions of partitioned sums of squares of normally distributed random variables. • Traditional linear regression analysis relies upon making statistical claims about the distribution of sums of squares of normally distributed random variables (and ratios between them) – i.e. in the simple normal regression model • Where does this come from? SSE/σ2 = (Yi − ˆYi)2 ∼ χ2 (n − 2)
  • 3. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 3 Outline • Review some properties of multivariate Gaussian distributions and sums of squares • Establish the fact that the multivariate Gaussian sum of squares is χ(n) distributed • Provide intuition for Cochran’s theorem • Prove a lemma in support of Cochran’s theorem • Prove Cochran’s theorem • Connect Cochran’s theorem back to matrix linear regression
  • 4. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 4 Preliminaries • Let Y1, Y2, …, Yn be N(µi,σi ) random variables. • As usual define • Then we know that each Zi ~ N(0,1) Zi = Yi−µi σi From Wackerly et al, 306
  • 5. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 5 Theorem 0 : Statement • The sum of squares of n N(0,1) random variables is χ distributed with n degrees of freedom ( n i=1 Z2 i ) ∼ χ2 (n)
  • 6. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 6 Theorem 0: Givens • Proof requires knowing both 1. 2.If Y1, Y2, …., Yn are independent random variables with moment generating functions mY1 (t), mY2 (t), … mYn (t), then when U = Y1 + Y2 + … Yn and from the uniqueness of moment generating functions that mU(t) fully characterizes the distribution of U Z2 i ∼ χ2 (ν), ν = 1 or equivalently Z2 i ∼ Γ(ν/2, 2), ν = 1 mU (t) = mY1 (t) × mY2 (t) × . . . × mYn (t) Homework, midterm ?
  • 7. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 7 Theorem 0: Proof • The moment generating function for a χ(ν) distribution is (Wackerley et al, back cover) • The moment generating function for is (by given prerequisite) mZ2 i (t) = (1 − 2t)ν/2 , where here ν = 1 V = ( n i=1 Z2 i ) mV (t) = mZ2 1 (t) × mZ2 2 (t) × · · · × mZ2 n (t)
  • 8. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 8 Theorem 0: Proof • But is just • Which is itself, by inspection, just the moment generating function for a χ(n) random variable which implies (by uniqueness) that V = ( n i=1 Z2 i ) ∼ χ2 (n) mV (t) = mZ2 1 (t) × mZ2 2 (t) × · · · × mZ2 n (t) mV (t) = (1 − 2t)1/2 × (1 − 2t)1/2 × · · · × (1 − 2t)1/2 mV (t) = (1 − 2t)n/2
  • 9. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 9 Quadratic Forms and Cochran’s Theorem • Quadratic forms of normal random variables are of great importance in many branches of statistics – Least squares – ANOVA – Regression analysis – etc. • General idea – Split the sum of the squares of observations into a number of quadratic forms where each corresponds to some cause of variation
  • 10. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 10 Quadratic Forms and Cochran’s Theorem • The conclusion of Cochran’s theorem is that, under the assumption of normality, the various quadratic forms are independent and χ distributed. • This fact is the foundation upon which many statistical tests rest.
  • 11. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 11 Preliminaries: A Common Quadratic Form • Let • Consider the (important) quadratic form that appears in the exponent of the normal density • In the special case of µ = 0 and Λ = I this reduces to x’x which by what we just proved we know is χ  (n) distributed • Let’s prove that this holds in the general case x ∼ N(µ, Λ) (x − µ)′ Λ−1 (x − µ)
  • 12. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 12 Lemma 1 • Suppose that x~N(µ, Λ) with |Λ| > 0 then (where n is the dimension of x) • Proof: Set y = Λ-/(x-µ) then – E(y) = 0 – Cov(y) = Λ-/ Λ Λ-/ = I – That is y ~N(0,I) and thus (x − µ)′ Λ−1 (x − µ) ∼ χ2 (n) (x − µ)′ Λ−1 (x − µ) = y′ y ∼ χ2 (n) Note: this is sometimes called “sphering” data
  • 13. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 13 The Path • What do we have? • Where are we going? – (Cochran’s Theorem) Let X1, X2, …, Xn be independent N(0,σ)-distributed random variables, and suppose that Where Q1, Q2, …, Qk are positive semi-definite quadratic forms in the random variables X1, X2, …, Xm, that is, (x − µ)′ Λ−1 (x − µ) = y′ y ∼ χ2 (n) n i=1 X2 i = Q1 + Q2 + . . . + Qk Qi = X′ AiX, i = 1, 2, . . . , k
  • 14. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 14 Cochran’s Theorem Statement Set Rank Ai = ri, i=1,2,…, k. If • Then 1. Q1, Q2, …, Qk are independent 2. Qi ~ σχ(ri) r1 + r2 + . . . + rk = n Reminder: the rank of a matrix is the number of linearly independent rows / columns in the matrix, or, equivalently, the number of its non-zero eigenvalues
  • 15. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 15 Closing the Gap • We start with a lemma that will help us prove Cochran’s theorem • This lemma is a linear algebra result • We also need to know a couple results regarding linear transformations of normal vectors – We attend to those first.
  • 16. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 16 Linear transformations • Theorem 1: Let X be a normal random vector. The components of X are independent iff they are uncorrelated. – Demonstrated in class by setting Cov(Xi, Xj) = 0 and then deriving product form of joint density
  • 17. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 17 Linear transformations • Theorem 2: Let X ~ N(µ, Λ) and set Y = C’X where the orthogonal matrix C is such that C’Λ C = D. Then Y ~ N(C’µ, D); the components of Y are independent; and Var Yk = λk, k =1…n, where λ, λ,…, λn are the eigenvalues of Λ Look up singular value decomposition.
  • 18. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 18 Orthogonal transforms of iid N(0,σ) variables • Let X ~ N(µ, σ I ) where σ > 0 and set Y = CX where C is an orthogonal matrix. Then Cov{Y} = CσIC’ = σ I • This leads to • Theorem 2: Let X ~ N(µ, σ I ) where σ > 0, let C be an arbitrary orthogonal matrix, and set Y=CX. The Y ∼ N(Cµ, σI); in particular, Y1, Y2, …, Yn are independent normal random variables with the same variance σ.
  • 19. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 19 Where we are • Now we can transform N(µ, ∑) random variables into N(0,D) random variables. • We know that orthogonal transformations of a random vector X ~ N(µ,σI) results in a transformed vector whose elements are still independent • The preliminaries are over, now we proceed to proving a lemma that forms the backbone of Cochran’s theorem.
  • 20. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 20 Lemma 1 • Let x1, x2, …, xn be real numbers. Suppose that ∑ xi 2 can be split into a sum of positive semidefinite quadratic forms, that is, where Qi = x’Aix and (rank Qi = ) rank Ai = ri, i=1,2,…,k. If ∑ ri = n then there exists an orthogonal matrix C such that, with x = Cy we have… n i=1 x2 i = Q1 + Q2 + . . . + Qk
  • 21. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 21 Lemma 2 cont. • Remark: Note that different quadratic forms contain different y-variables and that the number of terms in each Qi equals the rank, ri, of Qi Q1 = y2 1 + y2 2 + . . . + y2 r1 Q2 = y2 r1+1 + y2 r1+2 + . . . + y2 r1+r2 Q3 = y2 r1+r2+1 + y2 r1+r2+2 + . . . + y2 r1+r2+r3 ... Qk = y2 n−rk+1 + y2 n−rk+2 + . . . + y2 n
  • 22. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 22 What’s the point? • We won’t construct this matrix C, it’s just useful for proving Cochran’s theorem. • We care that – The yi 2’s end up in different sums – we’ll use this to prove independence of the different quadratic forms.
  • 23. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 23 Proof • We prove the n=2 case. The general case is obtained by induction. [Gut 95] • For n=2 we have where A1 and A2 are positive semi-definite matrices with ranks r1 and r2 respectively and r1 + r2 = n Q = n i=1 x2 i = x′ A1x + x′ A2x (= Q1 + Q2)
  • 24. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 24 Proof: Cont. • By assumption there exists an orthogonal matrix C such that where D is a diagonal matrix, the diagonal elements of which are the eigenvalues of A1; λ, λ, …, λn. • Since Rank(A1) =r1 then r1 eigenvalues are positive and n-r1 eigenvalues equal zero. • Suppose without restriction that the first r1 eigenvalues are positive and the rest are zero. C′ A1C = D
  • 25. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 25 Proof : Cont • Set and remember that when C is an orthogonal matrix that then x = Cy x′ x = (Cy)′ Cy = y′ C′ Cy = y′ y Q = y2 i = r1 i=1 λiy2 i + y′ C′ A2Cy
  • 26. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 26 Proof : Cont • Or, rearranging terms slightly and expanding the second matrix product • Since the rank of the matrix A2 equals r2 ( = n-r1) we can conclude that which proves the lemma for the case n=2. ri i=1(1 − λi)y2 i + n i=r1+1 y2 i = y′ C′ A2Cy λ1 = λ2 = . . . = λr1 = 1 Q1 = r1 i=1 y2 i and Q2 = n i=r1+1 y2 i
  • 27. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 27 What does this mean again? • This lemma only has to do with real numbers, not random variables. • It says that if ∑ xi 2 can be split into a sum of positive semi-definite quadratic forms then there is a orthogonal (projection) matrix x=Cy (or C’x = y) that makes each of the quadratic forms have some very nice properties, foremost of which is that – Each yi appears in only one resulting sum of squares.
  • 28. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 28 Cochran’s Theorem Let X1, X2, …, Xn be independent N(0,σ)-distributed random variables, and suppose that Where Q1, Q2, …, Qk are positive semi-definite quadratic forms in the random variables X1, X2, …, Xm, that is, Set Rank Ai = ri, i=1,2,…, k. If then 1. Q1, Q2, …, Qk are independent 2. Qi ~ σχ(ri) n i=1 X2 i = Q1 + Q2 + . . . + Qk Qi = X′ AiX, i = 1, 2, . . . , k r1 + r2 + . . . + rk = n
  • 29. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 29 Proof [from Gut 95] • From the previous lemma we know that there exists an orthogonal matrix C such that the transformation X=CY yields • But since every Y2 occurs in exactly one Qj and the Yi’s are all independent N(0, σ) RV’s (because C is an orthogonal matrix) Cochran’s theorem follows. Q1 = Y 2 1 + Y 2 2 + . . . + Y 2 r1 Q2 = Y 2 r1+1 + Y 2 r1+2 + . . . + Y 2 r1+r2 Q3 = Y 2 r1+r2+1 + Y 2 r1+r2+2 + . . . + Y 2 r1+r2+r3 ... Qk = Y 2 n−rk+1 + Y 2 n−rk+2 + . . . + Y 2 n
  • 30. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 30 Huh? • Best to work an example to understand why this is important • Let’s consider the distribution of a sample variance (not regression model yet). Let Yi, i=1…n be samples from Y ~ N(0, σ). We can use Cochran’s theorem to establish the distribution of the sample variance (and it’s independence from the sample mean).
  • 31. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 31 Example • Recall form of SSTO for regression model and note that the form of SSTO = (n-1) s2{Y} • Recognize that this can be rearranged and the re-expressed in matrix form SSTO = (Yi − ¯Y )2 = Y 2 i − ( Yi)2 n Y 2 i = (Yi − ¯Y )2 + ( Yi)2 n Y′ IY = Y′ (I − 1 n J)Y + Y′ ( 1 n J)Y
  • 32. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 32 Example cont. • From earlier we know that but we can read off the rank of the quadratic form as well (rank(I) = n) • The ranks of the remaining quadratic forms can be read off too (with some linear algebra reminders) Y′ IY ∼ σ2 χ2 (n) Y′ Y = Y′ (I − 1 n J)Y + Y′ ( 1 n J)Y
  • 33. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 33 Linear Algebra Reminders • For a symmetric and idempotent matrix A, rank(A) = trace(A), the number of non-zero eigenvalues of A. – Is (1/n)J symmetric and idempotent? – How about (I-(1/n)J)? • trace(A + B) = trace(A) + trace(B) • Assuming they are we can read off the ranks of each quadratic form Y′ IY = Y′ (I − 1 n J)Y + Y′ ( 1 n J)Y rank: n rank: n-1 rank: 1
  • 34. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 34 Cochran’s Theorem Usage • Cochran’s theorem tells us, immediately, that because each of the quadratic forms is χ distributed with degrees of freedom given by the rank of the corresponding quadratic form and each sum of squares is independent of the others. Y′ IY = Y′ (I − 1 n J)Y + Y′ ( 1 n J)Y rank: n rank: n-1 rank: 1 Y 2 i ∼ σ2 χ2 (n), (Yi − ¯Y )2 ∼ σ2 χ2 (n − 1), ( Yi)2 n ∼ σ2 χ2 (1)
  • 35. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 35 What about regression? • Quick comment: in the preceding, one can think about having modeled the population with a single parameter model – the parameter being the mean. The number of degrees of freedom in the sample variance sum of squares is reduced by the number of parameters fit in the linear model (one, the mean) • Now – regression.
  • 36. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 36 Rank of ANOVA Sums of Squares • Slightly stronger version of Cochran’s theorem needed (will assume it exists) to prove the following claim(s). SSTO = Y′ [I − 1 n J]Y SSE = Y′ (I − H)Y SSR = Y′ [H − 1 n J]Y Rank n-1 n-p p-1 good midterm question
  • 37. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 37 Distribution of General Multiple Regression ANOVA Sums of Squares • From Cochran’s theorem, knowing the ranks of gives you this immediately SSTO ∼ σ2 χ2 (n − 1) SSE ∼ σ2 χ2 (n − p) SSR ∼ σ2 χ2 (p − 1) SSTO = Y′ [I − 1 n J]Y SSE = Y′ (I − H)Y SSR = Y′ [H − 1 n J]Y
  • 38. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 38 F Test for Regression Relation • Now the test of whether there is a regression relation between the response variable Y and the set of X variables X1, …, Xp-1 makes more sense • The F distribution is defined to be the ratio of χ distributions that have themselves been normalized by their number of degrees of freedom.
  • 39. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 39 F Test Hypotheses • If we want to choose between the alternatives – H0 : β = β = β … = βp- = 0 – H1 : not all βk k=1…n equal zero • We can use the defined test statistic • The decision rule to control the Type I error at α is F∗ = MSR MSE ∼ σ2χ2(p−1) p−1 σ2χ2(n−p) n−p If F∗ ≤ F(1 − α; p − 1, n − p), conclude H0 If F∗ > F(1 − α; p − 1, n − p), conclude Ha