SlideShare a Scribd company logo
c (Claudia Czado, TU Munich) – 0 –
Lecture 10: Linear Mixed Models
(Linear Models with Random Effects)
Claudia Czado
TU München
c (Claudia Czado, TU Munich) – 1 –
Overview
West, Welch, and Galecki (2007)
Fahrmeir, Kneib, and Lang (2007) (Kapitel 6)
• Introduction
• Likelihood Inference for Linear Mixed Models
– Parameter Estimation for known Covariance Structure
– Parameter Estimation for unknown Covariance Structure
– Confidence Intervals and Hypothesis Tests
c (Claudia Czado, TU Munich) – 2 –
Introduction
So far: independent response variables, but often
• Clustered Data
– response is measured for each subject
– each subject belongs to a group of subjects (cluster)
Ex.:
– math scores of student grouped by classrooms (class room forms cluster)
– birth weigths of rats grouped by litter (litter forms cluster)
• Longitudinal Data
– response is measured at several time points
– number of time points is not too large (in contrast to time series)
Ex.: sales of a product at each month in a year (12 measurements)
c (Claudia Czado, TU Munich) – 3 –
Fixed and Random Factors/Effects
How can we extend the linear model to allow for such dependent data structures?
fixed factor = qualitative covariate (e.g. gender, agegroup)
fixed effect = quantitative covariate (e.g. age)
random factor = qualitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their number of cashiers were reported
10 supermarkets with 2 cashiers
5 supermarkets with 1 cashier
5 supermarkets with 5 cashiers

observed levels of random factor
“number of cashiers”
random effect = quantitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their size reported. These size values
are random samples from the population of size values of all supermarkets.
c (Claudia Czado, TU Munich) – 4 –
Modeling Clustered Data
Yij = response of j-th member of cluster i, i = 1, . . . , m, j = 1, . . . , ni
m = number of clusters
ni = size of cluster i
xij = covariate vector of j-th member of cluster i for fixed effects, ∈ Rp
β = fixed effects parameter, ∈ Rp
uij = covariate vector of j-th member of cluster i for random effects, ∈ Rq
γi = random effect parameter, ∈ Rq
Model:
Yij = xij
t
β
| {z }
fixed
+ uij
t
γi
| {z }
random
+ ǫij
|{z}
random
i = 1, . . . , m; j = 1, . . . , ni
c (Claudia Czado, TU Munich) – 5 –
Mixed Linear Model (LMM) I
Assumptions:
γi ∼ Nq(0, D), D ∈ Rq×q
ǫi :=


ǫi1
.
.
.
ǫini

 ∼ Nni
(0, Σi), Σi ∈ Rni×ni
γ1, . . . , γm, ǫ1, . . . , ǫm independent
D = covariance matrix of random effects γi
Σi = covariance matrix of error vector ǫi in cluster i
c (Claudia Czado, TU Munich) – 6 –
Mixed Linear Model (LMM) II
Matrix Notation:
Xi :=


xt
i1
.
.
.
xt
ini

 ∈ Rni×p
, Ui :=


ut
i1
.
.
.
ut
ini

 ∈ Rni×q
, Yi :=


Yi1
.
.
.
Yini

 ∈ Rni
⇒
Yi = Xiβ + Uiγi + ǫi i = 1, . . . , m
γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni
(0, Σi)
(1)
c (Claudia Czado, TU Munich) – 7 –
Modeling Longitudinal Data
Yij = response of subject i at j-th measurement, i = 1, . . . , m, j = 1, . . . , ni
ni = number of measurements for subject i
m = number of objects
xij = covariate vector of i-th subject at j-th measurement
for fixed effects β ∈ Rp
uij = covariate vector of i-th subject at j-th measurement
for random effects γi ∈ Rq
matrix notation
⇒
Yi = Xiβ + Uiγi + ǫi
γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni
(0, Σi)
Remark: The general form of the mixed linear model is the same for clustered
and longitudinal observations.
c (Claudia Czado, TU Munich) – 8 –
Matrix Formulation of the Linear Mixed Model
Y :=


Y1
.
.
.
Ym

 ∈ Rn
, where n :=
m
P
i=1
ni
X :=


X1
.
.
.
Xn

 ∈ Rn×p
, β ∈ Rp
G :=


D
...
D

 ∈ Rmq×mq
U :=




U1 0n1×q . . . 0n1×q
0n2×q U2
.
.
. ...
0nm×q Um



 ∈ Rn×(m·q)
, 0ni×q :=


0 . . . 0
.
.
. ...
0 . . . 0

 ∈ Rni×q
γ :=


γ1
.
.
.
γm

 ∈ Rm·q
, ǫ :=


ǫ1
.
.
.
ǫm,

 R :=


Σ1 0
...
0 Σm

 ∈ Rn×n
c (Claudia Czado, TU Munich) – 9 –
Linear Mixed Model (LMM) in matrix formulation
With this, the linear mixed model (1) can be rewritten as
Y = Xβ + Uγ + ǫ (2)
where

γ
ǫ

∼ Nmq+n

0
0

,

G 0mq×n
0n×mq R

Remarks:
• LMM (2) can be rewritten as two level hierarchical model
Y |γ ∼ Nn(Xβ + Uγ, R) (3)
γ ∼ Nmq(0, R) (4)
c (Claudia Czado, TU Munich) – 10 –
• Let Y = Xβ + ǫ∗
, where ǫ∗
:= Uγ + ǫ = U In×n

| {z }
A

γ
ǫ

(2)
⇒ ǫ∗
∼ Nn(0, V ), where
V = A

G 0
0 R

At
= U In×n


G 0
0 R
 
Ut
In×n

= UG R


Ut
In×n

= UGUt
+ R
Therefore (2) implies
Y = Xβ + ǫ∗
ǫ∗
∼ Nn(0, V )

(5) marginal model
• (2) or (3)+(4) implies (5), however (5) does not imply (3)+(4)
⇒ If one is only interested in estimating β one can use the ordinary
linear model (5)
If one is interested in estimating β and γ one has to use model (3)+(4)
c (Claudia Czado, TU Munich) – 11 –
Likelihood Inference for LMM:
1) Estimation of β and γ for known G and R
Estimation of β: Using (5), we have as MLE or weighted LSE of β
β̃ := Xt
V −1
X
−1
Xt
V −1
Y (6)
Recall: Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ), Σ known, Σ = Σ1/2
Σ1/2
t
⇒ Σ−1/2
Y = Σ−1/2
Xβ + Σ−1/2
ǫ
| {z } (7)
∼ Nn(0, Σ−1/2
ΣΣ−1/2t
| {z }
=In
)
⇒ LSE of β in (7) :
β̂ =

Xt
Σ−1/2t
Σ−1/2
X
−1
XΣ−1/2t
Σ−1/2
Y
= Xt
Σ−1
X
−1
Xt
Σ−1
Y (8)
This estimate is called the weighted LSE
Exercise: Show that (8) is the MLE in Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ)
c (Claudia Czado, TU Munich) – 12 –
Estimation of γ:
From (3) and (4) it follows that Y ∼ Nn(Xβ, V ) γ ∼ Nmq(0, G)
Cov(Y , γ) = Cov(Xβ + Uγ + ǫ, γ)
= Cov(Xβ, γ)
| {z }
=0
+U V ar(γ, γ)
| {z }
G
+ Cov(ǫ, γ)
| {z }
=0
= UG
⇒

Y
γ

∼ Nn+mq

Xβ
0

,

V UG
GUt
G

Recall: X =

Y
Z

∼ Np

µY
µZ

,

ΣY ΣY Z
ΣZY ΣZ

⇒ Z|Y ∼ N µZ|Y , ΣY |Z

with
µZ|Y = µZ + ΣZY Σ−1
Y (Y − µY ) , ΣZ|Y = ΣZ − ΣZY Σ−1
Y ΣY Z
E(γ|Y ) = 0 + GUt
V −10
(Y − Xβ) = GUt
V −1
(Y − Xβ) (9)
is the best linear unbiased predictor of γ (BLUP)
Therefore γ̃ := GUt
V −1
(Y − Xβ̃) is the empirical BLUP (EBLUP)
c (Claudia Czado, TU Munich) – 13 –
Joint maximization of log likelihood of (Y t
, γt
)t
with respect to (β, γ)
f(y, γ) = f(y|γ) · f(γ)
(3)+(4)
∝ exp{−1
2(y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ)}
exp{−1
2γt
G−1
γ}
⇒ ln f(y, γ) = −1
2(y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ)
−1
2 γt
G−1
γ
| {z }
penalty term for γ
+ constants ind. of (β, γ)
So it is enough to minimize
Q(β, γ) := (y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ) − γt
G−1
γ
= γt
R−1
γ − 2βt
Xt
R−1
y + 2βt
Xt
R−1
Uγ − 2γt
Ut
R−1
y
+βt
Xt
R−1
Xβ + γt
Ut
R−1
Uγ + γt
G−1
γ
c (Claudia Czado, TU Munich) – 14 –
Recall:
f(α) := αt
b =
n
P
j=1
αjbj
∂
∂αi
f(α) = bj,
∂
∂αf(α) = b
g(α) := αt
Aα =
P
i
P
j
αiαjaij
∂
∂αi
g(α) = 2αiaii +
n
P
j=1,j6=i
αjaij +
n
P
j=1,j6=i
αjaji = 2
n
P
j=1
αjaij = 2At
iα
∂
∂αg(α) = 2


At
1
.
.
.
At
n

 = 2Aα At
i is ith row of A
c (Claudia Czado, TU Munich) – 15 –
Mixed Model Equation
∂
∂β
Q(β, γ) = −2Xt
R−1
y + 2Xt
R−1
Uγ + 2Xt
R−1
Xβ
Set
= 0
∂
∂γ Q(β, γ) = −2Ut
R−1
Xβ − 2Ut
R−1
y + 2Ut
R−1
Uγ + 2G−1
γ
Set
= 0
⇔ Xt
R−1
Xe
β + Xt
R−1
U e
γ = Xt
R−1
y
Ut
R−1
Xe
β + (Ut
R−1
U + G−1
)e
γ = Ut
R−1
y
⇔

Xt
R−1
X Xt
R−1
U
Ut
R−1
U Ut
R−1
R + G−1
 
e
β
e
γ

=

Xt
R−1
y
Ut
R−1
y

(10)
c (Claudia Czado, TU Munich) – 16 –
Exercise: Show that e
β, e
γ defined by (8) and (9) respectively solve (10).
Define C := X U

, B :=

0 0
0 G−1

⇒ Ct
R−1
C =

Xt
Ut

R−1
X U

=

Xt
R−1
Ut
R−1

X U

=

Xt
R−1
X Xt
R−1
U
Ut
R−1
X Ut
R−1
U

⇒ (10) ⇔ (Ct
R−1
C + B)

e
β
e
γ

= Ct
R−1
y
⇔

e
β
e
γ

= (Ct
R−1
C + B)−1
Ct
R−1
y
c (Claudia Czado, TU Munich) – 17 –
2) Estimation for unknown covariance structure
We assume now in the marginal model (5)
Y = Xβ + ǫ∗
, ǫ∗
∼ Nn(0, V )
with V = UGUt
+ R, that G and R are only known up to the
variance parameter ϑ, i.e. we write
V (ϑ) = UG(ϑ)Ut
+ R(ϑ)
c (Claudia Czado, TU Munich) – 18 –
ML Estimation in extended marginal model
Y = Xβ + ǫ∗
, ǫ∗
∼ Nn(0, V (ϑ)) with V (ϑ) = UG(ϑ)Ut
+ R(ϑ)
loglikelihood for (β, ϑ):
l(β, ϑ) = −
1
2
{ln |V (ϑ)|+(y−Xβ)t
V (ϑ)−1
(y−Xβ)}+ const. ind. of β, ϑ (11)
If we maximize (11) for fixed ϑ with regard to β, we get
e
β(ϑ) := (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
y
Then the profile log likelihood is
lp(ϑ) := l(e
β(ϑ), ϑ)
= −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
Maximizing lp(ϑ) wrt to ϑ gives MLE ϑ̂ML. ϑ̂ML is however biased; this is
why one uses often restricted ML estimation (REML)
c (Claudia Czado, TU Munich) – 19 –
Restricted ML Estimation in extended marginal
model
Here we use for the estimation of ϑ the marginal log likelihood:
lR(ϑ) := ln(
Z
L(β, ϑ)dβ)
R
L(β, ϑ)dβ =
R 1
(2π)n/2|V (ϑ)|−1/2
+ exp{−1
2(y − Xβ)t
V (ϑ)−1
(y − Xβ)}dβ
Consider:
(y − Xβ)t
V (ϑ)−1
(y − Xβ) = βt
Xt
V (ϑ)−1
X
| {z }
A(ϑ)
β − 2yt
V (ϑ)−1
Xβ + yt
V (ϑ)−1
y
= (β − B(ϑ)y)t
A(ϑ)(β − B(ϑ)y) + yt
V (ϑ)−1
− yt
B(ϑ)t
A(ϑ)B(ϑ)y
where B(ϑ) := A(ϑ)−1
Xt
V (ϑ)−1
(Note that B(ϑ)t
A(ϑ) = V (ϑ)−1
XA(ϑ)−1
A(ϑ) = V (ϑ)−1
X )
c (Claudia Czado, TU Munich) – 20 –
Therefore we have
R
L(β, ϑ)dβ = |V (ϑ)|−1/2
(2π)n/2 exp{−1
2(yt
[V (ϑ)−1
+ B(ϑ)t
A(ϑ)B(ϑ)]y}
·
Z
exp{−
1
2
(β − B(ϑ)y)t
A(ϑ)(β − B(ϑ)y)}dβ
| {z }
(2π)p/2
|A(ϑ)−1|−1/2
(Variance is A(ϑ)−1
!)
(12)
Now (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))
= yt
V (ϑ)−1
y − 2yt
V (ϑ)−1
Xe
β(ϑ) + e
β(ϑ)t
Xt
V (ϑ)−1
X
| {z }
A(ϑ)
e
β(ϑ)
= yt
V (ϑ)−1
y − 2yt
V (ϑ)−1
XB(ϑ)y + yt
B(ϑ)t
A(ϑ)B(ϑ)y
= yt
V (ϑ)−1
y − yt
B(ϑ)t
A(ϑ)B(ϑ)y
Here we used:
e
β = (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
y = A(ϑ)−1
Xt
V (ϑ)−1
y = B(ϑ)y
and
B(ϑ)t
A(ϑ)B(ϑ) = V (ϑ)−1
XA(ϑ)−1
A(ϑ)B(ϑ) = V (ϑ)−1
XB(ϑ)
c (Claudia Czado, TU Munich) – 21 –
Therefore we can rewrite (12) as
R
L(β, ϑ)dβ = |V (ϑ)|−1/2
(2π)n/2 exp{−1
2(y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
· (2π)n/2
|A(ϑ)−1|−1/2 |A(ϑ)−1
| = 1
|A|
⇒ lR(θ) = ln(
R
L(β, ϑ)dβ)
= −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
−1
2 ln |A(ϑ)| + C
= lp(ϑ) − 1
2 ln |A(ϑ)| + C
Therefore the restricted ML (REML) of ϑ is given by ϑ̂REML which maximizes
lR(ϑ) = lp(ϑ) −
1
2
ln |Xt
V (ϑ)−1
X|
c (Claudia Czado, TU Munich) – 22 –
Summary: Estimation in LMM with unknown cov.
For the linear mixed model
Y = Xβ + Uγ + ǫ,

γ
ǫ

∼ Nmq+n

0
0

,

G(ϑ) 0mq×n
0n×mq R(ϑ)

with V (ϑ) = UG(ϑ)Ut
+ R(ϑ)
the covariance parameter vector ϑ is estimated by either
ϑ̂ML which maximizes
lp(ϑ) = −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
where e
β = (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
Y
or by
ϑ̂REML which maximizes lR(ϑ) = lp(ϑ) − 1
2 ln |Xt
V (ϑ)−1
X|
The fixed effects β and random effects γ are estimated by
b
β =

Xt b
V −1
X
−1
Xt b
V −1
Y
b
γ = b
GUt b
V −1
(Y − X b
β) where b
V = V ( b
ϑML) or V ( b
ϑREML)
c (Claudia Czado, TU Munich) – 23 –
Special Case
(Dependence on ϑ is ignored to ease notation)
G =


D
...
D

 , U =


U1
...
Um

 , R =


Σ1
...
Σm

 ,
X =


X1
.
.
.
Xm

 , Y =


Y1
.
.
.
Ym


⇒ V = UGUt
+ R =


U1DUt
1 + Σ1 0
...
0 UmDUt
m + Σm

 (blockdiagonal)
=


V1
...
Vm

 where Vi := UiDUt
i + Σi
c (Claudia Czado, TU Munich) – 24 –
Define
b
Vi := UiD(b
ϑ)Ut
i + Σi(b
ϑ), where b
ϑ = b
ϑML or b
ϑREML
⇒ b
β = (Xt b
V −1
X)−1
Xt b
V −1
Y
=
m
X
i=1
Xt
i
b
Vi
−1
Xi)−1
(
m
X
i=1
Xt
i
b
Vi
−1
Yi)
and
b
γ = b
GUt b
V −1
(Y − Xb
β) has components
b
γi = D(b
γ)Ui
b
V −1
i (yi − Xi
b
β)
c (Claudia Czado, TU Munich) – 25 –
3) Confidence intervals and hypothesis tests
Since Y ∼ N(Xβ, V (ϑ)) holds, an approximation to the covariance of
b
β =

Xt
V −1
(b
ϑ)X
−1
Xt
V −1
(b
ϑ)Y is given by
A(b
ϑ) := (Xt
V −1
(b
ϑ)X)−1
Note: here one assumes that V (b
ϑ) is fixed and does not depend on Y .
Therefore b
σj := (Xt
V −1
(b
ϑ)X)−1
jj are considered as estimates of V ar(b
βj).
Therefore
b
βj ± z1−α/2
q
(XtV −1(b
ϑ)X)−1
jj
gives an approximate 100(1 − α)% CI for βj.
It is expected that (Xt
V −1
(b
ϑ)X)−1
jj underestimates V ar(b
βj) since the variation
in b
ϑ is not taken into account.
A full Bayesian analysis using MCMC methods is preferable to these
approximations.
c (Claudia Czado, TU Munich) – 26 –
Under the assumption that b
β is asymptotically normal with mean β and
covariance matrix A(ϑ), then the usal hypothesis tests can be done; i.e. for
• H0 : βj = 0 versus H1 : βj 6= 0
Reject H0 ⇔ |tj| = |
b
βj
b
σj
|  z1−α/2
• H0 : Cβ = d versus H1 : Cβ 6= d rank(C) = r
Reject H0 ⇔ W := (Cb
β − d)t
(Ct
A(b
ϑ)C)−1
(Cb
β − d)  χ2
1−α,r
(Wald-Test)
or
Reject H0 ⇔ −2[l(b
β, b
γ) − l( b
βR, b
γR)]  χ2
1−α,r
where b
β, b
γ estimates in unrestricted model
b
βR, b
γR estimates in restricted model (Cβ = d)
(Likelihood Ratio Test)
c (Claudia Czado, TU Munich) – 27 –
References
Fahrmeir, L., T. Kneib, and S. Lang (2007). Regression: Modelle, Methoden
und Anwendungen. Berlin: Springer-Verlag.
West, B., K. E. Welch, and A. T. Galecki (2007). Linear Mixed Models:
a practical guide using statistical software. Boca Raton: Chapman-
Hall/CRC.

More Related Content

PDF
Mixed Model Analysis for Overdispersion
PPTX
Static Models of Continuous Variables
PDF
BlUP and BLUE- REML of linear mixed model
PPT
Lecture-4 Advanced biostatistics BLUP.ppt
PDF
Priliminary Research on Multi-Dimensional Panel Data Modeling
PPTX
Advanced Methods of Statistical Analysis used in Animal Breeding.
PPSX
Viva extented final
PDF
Logit model testing and interpretation
Mixed Model Analysis for Overdispersion
Static Models of Continuous Variables
BlUP and BLUE- REML of linear mixed model
Lecture-4 Advanced biostatistics BLUP.ppt
Priliminary Research on Multi-Dimensional Panel Data Modeling
Advanced Methods of Statistical Analysis used in Animal Breeding.
Viva extented final
Logit model testing and interpretation

Similar to LMM, linear models with random effects, lecture 10 (20)

PDF
Federico Vegetti_GLM and Maximum Likelihood.pdf
PDF
Asymptotic properties of bayes factor in one way repeated measurements model
PDF
Asymptotic properties of bayes factor in one way repeated measurements model
PPT
Mixed models
PDF
A Solution Manual and Notes for The Elements of Statistical Learning.pdf
PDF
Linear models for data science
PDF
Varese italie seminar
PDF
Linear regression
PDF
A basic introduction to learning
PPTX
ERF Training Workshop Panel Data 5
PDF
Inversion_Parmetrization_under_det_problem.pdf
PPTX
GLM_2020_21.pptx
PDF
HW1 MIT Fall 2005
PPT
Lecture 1 maximum likelihood
PDF
Linear Regression.pdf
PDF
Presentation
PDF
ch02ans.pdf The Simple Linear Regression Model: Specification and Estimation
PDF
Uncoupled Regression from Pairwise Comparison Data
PDF
~Heather Turner_Introduction to generalized Linear Models.pdf
PPT
Cách sử dụng XTABOND2 trong STATA với GMM
Federico Vegetti_GLM and Maximum Likelihood.pdf
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
Mixed models
A Solution Manual and Notes for The Elements of Statistical Learning.pdf
Linear models for data science
Varese italie seminar
Linear regression
A basic introduction to learning
ERF Training Workshop Panel Data 5
Inversion_Parmetrization_under_det_problem.pdf
GLM_2020_21.pptx
HW1 MIT Fall 2005
Lecture 1 maximum likelihood
Linear Regression.pdf
Presentation
ch02ans.pdf The Simple Linear Regression Model: Specification and Estimation
Uncoupled Regression from Pairwise Comparison Data
~Heather Turner_Introduction to generalized Linear Models.pdf
Cách sử dụng XTABOND2 trong STATA với GMM
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Lecture1 pattern recognition............
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
Galatica Smart Energy Infrastructure Startup Pitch Deck
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Qualitative Qantitative and Mixed Methods.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
annual-report-2024-2025 original latest.
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
1_Introduction to advance data techniques.pptx
Lecture1 pattern recognition............
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
Ad

LMM, linear models with random effects, lecture 10

  • 1. c (Claudia Czado, TU Munich) – 0 – Lecture 10: Linear Mixed Models (Linear Models with Random Effects) Claudia Czado TU München
  • 2. c (Claudia Czado, TU Munich) – 1 – Overview West, Welch, and Galecki (2007) Fahrmeir, Kneib, and Lang (2007) (Kapitel 6) • Introduction • Likelihood Inference for Linear Mixed Models – Parameter Estimation for known Covariance Structure – Parameter Estimation for unknown Covariance Structure – Confidence Intervals and Hypothesis Tests
  • 3. c (Claudia Czado, TU Munich) – 2 – Introduction So far: independent response variables, but often • Clustered Data – response is measured for each subject – each subject belongs to a group of subjects (cluster) Ex.: – math scores of student grouped by classrooms (class room forms cluster) – birth weigths of rats grouped by litter (litter forms cluster) • Longitudinal Data – response is measured at several time points – number of time points is not too large (in contrast to time series) Ex.: sales of a product at each month in a year (12 measurements)
  • 4. c (Claudia Czado, TU Munich) – 3 – Fixed and Random Factors/Effects How can we extend the linear model to allow for such dependent data structures? fixed factor = qualitative covariate (e.g. gender, agegroup) fixed effect = quantitative covariate (e.g. age) random factor = qualitative variable whose levels are randomly sampled from a population of levels being studied Ex.: 20 supermarkets were selected and their number of cashiers were reported 10 supermarkets with 2 cashiers 5 supermarkets with 1 cashier 5 supermarkets with 5 cashiers observed levels of random factor “number of cashiers” random effect = quantitative variable whose levels are randomly sampled from a population of levels being studied Ex.: 20 supermarkets were selected and their size reported. These size values are random samples from the population of size values of all supermarkets.
  • 5. c (Claudia Czado, TU Munich) – 4 – Modeling Clustered Data Yij = response of j-th member of cluster i, i = 1, . . . , m, j = 1, . . . , ni m = number of clusters ni = size of cluster i xij = covariate vector of j-th member of cluster i for fixed effects, ∈ Rp β = fixed effects parameter, ∈ Rp uij = covariate vector of j-th member of cluster i for random effects, ∈ Rq γi = random effect parameter, ∈ Rq Model: Yij = xij t β | {z } fixed + uij t γi | {z } random + ǫij |{z} random i = 1, . . . , m; j = 1, . . . , ni
  • 6. c (Claudia Czado, TU Munich) – 5 – Mixed Linear Model (LMM) I Assumptions: γi ∼ Nq(0, D), D ∈ Rq×q ǫi :=   ǫi1 . . . ǫini   ∼ Nni (0, Σi), Σi ∈ Rni×ni γ1, . . . , γm, ǫ1, . . . , ǫm independent D = covariance matrix of random effects γi Σi = covariance matrix of error vector ǫi in cluster i
  • 7. c (Claudia Czado, TU Munich) – 6 – Mixed Linear Model (LMM) II Matrix Notation: Xi :=   xt i1 . . . xt ini   ∈ Rni×p , Ui :=   ut i1 . . . ut ini   ∈ Rni×q , Yi :=   Yi1 . . . Yini   ∈ Rni ⇒ Yi = Xiβ + Uiγi + ǫi i = 1, . . . , m γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent ǫi ∼ Nni (0, Σi) (1)
  • 8. c (Claudia Czado, TU Munich) – 7 – Modeling Longitudinal Data Yij = response of subject i at j-th measurement, i = 1, . . . , m, j = 1, . . . , ni ni = number of measurements for subject i m = number of objects xij = covariate vector of i-th subject at j-th measurement for fixed effects β ∈ Rp uij = covariate vector of i-th subject at j-th measurement for random effects γi ∈ Rq matrix notation ⇒ Yi = Xiβ + Uiγi + ǫi γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent ǫi ∼ Nni (0, Σi) Remark: The general form of the mixed linear model is the same for clustered and longitudinal observations.
  • 9. c (Claudia Czado, TU Munich) – 8 – Matrix Formulation of the Linear Mixed Model Y :=   Y1 . . . Ym   ∈ Rn , where n := m P i=1 ni X :=   X1 . . . Xn   ∈ Rn×p , β ∈ Rp G :=   D ... D   ∈ Rmq×mq U :=     U1 0n1×q . . . 0n1×q 0n2×q U2 . . . ... 0nm×q Um     ∈ Rn×(m·q) , 0ni×q :=   0 . . . 0 . . . ... 0 . . . 0   ∈ Rni×q γ :=   γ1 . . . γm   ∈ Rm·q , ǫ :=   ǫ1 . . . ǫm,   R :=   Σ1 0 ... 0 Σm   ∈ Rn×n
  • 10. c (Claudia Czado, TU Munich) – 9 – Linear Mixed Model (LMM) in matrix formulation With this, the linear mixed model (1) can be rewritten as Y = Xβ + Uγ + ǫ (2) where γ ǫ ∼ Nmq+n 0 0 , G 0mq×n 0n×mq R Remarks: • LMM (2) can be rewritten as two level hierarchical model Y |γ ∼ Nn(Xβ + Uγ, R) (3) γ ∼ Nmq(0, R) (4)
  • 11. c (Claudia Czado, TU Munich) – 10 – • Let Y = Xβ + ǫ∗ , where ǫ∗ := Uγ + ǫ = U In×n | {z } A γ ǫ (2) ⇒ ǫ∗ ∼ Nn(0, V ), where V = A G 0 0 R At = U In×n G 0 0 R Ut In×n = UG R Ut In×n = UGUt + R Therefore (2) implies Y = Xβ + ǫ∗ ǫ∗ ∼ Nn(0, V ) (5) marginal model • (2) or (3)+(4) implies (5), however (5) does not imply (3)+(4) ⇒ If one is only interested in estimating β one can use the ordinary linear model (5) If one is interested in estimating β and γ one has to use model (3)+(4)
  • 12. c (Claudia Czado, TU Munich) – 11 – Likelihood Inference for LMM: 1) Estimation of β and γ for known G and R Estimation of β: Using (5), we have as MLE or weighted LSE of β β̃ := Xt V −1 X −1 Xt V −1 Y (6) Recall: Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ), Σ known, Σ = Σ1/2 Σ1/2 t ⇒ Σ−1/2 Y = Σ−1/2 Xβ + Σ−1/2 ǫ | {z } (7) ∼ Nn(0, Σ−1/2 ΣΣ−1/2t | {z } =In ) ⇒ LSE of β in (7) : β̂ = Xt Σ−1/2t Σ−1/2 X −1 XΣ−1/2t Σ−1/2 Y = Xt Σ−1 X −1 Xt Σ−1 Y (8) This estimate is called the weighted LSE Exercise: Show that (8) is the MLE in Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ)
  • 13. c (Claudia Czado, TU Munich) – 12 – Estimation of γ: From (3) and (4) it follows that Y ∼ Nn(Xβ, V ) γ ∼ Nmq(0, G) Cov(Y , γ) = Cov(Xβ + Uγ + ǫ, γ) = Cov(Xβ, γ) | {z } =0 +U V ar(γ, γ) | {z } G + Cov(ǫ, γ) | {z } =0 = UG ⇒ Y γ ∼ Nn+mq Xβ 0 , V UG GUt G Recall: X = Y Z ∼ Np µY µZ , ΣY ΣY Z ΣZY ΣZ ⇒ Z|Y ∼ N µZ|Y , ΣY |Z with µZ|Y = µZ + ΣZY Σ−1 Y (Y − µY ) , ΣZ|Y = ΣZ − ΣZY Σ−1 Y ΣY Z E(γ|Y ) = 0 + GUt V −10 (Y − Xβ) = GUt V −1 (Y − Xβ) (9) is the best linear unbiased predictor of γ (BLUP) Therefore γ̃ := GUt V −1 (Y − Xβ̃) is the empirical BLUP (EBLUP)
  • 14. c (Claudia Czado, TU Munich) – 13 – Joint maximization of log likelihood of (Y t , γt )t with respect to (β, γ) f(y, γ) = f(y|γ) · f(γ) (3)+(4) ∝ exp{−1 2(y − Xβ − Uγ)t R−1 (y − Xβ − Uγ)} exp{−1 2γt G−1 γ} ⇒ ln f(y, γ) = −1 2(y − Xβ − Uγ)t R−1 (y − Xβ − Uγ) −1 2 γt G−1 γ | {z } penalty term for γ + constants ind. of (β, γ) So it is enough to minimize Q(β, γ) := (y − Xβ − Uγ)t R−1 (y − Xβ − Uγ) − γt G−1 γ = γt R−1 γ − 2βt Xt R−1 y + 2βt Xt R−1 Uγ − 2γt Ut R−1 y +βt Xt R−1 Xβ + γt Ut R−1 Uγ + γt G−1 γ
  • 15. c (Claudia Czado, TU Munich) – 14 – Recall: f(α) := αt b = n P j=1 αjbj ∂ ∂αi f(α) = bj, ∂ ∂αf(α) = b g(α) := αt Aα = P i P j αiαjaij ∂ ∂αi g(α) = 2αiaii + n P j=1,j6=i αjaij + n P j=1,j6=i αjaji = 2 n P j=1 αjaij = 2At iα ∂ ∂αg(α) = 2   At 1 . . . At n   = 2Aα At i is ith row of A
  • 16. c (Claudia Czado, TU Munich) – 15 – Mixed Model Equation ∂ ∂β Q(β, γ) = −2Xt R−1 y + 2Xt R−1 Uγ + 2Xt R−1 Xβ Set = 0 ∂ ∂γ Q(β, γ) = −2Ut R−1 Xβ − 2Ut R−1 y + 2Ut R−1 Uγ + 2G−1 γ Set = 0 ⇔ Xt R−1 Xe β + Xt R−1 U e γ = Xt R−1 y Ut R−1 Xe β + (Ut R−1 U + G−1 )e γ = Ut R−1 y ⇔ Xt R−1 X Xt R−1 U Ut R−1 U Ut R−1 R + G−1 e β e γ = Xt R−1 y Ut R−1 y (10)
  • 17. c (Claudia Czado, TU Munich) – 16 – Exercise: Show that e β, e γ defined by (8) and (9) respectively solve (10). Define C := X U , B := 0 0 0 G−1 ⇒ Ct R−1 C = Xt Ut R−1 X U = Xt R−1 Ut R−1 X U = Xt R−1 X Xt R−1 U Ut R−1 X Ut R−1 U ⇒ (10) ⇔ (Ct R−1 C + B) e β e γ = Ct R−1 y ⇔ e β e γ = (Ct R−1 C + B)−1 Ct R−1 y
  • 18. c (Claudia Czado, TU Munich) – 17 – 2) Estimation for unknown covariance structure We assume now in the marginal model (5) Y = Xβ + ǫ∗ , ǫ∗ ∼ Nn(0, V ) with V = UGUt + R, that G and R are only known up to the variance parameter ϑ, i.e. we write V (ϑ) = UG(ϑ)Ut + R(ϑ)
  • 19. c (Claudia Czado, TU Munich) – 18 – ML Estimation in extended marginal model Y = Xβ + ǫ∗ , ǫ∗ ∼ Nn(0, V (ϑ)) with V (ϑ) = UG(ϑ)Ut + R(ϑ) loglikelihood for (β, ϑ): l(β, ϑ) = − 1 2 {ln |V (ϑ)|+(y−Xβ)t V (ϑ)−1 (y−Xβ)}+ const. ind. of β, ϑ (11) If we maximize (11) for fixed ϑ with regard to β, we get e β(ϑ) := (Xt V (ϑ)−1 X)−1 Xt V (ϑ)−1 y Then the profile log likelihood is lp(ϑ) := l(e β(ϑ), ϑ) = −1 2{ln |V (ϑ)| + (y − Xe β(ϑ))t V (ϑ)−1 (y − Xe β(ϑ))} Maximizing lp(ϑ) wrt to ϑ gives MLE ϑ̂ML. ϑ̂ML is however biased; this is why one uses often restricted ML estimation (REML)
  • 20. c (Claudia Czado, TU Munich) – 19 – Restricted ML Estimation in extended marginal model Here we use for the estimation of ϑ the marginal log likelihood: lR(ϑ) := ln( Z L(β, ϑ)dβ) R L(β, ϑ)dβ = R 1 (2π)n/2|V (ϑ)|−1/2 + exp{−1 2(y − Xβ)t V (ϑ)−1 (y − Xβ)}dβ Consider: (y − Xβ)t V (ϑ)−1 (y − Xβ) = βt Xt V (ϑ)−1 X | {z } A(ϑ) β − 2yt V (ϑ)−1 Xβ + yt V (ϑ)−1 y = (β − B(ϑ)y)t A(ϑ)(β − B(ϑ)y) + yt V (ϑ)−1 − yt B(ϑ)t A(ϑ)B(ϑ)y where B(ϑ) := A(ϑ)−1 Xt V (ϑ)−1 (Note that B(ϑ)t A(ϑ) = V (ϑ)−1 XA(ϑ)−1 A(ϑ) = V (ϑ)−1 X )
  • 21. c (Claudia Czado, TU Munich) – 20 – Therefore we have R L(β, ϑ)dβ = |V (ϑ)|−1/2 (2π)n/2 exp{−1 2(yt [V (ϑ)−1 + B(ϑ)t A(ϑ)B(ϑ)]y} · Z exp{− 1 2 (β − B(ϑ)y)t A(ϑ)(β − B(ϑ)y)}dβ | {z } (2π)p/2 |A(ϑ)−1|−1/2 (Variance is A(ϑ)−1 !) (12) Now (y − Xe β(ϑ))t V (ϑ)−1 (y − Xe β(ϑ)) = yt V (ϑ)−1 y − 2yt V (ϑ)−1 Xe β(ϑ) + e β(ϑ)t Xt V (ϑ)−1 X | {z } A(ϑ) e β(ϑ) = yt V (ϑ)−1 y − 2yt V (ϑ)−1 XB(ϑ)y + yt B(ϑ)t A(ϑ)B(ϑ)y = yt V (ϑ)−1 y − yt B(ϑ)t A(ϑ)B(ϑ)y Here we used: e β = (Xt V (ϑ)−1 X)−1 Xt V (ϑ)−1 y = A(ϑ)−1 Xt V (ϑ)−1 y = B(ϑ)y and B(ϑ)t A(ϑ)B(ϑ) = V (ϑ)−1 XA(ϑ)−1 A(ϑ)B(ϑ) = V (ϑ)−1 XB(ϑ)
  • 22. c (Claudia Czado, TU Munich) – 21 – Therefore we can rewrite (12) as R L(β, ϑ)dβ = |V (ϑ)|−1/2 (2π)n/2 exp{−1 2(y − Xe β(ϑ))t V (ϑ)−1 (y − Xe β(ϑ))} · (2π)n/2 |A(ϑ)−1|−1/2 |A(ϑ)−1 | = 1 |A| ⇒ lR(θ) = ln( R L(β, ϑ)dβ) = −1 2{ln |V (ϑ)| + (y − Xe β(ϑ))t V (ϑ)−1 (y − Xe β(ϑ))} −1 2 ln |A(ϑ)| + C = lp(ϑ) − 1 2 ln |A(ϑ)| + C Therefore the restricted ML (REML) of ϑ is given by ϑ̂REML which maximizes lR(ϑ) = lp(ϑ) − 1 2 ln |Xt V (ϑ)−1 X|
  • 23. c (Claudia Czado, TU Munich) – 22 – Summary: Estimation in LMM with unknown cov. For the linear mixed model Y = Xβ + Uγ + ǫ, γ ǫ ∼ Nmq+n 0 0 , G(ϑ) 0mq×n 0n×mq R(ϑ) with V (ϑ) = UG(ϑ)Ut + R(ϑ) the covariance parameter vector ϑ is estimated by either ϑ̂ML which maximizes lp(ϑ) = −1 2{ln |V (ϑ)| + (y − Xe β(ϑ))t V (ϑ)−1 (y − Xe β(ϑ))} where e β = (Xt V (ϑ)−1 X)−1 Xt V (ϑ)−1 Y or by ϑ̂REML which maximizes lR(ϑ) = lp(ϑ) − 1 2 ln |Xt V (ϑ)−1 X| The fixed effects β and random effects γ are estimated by b β = Xt b V −1 X −1 Xt b V −1 Y b γ = b GUt b V −1 (Y − X b β) where b V = V ( b ϑML) or V ( b ϑREML)
  • 24. c (Claudia Czado, TU Munich) – 23 – Special Case (Dependence on ϑ is ignored to ease notation) G =   D ... D   , U =   U1 ... Um   , R =   Σ1 ... Σm   , X =   X1 . . . Xm   , Y =   Y1 . . . Ym   ⇒ V = UGUt + R =   U1DUt 1 + Σ1 0 ... 0 UmDUt m + Σm   (blockdiagonal) =   V1 ... Vm   where Vi := UiDUt i + Σi
  • 25. c (Claudia Czado, TU Munich) – 24 – Define b Vi := UiD(b ϑ)Ut i + Σi(b ϑ), where b ϑ = b ϑML or b ϑREML ⇒ b β = (Xt b V −1 X)−1 Xt b V −1 Y = m X i=1 Xt i b Vi −1 Xi)−1 ( m X i=1 Xt i b Vi −1 Yi) and b γ = b GUt b V −1 (Y − Xb β) has components b γi = D(b γ)Ui b V −1 i (yi − Xi b β)
  • 26. c (Claudia Czado, TU Munich) – 25 – 3) Confidence intervals and hypothesis tests Since Y ∼ N(Xβ, V (ϑ)) holds, an approximation to the covariance of b β = Xt V −1 (b ϑ)X −1 Xt V −1 (b ϑ)Y is given by A(b ϑ) := (Xt V −1 (b ϑ)X)−1 Note: here one assumes that V (b ϑ) is fixed and does not depend on Y . Therefore b σj := (Xt V −1 (b ϑ)X)−1 jj are considered as estimates of V ar(b βj). Therefore b βj ± z1−α/2 q (XtV −1(b ϑ)X)−1 jj gives an approximate 100(1 − α)% CI for βj. It is expected that (Xt V −1 (b ϑ)X)−1 jj underestimates V ar(b βj) since the variation in b ϑ is not taken into account. A full Bayesian analysis using MCMC methods is preferable to these approximations.
  • 27. c (Claudia Czado, TU Munich) – 26 – Under the assumption that b β is asymptotically normal with mean β and covariance matrix A(ϑ), then the usal hypothesis tests can be done; i.e. for • H0 : βj = 0 versus H1 : βj 6= 0 Reject H0 ⇔ |tj| = | b βj b σj | z1−α/2 • H0 : Cβ = d versus H1 : Cβ 6= d rank(C) = r Reject H0 ⇔ W := (Cb β − d)t (Ct A(b ϑ)C)−1 (Cb β − d) χ2 1−α,r (Wald-Test) or Reject H0 ⇔ −2[l(b β, b γ) − l( b βR, b γR)] χ2 1−α,r where b β, b γ estimates in unrestricted model b βR, b γR estimates in restricted model (Cβ = d) (Likelihood Ratio Test)
  • 28. c (Claudia Czado, TU Munich) – 27 – References Fahrmeir, L., T. Kneib, and S. Lang (2007). Regression: Modelle, Methoden und Anwendungen. Berlin: Springer-Verlag. West, B., K. E. Welch, and A. T. Galecki (2007). Linear Mixed Models: a practical guide using statistical software. Boca Raton: Chapman- Hall/CRC.