LMM, linear models with random effects, lecture 10

c (Claudia Czado, TU Munich) – 0 –
Lecture 10: Linear Mixed Models
(Linear Models with Random Effects)
Claudia Czado
TU München

Overview
West, Welch, and Galecki (2007)
Fahrmeir, Kneib, and Lang (2007) (Kapitel 6)
• Introduction
• Likelihood Inference for Linear Mixed Models
– Parameter Estimation for known Covariance Structure
– Parameter Estimation for unknown Covariance Structure
– Confidence Intervals and Hypothesis Tests

Introduction
So far: independent response variables, but often
• Clustered Data
– response is measured for each subject
– each subject belongs to a group of subjects (cluster)
Ex.:
– math scores of student grouped by classrooms (class room forms cluster)
– birth weigths of rats grouped by litter (litter forms cluster)
• Longitudinal Data
– response is measured at several time points
– number of time points is not too large (in contrast to time series)
Ex.: sales of a product at each month in a year (12 measurements)

Fixed and Random Factors/Effects
How can we extend the linear model to allow for such dependent data structures?
fixed factor = qualitative covariate (e.g. gender, agegroup)
fixed effect = quantitative covariate (e.g. age)
random factor = qualitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their number of cashiers were reported
10 supermarkets with 2 cashiers
5 supermarkets with 1 cashier
5 supermarkets with 5 cashiers

observed levels of random factor
“number of cashiers”
random effect = quantitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their size reported. These size values
are random samples from the population of size values of all supermarkets.

Modeling Clustered Data
Yij = response of j-th member of cluster i, i = 1, . . . , m, j = 1, . . . , ni
m = number of clusters
ni = size of cluster i
xij = covariate vector of j-th member of cluster i for fixed effects, ∈ Rp
β = fixed effects parameter, ∈ Rp
uij = covariate vector of j-th member of cluster i for random effects, ∈ Rq
γi = random effect parameter, ∈ Rq
Model:
Yij = xij
t
β
| {z }
fixed
+ uij
t
γi
| {z }
random
+ ǫij
|{z}
random
i = 1, . . . , m; j = 1, . . . , ni

Mixed Linear Model (LMM) I
Assumptions:
γi ∼ Nq(0, D), D ∈ Rq×q
ǫi :=


ǫi1
.
.
.
ǫini

 ∼ Nni
(0, Σi), Σi ∈ Rni×ni
γ1, . . . , γm, ǫ1, . . . , ǫm independent
D = covariance matrix of random effects γi
Σi = covariance matrix of error vector ǫi in cluster i

Mixed Linear Model (LMM) II
Matrix Notation:
Xi :=


xt
i1
.
.
.
xt
ini

 ∈ Rni×p
, Ui :=


ut
i1
.
.
.
ut
ini

 ∈ Rni×q
, Yi :=


Yi1
.
.
.
Yini

 ∈ Rni
⇒
Yi = Xiβ + Uiγi + ǫi i = 1, . . . , m
γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni
(0, Σi)
(1)

Modeling Longitudinal Data
Yij = response of subject i at j-th measurement, i = 1, . . . , m, j = 1, . . . , ni
ni = number of measurements for subject i
m = number of objects
xij = covariate vector of i-th subject at j-th measurement
for fixed effects β ∈ Rp
uij = covariate vector of i-th subject at j-th measurement
for random effects γi ∈ Rq
matrix notation
⇒
Yi = Xiβ + Uiγi + ǫi
γi ∼ Nq(0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni
(0, Σi)
Remark: The general form of the mixed linear model is the same for clustered
and longitudinal observations.

Matrix Formulation of the Linear Mixed Model
Y :=


Y1
.
.
.
Ym

 ∈ Rn
, where n :=
m
P
i=1
ni
X :=


X1
.
.
.
Xn

 ∈ Rn×p
, β ∈ Rp
G :=


D
...
D

 ∈ Rmq×mq
U :=




U1 0n1×q . . . 0n1×q
0n2×q U2
.
.
. ...
0nm×q Um



 ∈ Rn×(m·q)
, 0ni×q :=


0 . . . 0
.
.
. ...
0 . . . 0

 ∈ Rni×q
γ :=


γ1
.
.
.
γm

 ∈ Rm·q
, ǫ :=


ǫ1
.
.
.
ǫm,

 R :=


Σ1 0
...
0 Σm

 ∈ Rn×n

Linear Mixed Model (LMM) in matrix formulation
With this, the linear mixed model (1) can be rewritten as
Y = Xβ + Uγ + ǫ (2)
where

γ
ǫ

∼ Nmq+n

0
0

,

G 0mq×n
0n×mq R

Remarks:
• LMM (2) can be rewritten as two level hierarchical model
Y |γ ∼ Nn(Xβ + Uγ, R) (3)
γ ∼ Nmq(0, R) (4)

• Let Y = Xβ + ǫ∗
, where ǫ∗
:= Uγ + ǫ = U In×n

| {z }
A

γ
ǫ

(2)
⇒ ǫ∗
∼ Nn(0, V ), where
V = A

G 0
0 R

At
= U In×n

G 0
0 R

Ut
In×n

= UG R

Ut
In×n

= UGUt
+ R
Therefore (2) implies
Y = Xβ + ǫ∗
ǫ∗
∼ Nn(0, V )

(5) marginal model
• (2) or (3)+(4) implies (5), however (5) does not imply (3)+(4)
⇒ If one is only interested in estimating β one can use the ordinary
linear model (5)
If one is interested in estimating β and γ one has to use model (3)+(4)

Likelihood Inference for LMM:
1) Estimation of β and γ for known G and R
Estimation of β: Using (5), we have as MLE or weighted LSE of β
β̃ := Xt
V −1
X
−1
Xt
V −1
Y (6)
Recall: Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ), Σ known, Σ = Σ1/2
Σ1/2
t
⇒ Σ−1/2
Y = Σ−1/2
Xβ + Σ−1/2
ǫ
| {z } (7)
∼ Nn(0, Σ−1/2
ΣΣ−1/2t
| {z }
=In
)
⇒ LSE of β in (7) :
β̂ =

Xt
Σ−1/2t
Σ−1/2
X
−1
XΣ−1/2t
Σ−1/2
Y
= Xt
Σ−1
X
−1
Xt
Σ−1
Y (8)
This estimate is called the weighted LSE
Exercise: Show that (8) is the MLE in Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ)

Estimation of γ:
From (3) and (4) it follows that Y ∼ Nn(Xβ, V ) γ ∼ Nmq(0, G)
Cov(Y , γ) = Cov(Xβ + Uγ + ǫ, γ)
= Cov(Xβ, γ)
| {z }
=0
+U V ar(γ, γ)
| {z }
G
+ Cov(ǫ, γ)
| {z }
=0
= UG
⇒

Y
γ

∼ Nn+mq

Xβ
0

,

V UG
GUt
G

Recall: X =

Y
Z

∼ Np

µY
µZ

,

ΣY ΣY Z
ΣZY ΣZ

⇒ Z|Y ∼ N µZ|Y , ΣY |Z

with
µZ|Y = µZ + ΣZY Σ−1
Y (Y − µY ) , ΣZ|Y = ΣZ − ΣZY Σ−1
Y ΣY Z
E(γ|Y ) = 0 + GUt
V −10
(Y − Xβ) = GUt
V −1
(Y − Xβ) (9)
is the best linear unbiased predictor of γ (BLUP)
Therefore γ̃ := GUt
V −1
(Y − Xβ̃) is the empirical BLUP (EBLUP)

Joint maximization of log likelihood of (Y t
, γt
)t
with respect to (β, γ)
f(y, γ) = f(y|γ) · f(γ)
(3)+(4)
∝ exp{−1
2(y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ)}
exp{−1
2γt
G−1
γ}
⇒ ln f(y, γ) = −1
2(y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ)
−1
2 γt
G−1
γ
| {z }
penalty term for γ
+ constants ind. of (β, γ)
So it is enough to minimize
Q(β, γ) := (y − Xβ − Uγ)t
R−1
(y − Xβ − Uγ) − γt
G−1
γ
= γt
R−1
γ − 2βt
Xt
R−1
y + 2βt
Xt
R−1
Uγ − 2γt
Ut
R−1
y
+βt
Xt
R−1
Xβ + γt
Ut
R−1
Uγ + γt
G−1
γ

Recall:
f(α) := αt
b =
n
P
j=1
αjbj
∂
∂αi
f(α) = bj,
∂
∂αf(α) = b
g(α) := αt
Aα =
P
i
P
j
αiαjaij
∂
∂αi
g(α) = 2αiaii +
n
P
j=1,j6=i
αjaij +
n
P
j=1,j6=i
αjaji = 2
n
P
j=1
αjaij = 2At
iα
∂
∂αg(α) = 2


At
1
.
.
.
At
n

 = 2Aα At
i is ith row of A

Mixed Model Equation
∂
∂β
Q(β, γ) = −2Xt
R−1
y + 2Xt
R−1
Uγ + 2Xt
R−1
Xβ
Set
= 0
∂
∂γ Q(β, γ) = −2Ut
R−1
Xβ − 2Ut
R−1
y + 2Ut
R−1
Uγ + 2G−1
γ
Set
= 0
⇔ Xt
R−1
Xe
β + Xt
R−1
U e
γ = Xt
R−1
y
Ut
R−1
Xe
β + (Ut
R−1
U + G−1
)e
γ = Ut
R−1
y
⇔

Xt
R−1
X Xt
R−1
U
Ut
R−1
U Ut
R−1
R + G−1

e
β
e
γ

=

Xt
R−1
y
Ut
R−1
y

(10)

Exercise: Show that e
β, e
γ defined by (8) and (9) respectively solve (10).
Define C := X U

, B :=

0 0
0 G−1

⇒ Ct
R−1
C =

Xt
Ut

R−1
X U

=

Xt
R−1
Ut
R−1

X U

=

Xt
R−1
X Xt
R−1
U
Ut
R−1
X Ut
R−1
U

⇒ (10) ⇔ (Ct
R−1
C + B)

e
β
e
γ

= Ct
R−1
y
⇔

e
β
e
γ

= (Ct
R−1
C + B)−1
Ct
R−1
y

2) Estimation for unknown covariance structure
We assume now in the marginal model (5)
Y = Xβ + ǫ∗
, ǫ∗
∼ Nn(0, V )
with V = UGUt
+ R, that G and R are only known up to the
variance parameter ϑ, i.e. we write
V (ϑ) = UG(ϑ)Ut
+ R(ϑ)

ML Estimation in extended marginal model
Y = Xβ + ǫ∗
, ǫ∗
∼ Nn(0, V (ϑ)) with V (ϑ) = UG(ϑ)Ut
+ R(ϑ)
loglikelihood for (β, ϑ):
l(β, ϑ) = −
1
2
{ln |V (ϑ)|+(y−Xβ)t
V (ϑ)−1
(y−Xβ)}+ const. ind. of β, ϑ (11)
If we maximize (11) for fixed ϑ with regard to β, we get
e
β(ϑ) := (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
y
Then the profile log likelihood is
lp(ϑ) := l(e
β(ϑ), ϑ)
= −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
Maximizing lp(ϑ) wrt to ϑ gives MLE ϑ̂ML. ϑ̂ML is however biased; this is
why one uses often restricted ML estimation (REML)

Restricted ML Estimation in extended marginal
model
Here we use for the estimation of ϑ the marginal log likelihood:
lR(ϑ) := ln(
Z
L(β, ϑ)dβ)
R
L(β, ϑ)dβ =
R 1
(2π)n/2|V (ϑ)|−1/2
+ exp{−1
2(y − Xβ)t
V (ϑ)−1
(y − Xβ)}dβ
Consider:
(y − Xβ)t
V (ϑ)−1
(y − Xβ) = βt
Xt
V (ϑ)−1
X
| {z }
A(ϑ)
β − 2yt
V (ϑ)−1
Xβ + yt
V (ϑ)−1
y
= (β − B(ϑ)y)t
A(ϑ)(β − B(ϑ)y) + yt
V (ϑ)−1
− yt
B(ϑ)t
A(ϑ)B(ϑ)y
where B(ϑ) := A(ϑ)−1
Xt
V (ϑ)−1
(Note that B(ϑ)t
A(ϑ) = V (ϑ)−1
XA(ϑ)−1
A(ϑ) = V (ϑ)−1
X )

Therefore we have
R
L(β, ϑ)dβ = |V (ϑ)|−1/2
(2π)n/2 exp{−1
2(yt
[V (ϑ)−1
+ B(ϑ)t
A(ϑ)B(ϑ)]y}
·
Z
exp{−
1
2
(β − B(ϑ)y)t
A(ϑ)(β − B(ϑ)y)}dβ
| {z }
(2π)p/2
|A(ϑ)−1|−1/2
(Variance is A(ϑ)−1
!)
(12)
Now (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))
= yt
V (ϑ)−1
y − 2yt
V (ϑ)−1
Xe
β(ϑ) + e
β(ϑ)t
Xt
V (ϑ)−1
X
| {z }
A(ϑ)
e
β(ϑ)
= yt
V (ϑ)−1
y − 2yt
V (ϑ)−1
XB(ϑ)y + yt
B(ϑ)t
A(ϑ)B(ϑ)y
= yt
V (ϑ)−1
y − yt
B(ϑ)t
A(ϑ)B(ϑ)y
Here we used:
e
β = (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
y = A(ϑ)−1
Xt
V (ϑ)−1
y = B(ϑ)y
and
B(ϑ)t
A(ϑ)B(ϑ) = V (ϑ)−1
XA(ϑ)−1
A(ϑ)B(ϑ) = V (ϑ)−1
XB(ϑ)

Therefore we can rewrite (12) as
R
L(β, ϑ)dβ = |V (ϑ)|−1/2
(2π)n/2 exp{−1
2(y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
· (2π)n/2
|A(ϑ)−1|−1/2 |A(ϑ)−1
| = 1
|A|
⇒ lR(θ) = ln(
R
L(β, ϑ)dβ)
= −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
−1
2 ln |A(ϑ)| + C
= lp(ϑ) − 1
2 ln |A(ϑ)| + C
Therefore the restricted ML (REML) of ϑ is given by ϑ̂REML which maximizes
lR(ϑ) = lp(ϑ) −
1
2
ln |Xt
V (ϑ)−1
X|

Summary: Estimation in LMM with unknown cov.
For the linear mixed model
Y = Xβ + Uγ + ǫ,

γ
ǫ

∼ Nmq+n

0
0

,

G(ϑ) 0mq×n
0n×mq R(ϑ)

with V (ϑ) = UG(ϑ)Ut
+ R(ϑ)
the covariance parameter vector ϑ is estimated by either
ϑ̂ML which maximizes
lp(ϑ) = −1
2{ln |V (ϑ)| + (y − Xe
β(ϑ))t
V (ϑ)−1
(y − Xe
β(ϑ))}
where e
β = (Xt
V (ϑ)−1
X)−1
Xt
V (ϑ)−1
Y
or by
ϑ̂REML which maximizes lR(ϑ) = lp(ϑ) − 1
2 ln |Xt
V (ϑ)−1
X|
The fixed effects β and random effects γ are estimated by
b
β =

Xt b
V −1
X
−1
Xt b
V −1
Y
b
γ = b
GUt b
V −1
(Y − X b
β) where b
V = V ( b
ϑML) or V ( b
ϑREML)

Special Case
(Dependence on ϑ is ignored to ease notation)
G =


D
...
D

 , U =


U1
...
Um

 , R =


Σ1
...
Σm

 ,
X =


X1
.
.
.
Xm

 , Y =


Y1
.
.
.
Ym


⇒ V = UGUt
+ R =


U1DUt
1 + Σ1 0
...
0 UmDUt
m + Σm

 (blockdiagonal)
=


V1
...
Vm

 where Vi := UiDUt
i + Σi

Define
b
Vi := UiD(b
ϑ)Ut
i + Σi(b
ϑ), where b
ϑ = b
ϑML or b
ϑREML
⇒ b
β = (Xt b
V −1
X)−1
Xt b
V −1
Y
=
m
X
i=1
Xt
i
b
Vi
−1
Xi)−1
(
m
X
i=1
Xt
i
b
Vi
−1
Yi)
and
b
γ = b
GUt b
V −1
(Y − Xb
β) has components
b
γi = D(b
γ)Ui
b
V −1
i (yi − Xi
b
β)

3) Confidence intervals and hypothesis tests
Since Y ∼ N(Xβ, V (ϑ)) holds, an approximation to the covariance of
b
β =

Xt
V −1
(b
ϑ)X
−1
Xt
V −1
(b
ϑ)Y is given by
A(b
ϑ) := (Xt
V −1
(b
ϑ)X)−1
Note: here one assumes that V (b
ϑ) is fixed and does not depend on Y .
Therefore b
σj := (Xt
V −1
(b
ϑ)X)−1
jj are considered as estimates of V ar(b
βj).
Therefore
b
βj ± z1−α/2
q
(XtV −1(b
ϑ)X)−1
jj
gives an approximate 100(1 − α)% CI for βj.
It is expected that (Xt
V −1
(b
ϑ)X)−1
jj underestimates V ar(b
βj) since the variation
in b
ϑ is not taken into account.
A full Bayesian analysis using MCMC methods is preferable to these
approximations.

Under the assumption that b
β is asymptotically normal with mean β and
covariance matrix A(ϑ), then the usal hypothesis tests can be done; i.e. for
• H0 : βj = 0 versus H1 : βj 6= 0
Reject H0 ⇔ |tj| = |
b
βj
b
σj
| z1−α/2
• H0 : Cβ = d versus H1 : Cβ 6= d rank(C) = r
Reject H0 ⇔ W := (Cb
β − d)t
(Ct
A(b
ϑ)C)−1
(Cb
β − d) χ2
1−α,r
(Wald-Test)
or
Reject H0 ⇔ −2[l(b
β, b
γ) − l( b
βR, b
γR)] χ2
1−α,r
where b
β, b
γ estimates in unrestricted model
b
βR, b
γR estimates in restricted model (Cβ = d)
(Likelihood Ratio Test)

References
Fahrmeir, L., T. Kneib, and S. Lang (2007). Regression: Modelle, Methoden
und Anwendungen. Berlin: Springer-Verlag.
West, B., K. E. Welch, and A. T. Galecki (2007). Linear Mixed Models:
a practical guide using statistical software. Boca Raton: Chapman-
Hall/CRC.

LMM, linear models with random effects, lecture 10

More Related Content

Similar to LMM, linear models with random effects, lecture 10 (20)

Recently uploaded (20)

LMM, linear models with random effects, lecture 10