Non-informative reparametrisation for location-scale mixtures

Non-informative reparametrisations for location-scale mixtures
Kaniav Kamary1, Kate Lee2, Christian P. Robert1,3
1CEREMADE, Université Paris–Dauphine, Paris 2Auckland University of Technology, New Zealand 3Dept. of Statistics, University of
Warwick, and CREST, Paris
Introduction
Traditional definition of mixture density:
f(x θ,p) =
k
∑
i=1
pif(x θi)
k
∑
i=1
pi = 1. (1)
which gives a separate meaning to each component.
For the location-scale Gaussian mixture:
f(x θ,p) =
k
∑
i=1
piN(x µi,σi)
Mengersen and Robert (1996) [2] established that an improper prior on (µ1,σ1) leads to a proper prior
when
µi = µi−1 + σi−1δi and σi = τiσi−1,τi < 1.
Diebolt and Robert (1994) [3] discussed the alternative approach of imposing proper posteriors on
improper priors by banning almost empty components from the likelihood function.
Setting global mean and variance Eθ,p(X) = µ and varθ,p(X) = σ2, imposes natural constraints on the
component parameters;
µ =
k
∑
i=1
piµi; σ2
=
k
∑
i=1
piµ2
i +
k
∑
i=1
piσ2
i − µ2
; Eθ,p(X2
) =
k
∑
i=1
piµ2
i +
k
∑
i=1
piσ2
i
which implies that (µ1,...,µk,σ1,...,σk) belongs to a specific ellipse.
New reparametrisation: Modifying the parameterization of the location-scale mixture in terms of
the global mean and variance of the mixture distribution.
Writing
f(x θ,p) =
k
∑
i=1
pif(x µ + σγi/
√
pi,σηi/
√
pi), (2)
leads a parameter space such that (p1,...,pk,γ1,...,γk,η1,...,ηk) is constrained by
pi,ηi ≥ 0 (1 ≤ i ≤ k)
k
∑
i=1
pi = 1
k
∑
i=1
√
piγi = 0
k
∑
i=1
{η2
i + γ2
i } = 1.
which implies ∀i 0 ≤ pi ≤ 1, 0 ≤ γi ≤ 1, 0 ≤ ηi ≤ 1. The constraints lead that (γ1,...,η) belongs to an
hypersphere of R2k centered at the origin with the radius of r = 1 intersected with an hyperplane of this
space passing the origin that results in a circle centered at the origin with radius 1.
Spherical coordinate representation of γ’s:
Suppose that ∑
k
i=1 γ2
i = ϕ2. The vector γ belongs
both to the hypersphere of radius ϕ and to the
hyperplane orthogonal to
√
pi;i = 1,...,k.
s-th orthogonal base Λs:
̃Λ1,j =
⎧⎪⎪
⎨
⎪⎪⎩
−
√
p2, j = 1
√
p1, j = 2
0, j > 2
s-th vector is given by
̃Λs,j =
⎧⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎩
−(pjps+1)1/2/(∑
s
l=1
pl)
1/2
, s > 1, j ≤ s
(∑
s
l=1
pl)
1/2
, s > 1, j = s + 1
0, s > 1, j > s + 1
and s-th orthonormal base is Fs = Λs/Λs .
Figure: Image from Robert Osserman.
(γ1,...,γk) can be written as
(γ1,...,γk) = ϕcos( 1)F1 + ϕsin( 1)cos( 2)F2 + ... + ϕsin( 1)⋯sin( k−2)Fk−1
with the angles 1,..., k−3 in [0,π] and k−2 in [0,2π].
Foundational consequences: The restriction is compact and helpful in selecting improper and
non-informative priors over mixtures.
Prior modeling:
Global mean and variance: The posterior distribution associated with the prior π(µ,σ) = 1/σ is proper
when (a) proper distributions are used on the other parameters and (b) there are at least two
observations in the sample.
Component weights: (p1,...,pk) ∼ Dir(α0,...,α0),
Angles ’s: 1,..., k−3 ∼ U[0,π] and k ∼ U[0,2π],
Raduis ϕ and η1,...,ηk: If k is small, (ϕ2,η2
1,...,η2
k) ∼ Dir(α,...,α) while for k more than 3, (η1,...,ηk)
is written through spherical coordinates
ηi =
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
√
1 − ϕ2 cos(ξi), i = 1
√
1 − ϕ2
i−1
∏
j=1
sin(ξj)cos(ξi), 1 < i < k
√
1 − ϕ2
i−1
∏
j=1
sin(ξj), i = k
Unlike , the support for all angles ξ1,⋯,ξk−1 is limited to [0,π/2], due to the positivity requirement on
the ηi’s.
(ξ1,⋯,ξk−1) ∼ U([0,π/2]k−1
).
MCMC algorithm
Metropolis-within-Gibbs algorithm for reparameterised mixture model:
1 Generate initial values (µ(0),σ(0),p(0),ϕ(0),ξ
(0)
1 ,...,ξ
(0)
k−1,
(0)
1 ,...,
(0)
k−2).
2 For t = 1,...,T, the update of (µ(t),σ(t),p(t),ϕ(t),ξ
(t)
1 ,...,ξ
(t)
k−1,
(t)
1 ,...,
(t)
k−2)
follows;
2.1 Generate a proposal µ′ ∼ N(µ(t−1),εµ) and update µ(t) against
π(⋅ x,σ(t−1)
,p(t−1)
,ϕ(t−1)
,ξ(t−1)
, (t−1)
).
2.2 Generate a proposal log(σ)′ ∼ N(log(σ(t−1)),εσ) and update σ(t) against
π(⋅ x,µ(t)
,p(t−1)
,ϕ(t−1)
,ξ(t−1)
, (t−1)
).
2.3 Generate a proposal (ϕ2)′ ∼ Beta((ϕ2)(t)εϕ + 1,(1 − (ϕ2)(t))εϕ + 1) and update ϕ(t) against
π(⋅ x,µ(t)
,σ(t)
,p(t−1)
,ξ(t)
, (t)
).
2.4 Generate a proposal p′ ∼ Dir(p
(t−1)
1 εp + 1,...,p
(t−1)
k εp + 1), and update p(t) against
π(⋅ x,µ(t)
,σ(t)
,ϕ(t)
,ξ(t)
, (t)
).
2.5 Generate proposals ξ′
i ∼ U[ξ
(t)
i − εξ,ξ
(t)
i + εξ], i = 1,⋯,k − 1, and update (ξ
(t)
1 ,...,ξ
(t)
k−1) against
π(⋅ x,µ(t)
,σ(t)
,p(t)
,ϕ(t)
, (t)
).
2.6 Generate proposals ′
i ∼ U[
(t)
i − ε ,
(t)
i + ε ], i = 1,⋯,k − 2, and update (
(t)
1 ,...,
(t)
k−2) against
π(⋅ x,µ(t)
,σ(t)
,p(t)
,ϕ(t)
,ξ(t)
).
where p(t) = (p
(t)
1 ,...,p
(t)
k ), x = (x1,...,xn), ξ(t) = (ξ
(t)
1 ,...,ξ
(t)
k−1) and (t) = (
(t)
1 ,...,
(t)
k−2).
Ultimixt package
▸ Implementation of the Metropolis-within-Gibbs algorithm for reparametrized mixture distribution;
▸ Calibrate the scales of the various proposals by aiming an average acceptance rate of either 0.44 or 0.234
depending on the dimension of the simulated parameter;
▸ Accurately estimate the component parameters;
Point estimator of the component parameters in the case of label switching:
▸ K-means clustering algorithm;
▸ Reordering labels towards producing the shortest distance between the current posterior sample and the
(or a) maximum posterior probability (MAP) estimate; [1].
Mixture of two normal distributions
A sample of size 50 simulated from .65N(−8,2) + .35N(−.5,1),
Figure: Empirical densities of 10 sequences of running Metropolis-within-Gibbs algorithm in parallel with 2e + 05 iterations.
▸ Outcomes of 10 parallel chains started
randomly from different starting values,
are indistinguishable;
▸ Chains are well-mixed;
▸ Sampler output covers the entire
sample space;
▸ Estimated densities converge to a
neighborhood of the true values;
▸ Estimated mixture density is remarkably
smooth;
Mixture of three normal distributions
A sample of size 50 is simulated from model .27N(−4.5,1) + .4N(10,1) + .33N(3,1)
Figure: Sequences of µi,σi and pi and estimated mixture density; mixture density estimate based on 104 MCMC iterations
Overfitting case
Extreme valued posterior samples for an overfitted model.
Galaxy dataset: Point estimator of the parameters of a mixture of (Left) 6 components; (Right) 4 components.
References
[1] S. Früwirth. Schnatter. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching
and mixture models. J. American Statist. Assoc., 96 194–209.
[2] K. Mengersen and C. Robert. (1996) Testing for mixtures: A Bayesian entropic approach (with discussion).
In Bayesian Statistics 5 (J. Berger, J. Bernardo, A. Dawid, D. Lindley and A. Smith, eds). Oxford University
Press, Oxford, 255–276.
[3] J. Diebolt and C. Robert. (1994) Estimation of finite mixture distributions by Bayesian sampling. J. Royal
Statist. Society Series B, 56 363–375.
kamary@ceremade.dauphine.fr

Non-informative reparametrisation for location-scale mixtures

More Related Content

What's hot (19)

Similar to Non-informative reparametrisation for location-scale mixtures (20)

More from Christian Robert (20)

Recently uploaded (20)

Non-informative reparametrisation for location-scale mixtures