Gibbs flow transport for Bayesian inference

Gibbs ﬂow transport for Bayesian inference
Jeremy Heng
ESSEC Business School
Joint work with Arnaud Doucet (Oxford) & Yvo Pokern (UCL)
SciCADE 2019
Selected topics in computation and dynamics: machine learning and
multiscale methods
Innsbruck
22 July 2019
Jeremy Heng Flow transport 1/ 23

Problem speciﬁcation
• Target distribution on Rd
⇡(dx) =
(x) dx
Z
where : Rd
! R+ can be evaluated pointwise and
Z =
Z
Rd
(x) dx
is unknown
• Problem 1: Obtain consistent estimator of ⇡(') :=
R
Rd '(x) ⇡(dx)
• Problem 2: Obtain unbiased and consistent estimator of Z
• Main challenge: dimension d is typically large

Motivation: Bayesian computation
• Prior distribution ⇡0 on unknown parameters of a model
• Likelihood function L : Rd
! R+ of data y
• Bayes update gives posterior distribution on Rd
⇡(dx) =
⇡0(x)L(x) dx
Z
,
where Z =
R
Rd ⇡0(x)L(x) dx is the marginal likelihood of y
• Problem: ⇡(') and Z are typically intractable
• Main challenge: complex models require large number of
parameters d

Monte Carlo methods
• Typically sampling from ⇡ is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
• MCMC constructs a ⇡-invariant Markov transition kernel
K : Rd
⇥ B(Rd
) ! [0, 1]
• Sample X0 ⇠ ⇡0 and iterate Xn ⇠ K(Xn 1, ·) until convergence
• MCMC methods have been successful in many applications, but can
also fail in practice, for e.g. when ⇡ is highly multi-modal

Annealed importance sampling
• If ⇡0 and ⇡ are distant, deﬁne bridges
⇡ m
(dx) =
⇡0(x)L(x) m
dx
Z( m)
,
with 0 = 0 < 1 < . . . < M = 1 so
that ⇡1 = ⇡
• Initialize X0 ⇠ ⇡0 and move Xm ⇠ Km(Xm 1, ·) for m = 1, . . . , M,
where Km is ⇡ m -invariant
• Annealed importance sampling constructs w : (Rd
)M+1
! R+ so
that
⇡(') =
E ['(XM )w(X0:M )]
E [w(X0:M )]
, Z = E [w(X0:M )]
• AIS (Neal, 2001) and SMC samplers (Del Moral et al., 2006) are
considered state-of-the-art in statistics and machine learning

Jarzynski nonequilibrium equality
• Consider M ! 1, i.e. deﬁne the curve
of distribution {⇡t}t2[0,1]
⇡t(dx) =
⇡0(x)L(x) (t)
dx
Z(t)
,
where : [0, 1] ! [0, 1] is a strictly
increasing C1
function
• Initialize X0 ⇠ ⇡0 and run time-inhomogenous Langevin dynamics
dXt =
1
2
r log ⇡t(Xt)dt + dWt, t 2 [0, 1]
• Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs
w : C([0, 1], Rd
) ! R+ so that
⇡(') =
E
⇥
'(X1)w(X[0,1])
⇤
E
⇥
w(X[0,1])
⇤ , Z = E
⇥
w(X[0,1])
⇤

Optimal dynamics
• Dynamical lag kLaw(Xt) ⇡tk impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] ⇥ Rd
! Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
r log ⇡t(Xt)dt + dWt, t 2 [0, 1], X0 ⇠ ⇡0
• An optimal choice of f results in zero lag, i.e. Xt ⇠ ⇡t for t 2 [0, 1],
and zero variance estimator of Z
• Any optimal choice f satisﬁes Liouville PDE/continuity equation
r · (⇡t(x)f (t, x)) = @t⇡t(x)
• Zero lag also achieved by running deterministic dynamics
dXt = f (t, Xt)dt, X0 ⇠ ⇡0
• Main idea: solve Liouville PDE for f and run ODE to get trajectory

Time evolution of distributions
• Time evolution of ⇡t is given by
@t⇡t(x) = 0
(t) (log L(x) It) ⇡t(x),
where
It =
1
0(t)
d
dt
log Z(t)
!
= E⇡t
[log L(Xt)] < 1
• Integrating recovers path sampling (Gelman and Meng, 1998) or
thermodynamic integration (Kirkwood, 1935) identity
log
✓
Z(1)
Z(0)
◆
=
Z 1
0
0
(t)It dt.

Defining the flow transport problem
• We want to solve the Liouville equation
r · (⇡t(x)f (t, x)) = @t⇡t(x),
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
A1 f is locally Lipschitz;
A2
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1;
Eulerian Liouville PDE () Lagrangian ODE
• Define flow transport problem as solving Liouville for f that
satisfies [A1] & [A2]

Ill-posedness and regularization
• Under-determined: consider ⇡t = N((0, 0) , I2) for t 2 [0, 1],
f (x1, x2) = (0, 0) and f (x1, x2) = ( x2, x1)
are both solutions
• Regularization: seek minimal kinetic energy solution
argminf
⇢Z 1
0
Z
Rd
|f (t, x)|2
⇡t(x) dx dt : f solves Liouville
Euler-Lagrange
=) f ⇤
(t, x) = r t(x) where r · (⇡t(x)r t(x)) = @t⇡t(x)
• Analytical solution available when distributions are (mixtures of)
Gaussians (Reich, 2012)

Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
R x
1
@t⇡t(u) du
⇡t(x)
• Checking Liouville
r · (⇡t(x)f (t, x)) = @x
Z x
1
@t⇡t(u) du = @t⇡t(x)
A1 For f to be locally Lipschitz, assume
⇡0, L 2 C1
(R, R+) =) f 2 C1
([0, 1] ⇥ R, R)
A2 For integrability of
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily
|⇡tf |(t, x) =
Z x
1
@t⇡t(u) du ! 0 as |x| ! 1
since
R 1
1
@t⇡t(u) du = 0
• Optimality: f (t, x) = r t(x) holds trivially

Flow transport problem on R
• Re-write solution as
f (t, x) =
0
(t)It {Ft(x) Ix
t /It}
⇡t(x)
where Ix
t = E⇡t
[ ( 1,x] log L(Xt)] and Ft is CDF of ⇡t
• Speed is controlled by 0
(t) and ⇡t(x)
• Sign is given by di↵erence between Ft(x) and Ix
t /It 2 [0, 1]

Flow transport problem on Rd
, d 1
• Multivariate solution for d = 3
(⇡tf1)(t, x1:3) =
Z x1
1
@t⇡t(u1, x2, x3) du1
+ g1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1
(⇡tf2)(t, x1:3) = g0
1(t, x1)
Z 1
1
Z x2
1
@t⇡t(u1, u2, x3) du1:2
+ g0
1(t, x1)g2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
(⇡tf3)(t, x1:3) = g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
Z x3
1
@t⇡t(u1, u2, u3) du1:3
where g1, g2 2 C2
([0, 1] ⇥ R, [0, 1])

, d 1
• Checking Liouville
@x1 (⇡tf1)(t, x1:3) = @t⇡t(x1, x2, x3)
+ g0
1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1
@x2 (⇡tf2)(t, x1:3) = g0
1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1:2
+ g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
@x3 (⇡tf3)(t, x1:3) = g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
• Taking divergence gives telescopic sum
r · (⇡tf )(t, x1:3) =
3X
i=1
@xi
(⇡tfi )(t, x1:3) = @t⇡t(x1:3)

, d 1
A1 For f to be locally Lipschitz, assume
⇡0, L 2 C1
(Rd
, R+) =) f 2 C1
([0, 1] ⇥ Rd
, Rd
)
A2 For integrability of
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily
|⇡tf |(t, x)| ! 0 as |x| ! 1
if {gi } are non-decreasing functions with tail behaviour
gi (t, xi ) ! 0 as xi ! 1,
gi (t, xi ) ! 1 as xi ! 1
• Choosing gi (t, xi ) = Ft(xi ) as marginal CDF of ⇡t allows f to
decouple if distributions are independent

Approximate Gibbs ﬂow transport
• Solution involved integrals of increasing dimension as it tracks
increasing conditional distributions
⇡t(x1|x2:d ), ⇡t(x2|3:d ), . . . , ⇡t(xd ), xi 2 R
• Trade-o↵ accuracy for computational tractability: track full
conditional distributions
⇡t(xi |x i ), xi 2 R
• System of Liouville equations
@xi
·
n
⇡t(xi |x i )˜fi (t, x)
o
= @t⇡t(xi |x i ),
each deﬁned on (0, 1) ⇥ R

Approximate Gibbs ﬂow transport
• The solution is
˜fi (t, x) =
R xi
1
@t⇡t(ui |x i ) dui
⇡t(xi |x i )
• If ⇡0, L 2 C1
(Rd
, R+) with appropriate tail behaviour, the ODE
dXt = ˜f (t, Xt)dt, X0 ⇠ ⇡0
admits a unique solution on [0, 1], referred to as Gibbs ﬂow

Error control
• Define local error
"t(x) = @t⇡t(x) + r · (⇡t(x)˜f (t, x))
= @t⇡t(x)
dX
i=1
@t⇡t(xi |x i )⇡t(x i )
• Gibbs flow exploits local independence:
⇡t(x) =
dY
i=1
⇡t(xi ) =) "t(x) = 0
• If Gibbs flow induces {˜⇡t}t2[0,1] with ˜⇡0 = ⇡0
k˜⇡t ⇡tk2
L2  t
Z t
0
k"uk2
L2 du · exp
✓
1 +
Z t
0
kr · ˜f (u, ·) k1 du
◆

Numerical implementation of Gibbs ﬂow
• Implementation of the Gibbs ﬂow involves one-dimensional
quadrature and numerical integration
• Previously, we considered the forward Euler scheme
Ym = Ym 1 + t ˜f (tm 1, Ym 1) = m(Ym 1)
• To get Law(Ym), we need Jacobian determinant of m which
typically costs O(d3
)
• In contrast, this scheme that cycles through each dimension
Ym[i] = Ym 1[i] + t ˜f (tm 1, Ym[1 : i 1], Ym 1[i : d])
Ym = m,d · · · m,1(Ym 1)
is also order one, but costs O(d)

Mixture modelling example
• Lack of identiﬁability induces ⇡ on R4
with 4! = 24 well-separated
and identical modes
• Gibbs ﬂow approximation
t =0.0006
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0058
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0542
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =1.0000
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10

Mixture modelling example
• Proportion of samples in each of the 24 modes
0.00
0.01
0.02
0.03
0.04
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Mode
Proportionofparticles
• Pearson’s Chi-squared test for uniformity gives p-value of 0.85

Cox point process model
• E↵ective sample size % in dimension d
• AIS: AIS with HMC moves
• GF-SIS: Gibbs ﬂow
• GF-AIS: Gibbs ﬂow with HMC moves
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
d = 400

End
• Slides will be uploaded to my webpage
• Heng, J., Doucet, A., & Pokern, Y. (2015). Gibbs Flow for
Approximate Transport with Applications to Bayesian Computation.
arXiv preprint arXiv:1509.08787.
• Updated article and R package coming soon!

Gibbs flow transport for Bayesian inference

More Related Content

What's hot (20)

Similar to Gibbs flow transport for Bayesian inference (20)

More from JeremyHeng10 (12)

Recently uploaded (20)

Gibbs flow transport for Bayesian inference