SlideShare a Scribd company logo
Gibbs flow transport for Bayesian inference
Jeremy Heng
ESSEC Business School
Joint work with Arnaud Doucet (Oxford) & Yvo Pokern (UCL)
SciCADE 2019
Selected topics in computation and dynamics: machine learning and
multiscale methods
Innsbruck
22 July 2019
Jeremy Heng Flow transport 1/ 23
Problem specification
• Target distribution on Rd
⇡(dx) =
(x) dx
Z
where : Rd
! R+ can be evaluated pointwise and
Z =
Z
Rd
(x) dx
is unknown
• Problem 1: Obtain consistent estimator of ⇡(') :=
R
Rd '(x) ⇡(dx)
• Problem 2: Obtain unbiased and consistent estimator of Z
• Main challenge: dimension d is typically large
Jeremy Heng Flow transport 2/ 23
Motivation: Bayesian computation
• Prior distribution ⇡0 on unknown parameters of a model
• Likelihood function L : Rd
! R+ of data y
• Bayes update gives posterior distribution on Rd
⇡(dx) =
⇡0(x)L(x) dx
Z
,
where Z =
R
Rd ⇡0(x)L(x) dx is the marginal likelihood of y
• Problem: ⇡(') and Z are typically intractable
• Main challenge: complex models require large number of
parameters d
Jeremy Heng Flow transport 3/ 23
Monte Carlo methods
• Typically sampling from ⇡ is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
• MCMC constructs a ⇡-invariant Markov transition kernel
K : Rd
⇥ B(Rd
) ! [0, 1]
• Sample X0 ⇠ ⇡0 and iterate Xn ⇠ K(Xn 1, ·) until convergence
• MCMC methods have been successful in many applications, but can
also fail in practice, for e.g. when ⇡ is highly multi-modal
Jeremy Heng Flow transport 4/ 23
Annealed importance sampling
• If ⇡0 and ⇡ are distant, define bridges
⇡ m
(dx) =
⇡0(x)L(x) m
dx
Z( m)
,
with 0 = 0 < 1 < . . . < M = 1 so
that ⇡1 = ⇡
• Initialize X0 ⇠ ⇡0 and move Xm ⇠ Km(Xm 1, ·) for m = 1, . . . , M,
where Km is ⇡ m -invariant
• Annealed importance sampling constructs w : (Rd
)M+1
! R+ so
that
⇡(') =
E ['(XM )w(X0:M )]
E [w(X0:M )]
, Z = E [w(X0:M )]
• AIS (Neal, 2001) and SMC samplers (Del Moral et al., 2006) are
considered state-of-the-art in statistics and machine learning
Jeremy Heng Flow transport 5/ 23
Jarzynski nonequilibrium equality
• Consider M ! 1, i.e. define the curve
of distribution {⇡t}t2[0,1]
⇡t(dx) =
⇡0(x)L(x) (t)
dx
Z(t)
,
where : [0, 1] ! [0, 1] is a strictly
increasing C1
function
• Initialize X0 ⇠ ⇡0 and run time-inhomogenous Langevin dynamics
dXt =
1
2
r log ⇡t(Xt)dt + dWt, t 2 [0, 1]
• Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs
w : C([0, 1], Rd
) ! R+ so that
⇡(') =
E
⇥
'(X1)w(X[0,1])
⇤
E
⇥
w(X[0,1])
⇤ , Z = E
⇥
w(X[0,1])
⇤
Jeremy Heng Flow transport 6/ 23
Optimal dynamics
• Dynamical lag kLaw(Xt) ⇡tk impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] ⇥ Rd
! Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
r log ⇡t(Xt)dt + dWt, t 2 [0, 1], X0 ⇠ ⇡0
• An optimal choice of f results in zero lag, i.e. Xt ⇠ ⇡t for t 2 [0, 1],
and zero variance estimator of Z
• Any optimal choice f satisfies Liouville PDE/continuity equation
r · (⇡t(x)f (t, x)) = @t⇡t(x)
• Zero lag also achieved by running deterministic dynamics
dXt = f (t, Xt)dt, X0 ⇠ ⇡0
• Main idea: solve Liouville PDE for f and run ODE to get trajectory
Jeremy Heng Flow transport 7/ 23
Time evolution of distributions
• Time evolution of ⇡t is given by
@t⇡t(x) = 0
(t) (log L(x) It) ⇡t(x),
where
It =
1
0(t)
d
dt
log Z(t)
!
= E⇡t
[log L(Xt)] < 1
• Integrating recovers path sampling (Gelman and Meng, 1998) or
thermodynamic integration (Kirkwood, 1935) identity
log
✓
Z(1)
Z(0)
◆
=
Z 1
0
0
(t)It dt.
Jeremy Heng Flow transport 8/ 23
Defining the flow transport problem
• We want to solve the Liouville equation
r · (⇡t(x)f (t, x)) = @t⇡t(x),
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
A1 f is locally Lipschitz;
A2
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1;
Eulerian Liouville PDE () Lagrangian ODE
• Define flow transport problem as solving Liouville for f that
satisfies [A1] & [A2]
Jeremy Heng Flow transport 9/ 23
Ill-posedness and regularization
• Under-determined: consider ⇡t = N((0, 0) , I2) for t 2 [0, 1],
f (x1, x2) = (0, 0) and f (x1, x2) = ( x2, x1)
are both solutions
• Regularization: seek minimal kinetic energy solution
argminf
⇢Z 1
0
Z
Rd
|f (t, x)|2
⇡t(x) dx dt : f solves Liouville
Euler-Lagrange
=) f ⇤
(t, x) = r t(x) where r · (⇡t(x)r t(x)) = @t⇡t(x)
• Analytical solution available when distributions are (mixtures of)
Gaussians (Reich, 2012)
Jeremy Heng Flow transport 10/ 23
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
R x
1
@t⇡t(u) du
⇡t(x)
• Checking Liouville
r · (⇡t(x)f (t, x)) = @x
Z x
1
@t⇡t(u) du = @t⇡t(x)
A1 For f to be locally Lipschitz, assume
⇡0, L 2 C1
(R, R+) =) f 2 C1
([0, 1] ⇥ R, R)
A2 For integrability of
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily
|⇡tf |(t, x) =
Z x
1
@t⇡t(u) du ! 0 as |x| ! 1
since
R 1
1
@t⇡t(u) du = 0
• Optimality: f (t, x) = r t(x) holds trivially
Jeremy Heng Flow transport 11/ 23
Flow transport problem on R
• Re-write solution as
f (t, x) =
0
(t)It {Ft(x) Ix
t /It}
⇡t(x)
where Ix
t = E⇡t
[ ( 1,x] log L(Xt)] and Ft is CDF of ⇡t
• Speed is controlled by 0
(t) and ⇡t(x)
• Sign is given by di↵erence between Ft(x) and Ix
t /It 2 [0, 1]
Jeremy Heng Flow transport 12/ 23
Flow transport problem on Rd
, d 1
• Multivariate solution for d = 3
(⇡tf1)(t, x1:3) =
Z x1
1
@t⇡t(u1, x2, x3) du1
+ g1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1
(⇡tf2)(t, x1:3) = g0
1(t, x1)
Z 1
1
Z x2
1
@t⇡t(u1, u2, x3) du1:2
+ g0
1(t, x1)g2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
(⇡tf3)(t, x1:3) = g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
Z x3
1
@t⇡t(u1, u2, u3) du1:3
where g1, g2 2 C2
([0, 1] ⇥ R, [0, 1])
Jeremy Heng Flow transport 13/ 23
Flow transport problem on Rd
, d 1
• Checking Liouville
@x1 (⇡tf1)(t, x1:3) = @t⇡t(x1, x2, x3)
+ g0
1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1
@x2 (⇡tf2)(t, x1:3) = g0
1(t, x1)
Z 1
1
@t⇡t(u1, x2, x3) du1:2
+ g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
@x3 (⇡tf3)(t, x1:3) = g0
1(t, x1)g0
2(t, x2)
Z 1
1
Z 1
1
@t⇡t(u1, u2, x3) du1:2
• Taking divergence gives telescopic sum
r · (⇡tf )(t, x1:3) =
3X
i=1
@xi
(⇡tfi )(t, x1:3) = @t⇡t(x1:3)
Jeremy Heng Flow transport 14/ 23
Flow transport problem on Rd
, d 1
A1 For f to be locally Lipschitz, assume
⇡0, L 2 C1
(Rd
, R+) =) f 2 C1
([0, 1] ⇥ Rd
, Rd
)
A2 For integrability of
R 1
0
R
Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily
|⇡tf |(t, x)| ! 0 as |x| ! 1
if {gi } are non-decreasing functions with tail behaviour
gi (t, xi ) ! 0 as xi ! 1,
gi (t, xi ) ! 1 as xi ! 1
• Choosing gi (t, xi ) = Ft(xi ) as marginal CDF of ⇡t allows f to
decouple if distributions are independent
Jeremy Heng Flow transport 15/ 23
Approximate Gibbs flow transport
• Solution involved integrals of increasing dimension as it tracks
increasing conditional distributions
⇡t(x1|x2:d ), ⇡t(x2|3:d ), . . . , ⇡t(xd ), xi 2 R
• Trade-o↵ accuracy for computational tractability: track full
conditional distributions
⇡t(xi |x i ), xi 2 R
• System of Liouville equations
@xi
·
n
⇡t(xi |x i )˜fi (t, x)
o
= @t⇡t(xi |x i ),
each defined on (0, 1) ⇥ R
Jeremy Heng Flow transport 16/ 23
Approximate Gibbs flow transport
• The solution is
˜fi (t, x) =
R xi
1
@t⇡t(ui |x i ) dui
⇡t(xi |x i )
• If ⇡0, L 2 C1
(Rd
, R+) with appropriate tail behaviour, the ODE
dXt = ˜f (t, Xt)dt, X0 ⇠ ⇡0
admits a unique solution on [0, 1], referred to as Gibbs flow
Jeremy Heng Flow transport 17/ 23
Error control
• Define local error
"t(x) = @t⇡t(x) + r · (⇡t(x)˜f (t, x))
= @t⇡t(x)
dX
i=1
@t⇡t(xi |x i )⇡t(x i )
• Gibbs flow exploits local independence:
⇡t(x) =
dY
i=1
⇡t(xi ) =) "t(x) = 0
• If Gibbs flow induces {˜⇡t}t2[0,1] with ˜⇡0 = ⇡0
k˜⇡t ⇡tk2
L2  t
Z t
0
k"uk2
L2 du · exp
✓
1 +
Z t
0
kr · ˜f (u, ·) k1 du
◆
Jeremy Heng Flow transport 18/ 23
Numerical implementation of Gibbs flow
• Implementation of the Gibbs flow involves one-dimensional
quadrature and numerical integration
• Previously, we considered the forward Euler scheme
Ym = Ym 1 + t ˜f (tm 1, Ym 1) = m(Ym 1)
• To get Law(Ym), we need Jacobian determinant of m which
typically costs O(d3
)
• In contrast, this scheme that cycles through each dimension
Ym[i] = Ym 1[i] + t ˜f (tm 1, Ym[1 : i 1], Ym 1[i : d])
Ym = m,d · · · m,1(Ym 1)
is also order one, but costs O(d)
Jeremy Heng Flow transport 19/ 23
Mixture modelling example
• Lack of identifiability induces ⇡ on R4
with 4! = 24 well-separated
and identical modes
• Gibbs flow approximation
t =0.0006
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0058
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0542
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =1.0000
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
Jeremy Heng Flow transport 20/ 23
Mixture modelling example
• Proportion of samples in each of the 24 modes
0.00
0.01
0.02
0.03
0.04
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Mode
Proportionofparticles
• Pearson’s Chi-squared test for uniformity gives p-value of 0.85
Jeremy Heng Flow transport 21/ 23
Cox point process model
• E↵ective sample size % in dimension d
• AIS: AIS with HMC moves
• GF-SIS: Gibbs flow
• GF-AIS: Gibbs flow with HMC moves
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 400
Jeremy Heng Flow transport 22/ 23
End
• Slides will be uploaded to my webpage
• Heng, J., Doucet, A., & Pokern, Y. (2015). Gibbs Flow for
Approximate Transport with Applications to Bayesian Computation.
arXiv preprint arXiv:1509.08787.
• Updated article and R package coming soon!
Jeremy Heng Flow transport 23/ 23

More Related Content

PDF
Gibbs flow transport for Bayesian inference
PDF
Mesh Processing Course : Active Contours
PDF
Simplified Runtime Analysis of Estimation of Distribution Algorithms
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Mark Girolami's Read Paper 2010
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Gibbs flow transport for Bayesian inference
Mesh Processing Course : Active Contours
Simplified Runtime Analysis of Estimation of Distribution Algorithms
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Mark Girolami's Read Paper 2010
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...

What's hot (20)

PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
PDF
Ece formula sheet
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
PDF
An application of the hyperfunction theory to numerical integration
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
PDF
The dual geometry of Shannon information
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PDF
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
Ece formula sheet
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
An application of the hyperfunction theory to numerical integration
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
The dual geometry of Shannon information
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Coordinate sampler: A non-reversible Gibbs-like sampler
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Ad

Similar to Gibbs flow transport for Bayesian inference (20)

PDF
Can we estimate a constant?
PDF
A Note on “   Geraghty contraction type mappings”
PDF
Reflect tsukuba524
PDF
Anomalous Transport
PDF
Proximal Splitting and Optimal Transport
PPTX
Seismic data processing lecture 3
PDF
A Szemerédi-type theorem for subsets of the unit cube
PPTX
EC8352-Signals and Systems - Laplace transform
PDF
Redundancy in robot manipulators and multi robot systems
PDF
Andreas Eberle
PPT
digital control Chapter 2 slide
PDF
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
PDF
Ecfft zk studyclub 9.9
PPTX
Lect_Z_Transform_Main_digital_image_processing.pptx
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
PDF
On estimating the integrated co volatility using
PPTX
Advanced Functions Unit 1
PDF
Matrix calculus
PDF
Density theorems for anisotropic point configurations
Can we estimate a constant?
A Note on “   Geraghty contraction type mappings”
Reflect tsukuba524
Anomalous Transport
Proximal Splitting and Optimal Transport
Seismic data processing lecture 3
A Szemerédi-type theorem for subsets of the unit cube
EC8352-Signals and Systems - Laplace transform
Redundancy in robot manipulators and multi robot systems
Andreas Eberle
digital control Chapter 2 slide
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
Ecfft zk studyclub 9.9
Lect_Z_Transform_Main_digital_image_processing.pptx
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
On estimating the integrated co volatility using
Advanced Functions Unit 1
Matrix calculus
Density theorems for anisotropic point configurations
Ad

More from JeremyHeng10 (12)

PDF
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
PDF
Diffusion Schrödinger bridges for score-based generative modeling
PDF
Diffusion Schrödinger bridges for score-based generative modeling
PDF
Sequential Monte Carlo algorithms for agent-based models of disease transmission
PDF
Sequential Monte Carlo algorithms for agent-based models of disease transmission
PDF
Statistical inference for agent-based SIS and SIR models
PDF
Unbiased Markov chain Monte Carlo
PDF
Unbiased Markov chain Monte Carlo
PDF
Unbiased Hamiltonian Monte Carlo
PDF
Unbiased Hamiltonian Monte Carlo
PDF
Controlled sequential Monte Carlo
PDF
Talk in BayesComp 2018
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Statistical inference for agent-based SIS and SIR models
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
Controlled sequential Monte Carlo
Talk in BayesComp 2018

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Business Analytics and business intelligence.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
1_Introduction to advance data techniques.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data_Analytics_and_PowerBI_Presentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Quality review (1)_presentation of this 21
Business Analytics and business intelligence.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to machine learning and Linear Models
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
1_Introduction to advance data techniques.pptx

Gibbs flow transport for Bayesian inference

  • 1. Gibbs flow transport for Bayesian inference Jeremy Heng ESSEC Business School Joint work with Arnaud Doucet (Oxford) & Yvo Pokern (UCL) SciCADE 2019 Selected topics in computation and dynamics: machine learning and multiscale methods Innsbruck 22 July 2019 Jeremy Heng Flow transport 1/ 23
  • 2. Problem specification • Target distribution on Rd ⇡(dx) = (x) dx Z where : Rd ! R+ can be evaluated pointwise and Z = Z Rd (x) dx is unknown • Problem 1: Obtain consistent estimator of ⇡(') := R Rd '(x) ⇡(dx) • Problem 2: Obtain unbiased and consistent estimator of Z • Main challenge: dimension d is typically large Jeremy Heng Flow transport 2/ 23
  • 3. Motivation: Bayesian computation • Prior distribution ⇡0 on unknown parameters of a model • Likelihood function L : Rd ! R+ of data y • Bayes update gives posterior distribution on Rd ⇡(dx) = ⇡0(x)L(x) dx Z , where Z = R Rd ⇡0(x)L(x) dx is the marginal likelihood of y • Problem: ⇡(') and Z are typically intractable • Main challenge: complex models require large number of parameters d Jeremy Heng Flow transport 3/ 23
  • 4. Monte Carlo methods • Typically sampling from ⇡ is intractable, so we rely on Markov chain Monte Carlo (MCMC) methods • MCMC constructs a ⇡-invariant Markov transition kernel K : Rd ⇥ B(Rd ) ! [0, 1] • Sample X0 ⇠ ⇡0 and iterate Xn ⇠ K(Xn 1, ·) until convergence • MCMC methods have been successful in many applications, but can also fail in practice, for e.g. when ⇡ is highly multi-modal Jeremy Heng Flow transport 4/ 23
  • 5. Annealed importance sampling • If ⇡0 and ⇡ are distant, define bridges ⇡ m (dx) = ⇡0(x)L(x) m dx Z( m) , with 0 = 0 < 1 < . . . < M = 1 so that ⇡1 = ⇡ • Initialize X0 ⇠ ⇡0 and move Xm ⇠ Km(Xm 1, ·) for m = 1, . . . , M, where Km is ⇡ m -invariant • Annealed importance sampling constructs w : (Rd )M+1 ! R+ so that ⇡(') = E ['(XM )w(X0:M )] E [w(X0:M )] , Z = E [w(X0:M )] • AIS (Neal, 2001) and SMC samplers (Del Moral et al., 2006) are considered state-of-the-art in statistics and machine learning Jeremy Heng Flow transport 5/ 23
  • 6. Jarzynski nonequilibrium equality • Consider M ! 1, i.e. define the curve of distribution {⇡t}t2[0,1] ⇡t(dx) = ⇡0(x)L(x) (t) dx Z(t) , where : [0, 1] ! [0, 1] is a strictly increasing C1 function • Initialize X0 ⇠ ⇡0 and run time-inhomogenous Langevin dynamics dXt = 1 2 r log ⇡t(Xt)dt + dWt, t 2 [0, 1] • Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs w : C([0, 1], Rd ) ! R+ so that ⇡(') = E ⇥ '(X1)w(X[0,1]) ⇤ E ⇥ w(X[0,1]) ⇤ , Z = E ⇥ w(X[0,1]) ⇤ Jeremy Heng Flow transport 6/ 23
  • 7. Optimal dynamics • Dynamical lag kLaw(Xt) ⇡tk impacts variance of estimators • Vaikuntanathan & Jarzynski (2011) considered adding drift f : [0, 1] ⇥ Rd ! Rd to reduce lag dXt = f (t, Xt)dt + 1 2 r log ⇡t(Xt)dt + dWt, t 2 [0, 1], X0 ⇠ ⇡0 • An optimal choice of f results in zero lag, i.e. Xt ⇠ ⇡t for t 2 [0, 1], and zero variance estimator of Z • Any optimal choice f satisfies Liouville PDE/continuity equation r · (⇡t(x)f (t, x)) = @t⇡t(x) • Zero lag also achieved by running deterministic dynamics dXt = f (t, Xt)dt, X0 ⇠ ⇡0 • Main idea: solve Liouville PDE for f and run ODE to get trajectory Jeremy Heng Flow transport 7/ 23
  • 8. Time evolution of distributions • Time evolution of ⇡t is given by @t⇡t(x) = 0 (t) (log L(x) It) ⇡t(x), where It = 1 0(t) d dt log Z(t) ! = E⇡t [log L(Xt)] < 1 • Integrating recovers path sampling (Gelman and Meng, 1998) or thermodynamic integration (Kirkwood, 1935) identity log ✓ Z(1) Z(0) ◆ = Z 1 0 0 (t)It dt. Jeremy Heng Flow transport 8/ 23
  • 9. Defining the flow transport problem • We want to solve the Liouville equation r · (⇡t(x)f (t, x)) = @t⇡t(x), for a drift f ... but not all solutions will work! • Validity relies on following result: Theorem. Ambrosio et al. (2005) Under the following assumptions: A1 f is locally Lipschitz; A2 R 1 0 R Rd |f (t, x)|⇡t(x) dx dt < 1; Eulerian Liouville PDE () Lagrangian ODE • Define flow transport problem as solving Liouville for f that satisfies [A1] & [A2] Jeremy Heng Flow transport 9/ 23
  • 10. Ill-posedness and regularization • Under-determined: consider ⇡t = N((0, 0) , I2) for t 2 [0, 1], f (x1, x2) = (0, 0) and f (x1, x2) = ( x2, x1) are both solutions • Regularization: seek minimal kinetic energy solution argminf ⇢Z 1 0 Z Rd |f (t, x)|2 ⇡t(x) dx dt : f solves Liouville Euler-Lagrange =) f ⇤ (t, x) = r t(x) where r · (⇡t(x)r t(x)) = @t⇡t(x) • Analytical solution available when distributions are (mixtures of) Gaussians (Reich, 2012) Jeremy Heng Flow transport 10/ 23
  • 11. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = R x 1 @t⇡t(u) du ⇡t(x) • Checking Liouville r · (⇡t(x)f (t, x)) = @x Z x 1 @t⇡t(u) du = @t⇡t(x) A1 For f to be locally Lipschitz, assume ⇡0, L 2 C1 (R, R+) =) f 2 C1 ([0, 1] ⇥ R, R) A2 For integrability of R 1 0 R Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily |⇡tf |(t, x) = Z x 1 @t⇡t(u) du ! 0 as |x| ! 1 since R 1 1 @t⇡t(u) du = 0 • Optimality: f (t, x) = r t(x) holds trivially Jeremy Heng Flow transport 11/ 23
  • 12. Flow transport problem on R • Re-write solution as f (t, x) = 0 (t)It {Ft(x) Ix t /It} ⇡t(x) where Ix t = E⇡t [ ( 1,x] log L(Xt)] and Ft is CDF of ⇡t • Speed is controlled by 0 (t) and ⇡t(x) • Sign is given by di↵erence between Ft(x) and Ix t /It 2 [0, 1] Jeremy Heng Flow transport 12/ 23
  • 13. Flow transport problem on Rd , d 1 • Multivariate solution for d = 3 (⇡tf1)(t, x1:3) = Z x1 1 @t⇡t(u1, x2, x3) du1 + g1(t, x1) Z 1 1 @t⇡t(u1, x2, x3) du1 (⇡tf2)(t, x1:3) = g0 1(t, x1) Z 1 1 Z x2 1 @t⇡t(u1, u2, x3) du1:2 + g0 1(t, x1)g2(t, x2) Z 1 1 Z 1 1 @t⇡t(u1, u2, x3) du1:2 (⇡tf3)(t, x1:3) = g0 1(t, x1)g0 2(t, x2) Z 1 1 Z 1 1 Z x3 1 @t⇡t(u1, u2, u3) du1:3 where g1, g2 2 C2 ([0, 1] ⇥ R, [0, 1]) Jeremy Heng Flow transport 13/ 23
  • 14. Flow transport problem on Rd , d 1 • Checking Liouville @x1 (⇡tf1)(t, x1:3) = @t⇡t(x1, x2, x3) + g0 1(t, x1) Z 1 1 @t⇡t(u1, x2, x3) du1 @x2 (⇡tf2)(t, x1:3) = g0 1(t, x1) Z 1 1 @t⇡t(u1, x2, x3) du1:2 + g0 1(t, x1)g0 2(t, x2) Z 1 1 Z 1 1 @t⇡t(u1, u2, x3) du1:2 @x3 (⇡tf3)(t, x1:3) = g0 1(t, x1)g0 2(t, x2) Z 1 1 Z 1 1 @t⇡t(u1, u2, x3) du1:2 • Taking divergence gives telescopic sum r · (⇡tf )(t, x1:3) = 3X i=1 @xi (⇡tfi )(t, x1:3) = @t⇡t(x1:3) Jeremy Heng Flow transport 14/ 23
  • 15. Flow transport problem on Rd , d 1 A1 For f to be locally Lipschitz, assume ⇡0, L 2 C1 (Rd , R+) =) f 2 C1 ([0, 1] ⇥ Rd , Rd ) A2 For integrability of R 1 0 R Rd |f (t, x)|⇡t(x) dx dt < 1, necessarily |⇡tf |(t, x)| ! 0 as |x| ! 1 if {gi } are non-decreasing functions with tail behaviour gi (t, xi ) ! 0 as xi ! 1, gi (t, xi ) ! 1 as xi ! 1 • Choosing gi (t, xi ) = Ft(xi ) as marginal CDF of ⇡t allows f to decouple if distributions are independent Jeremy Heng Flow transport 15/ 23
  • 16. Approximate Gibbs flow transport • Solution involved integrals of increasing dimension as it tracks increasing conditional distributions ⇡t(x1|x2:d ), ⇡t(x2|3:d ), . . . , ⇡t(xd ), xi 2 R • Trade-o↵ accuracy for computational tractability: track full conditional distributions ⇡t(xi |x i ), xi 2 R • System of Liouville equations @xi · n ⇡t(xi |x i )˜fi (t, x) o = @t⇡t(xi |x i ), each defined on (0, 1) ⇥ R Jeremy Heng Flow transport 16/ 23
  • 17. Approximate Gibbs flow transport • The solution is ˜fi (t, x) = R xi 1 @t⇡t(ui |x i ) dui ⇡t(xi |x i ) • If ⇡0, L 2 C1 (Rd , R+) with appropriate tail behaviour, the ODE dXt = ˜f (t, Xt)dt, X0 ⇠ ⇡0 admits a unique solution on [0, 1], referred to as Gibbs flow Jeremy Heng Flow transport 17/ 23
  • 18. Error control • Define local error "t(x) = @t⇡t(x) + r · (⇡t(x)˜f (t, x)) = @t⇡t(x) dX i=1 @t⇡t(xi |x i )⇡t(x i ) • Gibbs flow exploits local independence: ⇡t(x) = dY i=1 ⇡t(xi ) =) "t(x) = 0 • If Gibbs flow induces {˜⇡t}t2[0,1] with ˜⇡0 = ⇡0 k˜⇡t ⇡tk2 L2  t Z t 0 k"uk2 L2 du · exp ✓ 1 + Z t 0 kr · ˜f (u, ·) k1 du ◆ Jeremy Heng Flow transport 18/ 23
  • 19. Numerical implementation of Gibbs flow • Implementation of the Gibbs flow involves one-dimensional quadrature and numerical integration • Previously, we considered the forward Euler scheme Ym = Ym 1 + t ˜f (tm 1, Ym 1) = m(Ym 1) • To get Law(Ym), we need Jacobian determinant of m which typically costs O(d3 ) • In contrast, this scheme that cycles through each dimension Ym[i] = Ym 1[i] + t ˜f (tm 1, Ym[1 : i 1], Ym 1[i : d]) Ym = m,d · · · m,1(Ym 1) is also order one, but costs O(d) Jeremy Heng Flow transport 19/ 23
  • 20. Mixture modelling example • Lack of identifiability induces ⇡ on R4 with 4! = 24 well-separated and identical modes • Gibbs flow approximation t =0.0006 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =0.0058 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =0.0542 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =1.0000 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 Jeremy Heng Flow transport 20/ 23
  • 21. Mixture modelling example • Proportion of samples in each of the 24 modes 0.00 0.01 0.02 0.03 0.04 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 Mode Proportionofparticles • Pearson’s Chi-squared test for uniformity gives p-value of 0.85 Jeremy Heng Flow transport 21/ 23
  • 22. Cox point process model • E↵ective sample size % in dimension d • AIS: AIS with HMC moves • GF-SIS: Gibbs flow • GF-AIS: Gibbs flow with HMC moves 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 225 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 400 Jeremy Heng Flow transport 22/ 23
  • 23. End • Slides will be uploaded to my webpage • Heng, J., Doucet, A., & Pokern, Y. (2015). Gibbs Flow for Approximate Transport with Applications to Bayesian Computation. arXiv preprint arXiv:1509.08787. • Updated article and R package coming soon! Jeremy Heng Flow transport 23/ 23