SlideShare a Scribd company logo
Controlled Sequential Monte Carlo
Jeremy Heng
Department of Statistics, Harvard University
Joint work with Adrian Bishop (UTS & CSIRO), George Deligiannidis and Arnaud Doucet
(Oxford)
BayesComp 2018
Jeremy Heng Controlled SMC 1 / 25
Intractable likelihoods
This work is about efficiently estimating
p(y|θ) = p(y|x, θ)p(dx|θ)
Jeremy Heng Controlled SMC 2 / 25
Intractable likelihoods
This work is about efficiently estimating
p(y|θ) = p(y|x, θ)p(dx|θ)
In state space models, law of latent Markov chain
p(dx|θ) = µθ(dx0)
T
t=1
Mt,θ(xt−1, dxt )
i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·)
Jeremy Heng Controlled SMC 2 / 25
Intractable likelihoods
This work is about efficiently estimating
p(y|θ) = p(y|x, θ)p(dx|θ)
In state space models, law of latent Markov chain
p(dx|θ) = µθ(dx0)
T
t=1
Mt,θ(xt−1, dxt )
i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·)
Observations are conditionally independent given (Xt )t∈[0:T]
p(y|x, θ) =
T
t=0
gθ(xt , yt )
Jeremy Heng Controlled SMC 2 / 25
Intractable likelihoods
This work is about efficiently estimating
p(y|θ) = p(y|x, θ)p(dx|θ)
In state space models, law of latent Markov chain
p(dx|θ) = µθ(dx0)
T
t=1
Mt,θ(xt−1, dxt )
i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·)
Observations are conditionally independent given (Xt )t∈[0:T]
p(y|x, θ) =
T
t=0
gθ(xt , yt )
Want to also approximate smoothing distribution
p(x|y, θ) =
T
t=0
gθ(xt , yt )p(dx|θ)p(y|θ)−1
Jeremy Heng Controlled SMC 2 / 25
Motivating example
Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000
collected from a
neuroscience experiment (Temereanca et al., 2008)
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
14
Jeremy Heng Controlled SMC 3 / 25
Motivating example
Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000
collected from a
neuroscience experiment (Temereanca et al., 2008)
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
14
Observation model:
Yt|Xt = xt ∼ Binomial (50, κ(xt)) , κ(u) := (1 + exp(−u))−1
Jeremy Heng Controlled SMC 3 / 25
Motivating example
Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000
collected from a
neuroscience experiment (Temereanca et al., 2008)
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
14
Observation model:
Yt|Xt = xt ∼ Binomial (50, κ(xt)) , κ(u) := (1 + exp(−u))−1
Latent Markov chain:
X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2
), t ∈ [1 : 2999],
θ = (α, σ2
) ∈ [0, 1] × R+ are the unknown parameters
Jeremy Heng Controlled SMC 3 / 25
Feynman-Kac formulation
Consider proposal Markov chain (Xt )t∈[0:T] with law
Q(dx0:T ) = µ(dx0)
T
t=1
Mt(xt−1, dxt )
Jeremy Heng Controlled SMC 4 / 25
Feynman-Kac formulation
Consider proposal Markov chain (Xt )t∈[0:T] with law
Q(dx0:T ) = µ(dx0)
T
t=1
Mt(xt−1, dxt )
Consider estimating
Z =
XT+1
G0(x0)
T
t=1
Gt(xt−1, xt)Q(dx0:T )
given positive bounded potential functions (Gt )t∈[0:T]
Jeremy Heng Controlled SMC 4 / 25
Feynman-Kac formulation
Consider proposal Markov chain (Xt )t∈[0:T] with law
Q(dx0:T ) = µ(dx0)
T
t=1
Mt(xt−1, dxt )
Consider estimating
Z =
XT+1
G0(x0)
T
t=1
Gt(xt−1, xt)Q(dx0:T )
given positive bounded potential functions (Gt )t∈[0:T]
Define target path measure
P(dx0:T ) = G0(x0)
T
t=1
Gt (xt−1, xt )Q(dx0:T )Z−1
Jeremy Heng Controlled SMC 4 / 25
Feynman-Kac formulation
Consider proposal Markov chain (Xt )t∈[0:T] with law
Q(dx0:T ) = µ(dx0)
T
t=1
Mt(xt−1, dxt )
Consider estimating
Z =
XT+1
G0(x0)
T
t=1
Gt(xt−1, xt)Q(dx0:T )
given positive bounded potential functions (Gt )t∈[0:T]
Define target path measure
P(dx0:T ) = G0(x0)
T
t=1
Gt (xt−1, xt )Q(dx0:T )Z−1
The quantities µ, (Mt )t∈[1:T], (Gt )t∈[0:T] depend on the specific
application
Jeremy Heng Controlled SMC 4 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
sample Xn
0 ∼ µ and weight W n
0 ∝ G0(Xn
0 );
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
sample Xn
0 ∼ µ and weight W n
0 ∝ G0(Xn
0 );
sample ancestor index An
0 ∼ R W 1
0 , . . . , W N
0
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
sample Xn
0 ∼ µ and weight W n
0 ∝ G0(Xn
0 );
sample ancestor index An
0 ∼ R W 1
0 , . . . , W N
0
For time t ∈ [1 : T] and particle n ∈ [1 : N]
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
sample Xn
0 ∼ µ and weight W n
0 ∝ G0(Xn
0 );
sample ancestor index An
0 ∼ R W 1
0 , . . . , W N
0
For time t ∈ [1 : T] and particle n ∈ [1 : N]
sample Xn
t ∼ Mt (X
An
t−1
t−1 , ·) and weight W n
t ∝ Gt (X
An
t−1
t−1 , Xn
t );
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
SMC methods simulate an interacting particle system of size
N ∈ N
At time t = 0 and particle n ∈ [1 : N]
sample Xn
0 ∼ µ and weight W n
0 ∝ G0(Xn
0 );
sample ancestor index An
0 ∼ R W 1
0 , . . . , W N
0
For time t ∈ [1 : T] and particle n ∈ [1 : N]
sample Xn
t ∼ Mt (X
An
t−1
t−1 , ·) and weight W n
t ∝ Gt (X
An
t−1
t−1 , Xn
t );
sample ancestor index An
t ∼ R W 1
t , . . . , W N
t
Jeremy Heng Controlled SMC 5 / 25
Sequential Monte Carlo methods
Unbiased estimator of Z
ZN
=
1
N
N
n=1
G0(Xn
0 )
T
t=1
1
N
N
n=1
Gt (X
An
t−1
t−1 , Xn
t )
Jeremy Heng Controlled SMC 6 / 25
Sequential Monte Carlo methods
Unbiased estimator of Z
ZN
=
1
N
N
n=1
G0(Xn
0 )
T
t=1
1
N
N
n=1
Gt (X
An
t−1
t−1 , Xn
t )
Particle approximation of P
PN
=
1
N
N
n=1
δXn
0:T
where Xn
0:T is obtained by tracing ancestral lineage of particle Xn
T
Jeremy Heng Controlled SMC 6 / 25
Sequential Monte Carlo methods
Unbiased estimator of Z
ZN
=
1
N
N
n=1
G0(Xn
0 )
T
t=1
1
N
N
n=1
Gt (X
An
t−1
t−1 , Xn
t )
Particle approximation of P
PN
=
1
N
N
n=1
δXn
0:T
where Xn
0:T is obtained by tracing ancestral lineage of particle Xn
T
Convergence properties of ZN
and PN
as N → ∞ are now
well-understood
Jeremy Heng Controlled SMC 6 / 25
Sequential Monte Carlo methods
Unbiased estimator of Z
ZN
=
1
N
N
n=1
G0(Xn
0 )
T
t=1
1
N
N
n=1
Gt (X
An
t−1
t−1 , Xn
t )
Particle approximation of P
PN
=
1
N
N
n=1
δXn
0:T
where Xn
0:T is obtained by tracing ancestral lineage of particle Xn
T
Convergence properties of ZN
and PN
as N → ∞ are now
well-understood
However quality of approximation can be inadequate for practical
choices of N
Jeremy Heng Controlled SMC 6 / 25
Sequential Monte Carlo methods
Unbiased estimator of Z
ZN
=
1
N
N
n=1
G0(Xn
0 )
T
t=1
1
N
N
n=1
Gt (X
An
t−1
t−1 , Xn
t )
Particle approximation of P
PN
=
1
N
N
n=1
δXn
0:T
where Xn
0:T is obtained by tracing ancestral lineage of particle Xn
T
Convergence properties of ZN
and PN
as N → ∞ are now
well-understood
However quality of approximation can be inadequate for practical
choices of N
Performance crucially depends on discrepancy between P and Q
Jeremy Heng Controlled SMC 6 / 25
Neuroscience model continued
Using bootstrap particle filter (BPF), i.e. define
Q(dx0:T ) = µ(dx0) T
t=1 Mt(xt−1, dxt ) by
X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2
)
Jeremy Heng Controlled SMC 7 / 25
Neuroscience model continued
Using bootstrap particle filter (BPF), i.e. define
Q(dx0:T ) = µ(dx0) T
t=1 Mt(xt−1, dxt ) by
X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2
)
BPF struggles whenever the observations change abruptly
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
14
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Jeremy Heng Controlled SMC 7 / 25
Neuroscience model continued
Using bootstrap particle filter (BPF), i.e. define
Q(dx0:T ) = µ(dx0) T
t=1 Mt(xt−1, dxt ) by
X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2
)
BPF struggles whenever the observations change abruptly
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
14
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Better performance obtained with observations dependent
dynamics
Jeremy Heng Controlled SMC 7 / 25
Twisted path measures
Note that P(dx0:T |y0:T ) = P(dx0|y0:T )
T
t=1 P(dxt |xt−1, yt:T )
P(dx0|y0:T ) =
µ(dx0)ψ∗
0 (x0)
µ(ψ∗
0 )
, P(dxt |xt−1, yt:T ) =
Mt (xt−1, dxt)ψ∗
t (xt )
Mt(ψ∗
t )(xt−1)
where ψ∗
t (xt ) = P(yt:T |xt ) is the backward information filter
Jeremy Heng Controlled SMC 8 / 25
Twisted path measures
Note that P(dx0:T |y0:T ) = P(dx0|y0:T )
T
t=1 P(dxt |xt−1, yt:T )
P(dx0|y0:T ) =
µ(dx0)ψ∗
0 (x0)
µ(ψ∗
0 )
, P(dxt |xt−1, yt:T ) =
Mt (xt−1, dxt)ψ∗
t (xt )
Mt(ψ∗
t )(xt−1)
where ψ∗
t (xt ) = P(yt:T |xt ) is the backward information filter
Given a policy ψ = (ψt )t∈[0:T], i.e. positive and bounded functions
Jeremy Heng Controlled SMC 8 / 25
Twisted path measures
Note that P(dx0:T |y0:T ) = P(dx0|y0:T )
T
t=1 P(dxt |xt−1, yt:T )
P(dx0|y0:T ) =
µ(dx0)ψ∗
0 (x0)
µ(ψ∗
0 )
, P(dxt |xt−1, yt:T ) =
Mt (xt−1, dxt)ψ∗
t (xt )
Mt(ψ∗
t )(xt−1)
where ψ∗
t (xt ) = P(yt:T |xt ) is the backward information filter
Given a policy ψ = (ψt )t∈[0:T], i.e. positive and bounded functions
Define ψ-twisted path measure of Q as
Qψ
(dx0:T ) = µψ
(dx0)
T
t=1
Mψ
t (xt−1, dxt )
where
µψ
(dx0) :=
µ(dx0)ψ0(x0)
µ(ψ0)
, Mψ
t (xt−1, dxt) :=
Mt(xt−1, dxt)ψt (xt−1, xt )
Mt(ψt )(xt−1)
Jeremy Heng Controlled SMC 8 / 25
Twisted path measures
Now define twisted potentials (Gψ
t )t∈[0;T] such that
P(dx0:T ) = Gψ
0 (x0)
T
t=1
Gψ
t (xt−1, xt ) Qψ
(dx0:T )Z−1
hence Z = EQψ Gψ
0 (X0) T
t=1 Gψ
t (Xt−1, Xt )
Jeremy Heng Controlled SMC 9 / 25
Twisted path measures
Now define twisted potentials (Gψ
t )t∈[0;T] such that
P(dx0:T ) = Gψ
0 (x0)
T
t=1
Gψ
t (xt−1, xt ) Qψ
(dx0:T )Z−1
hence Z = EQψ Gψ
0 (X0) T
t=1 Gψ
t (Xt−1, Xt )
Achieved with
Gψ
0 (x0) :=
µ(ψ0)G0(x0)M1(ψ1)(x0)
ψ0(x0)
,
Gψ
t (xt−1, xt ) :=
Gt(xt−1, xt)Mt+1(ψt+1)(xt )
ψt (xt−1, xt )
, t ∈ [1 : T − 1],
Gψ
T (xT−1, xT ) :=
GT (xT−1, xT )
ψT (xT−1, xT )
Jeremy Heng Controlled SMC 9 / 25
Twisted path measures
Assume policy ψ is such that:
Jeremy Heng Controlled SMC 10 / 25
Twisted path measures
Assume policy ψ is such that:
sampling µψ
and (Mψ
t )t∈[1:T] feasible
Jeremy Heng Controlled SMC 10 / 25
Twisted path measures
Assume policy ψ is such that:
sampling µψ
and (Mψ
t )t∈[1:T] feasible
evaluating (Gψ
t )t∈[0:T] tractable
Jeremy Heng Controlled SMC 10 / 25
Twisted path measures
Assume policy ψ is such that:
sampling µψ
and (Mψ
t )t∈[1:T] feasible
evaluating (Gψ
t )t∈[0:T] tractable
For neuroscience model, we consider Gaussian approximation
ψt(xt ) = exp −atx2
t − bt xt − ct , (at , bt, ct ) ∈ R3
so
µψ
(x0) = N(x0; −k0b0, k0),
Mψ
t (xt−1, xt ) = N xt ; kt(ασ−2
xt−1 − bt), kt
with k0 := (1 + 2a0)−1
and kt := (σ−2
+ 2at)−1
Jeremy Heng Controlled SMC 10 / 25
Twisted path measures
Assume policy ψ is such that:
sampling µψ
and (Mψ
t )t∈[1:T] feasible
evaluating (Gψ
t )t∈[0:T] tractable
For neuroscience model, we consider Gaussian approximation
ψt(xt ) = exp −atx2
t − bt xt − ct , (at , bt, ct ) ∈ R3
so
µψ
(x0) = N(x0; −k0b0, k0),
Mψ
t (xt−1, xt ) = N xt ; kt(ασ−2
xt−1 − bt), kt
with k0 := (1 + 2a0)−1
and kt := (σ−2
+ 2at)−1
Integrals µ(ψ0) and xt → Mt+1(ψt+1)(xt ) are tractable
Jeremy Heng Controlled SMC 10 / 25
Twisted SMC
Construct ψ-twisted SMC as standard SMC applied to
µψ
, (Mψ
t )t∈[1:T], (Gψ
t )t∈[0:T]
Jeremy Heng Controlled SMC 11 / 25
Twisted SMC
Construct ψ-twisted SMC as standard SMC applied to
µψ
, (Mψ
t )t∈[1:T], (Gψ
t )t∈[0:T]
Unbiased estimator of Z
Zψ,N
=
1
N
N
n=1
Gψ
0 (Xn
0 )
T
t=1
1
N
N
n=1
Gψ
t (X
An
t−1
t−1 , Xn
t )
Jeremy Heng Controlled SMC 11 / 25
Twisted SMC
Construct ψ-twisted SMC as standard SMC applied to
µψ
, (Mψ
t )t∈[1:T], (Gψ
t )t∈[0:T]
Unbiased estimator of Z
Zψ,N
=
1
N
N
n=1
Gψ
0 (Xn
0 )
T
t=1
1
N
N
n=1
Gψ
t (X
An
t−1
t−1 , Xn
t )
Particle approximation of P
Pψ,N
=
1
N
N
n=1
δXn
0:T
Jeremy Heng Controlled SMC 11 / 25
Optimal policies
Policy with constant functions recover standard SMC
Jeremy Heng Controlled SMC 12 / 25
Optimal policies
Policy with constant functions recover standard SMC
Optimal policy
ψ∗
T (xT−1xT ) = GT (xT−1, xT ),
ψ∗
t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗
t+1)(xt ), t ∈ [T − 1 : 1],
ψ∗
0 (x0) = G0(x0)M1(ψ∗
1 )(x0)
Jeremy Heng Controlled SMC 12 / 25
Optimal policies
Policy with constant functions recover standard SMC
Optimal policy
ψ∗
T (xT−1xT ) = GT (xT−1, xT ),
ψ∗
t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗
t+1)(xt ), t ∈ [T − 1 : 1],
ψ∗
0 (x0) = G0(x0)M1(ψ∗
1 )(x0)
Under ψ∗
= (ψ∗
t )t∈[0:T]
Qψ∗
= P and Zψ∗
,N
= Z for N ≥ 1
Jeremy Heng Controlled SMC 12 / 25
Optimal policies
Policy with constant functions recover standard SMC
Optimal policy
ψ∗
T (xT−1xT ) = GT (xT−1, xT ),
ψ∗
t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗
t+1)(xt ), t ∈ [T − 1 : 1],
ψ∗
0 (x0) = G0(x0)M1(ψ∗
1 )(x0)
Under ψ∗
= (ψ∗
t )t∈[0:T]
Qψ∗
= P and Zψ∗
,N
= Z for N ≥ 1
Backward recursion defining ψ∗
is typically intractable
Jeremy Heng Controlled SMC 12 / 25
Optimal policies
Policy with constant functions recover standard SMC
Optimal policy
ψ∗
T (xT−1xT ) = GT (xT−1, xT ),
ψ∗
t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗
t+1)(xt ), t ∈ [T − 1 : 1],
ψ∗
0 (x0) = G0(x0)M1(ψ∗
1 )(x0)
Under ψ∗
= (ψ∗
t )t∈[0:T]
Qψ∗
= P and Zψ∗
,N
= Z for N ≥ 1
Backward recursion defining ψ∗
is typically intractable
If potentials (Gt )t∈[0:T] and transition densities of (Mt )t∈[1:T] are
log-concave, then ψ∗
is log-concave
Jeremy Heng Controlled SMC 12 / 25
Optimal policies
Policy with constant functions recover standard SMC
Optimal policy
ψ∗
T (xT−1xT ) = GT (xT−1, xT ),
ψ∗
t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗
t+1)(xt ), t ∈ [T − 1 : 1],
ψ∗
0 (x0) = G0(x0)M1(ψ∗
1 )(x0)
Under ψ∗
= (ψ∗
t )t∈[0:T]
Qψ∗
= P and Zψ∗
,N
= Z for N ≥ 1
Backward recursion defining ψ∗
is typically intractable
If potentials (Gt )t∈[0:T] and transition densities of (Mt )t∈[1:T] are
log-concave, then ψ∗
is log-concave
For neuroscience model, this justifies a Gaussian approximation
F := ξ(x) = exp −ax2
− bx − c : (a, b, c) ∈ R3
Jeremy Heng Controlled SMC 12 / 25
Connection to optimal control
V ∗
t := − log ψ∗
t are the optimal value functions of the
Kullback-Leibler control problem
inf
ψ∈Φ
KL Qψ
|P
Φ := {ψ ∈ Ψ : KL(Qψ
|P) < ∞}
Jeremy Heng Controlled SMC 13 / 25
Connection to optimal control
V ∗
t := − log ψ∗
t are the optimal value functions of the
Kullback-Leibler control problem
inf
ψ∈Φ
KL Qψ
|P
Φ := {ψ ∈ Ψ : KL(Qψ
|P) < ∞}
Methods to approximate backward recursion are known as
approximate dynamic programming (ADP) for finite horizon
control problems (Bertsekas and Tsitsiklis, 1996)
Jeremy Heng Controlled SMC 13 / 25
Connection to optimal control
V ∗
t := − log ψ∗
t are the optimal value functions of the
Kullback-Leibler control problem
inf
ψ∈Φ
KL Qψ
|P
Φ := {ψ ∈ Ψ : KL(Qψ
|P) < ∞}
Methods to approximate backward recursion are known as
approximate dynamic programming (ADP) for finite horizon
control problems (Bertsekas and Tsitsiklis, 1996)
Connection also useful for analysis (Tsitsiklis and Van Roy, 2001)
Jeremy Heng Controlled SMC 13 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
For time T: approximate ψ∗
T = GT using
ˆψT = arg min
ξ∈FT
N
n=1
{log ξ − log GT }2
(X
An
T−1
T−1 , Xn
T )
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
For time T: approximate ψ∗
T = GT using
ˆψT = arg min
ξ∈FT
N
n=1
{log ξ − log GT }2
(X
An
T−1
T−1 , Xn
T )
For time t ∈ [T − 1 : 0]: approximate ψ∗
t = Gt Mt+1(ψ∗
t+1) using
ˆψt = arg min
ξ∈Ft
N
n=1
log ξ − log Gt − log Mt+1( ˆψt+1)
2
(X
An
t−1
t−1 , Xn
t )
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
For time T: approximate ψ∗
T = GT using
ˆψT = arg min
ξ∈FT
N
n=1
{log ξ − log GT }2
(X
An
T−1
T−1 , Xn
T )
For time t ∈ [T − 1 : 0]: approximate ψ∗
t = Gt Mt+1(ψ∗
t+1) using
ˆψt = arg min
ξ∈Ft
N
n=1
log ξ − log Gt − log Mt+1( ˆψt+1)
2
(X
An
t−1
t−1 , Xn
t )
Output: policy ˆψ = ( ˆψt )t∈[0:T]
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
For time T: approximate ψ∗
T = GT using
ˆψT = arg min
ξ∈FT
N
n=1
{log ξ − log GT }2
(X
An
T−1
T−1 , Xn
T )
For time t ∈ [T − 1 : 0]: approximate ψ∗
t = Gt Mt+1(ψ∗
t+1) using
ˆψt = arg min
ξ∈Ft
N
n=1
log ξ − log Gt − log Mt+1( ˆψt+1)
2
(X
An
t−1
t−1 , Xn
t )
Output: policy ˆψ = ( ˆψt )t∈[0:T]
Run ˆψ-twisted SMC
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
First run standard SMC to get particles (Xn
t ) n ∈ [1 : N], t ∈ [0 : T]
Approximate backward recursion
For time T: approximate ψ∗
T = GT using
ˆψT = arg min
ξ∈FT
N
n=1
{log ξ − log GT }2
(X
An
T−1
T−1 , Xn
T )
For time t ∈ [T − 1 : 0]: approximate ψ∗
t = Gt Mt+1(ψ∗
t+1) using
ˆψt = arg min
ξ∈Ft
N
n=1
log ξ − log Gt − log Mt+1( ˆψt+1)
2
(X
An
t−1
t−1 , Xn
t )
Output: policy ˆψ = ( ˆψt )t∈[0:T]
Run ˆψ-twisted SMC
Algorithm is essentially Guarniero, Johansen & Lee (2017) if
projecting in natural scale
Jeremy Heng Controlled SMC 14 / 25
Approximate dynamic programming
We obtain error bounds like
E ˆψt − ψ∗
t L2 ≤
T
s=t
Ct−1,s−1eN
s
where Ct,s are stability constants and eN
t are approximation
errors
Jeremy Heng Controlled SMC 15 / 25
Approximate dynamic programming
We obtain error bounds like
E ˆψt − ψ∗
t L2 ≤
T
s=t
Ct−1,s−1eN
s
where Ct,s are stability constants and eN
t are approximation
errors
As N → ∞, ˆψ converges (LLN & CLT) to idealized ADP
˜ψT = arg min
ξ∈FT
E {log ξ − log GT }
2
(XT−1, XT ) ,
˜ψt = arg min
ξ∈Ft
E log ξ − log Gt − log Mt+1( ˜ψt+1)
2
(Xt−1, Xt )
Jeremy Heng Controlled SMC 15 / 25
Neuroscience model continued
ˆψ-twisted SMC improves upon standard SMC
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Jeremy Heng Controlled SMC 16 / 25
Neuroscience model continued
ˆψ-twisted SMC improves upon standard SMC
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T]
Jeremy Heng Controlled SMC 16 / 25
Neuroscience model continued
ˆψ-twisted SMC improves upon standard SMC
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T]
Controlled SMC approximates P(y0:T ), t ∈ [0 : T]
Jeremy Heng Controlled SMC 16 / 25
Neuroscience model continued
ˆψ-twisted SMC improves upon standard SMC
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T]
Controlled SMC approximates P(y0:T ), t ∈ [0 : T]
Can we do better?
Jeremy Heng Controlled SMC 16 / 25
Policy refinement
Want to refine current policy ˆψ
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Optimal choice of φ (with respect to ˆψ)
φ∗
T (xT−1xT ) = G
ˆψ
T (xT−1, xT ),
φ∗
t (xt−1, xt ) = G
ˆψ
t (xt−1, xt )M
ˆψ
t+1(φ∗
t+1)(xt ), t ∈ [T − 1 : 1],
φ∗
0(x0) = G
ˆψ
0 (x0)M
ˆψ
1 (φ∗
1)(x0)
as ˆψ · φ∗
= ψ∗
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Optimal choice of φ (with respect to ˆψ)
φ∗
T (xT−1xT ) = G
ˆψ
T (xT−1, xT ),
φ∗
t (xt−1, xt ) = G
ˆψ
t (xt−1, xt )M
ˆψ
t+1(φ∗
t+1)(xt ), t ∈ [T − 1 : 1],
φ∗
0(x0) = G
ˆψ
0 (x0)M
ˆψ
1 (φ∗
1)(x0)
as ˆψ · φ∗
= ψ∗
Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T]
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Optimal choice of φ (with respect to ˆψ)
φ∗
T (xT−1xT ) = G
ˆψ
T (xT−1, xT ),
φ∗
t (xt−1, xt ) = G
ˆψ
t (xt−1, xt )M
ˆψ
t+1(φ∗
t+1)(xt ), t ∈ [T − 1 : 1],
φ∗
0(x0) = G
ˆψ
0 (x0)M
ˆψ
1 (φ∗
1)(x0)
as ˆψ · φ∗
= ψ∗
Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T]
using particles from ˆψ-twisted SMC and same function classes
(Ft )t∈[0:T]
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Optimal choice of φ (with respect to ˆψ)
φ∗
T (xT−1xT ) = G
ˆψ
T (xT−1, xT ),
φ∗
t (xt−1, xt ) = G
ˆψ
t (xt−1, xt )M
ˆψ
t+1(φ∗
t+1)(xt ), t ∈ [T − 1 : 1],
φ∗
0(x0) = G
ˆψ
0 (x0)M
ˆψ
1 (φ∗
1)(x0)
as ˆψ · φ∗
= ψ∗
Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T]
using particles from ˆψ-twisted SMC and same function classes
(Ft )t∈[0:T]
Construct refined policy with ˆψ · ˆφ
Jeremy Heng Controlled SMC 17 / 25
Policy refinement
Want to refine current policy ˆψ
Twist Q
ˆψ
further with policy φ = (φt )t∈[0:T] gives
Q
ˆψ
φ
= Q
ˆψ·φ
, ˆψ · φ = ( ˆψt · φt )t∈[0:T]
Optimal choice of φ (with respect to ˆψ)
φ∗
T (xT−1xT ) = G
ˆψ
T (xT−1, xT ),
φ∗
t (xt−1, xt ) = G
ˆψ
t (xt−1, xt )M
ˆψ
t+1(φ∗
t+1)(xt ), t ∈ [T − 1 : 1],
φ∗
0(x0) = G
ˆψ
0 (x0)M
ˆψ
1 (φ∗
1)(x0)
as ˆψ · φ∗
= ψ∗
Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T]
using particles from ˆψ-twisted SMC and same function classes
(Ft )t∈[0:T]
Construct refined policy with ˆψ · ˆφ
Call this iterative scheme to refine policies controlled SMC
Jeremy Heng Controlled SMC 17 / 25
Neuroscience model continued
Iteration 2
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
Iteration 2
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Iteration 2
0 1000 2000
-3100
-3090
-3080
-3070
Jeremy Heng Controlled SMC 18 / 25
Neuroscience model continued
Iteration 3
0 500 1000 1500 2000 2500
0
10
20
30
40
50
60
70
80
90
100
Uncontrolled
Iteration 1
Iteration 2
Iteration 3
0 500 1000 1500 2000 2500
-3000
-2500
-2000
-1500
-1000
-500
Uncontrolled
Iteration 1
Iteration 2
Iteration 3
0 1000 2000
-3100
-3090
-3080
-3070
Jeremy Heng Controlled SMC 19 / 25
Neuroscience model continued
Under Ft := ξ(x) = exp −atx2
− bt x − ct : (at, bt , ct) ∈ R3
,
policy at iteration i ≥ 1
ψ
(i)
t (xt ) = exp −a
(i)
t x2
t − b
(i)
t xt − c
(i)
t
where (a
(i)
t , b
(i)
t , c
(i)
t ) =
i
k=1(ak
t , bk
t , ck
t )
Jeremy Heng Controlled SMC 20 / 25
Neuroscience model continued
Under Ft := ξ(x) = exp −atx2
− bt x − ct : (at, bt , ct) ∈ R3
,
policy at iteration i ≥ 1
ψ
(i)
t (xt ) = exp −a
(i)
t x2
t − b
(i)
t xt − c
(i)
t
where (a
(i)
t , b
(i)
t , c
(i)
t ) =
i
k=1(ak
t , bk
t , ck
t )
(ak
t , bk
t , ck
t ) are coefficients estimated at iteration k ≥ 1
0 500 1000 1500 2000 2500 3000
-4
-2
0
2
4
6
8
Iteration 1
Iteration 2
Iteration 3
0 500 1000 1500 2000 2500 3000
-20
-10
0
10
20
30
40
Iteration 1
Iteration 2
Iteration 3
Jeremy Heng Controlled SMC 20 / 25
Policy refinement
Residual from first ADP when fitting ˆψ is
εt := log ˆψt − log Gt − log Mt+1( ˆψt+1)
Jeremy Heng Controlled SMC 21 / 25
Policy refinement
Residual from first ADP when fitting ˆψ is
εt := log ˆψt − log Gt − log Mt+1( ˆψt+1)
Next ADP refinement re-fits residual (like L2
-boosting)
ˆφt = arg min
ξ∈F
N
n=1
log ξ − εt − log M
ˆψ
t+1(ˆφt+1)
2
(X
An
t−1
t−1 , Xn
t )
Jeremy Heng Controlled SMC 21 / 25
Policy refinement
Residual from first ADP when fitting ˆψ is
εt := log ˆψt − log Gt − log Mt+1( ˆψt+1)
Next ADP refinement re-fits residual (like L2
-boosting)
ˆφt = arg min
ξ∈F
N
n=1
log ξ − εt − log M
ˆψ
t+1(ˆφt+1)
2
(X
An
t−1
t−1 , Xn
t )
Twisted potential of refined policy ˆψ · ˆφ
− log G
ˆψ· ˆφ
t = log ˆφt − log G
ˆψ
t − log M
ˆψ
t+1(ˆφt+1)
new residual from fitting ˆφ
Jeremy Heng Controlled SMC 21 / 25
Policy refinement
Iterative scheme generates a Markov chain on policy space
FN ( ˆψ, U) = ˆψ · ˆφ
where ˆψ is current policy and ˆφ is ADP output
Jeremy Heng Controlled SMC 22 / 25
Policy refinement
Iterative scheme generates a Markov chain on policy space
FN ( ˆψ, U) = ˆψ · ˆφ
where ˆψ is current policy and ˆφ is ADP output
Converges to a unique invariant distribution that concentrates
around fixed points of
F( ˆψ) = ˆψ · ˜φ
where ˆψ is current policy and ˜φ is idealized ADP output
1.8 2 2.2 2.4 2.6 2.8 3 3.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
14 16 18 20 22 24
0
1
2
3
4
5
6
Jeremy Heng Controlled SMC 22 / 25
Neuroscience model continued
(Left) Comparison to bootstrap particle filter (BPF)
Jeremy Heng Controlled SMC 23 / 25
Neuroscience model continued
(Left) Comparison to bootstrap particle filter (BPF)
(Right) Comparison to forward filtering and backward smoother
(FFBS) for functional x0:T → 50κ(x0:T )
0.05 0.1 0.15 0.2
-9.5
-9
-8.5
-8
-7.5
-7
-6.5
-6 BPF
cSMC
0 500 1000 1500 2000 2500 3000
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
FFBS
cSMC
Figure: Relative variance of marginal likelihood estimates (left) and
estimates of smoothing expectation (right).
Jeremy Heng Controlled SMC 23 / 25
Neuroscience model continued
Bayesian inference for parameters θ = (α, σ2
) within particle
marginal Metropolis-Hastings (PMMH)
Jeremy Heng Controlled SMC 24 / 25
Neuroscience model continued
Bayesian inference for parameters θ = (α, σ2
) within particle
marginal Metropolis-Hastings (PMMH)
cSMC and BPF to produce unbiased estimates of marginal likelihood
Jeremy Heng Controlled SMC 24 / 25
Neuroscience model continued
Bayesian inference for parameters θ = (α, σ2
) within particle
marginal Metropolis-Hastings (PMMH)
cSMC and BPF to produce unbiased estimates of marginal likelihood
Autocorrelation function (ACF) of each PMMH chain
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BPF
cSMC
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BPF
cSMC
Jeremy Heng Controlled SMC 24 / 25
Neuroscience model continued
Bayesian inference for parameters θ = (α, σ2
) within particle
marginal Metropolis-Hastings (PMMH)
cSMC and BPF to produce unbiased estimates of marginal likelihood
Autocorrelation function (ACF) of each PMMH chain
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BPF
cSMC
0 10 20 30 40 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BPF
cSMC
ESS improvement roughly 5 times for both parameters
Jeremy Heng Controlled SMC 24 / 25
Concluding remarks
Methodology can be extended to static models or SMC samplers
πt (dx) =
π0(dx)L(x)λt
Zt
where 0 = λ0 < λ1 < · · · < λT = 1
Jeremy Heng Controlled SMC 25 / 25
Concluding remarks
Methodology can be extended to static models or SMC samplers
πt (dx) =
π0(dx)L(x)λt
Zt
where 0 = λ0 < λ1 < · · · < λT = 1
Draft: Controlled Sequential Monte Carlo, arXiv:1708.08396, 2017.
Jeremy Heng Controlled SMC 25 / 25
Concluding remarks
Methodology can be extended to static models or SMC samplers
πt (dx) =
π0(dx)L(x)λt
Zt
where 0 = λ0 < λ1 < · · · < λT = 1
Draft: Controlled Sequential Monte Carlo, arXiv:1708.08396, 2017.
MATLAB code: https://github.com/jeremyhengjm/controlledSMC
Jeremy Heng Controlled SMC 25 / 25

More Related Content

PDF
Natalini nse slide_giu2013
PDF
Introduction to MCMC methods
PDF
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
PDF
Freezing of energy of a soliton in an external potential
PDF
Particle filtering
PDF
Controlled sequential Monte Carlo
PDF
kinks and cusps in the transition dynamics of a bloch state
PDF
Recent developments on unbiased MCMC
Natalini nse slide_giu2013
Introduction to MCMC methods
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Freezing of energy of a soliton in an external potential
Particle filtering
Controlled sequential Monte Carlo
kinks and cusps in the transition dynamics of a bloch state
Recent developments on unbiased MCMC

What's hot (20)

PDF
Sequential Monte Carlo algorithms for agent-based models of disease transmission
PDF
Chris Sherlock's slides
PDF
Random Variables
PDF
Autoregression
PDF
Stochastic Processes - part 5
PDF
On problem-of-parameters-identification-of-dynamic-object
PDF
May the Force NOT be with you
PDF
Stochastic Processes - part 6
PDF
Stochastic Processes - part 1
PDF
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
PDF
Cyclo-stationary processes
PDF
Unbiased Markov chain Monte Carlo
PDF
Stochastic Processes - part 4
PPT
Ecte401 notes week3
PDF
A current perspectives of corrected operator splitting (os) for systems
PDF
Stochastic Processes - part 3
PDF
Magnetohydrodynamic Rayleigh Problem with Hall Effect in a porous Plate
PDF
Stochastic Processes - part 2
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Chris Sherlock's slides
Random Variables
Autoregression
Stochastic Processes - part 5
On problem-of-parameters-identification-of-dynamic-object
May the Force NOT be with you
Stochastic Processes - part 6
Stochastic Processes - part 1
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
Cyclo-stationary processes
Unbiased Markov chain Monte Carlo
Stochastic Processes - part 4
Ecte401 notes week3
A current perspectives of corrected operator splitting (os) for systems
Stochastic Processes - part 3
Magnetohydrodynamic Rayleigh Problem with Hall Effect in a porous Plate
Stochastic Processes - part 2
Ad

Similar to Talk in BayesComp 2018 (20)

PDF
Sequential Monte Carlo (SMC) and Particle Filters
PDF
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
PDF
Unbiased Markov chain Monte Carlo
PDF
Presentation MCB seminar 09032011
PDF
Presentation of SMC^2 at BISP7
PDF
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
PDF
2012 mdsp pr05 particle filter
PDF
Zap Q-Learning - ISMP 2018
PDF
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
PPT
Particle filter and Markov Chain Monte Carlo
PDF
Quantum Variables in Finance and Neuroscience Lecture Slides
PPTX
Monte Carlo Berkeley.pptx
PPT
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
PDF
Machine Learning With MapReduce, K-Means, MLE
PDF
MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems:...
PDF
SMC^2: an algorithm for sequential analysis of state-space models
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
Unbiased Hamiltonian Monte Carlo
Sequential Monte Carlo (SMC) and Particle Filters
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
Unbiased Markov chain Monte Carlo
Presentation MCB seminar 09032011
Presentation of SMC^2 at BISP7
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
2012 mdsp pr05 particle filter
Zap Q-Learning - ISMP 2018
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Particle filter and Markov Chain Monte Carlo
Quantum Variables in Finance and Neuroscience Lecture Slides
Monte Carlo Berkeley.pptx
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
Machine Learning With MapReduce, K-Means, MLE
MUMS: Transition & SPUQ Workshop - Complexities in Bayesian Inverse Problems:...
SMC^2: an algorithm for sequential analysis of state-space models
Inference for stochastic differential equations via approximate Bayesian comp...
Unbiased Hamiltonian Monte Carlo
Ad

More from JeremyHeng10 (8)

PDF
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
PDF
Diffusion Schrödinger bridges for score-based generative modeling
PDF
Diffusion Schrödinger bridges for score-based generative modeling
PDF
Sequential Monte Carlo algorithms for agent-based models of disease transmission
PDF
Statistical inference for agent-based SIS and SIR models
PDF
Gibbs flow transport for Bayesian inference
PDF
Unbiased Hamiltonian Monte Carlo
PDF
Gibbs flow transport for Bayesian inference
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Statistical inference for agent-based SIS and SIR models
Gibbs flow transport for Bayesian inference
Unbiased Hamiltonian Monte Carlo
Gibbs flow transport for Bayesian inference

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Foundation of Data Science unit number two notes
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ISS -ESG Data flows What is ESG and HowHow
Qualitative Qantitative and Mixed Methods.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Foundation of Data Science unit number two notes
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)

Talk in BayesComp 2018

  • 1. Controlled Sequential Monte Carlo Jeremy Heng Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS & CSIRO), George Deligiannidis and Arnaud Doucet (Oxford) BayesComp 2018 Jeremy Heng Controlled SMC 1 / 25
  • 2. Intractable likelihoods This work is about efficiently estimating p(y|θ) = p(y|x, θ)p(dx|θ) Jeremy Heng Controlled SMC 2 / 25
  • 3. Intractable likelihoods This work is about efficiently estimating p(y|θ) = p(y|x, θ)p(dx|θ) In state space models, law of latent Markov chain p(dx|θ) = µθ(dx0) T t=1 Mt,θ(xt−1, dxt ) i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·) Jeremy Heng Controlled SMC 2 / 25
  • 4. Intractable likelihoods This work is about efficiently estimating p(y|θ) = p(y|x, θ)p(dx|θ) In state space models, law of latent Markov chain p(dx|θ) = µθ(dx0) T t=1 Mt,θ(xt−1, dxt ) i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·) Observations are conditionally independent given (Xt )t∈[0:T] p(y|x, θ) = T t=0 gθ(xt , yt ) Jeremy Heng Controlled SMC 2 / 25
  • 5. Intractable likelihoods This work is about efficiently estimating p(y|θ) = p(y|x, θ)p(dx|θ) In state space models, law of latent Markov chain p(dx|θ) = µθ(dx0) T t=1 Mt,θ(xt−1, dxt ) i.e. X0 ∼ µθ and Xt |Xt−1 ∼ Mt,θ(Xt−1, ·) Observations are conditionally independent given (Xt )t∈[0:T] p(y|x, θ) = T t=0 gθ(xt , yt ) Want to also approximate smoothing distribution p(x|y, θ) = T t=0 gθ(xt , yt )p(dx|θ)p(y|θ)−1 Jeremy Heng Controlled SMC 2 / 25
  • 6. Motivating example Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000 collected from a neuroscience experiment (Temereanca et al., 2008) 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 14 Jeremy Heng Controlled SMC 3 / 25
  • 7. Motivating example Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000 collected from a neuroscience experiment (Temereanca et al., 2008) 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 14 Observation model: Yt|Xt = xt ∼ Binomial (50, κ(xt)) , κ(u) := (1 + exp(−u))−1 Jeremy Heng Controlled SMC 3 / 25
  • 8. Motivating example Measurements (y0, . . . , yT ) ∈ {0, . . . , 50}3000 collected from a neuroscience experiment (Temereanca et al., 2008) 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 14 Observation model: Yt|Xt = xt ∼ Binomial (50, κ(xt)) , κ(u) := (1 + exp(−u))−1 Latent Markov chain: X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2 ), t ∈ [1 : 2999], θ = (α, σ2 ) ∈ [0, 1] × R+ are the unknown parameters Jeremy Heng Controlled SMC 3 / 25
  • 9. Feynman-Kac formulation Consider proposal Markov chain (Xt )t∈[0:T] with law Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) Jeremy Heng Controlled SMC 4 / 25
  • 10. Feynman-Kac formulation Consider proposal Markov chain (Xt )t∈[0:T] with law Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) Consider estimating Z = XT+1 G0(x0) T t=1 Gt(xt−1, xt)Q(dx0:T ) given positive bounded potential functions (Gt )t∈[0:T] Jeremy Heng Controlled SMC 4 / 25
  • 11. Feynman-Kac formulation Consider proposal Markov chain (Xt )t∈[0:T] with law Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) Consider estimating Z = XT+1 G0(x0) T t=1 Gt(xt−1, xt)Q(dx0:T ) given positive bounded potential functions (Gt )t∈[0:T] Define target path measure P(dx0:T ) = G0(x0) T t=1 Gt (xt−1, xt )Q(dx0:T )Z−1 Jeremy Heng Controlled SMC 4 / 25
  • 12. Feynman-Kac formulation Consider proposal Markov chain (Xt )t∈[0:T] with law Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) Consider estimating Z = XT+1 G0(x0) T t=1 Gt(xt−1, xt)Q(dx0:T ) given positive bounded potential functions (Gt )t∈[0:T] Define target path measure P(dx0:T ) = G0(x0) T t=1 Gt (xt−1, xt )Q(dx0:T )Z−1 The quantities µ, (Mt )t∈[1:T], (Gt )t∈[0:T] depend on the specific application Jeremy Heng Controlled SMC 4 / 25
  • 13. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N Jeremy Heng Controlled SMC 5 / 25
  • 14. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] Jeremy Heng Controlled SMC 5 / 25
  • 15. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] sample Xn 0 ∼ µ and weight W n 0 ∝ G0(Xn 0 ); Jeremy Heng Controlled SMC 5 / 25
  • 16. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] sample Xn 0 ∼ µ and weight W n 0 ∝ G0(Xn 0 ); sample ancestor index An 0 ∼ R W 1 0 , . . . , W N 0 Jeremy Heng Controlled SMC 5 / 25
  • 17. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] sample Xn 0 ∼ µ and weight W n 0 ∝ G0(Xn 0 ); sample ancestor index An 0 ∼ R W 1 0 , . . . , W N 0 For time t ∈ [1 : T] and particle n ∈ [1 : N] Jeremy Heng Controlled SMC 5 / 25
  • 18. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] sample Xn 0 ∼ µ and weight W n 0 ∝ G0(Xn 0 ); sample ancestor index An 0 ∼ R W 1 0 , . . . , W N 0 For time t ∈ [1 : T] and particle n ∈ [1 : N] sample Xn t ∼ Mt (X An t−1 t−1 , ·) and weight W n t ∝ Gt (X An t−1 t−1 , Xn t ); Jeremy Heng Controlled SMC 5 / 25
  • 19. Sequential Monte Carlo methods SMC methods simulate an interacting particle system of size N ∈ N At time t = 0 and particle n ∈ [1 : N] sample Xn 0 ∼ µ and weight W n 0 ∝ G0(Xn 0 ); sample ancestor index An 0 ∼ R W 1 0 , . . . , W N 0 For time t ∈ [1 : T] and particle n ∈ [1 : N] sample Xn t ∼ Mt (X An t−1 t−1 , ·) and weight W n t ∝ Gt (X An t−1 t−1 , Xn t ); sample ancestor index An t ∼ R W 1 t , . . . , W N t Jeremy Heng Controlled SMC 5 / 25
  • 20. Sequential Monte Carlo methods Unbiased estimator of Z ZN = 1 N N n=1 G0(Xn 0 ) T t=1 1 N N n=1 Gt (X An t−1 t−1 , Xn t ) Jeremy Heng Controlled SMC 6 / 25
  • 21. Sequential Monte Carlo methods Unbiased estimator of Z ZN = 1 N N n=1 G0(Xn 0 ) T t=1 1 N N n=1 Gt (X An t−1 t−1 , Xn t ) Particle approximation of P PN = 1 N N n=1 δXn 0:T where Xn 0:T is obtained by tracing ancestral lineage of particle Xn T Jeremy Heng Controlled SMC 6 / 25
  • 22. Sequential Monte Carlo methods Unbiased estimator of Z ZN = 1 N N n=1 G0(Xn 0 ) T t=1 1 N N n=1 Gt (X An t−1 t−1 , Xn t ) Particle approximation of P PN = 1 N N n=1 δXn 0:T where Xn 0:T is obtained by tracing ancestral lineage of particle Xn T Convergence properties of ZN and PN as N → ∞ are now well-understood Jeremy Heng Controlled SMC 6 / 25
  • 23. Sequential Monte Carlo methods Unbiased estimator of Z ZN = 1 N N n=1 G0(Xn 0 ) T t=1 1 N N n=1 Gt (X An t−1 t−1 , Xn t ) Particle approximation of P PN = 1 N N n=1 δXn 0:T where Xn 0:T is obtained by tracing ancestral lineage of particle Xn T Convergence properties of ZN and PN as N → ∞ are now well-understood However quality of approximation can be inadequate for practical choices of N Jeremy Heng Controlled SMC 6 / 25
  • 24. Sequential Monte Carlo methods Unbiased estimator of Z ZN = 1 N N n=1 G0(Xn 0 ) T t=1 1 N N n=1 Gt (X An t−1 t−1 , Xn t ) Particle approximation of P PN = 1 N N n=1 δXn 0:T where Xn 0:T is obtained by tracing ancestral lineage of particle Xn T Convergence properties of ZN and PN as N → ∞ are now well-understood However quality of approximation can be inadequate for practical choices of N Performance crucially depends on discrepancy between P and Q Jeremy Heng Controlled SMC 6 / 25
  • 25. Neuroscience model continued Using bootstrap particle filter (BPF), i.e. define Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) by X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2 ) Jeremy Heng Controlled SMC 7 / 25
  • 26. Neuroscience model continued Using bootstrap particle filter (BPF), i.e. define Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) by X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2 ) BPF struggles whenever the observations change abruptly 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 14 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Jeremy Heng Controlled SMC 7 / 25
  • 27. Neuroscience model continued Using bootstrap particle filter (BPF), i.e. define Q(dx0:T ) = µ(dx0) T t=1 Mt(xt−1, dxt ) by X0 ∼ N(0, 1), Xt |Xt−1 = xt−1 ∼ N(αxt−1, σ2 ) BPF struggles whenever the observations change abruptly 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 14 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Better performance obtained with observations dependent dynamics Jeremy Heng Controlled SMC 7 / 25
  • 28. Twisted path measures Note that P(dx0:T |y0:T ) = P(dx0|y0:T ) T t=1 P(dxt |xt−1, yt:T ) P(dx0|y0:T ) = µ(dx0)ψ∗ 0 (x0) µ(ψ∗ 0 ) , P(dxt |xt−1, yt:T ) = Mt (xt−1, dxt)ψ∗ t (xt ) Mt(ψ∗ t )(xt−1) where ψ∗ t (xt ) = P(yt:T |xt ) is the backward information filter Jeremy Heng Controlled SMC 8 / 25
  • 29. Twisted path measures Note that P(dx0:T |y0:T ) = P(dx0|y0:T ) T t=1 P(dxt |xt−1, yt:T ) P(dx0|y0:T ) = µ(dx0)ψ∗ 0 (x0) µ(ψ∗ 0 ) , P(dxt |xt−1, yt:T ) = Mt (xt−1, dxt)ψ∗ t (xt ) Mt(ψ∗ t )(xt−1) where ψ∗ t (xt ) = P(yt:T |xt ) is the backward information filter Given a policy ψ = (ψt )t∈[0:T], i.e. positive and bounded functions Jeremy Heng Controlled SMC 8 / 25
  • 30. Twisted path measures Note that P(dx0:T |y0:T ) = P(dx0|y0:T ) T t=1 P(dxt |xt−1, yt:T ) P(dx0|y0:T ) = µ(dx0)ψ∗ 0 (x0) µ(ψ∗ 0 ) , P(dxt |xt−1, yt:T ) = Mt (xt−1, dxt)ψ∗ t (xt ) Mt(ψ∗ t )(xt−1) where ψ∗ t (xt ) = P(yt:T |xt ) is the backward information filter Given a policy ψ = (ψt )t∈[0:T], i.e. positive and bounded functions Define ψ-twisted path measure of Q as Qψ (dx0:T ) = µψ (dx0) T t=1 Mψ t (xt−1, dxt ) where µψ (dx0) := µ(dx0)ψ0(x0) µ(ψ0) , Mψ t (xt−1, dxt) := Mt(xt−1, dxt)ψt (xt−1, xt ) Mt(ψt )(xt−1) Jeremy Heng Controlled SMC 8 / 25
  • 31. Twisted path measures Now define twisted potentials (Gψ t )t∈[0;T] such that P(dx0:T ) = Gψ 0 (x0) T t=1 Gψ t (xt−1, xt ) Qψ (dx0:T )Z−1 hence Z = EQψ Gψ 0 (X0) T t=1 Gψ t (Xt−1, Xt ) Jeremy Heng Controlled SMC 9 / 25
  • 32. Twisted path measures Now define twisted potentials (Gψ t )t∈[0;T] such that P(dx0:T ) = Gψ 0 (x0) T t=1 Gψ t (xt−1, xt ) Qψ (dx0:T )Z−1 hence Z = EQψ Gψ 0 (X0) T t=1 Gψ t (Xt−1, Xt ) Achieved with Gψ 0 (x0) := µ(ψ0)G0(x0)M1(ψ1)(x0) ψ0(x0) , Gψ t (xt−1, xt ) := Gt(xt−1, xt)Mt+1(ψt+1)(xt ) ψt (xt−1, xt ) , t ∈ [1 : T − 1], Gψ T (xT−1, xT ) := GT (xT−1, xT ) ψT (xT−1, xT ) Jeremy Heng Controlled SMC 9 / 25
  • 33. Twisted path measures Assume policy ψ is such that: Jeremy Heng Controlled SMC 10 / 25
  • 34. Twisted path measures Assume policy ψ is such that: sampling µψ and (Mψ t )t∈[1:T] feasible Jeremy Heng Controlled SMC 10 / 25
  • 35. Twisted path measures Assume policy ψ is such that: sampling µψ and (Mψ t )t∈[1:T] feasible evaluating (Gψ t )t∈[0:T] tractable Jeremy Heng Controlled SMC 10 / 25
  • 36. Twisted path measures Assume policy ψ is such that: sampling µψ and (Mψ t )t∈[1:T] feasible evaluating (Gψ t )t∈[0:T] tractable For neuroscience model, we consider Gaussian approximation ψt(xt ) = exp −atx2 t − bt xt − ct , (at , bt, ct ) ∈ R3 so µψ (x0) = N(x0; −k0b0, k0), Mψ t (xt−1, xt ) = N xt ; kt(ασ−2 xt−1 − bt), kt with k0 := (1 + 2a0)−1 and kt := (σ−2 + 2at)−1 Jeremy Heng Controlled SMC 10 / 25
  • 37. Twisted path measures Assume policy ψ is such that: sampling µψ and (Mψ t )t∈[1:T] feasible evaluating (Gψ t )t∈[0:T] tractable For neuroscience model, we consider Gaussian approximation ψt(xt ) = exp −atx2 t − bt xt − ct , (at , bt, ct ) ∈ R3 so µψ (x0) = N(x0; −k0b0, k0), Mψ t (xt−1, xt ) = N xt ; kt(ασ−2 xt−1 − bt), kt with k0 := (1 + 2a0)−1 and kt := (σ−2 + 2at)−1 Integrals µ(ψ0) and xt → Mt+1(ψt+1)(xt ) are tractable Jeremy Heng Controlled SMC 10 / 25
  • 38. Twisted SMC Construct ψ-twisted SMC as standard SMC applied to µψ , (Mψ t )t∈[1:T], (Gψ t )t∈[0:T] Jeremy Heng Controlled SMC 11 / 25
  • 39. Twisted SMC Construct ψ-twisted SMC as standard SMC applied to µψ , (Mψ t )t∈[1:T], (Gψ t )t∈[0:T] Unbiased estimator of Z Zψ,N = 1 N N n=1 Gψ 0 (Xn 0 ) T t=1 1 N N n=1 Gψ t (X An t−1 t−1 , Xn t ) Jeremy Heng Controlled SMC 11 / 25
  • 40. Twisted SMC Construct ψ-twisted SMC as standard SMC applied to µψ , (Mψ t )t∈[1:T], (Gψ t )t∈[0:T] Unbiased estimator of Z Zψ,N = 1 N N n=1 Gψ 0 (Xn 0 ) T t=1 1 N N n=1 Gψ t (X An t−1 t−1 , Xn t ) Particle approximation of P Pψ,N = 1 N N n=1 δXn 0:T Jeremy Heng Controlled SMC 11 / 25
  • 41. Optimal policies Policy with constant functions recover standard SMC Jeremy Heng Controlled SMC 12 / 25
  • 42. Optimal policies Policy with constant functions recover standard SMC Optimal policy ψ∗ T (xT−1xT ) = GT (xT−1, xT ), ψ∗ t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗ t+1)(xt ), t ∈ [T − 1 : 1], ψ∗ 0 (x0) = G0(x0)M1(ψ∗ 1 )(x0) Jeremy Heng Controlled SMC 12 / 25
  • 43. Optimal policies Policy with constant functions recover standard SMC Optimal policy ψ∗ T (xT−1xT ) = GT (xT−1, xT ), ψ∗ t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗ t+1)(xt ), t ∈ [T − 1 : 1], ψ∗ 0 (x0) = G0(x0)M1(ψ∗ 1 )(x0) Under ψ∗ = (ψ∗ t )t∈[0:T] Qψ∗ = P and Zψ∗ ,N = Z for N ≥ 1 Jeremy Heng Controlled SMC 12 / 25
  • 44. Optimal policies Policy with constant functions recover standard SMC Optimal policy ψ∗ T (xT−1xT ) = GT (xT−1, xT ), ψ∗ t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗ t+1)(xt ), t ∈ [T − 1 : 1], ψ∗ 0 (x0) = G0(x0)M1(ψ∗ 1 )(x0) Under ψ∗ = (ψ∗ t )t∈[0:T] Qψ∗ = P and Zψ∗ ,N = Z for N ≥ 1 Backward recursion defining ψ∗ is typically intractable Jeremy Heng Controlled SMC 12 / 25
  • 45. Optimal policies Policy with constant functions recover standard SMC Optimal policy ψ∗ T (xT−1xT ) = GT (xT−1, xT ), ψ∗ t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗ t+1)(xt ), t ∈ [T − 1 : 1], ψ∗ 0 (x0) = G0(x0)M1(ψ∗ 1 )(x0) Under ψ∗ = (ψ∗ t )t∈[0:T] Qψ∗ = P and Zψ∗ ,N = Z for N ≥ 1 Backward recursion defining ψ∗ is typically intractable If potentials (Gt )t∈[0:T] and transition densities of (Mt )t∈[1:T] are log-concave, then ψ∗ is log-concave Jeremy Heng Controlled SMC 12 / 25
  • 46. Optimal policies Policy with constant functions recover standard SMC Optimal policy ψ∗ T (xT−1xT ) = GT (xT−1, xT ), ψ∗ t (xt−1, xt) = Gt (xt−1, xt )Mt+1(ψ∗ t+1)(xt ), t ∈ [T − 1 : 1], ψ∗ 0 (x0) = G0(x0)M1(ψ∗ 1 )(x0) Under ψ∗ = (ψ∗ t )t∈[0:T] Qψ∗ = P and Zψ∗ ,N = Z for N ≥ 1 Backward recursion defining ψ∗ is typically intractable If potentials (Gt )t∈[0:T] and transition densities of (Mt )t∈[1:T] are log-concave, then ψ∗ is log-concave For neuroscience model, this justifies a Gaussian approximation F := ξ(x) = exp −ax2 − bx − c : (a, b, c) ∈ R3 Jeremy Heng Controlled SMC 12 / 25
  • 47. Connection to optimal control V ∗ t := − log ψ∗ t are the optimal value functions of the Kullback-Leibler control problem inf ψ∈Φ KL Qψ |P Φ := {ψ ∈ Ψ : KL(Qψ |P) < ∞} Jeremy Heng Controlled SMC 13 / 25
  • 48. Connection to optimal control V ∗ t := − log ψ∗ t are the optimal value functions of the Kullback-Leibler control problem inf ψ∈Φ KL Qψ |P Φ := {ψ ∈ Ψ : KL(Qψ |P) < ∞} Methods to approximate backward recursion are known as approximate dynamic programming (ADP) for finite horizon control problems (Bertsekas and Tsitsiklis, 1996) Jeremy Heng Controlled SMC 13 / 25
  • 49. Connection to optimal control V ∗ t := − log ψ∗ t are the optimal value functions of the Kullback-Leibler control problem inf ψ∈Φ KL Qψ |P Φ := {ψ ∈ Ψ : KL(Qψ |P) < ∞} Methods to approximate backward recursion are known as approximate dynamic programming (ADP) for finite horizon control problems (Bertsekas and Tsitsiklis, 1996) Connection also useful for analysis (Tsitsiklis and Van Roy, 2001) Jeremy Heng Controlled SMC 13 / 25
  • 50. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Jeremy Heng Controlled SMC 14 / 25
  • 51. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion Jeremy Heng Controlled SMC 14 / 25
  • 52. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion For time T: approximate ψ∗ T = GT using ˆψT = arg min ξ∈FT N n=1 {log ξ − log GT }2 (X An T−1 T−1 , Xn T ) Jeremy Heng Controlled SMC 14 / 25
  • 53. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion For time T: approximate ψ∗ T = GT using ˆψT = arg min ξ∈FT N n=1 {log ξ − log GT }2 (X An T−1 T−1 , Xn T ) For time t ∈ [T − 1 : 0]: approximate ψ∗ t = Gt Mt+1(ψ∗ t+1) using ˆψt = arg min ξ∈Ft N n=1 log ξ − log Gt − log Mt+1( ˆψt+1) 2 (X An t−1 t−1 , Xn t ) Jeremy Heng Controlled SMC 14 / 25
  • 54. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion For time T: approximate ψ∗ T = GT using ˆψT = arg min ξ∈FT N n=1 {log ξ − log GT }2 (X An T−1 T−1 , Xn T ) For time t ∈ [T − 1 : 0]: approximate ψ∗ t = Gt Mt+1(ψ∗ t+1) using ˆψt = arg min ξ∈Ft N n=1 log ξ − log Gt − log Mt+1( ˆψt+1) 2 (X An t−1 t−1 , Xn t ) Output: policy ˆψ = ( ˆψt )t∈[0:T] Jeremy Heng Controlled SMC 14 / 25
  • 55. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion For time T: approximate ψ∗ T = GT using ˆψT = arg min ξ∈FT N n=1 {log ξ − log GT }2 (X An T−1 T−1 , Xn T ) For time t ∈ [T − 1 : 0]: approximate ψ∗ t = Gt Mt+1(ψ∗ t+1) using ˆψt = arg min ξ∈Ft N n=1 log ξ − log Gt − log Mt+1( ˆψt+1) 2 (X An t−1 t−1 , Xn t ) Output: policy ˆψ = ( ˆψt )t∈[0:T] Run ˆψ-twisted SMC Jeremy Heng Controlled SMC 14 / 25
  • 56. Approximate dynamic programming First run standard SMC to get particles (Xn t ) n ∈ [1 : N], t ∈ [0 : T] Approximate backward recursion For time T: approximate ψ∗ T = GT using ˆψT = arg min ξ∈FT N n=1 {log ξ − log GT }2 (X An T−1 T−1 , Xn T ) For time t ∈ [T − 1 : 0]: approximate ψ∗ t = Gt Mt+1(ψ∗ t+1) using ˆψt = arg min ξ∈Ft N n=1 log ξ − log Gt − log Mt+1( ˆψt+1) 2 (X An t−1 t−1 , Xn t ) Output: policy ˆψ = ( ˆψt )t∈[0:T] Run ˆψ-twisted SMC Algorithm is essentially Guarniero, Johansen & Lee (2017) if projecting in natural scale Jeremy Heng Controlled SMC 14 / 25
  • 57. Approximate dynamic programming We obtain error bounds like E ˆψt − ψ∗ t L2 ≤ T s=t Ct−1,s−1eN s where Ct,s are stability constants and eN t are approximation errors Jeremy Heng Controlled SMC 15 / 25
  • 58. Approximate dynamic programming We obtain error bounds like E ˆψt − ψ∗ t L2 ≤ T s=t Ct−1,s−1eN s where Ct,s are stability constants and eN t are approximation errors As N → ∞, ˆψ converges (LLN & CLT) to idealized ADP ˜ψT = arg min ξ∈FT E {log ξ − log GT } 2 (XT−1, XT ) , ˜ψt = arg min ξ∈Ft E log ξ − log Gt − log Mt+1( ˜ψt+1) 2 (Xt−1, Xt ) Jeremy Heng Controlled SMC 15 / 25
  • 59. Neuroscience model continued ˆψ-twisted SMC improves upon standard SMC 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Jeremy Heng Controlled SMC 16 / 25
  • 60. Neuroscience model continued ˆψ-twisted SMC improves upon standard SMC 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T] Jeremy Heng Controlled SMC 16 / 25
  • 61. Neuroscience model continued ˆψ-twisted SMC improves upon standard SMC 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T] Controlled SMC approximates P(y0:T ), t ∈ [0 : T] Jeremy Heng Controlled SMC 16 / 25
  • 62. Neuroscience model continued ˆψ-twisted SMC improves upon standard SMC 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Uncontrolled SMC approximates P(y0:t ), t ∈ [0 : T] Controlled SMC approximates P(y0:T ), t ∈ [0 : T] Can we do better? Jeremy Heng Controlled SMC 16 / 25
  • 63. Policy refinement Want to refine current policy ˆψ Jeremy Heng Controlled SMC 17 / 25
  • 64. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Jeremy Heng Controlled SMC 17 / 25
  • 65. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Optimal choice of φ (with respect to ˆψ) φ∗ T (xT−1xT ) = G ˆψ T (xT−1, xT ), φ∗ t (xt−1, xt ) = G ˆψ t (xt−1, xt )M ˆψ t+1(φ∗ t+1)(xt ), t ∈ [T − 1 : 1], φ∗ 0(x0) = G ˆψ 0 (x0)M ˆψ 1 (φ∗ 1)(x0) as ˆψ · φ∗ = ψ∗ Jeremy Heng Controlled SMC 17 / 25
  • 66. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Optimal choice of φ (with respect to ˆψ) φ∗ T (xT−1xT ) = G ˆψ T (xT−1, xT ), φ∗ t (xt−1, xt ) = G ˆψ t (xt−1, xt )M ˆψ t+1(φ∗ t+1)(xt ), t ∈ [T − 1 : 1], φ∗ 0(x0) = G ˆψ 0 (x0)M ˆψ 1 (φ∗ 1)(x0) as ˆψ · φ∗ = ψ∗ Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T] Jeremy Heng Controlled SMC 17 / 25
  • 67. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Optimal choice of φ (with respect to ˆψ) φ∗ T (xT−1xT ) = G ˆψ T (xT−1, xT ), φ∗ t (xt−1, xt ) = G ˆψ t (xt−1, xt )M ˆψ t+1(φ∗ t+1)(xt ), t ∈ [T − 1 : 1], φ∗ 0(x0) = G ˆψ 0 (x0)M ˆψ 1 (φ∗ 1)(x0) as ˆψ · φ∗ = ψ∗ Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T] using particles from ˆψ-twisted SMC and same function classes (Ft )t∈[0:T] Jeremy Heng Controlled SMC 17 / 25
  • 68. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Optimal choice of φ (with respect to ˆψ) φ∗ T (xT−1xT ) = G ˆψ T (xT−1, xT ), φ∗ t (xt−1, xt ) = G ˆψ t (xt−1, xt )M ˆψ t+1(φ∗ t+1)(xt ), t ∈ [T − 1 : 1], φ∗ 0(x0) = G ˆψ 0 (x0)M ˆψ 1 (φ∗ 1)(x0) as ˆψ · φ∗ = ψ∗ Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T] using particles from ˆψ-twisted SMC and same function classes (Ft )t∈[0:T] Construct refined policy with ˆψ · ˆφ Jeremy Heng Controlled SMC 17 / 25
  • 69. Policy refinement Want to refine current policy ˆψ Twist Q ˆψ further with policy φ = (φt )t∈[0:T] gives Q ˆψ φ = Q ˆψ·φ , ˆψ · φ = ( ˆψt · φt )t∈[0:T] Optimal choice of φ (with respect to ˆψ) φ∗ T (xT−1xT ) = G ˆψ T (xT−1, xT ), φ∗ t (xt−1, xt ) = G ˆψ t (xt−1, xt )M ˆψ t+1(φ∗ t+1)(xt ), t ∈ [T − 1 : 1], φ∗ 0(x0) = G ˆψ 0 (x0)M ˆψ 1 (φ∗ 1)(x0) as ˆψ · φ∗ = ψ∗ Approximate this backward recursion to obtain ˆφ = (ˆφt )t∈[0:T] using particles from ˆψ-twisted SMC and same function classes (Ft )t∈[0:T] Construct refined policy with ˆψ · ˆφ Call this iterative scheme to refine policies controlled SMC Jeremy Heng Controlled SMC 17 / 25
  • 70. Neuroscience model continued Iteration 2 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 Iteration 2 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Iteration 2 0 1000 2000 -3100 -3090 -3080 -3070 Jeremy Heng Controlled SMC 18 / 25
  • 71. Neuroscience model continued Iteration 3 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 100 Uncontrolled Iteration 1 Iteration 2 Iteration 3 0 500 1000 1500 2000 2500 -3000 -2500 -2000 -1500 -1000 -500 Uncontrolled Iteration 1 Iteration 2 Iteration 3 0 1000 2000 -3100 -3090 -3080 -3070 Jeremy Heng Controlled SMC 19 / 25
  • 72. Neuroscience model continued Under Ft := ξ(x) = exp −atx2 − bt x − ct : (at, bt , ct) ∈ R3 , policy at iteration i ≥ 1 ψ (i) t (xt ) = exp −a (i) t x2 t − b (i) t xt − c (i) t where (a (i) t , b (i) t , c (i) t ) = i k=1(ak t , bk t , ck t ) Jeremy Heng Controlled SMC 20 / 25
  • 73. Neuroscience model continued Under Ft := ξ(x) = exp −atx2 − bt x − ct : (at, bt , ct) ∈ R3 , policy at iteration i ≥ 1 ψ (i) t (xt ) = exp −a (i) t x2 t − b (i) t xt − c (i) t where (a (i) t , b (i) t , c (i) t ) = i k=1(ak t , bk t , ck t ) (ak t , bk t , ck t ) are coefficients estimated at iteration k ≥ 1 0 500 1000 1500 2000 2500 3000 -4 -2 0 2 4 6 8 Iteration 1 Iteration 2 Iteration 3 0 500 1000 1500 2000 2500 3000 -20 -10 0 10 20 30 40 Iteration 1 Iteration 2 Iteration 3 Jeremy Heng Controlled SMC 20 / 25
  • 74. Policy refinement Residual from first ADP when fitting ˆψ is εt := log ˆψt − log Gt − log Mt+1( ˆψt+1) Jeremy Heng Controlled SMC 21 / 25
  • 75. Policy refinement Residual from first ADP when fitting ˆψ is εt := log ˆψt − log Gt − log Mt+1( ˆψt+1) Next ADP refinement re-fits residual (like L2 -boosting) ˆφt = arg min ξ∈F N n=1 log ξ − εt − log M ˆψ t+1(ˆφt+1) 2 (X An t−1 t−1 , Xn t ) Jeremy Heng Controlled SMC 21 / 25
  • 76. Policy refinement Residual from first ADP when fitting ˆψ is εt := log ˆψt − log Gt − log Mt+1( ˆψt+1) Next ADP refinement re-fits residual (like L2 -boosting) ˆφt = arg min ξ∈F N n=1 log ξ − εt − log M ˆψ t+1(ˆφt+1) 2 (X An t−1 t−1 , Xn t ) Twisted potential of refined policy ˆψ · ˆφ − log G ˆψ· ˆφ t = log ˆφt − log G ˆψ t − log M ˆψ t+1(ˆφt+1) new residual from fitting ˆφ Jeremy Heng Controlled SMC 21 / 25
  • 77. Policy refinement Iterative scheme generates a Markov chain on policy space FN ( ˆψ, U) = ˆψ · ˆφ where ˆψ is current policy and ˆφ is ADP output Jeremy Heng Controlled SMC 22 / 25
  • 78. Policy refinement Iterative scheme generates a Markov chain on policy space FN ( ˆψ, U) = ˆψ · ˆφ where ˆψ is current policy and ˆφ is ADP output Converges to a unique invariant distribution that concentrates around fixed points of F( ˆψ) = ˆψ · ˜φ where ˆψ is current policy and ˜φ is idealized ADP output 1.8 2 2.2 2.4 2.6 2.8 3 3.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 14 16 18 20 22 24 0 1 2 3 4 5 6 Jeremy Heng Controlled SMC 22 / 25
  • 79. Neuroscience model continued (Left) Comparison to bootstrap particle filter (BPF) Jeremy Heng Controlled SMC 23 / 25
  • 80. Neuroscience model continued (Left) Comparison to bootstrap particle filter (BPF) (Right) Comparison to forward filtering and backward smoother (FFBS) for functional x0:T → 50κ(x0:T ) 0.05 0.1 0.15 0.2 -9.5 -9 -8.5 -8 -7.5 -7 -6.5 -6 BPF cSMC 0 500 1000 1500 2000 2500 3000 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 FFBS cSMC Figure: Relative variance of marginal likelihood estimates (left) and estimates of smoothing expectation (right). Jeremy Heng Controlled SMC 23 / 25
  • 81. Neuroscience model continued Bayesian inference for parameters θ = (α, σ2 ) within particle marginal Metropolis-Hastings (PMMH) Jeremy Heng Controlled SMC 24 / 25
  • 82. Neuroscience model continued Bayesian inference for parameters θ = (α, σ2 ) within particle marginal Metropolis-Hastings (PMMH) cSMC and BPF to produce unbiased estimates of marginal likelihood Jeremy Heng Controlled SMC 24 / 25
  • 83. Neuroscience model continued Bayesian inference for parameters θ = (α, σ2 ) within particle marginal Metropolis-Hastings (PMMH) cSMC and BPF to produce unbiased estimates of marginal likelihood Autocorrelation function (ACF) of each PMMH chain 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BPF cSMC 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BPF cSMC Jeremy Heng Controlled SMC 24 / 25
  • 84. Neuroscience model continued Bayesian inference for parameters θ = (α, σ2 ) within particle marginal Metropolis-Hastings (PMMH) cSMC and BPF to produce unbiased estimates of marginal likelihood Autocorrelation function (ACF) of each PMMH chain 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BPF cSMC 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BPF cSMC ESS improvement roughly 5 times for both parameters Jeremy Heng Controlled SMC 24 / 25
  • 85. Concluding remarks Methodology can be extended to static models or SMC samplers πt (dx) = π0(dx)L(x)λt Zt where 0 = λ0 < λ1 < · · · < λT = 1 Jeremy Heng Controlled SMC 25 / 25
  • 86. Concluding remarks Methodology can be extended to static models or SMC samplers πt (dx) = π0(dx)L(x)λt Zt where 0 = λ0 < λ1 < · · · < λT = 1 Draft: Controlled Sequential Monte Carlo, arXiv:1708.08396, 2017. Jeremy Heng Controlled SMC 25 / 25
  • 87. Concluding remarks Methodology can be extended to static models or SMC samplers πt (dx) = π0(dx)L(x)λt Zt where 0 = λ0 < λ1 < · · · < λT = 1 Draft: Controlled Sequential Monte Carlo, arXiv:1708.08396, 2017. MATLAB code: https://github.com/jeremyhengjm/controlledSMC Jeremy Heng Controlled SMC 25 / 25