SlideShare a Scribd company logo
Estimation of the score vector and observed
information matrix in intractable models
Arnaud Doucet (University of Oxford)
Pierre E. Jacob (University of Oxford)
Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis)
April 15th, 2015
Pierre Jacob Derivative estimation 1/ 35
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 1/ 35
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 2/ 35
Motivation
Derivatives of the likelihood help optimizing / sampling.
For many models they are not available.
One can resort to approximation techniques.
Pierre Jacob Derivative estimation 2/ 35
Motivation
Let ℓ(θ) denote a “log-likelihood” but it could be any
function.
Let L(θ) = exp ℓ(θ), the “likelihood”.
Assume that we have access to estimators L(θ) of L(θ)
such that
E[L(θ)] = L(θ)
and
V[L(θ)] =
L(θ)2v(θ)
M
.
Pierre Jacob Derivative estimation 3/ 35
Finite difference
First derivative:
ℓ(1)
(θ⋆
) =
log L(θ⋆ + h) − log L(θ⋆ − h)
2h
.
converges to ∇ℓ(θ⋆) when M → ∞ and h → 0.
Second derivative:
ℓ(2)
(θ) =
log L(θ + h) − 2 log L(θ) + log L(θ − h)
h2
.
converges to ∇2ℓ(θ⋆) when M → ∞ and h → 0.
Pierre Jacob Derivative estimation 4/ 35
Finite difference
Optimal rate of convergence for the first derivative:
h ∼ M−1/6
leading to MSE ∼ M−2/3
.
For the second derivative:
h ∼ M−1/8
leading to MSE ∼ M−1/2
.
Pierre Jacob Derivative estimation 5/ 35
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 6/ 35
Iterated Filtering
Given a log likelihood ℓ and a given point, consider a prior
θ ∼ N(θ⋆
, τ2
).
Posterior expectation when the prior variance goes to zero
First-order moments give first-order derivatives:
|τ−2
(E[θ|Y ] − θ⋆
) − ∇ℓ(θ⋆
)| ≤ Cτ.
Phrased simply,
posterior mean − prior mean
prior variance
≈ score.
Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering,
2011.
Pierre Jacob Derivative estimation 6/ 35
Stein’s lemma
Stein’s lemma states that
θ ∼ N(θ⋆
, τ2
)
if and only if for any function g such that E [|∇g(θ)|] < ∞,
E [(θ − θ⋆
) g (θ)] = τ2
E [∇g (θ)] .
If we choose the function g : θ → exp ℓ (θ) /Z with
Z = E [exp ℓ (θ)] and apply Stein’s lemma we obtain
1
Z
E [θ exp ℓ(θ)] − θ0 =
τ2
Z
E [∇ℓ (θ) exp (ℓ (θ))]
⇔ τ−2
(E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] .
Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z.
Pierre Jacob Derivative estimation 7/ 35
Stein’s lemma
For the second derivative, we consider
h : θ → (θ − θ⋆
) exp ℓ (θ) /Z.
Then
E
[
(θ − θ⋆
)2
| Y
]
= τ2
+ τ4
E
[
∇2
ℓ(θ) + ∇ℓ(θ)2
| Y
]
.
Adding and subtracting terms also yields
τ−4
(
V [θ | Y ] − τ2
)
= E
[
∇2
ℓ(θ) | Y
]
+
{
E
[
∇ℓ(θ)2
| Y
]
− (E [∇ℓ(θ) | Y ])2
}
.
. . . but what we really want is
∇ℓ(θ⋆
), ∇ℓ2
(θ⋆
)
and not
E [∇ℓ(θ) | Y ] , E
[
∇ℓ2
(θ) | Y
]
.
Pierre Jacob Derivative estimation 8/ 35
Core Idea
The prior is a normal distribution N(θ⋆, τ2).
The prior moments behave like:
Eτ [φ (Θ)] = φ (θ⋆
) +
τ2
2
∇2
φ (θ⋆
) + O
(
τ4
)
.
The posterior moments behave like:
Eτ [φ (Θ) | Y ] = φ (θ⋆
)+
τ2
2
(
∇2
φ (θ⋆
) + 2∇φ (θ⋆
) ∇ℓ (θ⋆
)
)
+O
(
τ4
)
.
Our arXived proof suffers from an overdose of Taylor
expansions.
Pierre Jacob Derivative estimation 9/ 35
Proof: prior moments
Let φ : R → R be a four times continuously differentiable
function. Assume that there exists a constant M < ∞ and
δ > 0 such that d4φ(θ)
dθ4 ≤ M for all θ ∈ B(θ⋆, δ).
Cut the expectation into two parts:
Eτ [φ (Θ)] =
∫
B(θ⋆,δ)
φ (θ) pτ (θ) dθ +
∫
Bc(θ⋆,δ)
φ (θ) pτ (θ) dθ
The second term is o(τk) for any power k.
Pierre Jacob Derivative estimation 10/ 35
Proof: prior moments
Let’s deal with the first term:
∫
B(θ⋆,δ)
φ (θ) pτ (dθ) .
Taylor expansion:
∀θ ∈ B(θ⋆
, δ) φ(θ) = φ (θ⋆
) +
∑
k=1,2,3
dkφ(θ⋆)
dθk
1
k!
(θ − θ⋆
)k
+ R3 (θ, θ⋆
) .
The Gaussian prior integrates any (θ − θ⋆)k with odd k to zero
over B(θ⋆, δ), so
∫
B(θ⋆,δ)
φ(θ)pτ (dθ) = φ (θ⋆
) +
d2φ(θ⋆)
dθ2
∫
B(θ⋆,δ)
1
2
(θ − θ⋆
)2
pτ (dθ)
+
∫
B(θ⋆,δ)
R3 (θ, θ⋆
) pτ (dθ) .
Pierre Jacob Derivative estimation 11/ 35
Proof: prior moments
For
d2φ(θ⋆)
dθ2
∫
B(θ⋆,δ)
1
2
(θ − θ⋆
)2
pτ (dθ)
we “complete the integral” over all R and subtract an integral
over Bc(θ⋆, δ) which is o(τk) for any k. We are left with
d2φ(θ⋆)
dθ2
τ2
2
+ o(τk
) for any k
For ∫
B(θ⋆,δ)
R3 (θ, θ⋆
) pτ (dθ) .
we use the assumption to say that for all θ there is ˜θ in B(θ⋆, δ)
such that
R3(θ, θ⋆
) =
d4φ(˜θ)
dθ4
(θ − θ⋆)4
4!
.
Pierre Jacob Derivative estimation 12/ 35
Proof: prior moments
Since d4φ(˜θ)
dθ4 is upper bounded by some M by assumption:
|
∫
B(θ⋆,δ)
R3 (θ, θ⋆
) pτ (dθ) |
≤ M
∫
B(θ⋆,δ)
(θ − θ⋆)4
4!
pτ (dθ)
≤ M
∫
R
(θ − θ⋆)4
4!
pτ (dθ)
= τ4
× C.
Combining all the terms, we obtain
Eτ [φ (Θ)] = φ (θ⋆
) +
τ2
2
∇2
φ (θ⋆
) + O
(
τ4
)
.
Pierre Jacob Derivative estimation 13/ 35
Proof: posterior moments
We want to obtain the posterior moments:
Eτ [φ (Θ) | Y ] = φ (θ⋆
)+
τ2
2
(
∇2
φ (θ⋆
) + 2∇φ (θ⋆
) ∇ℓ (θ⋆
)
)
+O
(
τ4
)
.
We write:
Eτ [φ (Θ) | Y ] =
Eτ [φ (Θ) L (Θ)]
Eτ [L (Θ)]
.
Then we apply the prior moment expansion for φ × L and for
L, and we simplify the ratio of two expansions.
We need to assume that the likelihood is four times continuously
differentiable, with bounded fourth derivatives around θ⋆.
Pierre Jacob Derivative estimation 14/ 35
Main results
In general, with a prior N(θ⋆, τ2Σ), when Σ is fixed and τ goes
to zero,
τ−2
Σ−1
Epost [(Θ − θ⋆
)] = ∇ℓ(θ⋆
) + O
(
τ2
)
,
{
τ−4
Σ−1
(
Vpost [Θ] − τ2
Σ
)
Σ−1
}
= ∇2
ℓ(θ⋆
) + O
(
τ2
)
.
Pierre Jacob Derivative estimation 15/ 35
Extension of Iterated Filtering
Posterior variance when the prior variance goes to zero
Second-order moments give second-order derivatives:
|τ−4
(
Cov[θ|Y ] − τ2
)
− ∇2
ℓ(θ⋆
)| ≤ Cτ2
.
Phrased simply,
posterior variance − prior variance
prior variance2 ≈ hessian.
Result from Doucet, Jacob, Rubenthaler on arXiv, 2013.
Pierre Jacob Derivative estimation 16/ 35
Proximity mapping
Given a real function f and a point θ⋆, consider for any τ2 > 0
θ → f (θ) exp
{
−
1
2τ2
(θ − θ⋆
)2
}
θ
θ0
Figure : Example for f : θ → exp(−|θ|) and three values of τ2
.
Pierre Jacob Derivative estimation 17/ 35
Proximity mapping
Proximity mapping
The τ2-proximity mapping is defined by
proxf : θ0 → argmaxθ∈R f (θ) exp
{
−
1
2τ2
(θ − θ0)2
}
.
Moreau approximation
The τ2-Moreau approximation is defined by
fτ2 : θ0 → C supθ∈R f (θ) exp
{
−
1
2τ2
(θ − θ0)2
}
where C is a normalizing constant.
Pierre Jacob Derivative estimation 18/ 35
Proximity mapping
θ
Figure : θ → f (θ) and θ → fτ2 (θ) for three values of τ2
.
Pierre Jacob Derivative estimation 18/ 35
Proximity mapping
Property
Those objects are such that
proxf (θ0) − θ0
τ2
= ∇ log fτ2 (θ0) −−−→
τ2→0
∇ log f (θ0)
Moreau (1962), Fonctions convexes duales et points proximaux
dans un espace Hilbertien.
Pereyra (2013), Proximal Markov chain Monte Carlo
algorithms.
Pierre Jacob Derivative estimation 19/ 35
Proximity mapping
Bayesian interpretation
If f is a seen as a likelihood function then
θ → f (θ) exp
{
−
1
2τ2
(θ − θ0)2
}
is an unnormalized posterior density function based on a
Normal prior with mean θ0 and variance τ2.
Hence
proxf (θ0) − θ0
τ2
−−−→
τ2→0
∇ log f (θ0)
can be read
posterior mode − prior mode
prior variance
≈ score.
Pierre Jacob Derivative estimation 20/ 35
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 21/ 35
Moment shift estimator
How to estimate:
S(θ⋆
) = τ−2
Σ−1
Epost [(Θ − θ⋆
)] = ∇ℓ(θ⋆
) + O
(
τ2
)
?
Importance Sampling estimator from the prior :
SN (θ⋆
) = τ−2
Σ−1
(
1
N
N∑
i=1
ˆWiθi − θ⋆
)
where
ˆWi =
L(θi)
∑N
j=1 L(θj)
.
Turns out it is better to use
SN (θ⋆
) = τ−2
Σ−1
(
1
N
N∑
i=1
ˆWiθi −
1
N
N∑
i=1
θi
)
.
Any idea why?
Pierre Jacob Derivative estimation 21/ 35
Moment shift estimator
For the second order derivative, we want to estimate:
{
τ−4
Σ−1
(
Vpost [Θ] − τ2
Σ
)
Σ−1
}
= ∇2
ℓ(θ⋆
) + O
(
τ2
)
.
We propose:
τ−4
Σ−1


N∑
i=1
ˆWi
(
θi −
N∑
i=1
ˆWjθj
)2
−
1
N
N∑
i=1
(
θi −
1
N
N∑
i=1
θi
)2

 Σ−1
.
Pierre Jacob Derivative estimation 22/ 35
Moment shift estimator
We retrieve the same rates of convergence as in finite difference,
e.g. if τ ∼ N−1/6, the MSE is in N−2/3.
Then why bother? Why has better performance been observed
in practice in some scenarios?
Different behaviour in the non-asympotic regime.
For any function υ and tuning parameter M,
V
(
SN (θ⋆
)
)
≤ τ−2
CN ,
where CN does not depend on υ nor M.
Pierre Jacob Derivative estimation 23/ 35
Outline
1 Context
2 General results
3 Monte Carlo
4 Hidden Markov models
Pierre Jacob Derivative estimation 24/ 35
Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Pierre Jacob Derivative estimation 24/ 35
Hidden Markov models
Direct application of the previous results
1 Prior distribution N(θ0, σ2) on the parameter θ.
2 The derivative approximations involve E[θ|Y ] and
Cov[θ|Y ].
3 Posterior moments for HMMs can be estimated by
particle MCMC,
SMC2
,
ABC
or your favourite method.
Ionides et al. proposed another approach.
Pierre Jacob Derivative estimation 25/ 35
Iterated Filtering
Modification of the model: θ is time-varying.
The associated loglikelihood is
¯ℓ(θ1:T ) = log p(y1:T ; θ1:T )
= log
∫
XT+1
T∏
t=1
g(yt | xt, θt) µ(dx1 | θ1)
T∏
t=2
f (dxt | xt−1, θt).
Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have
¯ℓ(θ[T]
) = ℓ(θ)
and the chain rule yields
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 26/ 35
Iterated Filtering
Choice of prior on θ1:T :
θ1 = θ0 + V1, V1 ∼ τ−1
κ
{
τ−1
(·)
}
θt+1 − θ0 = ρ
(
θt − θ0
)
+ Vt+1, Vt+1 ∼ σ−1
κ
{
σ−1
(·)
}
Choose σ2 such that τ2 = σ2/(1 − ρ2). Covariance of the prior
on θ1:T :
ΣT = τ2












1 ρ · · · · · · · · · ρT−1
ρ 1 ρ · · · · · · ρT−2
ρ2 ρ 1
... ρT−3
...
...
...
...
...
ρT−2 ... 1 ρ
ρT−1 · · · · · · · · · ρ 1












.
Pierre Jacob Derivative estimation 27/ 35
Iterated Filtering
Applying the general results for this prior yields, with
|x| =
∑T
t=1 |xi|:
|∇¯ℓ(θ
[T]
0 ) − Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)
| ≤ Cτ2
Moreover we have
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
≤
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
and
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 28/ 35
Iterated Filtering
The estimator of the score is thus given by
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
which can be reduced to
Sτ,ρ,T (θ0) =
τ−2
1 + ρ
[
(1 − ρ)
{T−1∑
t=2
E
(
θt Y
)
}
− {(1 − ρ) T + 2ρ} θ0
+E
(
θ1 Y
)
+ E
(
θT Y
)]
,
given the form of Σ−1
T . Note that in the quantities E(θt | Y ),
Y = Y1:T is the complete dataset, thus those expectations are
with respect to the smoothing distribution.
Pierre Jacob Derivative estimation 29/ 35
Iterated Filtering
If ρ = 1, then the parameters follow a random walk:
θ1 = θ0 + N(0, τ2
) and θt+1 = θt + N(0, σ2
).
In this case Ionides et al. proposed the estimator
Sτ,σ,T = τ−2
(
E
(
θT | Y
)
− θ0
)
as well as
S
(bis)
τ,σ,T =
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t].
Those expressions only involve expectations with respect to
filtering distributions.
Pierre Jacob Derivative estimation 30/ 35
Iterated Filtering
If ρ = 0, then the parameters are i.i.d:
θ1 = θ0 + N(0, τ2
) and θt+1 = θ0 + N(0, τ2
).
In this case the expression of the score estimator reduces to
Sτ,T = τ−2
T∑
t=1
(
E
(
θt | Y
)
− θ0
)
which involves smoothing distributions.
There’s only one parameter τ2 to choose for the prior.
However smoothing for general hidden Markov models is
difficult, and typically resorts to “fixed lag approximations”.
Pierre Jacob Derivative estimation 31/ 35
Numerical results
Linear Gaussian state space model where the ground truth is
available through the Kalman filter.
X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V )
Yt = ηXt + N(0, W ).
Generate T = 100 observations and set
ρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9.
240 independent runs, matching the computational costs
between methods in terms of number of calls to the transition
kernel.
Pierre Jacob Derivative estimation 32/ 35
Numerical results
q
q
q
q
q
q
q q q
q q q q q q q q q qq
q
q
qq
q
q
q
q q q q q q q q q q q
q
q
q
q
q
q
q
q
q
q
q q q
q q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
10
100
1000
10000
0.0 0.1 0.2 0.3
h
RMSE
parameter q q q q
1 2 3 4
Finite Difference
qq q q q q q q q q q q q q q q q q q q
qq q q q q q q q q q q q q q q q q q q
qq q q q q q q q q q q q q q q q q q q
q
q q q q q q q q q q q q q q q q q q q
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Smoothing
Figure : 240 runs for Iterated Smoothing and Finite Difference.
Pierre Jacob Derivative estimation 33/ 35
Numerical results
qqqqqqq q q q q q q q q q q q q q
qqqqqqq q q q q q q q q q q q q q
qqqqqqq q q q q q q q q q q q q q
q
qqqqqq q q q q q q q q q q q q q
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Smoothing
qq
q
q
q
q q q q q
q
q
q
q q
q q q q q
q
q
q
q
q
q
q q q q
q
q
q
q
q q q q q q
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Filtering 1
qqqq q q q q q q
qqqq q q q q q q
qqqq q q
q
q q q
qqqq
q
q
q q q q
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Filtering 2
Figure : 240 runs for Iterated Smoothing and Iterated Filtering.
Pierre Jacob Derivative estimation 34/ 35
Bibliography
Main references:
Inference for nonlinear dynamical systems, Ionides, Breto,
King, PNAS, 2006.
Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals
of Statistics, 2011.
Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall,
Madsen, 16th IFAC Symposium on System Identification.
Derivative-Free Estimation of the Score Vector
and Observed Information Matrix,
Doucet, Jacob, Rubenthaler, 2013 (on arXiv).
Pierre Jacob Derivative estimation 35/ 35

More Related Content

PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Tensor Train data format for uncertainty quantification
PDF
Unbiased Hamiltonian Monte Carlo
PDF
Jere Koskela slides
PDF
Numerical integration based on the hyperfunction theory
PDF
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Tensor Train data format for uncertainty quantification
Unbiased Hamiltonian Monte Carlo
Jere Koskela slides
Numerical integration based on the hyperfunction theory
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms

What's hot (20)

PDF
Monte Carlo Statistical Methods
PDF
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
PDF
Levitan Centenary Conference Talk, June 27 2014
PDF
An introduction to moment closure techniques
PDF
Signal Processing Introduction using Fourier Transforms
PDF
Approximate Bayesian Computation with Quasi-Likelihoods
PDF
Doering Savov
 
PDF
Bayesian Experimental Design for Stochastic Kinetic Models
DOC
Chapter 5 (maths 3)
PDF
Bayesian hybrid variable selection under generalized linear models
PDF
Lesson 15: Exponential Growth and Decay (slides)
PDF
Hyperfunction method for numerical integration and Fredholm integral equation...
PDF
Tutorial of topological_data_analysis_part_1(basic)
PDF
IVR - Chapter 1 - Introduction
PDF
An application of the hyperfunction theory to numerical integration
PDF
A series of maximum entropy upper bounds of the differential entropy
PDF
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PPT
Convex Optimization Modelling with CVXOPT
PDF
Bregman divergences from comparative convexity
Monte Carlo Statistical Methods
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
Levitan Centenary Conference Talk, June 27 2014
An introduction to moment closure techniques
Signal Processing Introduction using Fourier Transforms
Approximate Bayesian Computation with Quasi-Likelihoods
Doering Savov
 
Bayesian Experimental Design for Stochastic Kinetic Models
Chapter 5 (maths 3)
Bayesian hybrid variable selection under generalized linear models
Lesson 15: Exponential Growth and Decay (slides)
Hyperfunction method for numerical integration and Fredholm integral equation...
Tutorial of topological_data_analysis_part_1(basic)
IVR - Chapter 1 - Introduction
An application of the hyperfunction theory to numerical integration
A series of maximum entropy upper bounds of the differential entropy
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Convex Optimization Modelling with CVXOPT
Bregman divergences from comparative convexity
Ad

Similar to Estimation of the score vector and observed information matrix in intractable models (20)

PDF
On non-negative unbiased estimators
PDF
Slides ub-3
PDF
Scalable inference for a full multivariate stochastic volatility
PDF
Slides econometrics-2018-graduate-4
PPTX
Numerical Analysis and Its application to Boundary Value Problems
PDF
Application of stochastic lognormal diffusion model with
PDF
On Frechet Derivatives with Application to the Inverse Function Theorem of Or...
PDF
1_AJMS_229_19[Review].pdf
PDF
Slides erasmus
PDF
Lec 4-slides
PDF
HW1 MIT Fall 2005
PDF
Classification and regression based on derivatives: a consistency result for ...
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
PDF
Numerical_PDE_Paper
PDF
Stochastic Processes - part 6
PDF
Lausanne 2019 #1
PDF
GDRR Opening Workshop - Modeling Approaches for High-Frequency Financial Time...
PDF
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
PDF
Expectation propagation
On non-negative unbiased estimators
Slides ub-3
Scalable inference for a full multivariate stochastic volatility
Slides econometrics-2018-graduate-4
Numerical Analysis and Its application to Boundary Value Problems
Application of stochastic lognormal diffusion model with
On Frechet Derivatives with Application to the Inverse Function Theorem of Or...
1_AJMS_229_19[Review].pdf
Slides erasmus
Lec 4-slides
HW1 MIT Fall 2005
Classification and regression based on derivatives: a consistency result for ...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
Numerical_PDE_Paper
Stochastic Processes - part 6
Lausanne 2019 #1
GDRR Opening Workshop - Modeling Approaches for High-Frequency Financial Time...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Expectation propagation
Ad

More from Pierre Jacob (16)

PDF
Talk at CIRM on Poisson equation and debiasing techniques
PDF
ISBA 2022 Susie Bayarri lecture
PDF
Couplings of Markov chains and the Poisson equation
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
PDF
Markov chain Monte Carlo methods and some attempts at parallelizing them
PDF
Unbiased MCMC with couplings
PDF
Unbiased Markov chain Monte Carlo methods
PDF
Recent developments on unbiased MCMC
PDF
Current limitations of sequential inference in general hidden Markov models
PDF
Path storage in the particle filter
PDF
Density exploration methods
PDF
SMC^2: an algorithm for sequential analysis of state-space models
PDF
PAWL - GPU meeting @ Warwick
PDF
Presentation of SMC^2 at BISP7
PDF
Presentation MCB seminar 09032011
Talk at CIRM on Poisson equation and debiasing techniques
ISBA 2022 Susie Bayarri lecture
Couplings of Markov chains and the Poisson equation
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Markov chain Monte Carlo methods and some attempts at parallelizing them
Unbiased MCMC with couplings
Unbiased Markov chain Monte Carlo methods
Recent developments on unbiased MCMC
Current limitations of sequential inference in general hidden Markov models
Path storage in the particle filter
Density exploration methods
SMC^2: an algorithm for sequential analysis of state-space models
PAWL - GPU meeting @ Warwick
Presentation of SMC^2 at BISP7
Presentation MCB seminar 09032011

Recently uploaded (20)

PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
An interstellar mission to test astrophysical black holes
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
BIOMOLECULES PPT........................
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
famous lake in india and its disturibution and importance
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
The scientific heritage No 166 (166) (2025)
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
2. Earth - The Living Planet earth and life
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Comparative Structure of Integument in Vertebrates.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
An interstellar mission to test astrophysical black holes
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
BIOMOLECULES PPT........................
Biophysics 2.pdffffffffffffffffffffffffff
famous lake in india and its disturibution and importance
Derivatives of integument scales, beaks, horns,.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The scientific heritage No 166 (166) (2025)
TOTAL hIP ARTHROPLASTY Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ECG_Course_Presentation د.محمد صقران ppt
The KM-GBF monitoring framework – status & key messages.pptx
INTRODUCTION TO EVS | Concept of sustainability
2. Earth - The Living Planet earth and life
Taita Taveta Laboratory Technician Workshop Presentation.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Introduction to Fisheries Biotechnology_Lesson 1.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS

Estimation of the score vector and observed information matrix in intractable models

  • 1. Estimation of the score vector and observed information matrix in intractable models Arnaud Doucet (University of Oxford) Pierre E. Jacob (University of Oxford) Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis) April 15th, 2015 Pierre Jacob Derivative estimation 1/ 35
  • 2. Outline 1 Context 2 General results 3 Monte Carlo 4 Hidden Markov models Pierre Jacob Derivative estimation 1/ 35
  • 3. Outline 1 Context 2 General results 3 Monte Carlo 4 Hidden Markov models Pierre Jacob Derivative estimation 2/ 35
  • 4. Motivation Derivatives of the likelihood help optimizing / sampling. For many models they are not available. One can resort to approximation techniques. Pierre Jacob Derivative estimation 2/ 35
  • 5. Motivation Let ℓ(θ) denote a “log-likelihood” but it could be any function. Let L(θ) = exp ℓ(θ), the “likelihood”. Assume that we have access to estimators L(θ) of L(θ) such that E[L(θ)] = L(θ) and V[L(θ)] = L(θ)2v(θ) M . Pierre Jacob Derivative estimation 3/ 35
  • 6. Finite difference First derivative: ℓ(1) (θ⋆ ) = log L(θ⋆ + h) − log L(θ⋆ − h) 2h . converges to ∇ℓ(θ⋆) when M → ∞ and h → 0. Second derivative: ℓ(2) (θ) = log L(θ + h) − 2 log L(θ) + log L(θ − h) h2 . converges to ∇2ℓ(θ⋆) when M → ∞ and h → 0. Pierre Jacob Derivative estimation 4/ 35
  • 7. Finite difference Optimal rate of convergence for the first derivative: h ∼ M−1/6 leading to MSE ∼ M−2/3 . For the second derivative: h ∼ M−1/8 leading to MSE ∼ M−1/2 . Pierre Jacob Derivative estimation 5/ 35
  • 8. Outline 1 Context 2 General results 3 Monte Carlo 4 Hidden Markov models Pierre Jacob Derivative estimation 6/ 35
  • 9. Iterated Filtering Given a log likelihood ℓ and a given point, consider a prior θ ∼ N(θ⋆ , τ2 ). Posterior expectation when the prior variance goes to zero First-order moments give first-order derivatives: |τ−2 (E[θ|Y ] − θ⋆ ) − ∇ℓ(θ⋆ )| ≤ Cτ. Phrased simply, posterior mean − prior mean prior variance ≈ score. Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering, 2011. Pierre Jacob Derivative estimation 6/ 35
  • 10. Stein’s lemma Stein’s lemma states that θ ∼ N(θ⋆ , τ2 ) if and only if for any function g such that E [|∇g(θ)|] < ∞, E [(θ − θ⋆ ) g (θ)] = τ2 E [∇g (θ)] . If we choose the function g : θ → exp ℓ (θ) /Z with Z = E [exp ℓ (θ)] and apply Stein’s lemma we obtain 1 Z E [θ exp ℓ(θ)] − θ0 = τ2 Z E [∇ℓ (θ) exp (ℓ (θ))] ⇔ τ−2 (E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] . Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z. Pierre Jacob Derivative estimation 7/ 35
  • 11. Stein’s lemma For the second derivative, we consider h : θ → (θ − θ⋆ ) exp ℓ (θ) /Z. Then E [ (θ − θ⋆ )2 | Y ] = τ2 + τ4 E [ ∇2 ℓ(θ) + ∇ℓ(θ)2 | Y ] . Adding and subtracting terms also yields τ−4 ( V [θ | Y ] − τ2 ) = E [ ∇2 ℓ(θ) | Y ] + { E [ ∇ℓ(θ)2 | Y ] − (E [∇ℓ(θ) | Y ])2 } . . . . but what we really want is ∇ℓ(θ⋆ ), ∇ℓ2 (θ⋆ ) and not E [∇ℓ(θ) | Y ] , E [ ∇ℓ2 (θ) | Y ] . Pierre Jacob Derivative estimation 8/ 35
  • 12. Core Idea The prior is a normal distribution N(θ⋆, τ2). The prior moments behave like: Eτ [φ (Θ)] = φ (θ⋆ ) + τ2 2 ∇2 φ (θ⋆ ) + O ( τ4 ) . The posterior moments behave like: Eτ [φ (Θ) | Y ] = φ (θ⋆ )+ τ2 2 ( ∇2 φ (θ⋆ ) + 2∇φ (θ⋆ ) ∇ℓ (θ⋆ ) ) +O ( τ4 ) . Our arXived proof suffers from an overdose of Taylor expansions. Pierre Jacob Derivative estimation 9/ 35
  • 13. Proof: prior moments Let φ : R → R be a four times continuously differentiable function. Assume that there exists a constant M < ∞ and δ > 0 such that d4φ(θ) dθ4 ≤ M for all θ ∈ B(θ⋆, δ). Cut the expectation into two parts: Eτ [φ (Θ)] = ∫ B(θ⋆,δ) φ (θ) pτ (θ) dθ + ∫ Bc(θ⋆,δ) φ (θ) pτ (θ) dθ The second term is o(τk) for any power k. Pierre Jacob Derivative estimation 10/ 35
  • 14. Proof: prior moments Let’s deal with the first term: ∫ B(θ⋆,δ) φ (θ) pτ (dθ) . Taylor expansion: ∀θ ∈ B(θ⋆ , δ) φ(θ) = φ (θ⋆ ) + ∑ k=1,2,3 dkφ(θ⋆) dθk 1 k! (θ − θ⋆ )k + R3 (θ, θ⋆ ) . The Gaussian prior integrates any (θ − θ⋆)k with odd k to zero over B(θ⋆, δ), so ∫ B(θ⋆,δ) φ(θ)pτ (dθ) = φ (θ⋆ ) + d2φ(θ⋆) dθ2 ∫ B(θ⋆,δ) 1 2 (θ − θ⋆ )2 pτ (dθ) + ∫ B(θ⋆,δ) R3 (θ, θ⋆ ) pτ (dθ) . Pierre Jacob Derivative estimation 11/ 35
  • 15. Proof: prior moments For d2φ(θ⋆) dθ2 ∫ B(θ⋆,δ) 1 2 (θ − θ⋆ )2 pτ (dθ) we “complete the integral” over all R and subtract an integral over Bc(θ⋆, δ) which is o(τk) for any k. We are left with d2φ(θ⋆) dθ2 τ2 2 + o(τk ) for any k For ∫ B(θ⋆,δ) R3 (θ, θ⋆ ) pτ (dθ) . we use the assumption to say that for all θ there is ˜θ in B(θ⋆, δ) such that R3(θ, θ⋆ ) = d4φ(˜θ) dθ4 (θ − θ⋆)4 4! . Pierre Jacob Derivative estimation 12/ 35
  • 16. Proof: prior moments Since d4φ(˜θ) dθ4 is upper bounded by some M by assumption: | ∫ B(θ⋆,δ) R3 (θ, θ⋆ ) pτ (dθ) | ≤ M ∫ B(θ⋆,δ) (θ − θ⋆)4 4! pτ (dθ) ≤ M ∫ R (θ − θ⋆)4 4! pτ (dθ) = τ4 × C. Combining all the terms, we obtain Eτ [φ (Θ)] = φ (θ⋆ ) + τ2 2 ∇2 φ (θ⋆ ) + O ( τ4 ) . Pierre Jacob Derivative estimation 13/ 35
  • 17. Proof: posterior moments We want to obtain the posterior moments: Eτ [φ (Θ) | Y ] = φ (θ⋆ )+ τ2 2 ( ∇2 φ (θ⋆ ) + 2∇φ (θ⋆ ) ∇ℓ (θ⋆ ) ) +O ( τ4 ) . We write: Eτ [φ (Θ) | Y ] = Eτ [φ (Θ) L (Θ)] Eτ [L (Θ)] . Then we apply the prior moment expansion for φ × L and for L, and we simplify the ratio of two expansions. We need to assume that the likelihood is four times continuously differentiable, with bounded fourth derivatives around θ⋆. Pierre Jacob Derivative estimation 14/ 35
  • 18. Main results In general, with a prior N(θ⋆, τ2Σ), when Σ is fixed and τ goes to zero, τ−2 Σ−1 Epost [(Θ − θ⋆ )] = ∇ℓ(θ⋆ ) + O ( τ2 ) , { τ−4 Σ−1 ( Vpost [Θ] − τ2 Σ ) Σ−1 } = ∇2 ℓ(θ⋆ ) + O ( τ2 ) . Pierre Jacob Derivative estimation 15/ 35
  • 19. Extension of Iterated Filtering Posterior variance when the prior variance goes to zero Second-order moments give second-order derivatives: |τ−4 ( Cov[θ|Y ] − τ2 ) − ∇2 ℓ(θ⋆ )| ≤ Cτ2 . Phrased simply, posterior variance − prior variance prior variance2 ≈ hessian. Result from Doucet, Jacob, Rubenthaler on arXiv, 2013. Pierre Jacob Derivative estimation 16/ 35
  • 20. Proximity mapping Given a real function f and a point θ⋆, consider for any τ2 > 0 θ → f (θ) exp { − 1 2τ2 (θ − θ⋆ )2 } θ θ0 Figure : Example for f : θ → exp(−|θ|) and three values of τ2 . Pierre Jacob Derivative estimation 17/ 35
  • 21. Proximity mapping Proximity mapping The τ2-proximity mapping is defined by proxf : θ0 → argmaxθ∈R f (θ) exp { − 1 2τ2 (θ − θ0)2 } . Moreau approximation The τ2-Moreau approximation is defined by fτ2 : θ0 → C supθ∈R f (θ) exp { − 1 2τ2 (θ − θ0)2 } where C is a normalizing constant. Pierre Jacob Derivative estimation 18/ 35
  • 22. Proximity mapping θ Figure : θ → f (θ) and θ → fτ2 (θ) for three values of τ2 . Pierre Jacob Derivative estimation 18/ 35
  • 23. Proximity mapping Property Those objects are such that proxf (θ0) − θ0 τ2 = ∇ log fτ2 (θ0) −−−→ τ2→0 ∇ log f (θ0) Moreau (1962), Fonctions convexes duales et points proximaux dans un espace Hilbertien. Pereyra (2013), Proximal Markov chain Monte Carlo algorithms. Pierre Jacob Derivative estimation 19/ 35
  • 24. Proximity mapping Bayesian interpretation If f is a seen as a likelihood function then θ → f (θ) exp { − 1 2τ2 (θ − θ0)2 } is an unnormalized posterior density function based on a Normal prior with mean θ0 and variance τ2. Hence proxf (θ0) − θ0 τ2 −−−→ τ2→0 ∇ log f (θ0) can be read posterior mode − prior mode prior variance ≈ score. Pierre Jacob Derivative estimation 20/ 35
  • 25. Outline 1 Context 2 General results 3 Monte Carlo 4 Hidden Markov models Pierre Jacob Derivative estimation 21/ 35
  • 26. Moment shift estimator How to estimate: S(θ⋆ ) = τ−2 Σ−1 Epost [(Θ − θ⋆ )] = ∇ℓ(θ⋆ ) + O ( τ2 ) ? Importance Sampling estimator from the prior : SN (θ⋆ ) = τ−2 Σ−1 ( 1 N N∑ i=1 ˆWiθi − θ⋆ ) where ˆWi = L(θi) ∑N j=1 L(θj) . Turns out it is better to use SN (θ⋆ ) = τ−2 Σ−1 ( 1 N N∑ i=1 ˆWiθi − 1 N N∑ i=1 θi ) . Any idea why? Pierre Jacob Derivative estimation 21/ 35
  • 27. Moment shift estimator For the second order derivative, we want to estimate: { τ−4 Σ−1 ( Vpost [Θ] − τ2 Σ ) Σ−1 } = ∇2 ℓ(θ⋆ ) + O ( τ2 ) . We propose: τ−4 Σ−1   N∑ i=1 ˆWi ( θi − N∑ i=1 ˆWjθj )2 − 1 N N∑ i=1 ( θi − 1 N N∑ i=1 θi )2   Σ−1 . Pierre Jacob Derivative estimation 22/ 35
  • 28. Moment shift estimator We retrieve the same rates of convergence as in finite difference, e.g. if τ ∼ N−1/6, the MSE is in N−2/3. Then why bother? Why has better performance been observed in practice in some scenarios? Different behaviour in the non-asympotic regime. For any function υ and tuning parameter M, V ( SN (θ⋆ ) ) ≤ τ−2 CN , where CN does not depend on υ nor M. Pierre Jacob Derivative estimation 23/ 35
  • 29. Outline 1 Context 2 General results 3 Monte Carlo 4 Hidden Markov models Pierre Jacob Derivative estimation 24/ 35
  • 30. Hidden Markov models y2 X2X0 y1 X1 ... ... yT XT θ Figure : Graph representation of a general hidden Markov model. Pierre Jacob Derivative estimation 24/ 35
  • 31. Hidden Markov models Direct application of the previous results 1 Prior distribution N(θ0, σ2) on the parameter θ. 2 The derivative approximations involve E[θ|Y ] and Cov[θ|Y ]. 3 Posterior moments for HMMs can be estimated by particle MCMC, SMC2 , ABC or your favourite method. Ionides et al. proposed another approach. Pierre Jacob Derivative estimation 25/ 35
  • 32. Iterated Filtering Modification of the model: θ is time-varying. The associated loglikelihood is ¯ℓ(θ1:T ) = log p(y1:T ; θ1:T ) = log ∫ XT+1 T∏ t=1 g(yt | xt, θt) µ(dx1 | θ1) T∏ t=2 f (dxt | xt−1, θt). Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have ¯ℓ(θ[T] ) = ℓ(θ) and the chain rule yields dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T]) ∂θt . Pierre Jacob Derivative estimation 26/ 35
  • 33. Iterated Filtering Choice of prior on θ1:T : θ1 = θ0 + V1, V1 ∼ τ−1 κ { τ−1 (·) } θt+1 − θ0 = ρ ( θt − θ0 ) + Vt+1, Vt+1 ∼ σ−1 κ { σ−1 (·) } Choose σ2 such that τ2 = σ2/(1 − ρ2). Covariance of the prior on θ1:T : ΣT = τ2             1 ρ · · · · · · · · · ρT−1 ρ 1 ρ · · · · · · ρT−2 ρ2 ρ 1 ... ρT−3 ... ... ... ... ... ρT−2 ... 1 ρ ρT−1 · · · · · · · · · ρ 1             . Pierre Jacob Derivative estimation 27/ 35
  • 34. Iterated Filtering Applying the general results for this prior yields, with |x| = ∑T t=1 |xi|: |∇¯ℓ(θ [T] 0 ) − Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 ) | ≤ Cτ2 Moreover we have T∑ t=1 ∂¯ℓ(θ[T]) ∂θt − T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t ≤ T∑ t=1 ∂¯ℓ(θ[T]) ∂θt − { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t and dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T]) ∂θt . Pierre Jacob Derivative estimation 28/ 35
  • 35. Iterated Filtering The estimator of the score is thus given by T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t which can be reduced to Sτ,ρ,T (θ0) = τ−2 1 + ρ [ (1 − ρ) {T−1∑ t=2 E ( θt Y ) } − {(1 − ρ) T + 2ρ} θ0 +E ( θ1 Y ) + E ( θT Y )] , given the form of Σ−1 T . Note that in the quantities E(θt | Y ), Y = Y1:T is the complete dataset, thus those expectations are with respect to the smoothing distribution. Pierre Jacob Derivative estimation 29/ 35
  • 36. Iterated Filtering If ρ = 1, then the parameters follow a random walk: θ1 = θ0 + N(0, τ2 ) and θt+1 = θt + N(0, σ2 ). In this case Ionides et al. proposed the estimator Sτ,σ,T = τ−2 ( E ( θT | Y ) − θ0 ) as well as S (bis) τ,σ,T = T∑ t=1 VP,t −1 ( ˜θF,t − ˜θF,t−1 ) with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t]. Those expressions only involve expectations with respect to filtering distributions. Pierre Jacob Derivative estimation 30/ 35
  • 37. Iterated Filtering If ρ = 0, then the parameters are i.i.d: θ1 = θ0 + N(0, τ2 ) and θt+1 = θ0 + N(0, τ2 ). In this case the expression of the score estimator reduces to Sτ,T = τ−2 T∑ t=1 ( E ( θt | Y ) − θ0 ) which involves smoothing distributions. There’s only one parameter τ2 to choose for the prior. However smoothing for general hidden Markov models is difficult, and typically resorts to “fixed lag approximations”. Pierre Jacob Derivative estimation 31/ 35
  • 38. Numerical results Linear Gaussian state space model where the ground truth is available through the Kalman filter. X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V ) Yt = ηXt + N(0, W ). Generate T = 100 observations and set ρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9. 240 independent runs, matching the computational costs between methods in terms of number of calls to the transition kernel. Pierre Jacob Derivative estimation 32/ 35
  • 39. Numerical results q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 h RMSE parameter q q q q 1 2 3 4 Finite Difference qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Smoothing Figure : 240 runs for Iterated Smoothing and Finite Difference. Pierre Jacob Derivative estimation 33/ 35
  • 40. Numerical results qqqqqqq q q q q q q q q q q q q q qqqqqqq q q q q q q q q q q q q q qqqqqqq q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Smoothing qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Filtering 1 qqqq q q q q q q qqqq q q q q q q qqqq q q q q q q qqqq q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Filtering 2 Figure : 240 runs for Iterated Smoothing and Iterated Filtering. Pierre Jacob Derivative estimation 34/ 35
  • 41. Bibliography Main references: Inference for nonlinear dynamical systems, Ionides, Breto, King, PNAS, 2006. Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals of Statistics, 2011. Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall, Madsen, 16th IFAC Symposium on System Identification. Derivative-Free Estimation of the Score Vector and Observed Information Matrix, Doucet, Jacob, Rubenthaler, 2013 (on arXiv). Pierre Jacob Derivative estimation 35/ 35