SlideShare a Scribd company logo
Estimation of the score vector and observed
information matrix in intractable models
Arnaud Doucet (University of Oxford)
Pierre E. Jacob (University of Oxford)
Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis)
October 30th, 2014
Pierre Jacob Derivative estimation 1/ 40
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 1/ 40
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 2/ 40
Motivation
Derivatives of the likelihood can help optimizing / sampling.
For many complex models, the likelihood isn’t available, let alone its
derivatives.
One can resort to approximation techniques, and plug the estimates
of the derivatives into optimization / sampling methods.
Pierre Jacob Derivative estimation 2/ 40
Using derivatives in sampling algorithms
Modified Adjusted Langevin Algorithm
At step t, given a point θt ∈ Θ, do:
propose
θ⋆
∼ q(dθ | θt) ≡ N(θt +
σ2
2
∇θ log π(θt), σ2
),
with probability
1 ∧
π(θ⋆
)q(θt | θ⋆
)
π(θt)q(θ⋆ | θt)
set θt+1 = θ⋆
, otherwise set θt+1 = θt.
Pierre Jacob Derivative estimation 3/ 40
Using derivatives in sampling algorithms
Figure : Proposal mechanism for random walk Metropolis–Hastings.
Pierre Jacob Derivative estimation 4/ 40
Using derivatives in sampling algorithms
Figure : Proposal mechanism for MALA.
Pierre Jacob Derivative estimation 5/ 40
Using derivatives in sampling algorithms
In what sense is this algorithm better?
Scaling with the dimension of the state space
For Metropolis–Hastings, optimal scaling leads to
σ2
= O(d−1
),
For MALA, optimal scaling leads to
σ2
= O(d−1/3
).
Roberts & Rosenthal, Optimal Scaling for Various Metropolis-Hastings
Algorithms, 2001.
Pierre Jacob Derivative estimation 6/ 40
Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Hidden process: initial distribution µθ, transition fθ.
Observations conditional upon the hidden process, from gθ.
Pierre Jacob Derivative estimation 7/ 40
Assumptions
Input:
Parameter θ : unknown, prior distribution p.
Initial condition µθ(dx0) : can be sampled from.
Transition fθ(dxt|xt−1) : can be sampled from.
Measurement gθ(yt|xt) : can be evaluated point-wise.
Observations y1:T = (y1, . . . , yT ).
Goals:
score: ∇θ log L(θ; y1:T ) for any θ,
observed information matrix: −∇2
θ log L(θ; y1:T ) for any θ.
Then we could apply any fancy sampling algorithm.
Pierre Jacob Derivative estimation 8/ 40
Why is it an intractable model?
The likelihood function does not admit a closed form expression:
L(θ; y1, . . . , yT ) =
∫
XT+1
p(y1, . . . , yT | x0, . . . xT , θ)p(dx0, . . . dxT | θ)
=
∫
XT+1
T∏
t=1
gθ(yt | xt) µθ(dx0)
T∏
t=1
fθ(dxt | xt−1).
Hence the likelihood can only be estimated, e.g. by standard Monte
Carlo, or by particle filters.
What about the derivatives of the likelihood?
Pierre Jacob Derivative estimation 9/ 40
Fisher and Louis’ identities
Write the score as:
∇ℓ(θ) =
∫
∇ log p(x0:T , y1:T | θ)p(dx0:T | y1:T , θ).
which is an integral, with respect to the smoothing distribution
p(dx0:T | y1:T , θ), of
∇ log p(x0:T , y1:T | θ) = ∇ log µθ(x0)
+
T∑
t=1
∇ log fθ(xt | xt−1) +
T∑
t=1
∇ log gθ(yt | xt).
However pointwise evaluations of ∇ log µθ(x0) and ∇ log fθ(xt | xt−1) are
not always available.
Pierre Jacob Derivative estimation 10/ 40
New kid on the block: Iterated Filtering
Perturbed model
Hidden states ˜Xt = (˜θt, Xt).
{
˜θ0 ∼ N(θ0, τ2
Σ)
X0 ∼ µ˜θ0
(·)
and
{
˜θt ∼ N(˜θt−1, σ2
Σ)
Xt ∼ f˜θt
(· | Xt−1 = xt−1)
Observations ˜Yt ∼ g˜θt
(· | Xt).
Score estimate
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
− ∇ℓ(θ0) ≤ C(τ +
σ2
τ2
)
with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t].
Ionides, Breto, King, PNAS, 2006.
Pierre Jacob Derivative estimation 11/ 40
Iterated Filtering: the mystery
Why is it valid?
Is it related to any known techniques for derivative estimation?
How does it compare to other methods such as finite difference?
Can it be extended to estimate the observed information matrix?
Pierre Jacob Derivative estimation 12/ 40
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 13/ 40
Proximity mapping
Given a real function f and a point θ0, consider for any σ2
> 0
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
θ
θ0
Figure : Example for f : θ → exp(−|θ|) and three values of σ2
.
Pierre Jacob Derivative estimation 13/ 40
Proximity mapping
Proximity mapping
The σ2
-proximity mapping is defined by
proxf : θ0 → argmaxθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
.
Moreau approximation
The σ2
-Moreau approximation is defined by
fσ2 : θ0 → C supθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
where C is a normalizing constant.
Pierre Jacob Derivative estimation 14/ 40
Proximity mapping
θ
Figure : θ → f (θ) and θ → fσ2 (θ) for three values of σ2
.
Pierre Jacob Derivative estimation 14/ 40
Proximity mapping
Property
Those objects are such that
proxf (θ0) − θ0
σ2
= ∇ log fσ2 (θ0) −−−−→
σ2→0
∇ log f (θ0)
Moreau (1962), Fonctions convexes duales et points proximaux dans un
espace Hilbertien.
Pereyra (2013), Proximal Markov chain Monte Carlo algorithms.
Pierre Jacob Derivative estimation 15/ 40
Proximity mapping
Bayesian interpretation
If f is a seen as a likelihood function then
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
is an unnormalized posterior density function based on a Normal prior
with mean θ0 and variance σ2
.
Hence
proxf (θ0) − θ0
σ2
−−−−→
σ2→0
∇ log f (θ0)
can be read
posterior mode − prior mode
prior variance
≈ score.
Pierre Jacob Derivative estimation 16/ 40
Iterated Filtering
Posterior expectation instead of mode
Based on a prior θ ∼ N(θ0, σ2
),
|σ−2
(E[θ|Y ] − θ0) − ∇ log f (θ0)| ≤ Cσ2
.
Phrased simply,
posterior mean − prior mean
prior variance
≈ score.
Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering, 2011.
Pierre Jacob Derivative estimation 17/ 40
Extension of Iterated Filtering
Observed information matrix
Second-order moments give second-order derivatives:
|σ−4
(
Cov[θ|Y ] − σ2
)
− ∇2
log f (θ0)| ≤ Cσ2
.
Phrased simply,
posterior variance − prior variance
prior variance2 ≈ −observed information matrix.
Result from Doucet, Jacob, Rubenthaler on arXiv, 2013.
Pierre Jacob Derivative estimation 18/ 40
A connection with Stein’s method
Stein’s lemma states that
θ ∼ N(θ0, σ2
)
if and only if for any function g such that the following objects exist,
E [(θ − θ0) g (θ)] = σ2
E [∇g (θ)] .
If we choose the function g : θ → exp ℓ (θ) /E [exp ℓ (θ)] and apply
Stein’s lemma we obtain
E [(θ − θ0)g(θ)] = σ2
E [∇g (θ)]
= σ2 E [∇ℓ (θ) exp (ℓ (θ))]
E [exp ℓ (θ)]
Pierre Jacob Derivative estimation 19/ 40
A connection with Stein’s method
Hence we obtain
E [(θ − θ0) exp ℓ(θ)]
E [exp ℓ (θ)]
= σ2 E [∇ℓ (θ) exp (ℓ (θ))]
E [exp ℓ (θ)]
On the left we have E[θ | Y ] − θ0. On the right we have σ2
E[∇ℓ(θ) | Y ].
When σ2
→ 0, E[∇ℓ(θ) | Y ] should go to ∇ℓ(θ0).
The Iterated Filtering method indeed relies on the approximation
E [θ | Y ] − θ0 ≈ σ2
∇ℓ (θ0) .
Pierre Jacob Derivative estimation 20/ 40
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 21/ 40
Core Idea
Let’s take an informal look at proofs, in one-dimensional notations.
Introduce a normal prior distribution: N(θ0, σ2
).
Posterior concentration induced by the prior
Under minimal assumptions, when σ → 0:
the posterior is going to look more and more like the prior,
the difference between the posterior and the prior moments is
related to the derivatives of the log-likelihood.
∇ℓ(θ0) − σ−2
{E (θ|y) − θ0} ≤ Cσ2
∇2
ℓ(θ0) − σ−4
{
Cov (θ|y) − σ2
}
≤ C′
σ2
Pierre Jacob Derivative estimation 21/ 40
Details
Assumptions
1 Prior p(θ) = σ−d
κ(θ−θ0
σ ) where κ is symmetric, has finite moments
of all orders, and unit variance.
2 κ has tails that decrease at a faster rate than the likelihood
increases.
3 The log-likelihood is four times continuously differentiable.
Introduce a test function h such that |h(u)| < c|u|α
for some c, α.
Pierre Jacob Derivative estimation 22/ 40
Details
We start by writing
E {h (θ − θ0)| y} =
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
∫
exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
using u = (θ − θ0)/σ and then focus on the numerator
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
since the denominator is a particular instance of this expression with
h : u → 1.
Pierre Jacob Derivative estimation 23/ 40
Details
For the numerator:
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
we use a Taylor expansion of ℓ around θ0 and a Taylor expansion of exp
around 0, and then take the integral with respect to κ.
Notation:
ℓ(k)
(θ).u⊗k
=
∑
1≤i1,...,ik ≤d
∂k
ℓ(θ)
∂θi1 . . . ∂θik
ui1 . . . uik
which in one dimension becomes
ℓ(k)
(θ).u⊗k
=
dk
f (θ)
dθk
uk
.
Pierre Jacob Derivative estimation 24/ 40
Details
Main expansion:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du =
∫
h(σu)κ(u)du + σ
∫
h(σu)ℓ(1)
(θ0).u κ(u)du
+ σ2
∫
h(σu)
{
1
2
ℓ(2)
(θ0).u⊗2
+
1
2
(ℓ(1)
(θ0).u)2
}
κ(u)du
+ σ3
∫
h(σu)
{
1
3!
(ℓ(1)
(θ0).u)3
+
1
2
(ℓ(1)
(θ0).u)(ℓ(2)
(θ0).u⊗2
)
+
1
3!
ℓ(3)
(θ0).u⊗3
}
κ(u)du + O(σ4+α
).
The assumptions on the tails of the prior and the likelihood are used to
control the remainder terms and to ensure there are O(σ4+α
).
Pierre Jacob Derivative estimation 25/ 40
Details
We cut the integral into two bits:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
=
∫
σ|u|≤ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
+
∫
σ|u|>ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
The expansion stems from the first term, where σ|u| is small.
The second term ends up in the remainder in O(σ4+α
) using the
assumptions.
Classic technique in Bayesian asymptotics theory, but here the likelihood
is fixed and the prior concentrates, instead of the other way around.
Pierre Jacob Derivative estimation 26/ 40
Details
To get the score from the expansion, choose
h : u → u.
To get the observed information matrix from the expansion, choose
h : u → u2
,
and surprisingly (?) further assume that κ is mesokurtic, i.e.
∫
u4
κ(u)du = 3
(∫
u2
κ(u)du
)2
⇒ choose a Gaussian prior to obtain the observed information matrix.
Pierre Jacob Derivative estimation 27/ 40
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 28/ 40
Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Pierre Jacob Derivative estimation 28/ 40
Hidden Markov models
Direct application of the previous results
1 Prior distribution N(θ0, σ2
) on the parameter θ.
2 The derivative approximations involve E[θ|Y ] and Cov[θ|Y ].
3 Posterior moments for HMMs can be estimated by
particle MCMC,
SMC2
,
ABC
or your favourite method.
Ionides et al. proposed another approach, more specific to HMMs.
Pierre Jacob Derivative estimation 29/ 40
Iterated Filtering
Modification of the model: θ is allowed to be different at each time.
The associated loglikelihood is
¯ℓ(θ1:T ) = log p(y1:T ; θ1:T )
= log
∫
XT+1
T∏
t=1
g(yt | xt, θt) µ(dx1 | θ1)
T∏
t=2
f (dxt | xt−1, θt).
Introducing θ → (θ, θ, . . . , θ) := θ[T]
∈ RT
, we have
¯ℓ(θ[T]
) = ℓ(θ)
and the chain rule yields
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T]
)
∂θt
.
Pierre Jacob Derivative estimation 30/ 40
Iterated Filtering
Choice of prior on θ1:T :
θ1 = θ0 + V1, V1 ∼ τ−1
κ
{
τ−1
(·)
}
θt+1 − θ0 = ρ
(
θt − θ0
)
+ Vt+1, Vt+1 ∼ σ−1
κ
{
σ−1
(·)
}
Choose σ2
such that τ2
= σ2
/(1 − ρ2
). Covariance of the prior on θ1:T :
ΣT = τ2











1 ρ · · · · · · · · · ρT−1
ρ 1 ρ · · · · · · ρT−2
ρ2
ρ 1
... ρT−3
...
...
...
...
...
ρT−2 ... 1 ρ
ρT−1
· · · · · · · · · ρ 1











.
Pierre Jacob Derivative estimation 31/ 40
Iterated Filtering
Applying the general results for this prior yields, with |x| =
∑T
t=1 |xi|:
|∇¯ℓ(θ
[T]
0 ) − Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)
| ≤ Cτ2
Moreover we have
T∑
t=1
∂¯ℓ(θ[T]
)
∂θt
−
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
≤
T∑
t=1
∂¯ℓ(θ[T]
)
∂θt
−
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
and
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T]
)
∂θt
.
Pierre Jacob Derivative estimation 32/ 40
Iterated Filtering
The estimator of the score is thus given by
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
which can be reduced to
Sτ,ρ,T (θ0) =
τ−2
1 + ρ
[
(1 − ρ)
{T−1∑
t=2
E
(
θt Y
)
}
− {(1 − ρ) T + 2ρ} θ0
+E
(
θ1 Y
)
+ E
(
θT Y
)]
,
given the form of Σ−1
T . Note that in the quantities E(θt | Y ), Y = Y1:T
is the complete dataset, thus those expectations are with respect to the
smoothing distribution.
Pierre Jacob Derivative estimation 33/ 40
Iterated Filtering
If ρ = 1, then the parameters follow a random walk:
θ1 = θ0 + N(0, τ2
) and θt+1 = θt + N(0, σ2
).
In this case Ionides et al. proposed the estimator
Sτ,σ,T = τ−2
(
E
(
θT | Y
)
− θ0
)
as well as
S
(bis)
τ,σ,T =
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t].
Those expressions only involve expectations with respect to filtering
distributions.
Pierre Jacob Derivative estimation 34/ 40
Iterated Filtering
If ρ = 0, then the parameters are i.i.d:
θ1 = θ0 + N(0, τ2
) and θt+1 = θ0 + N(0, σ2
).
In this case the expression of the score estimator reduces to
Sτ,T = τ−2
T∑
t=1
(
E
(
θt | Y
)
− θ0
)
which involves smoothing distributions.
There’s only one parameter τ2
to choose for the prior.
However smoothing for general hidden Markov models is difficult,
and typically resorts to “fixed lag approximations”.
Pierre Jacob Derivative estimation 35/ 40
Iterated Smoothing
Only for the case ρ = 0 are we able to obtain simple expressions for the
observed information matrix. We propose the following estimator:
Iτ,T (θ0) = −τ−4
{ T∑
s=1
T∑
t=1
Cov
(
θs, θt Y
)
− τ2
T
}
.
for which we can show that
Iτ,T − (−∇2
ℓ(θ0)) ≤ Cτ2
.
Pierre Jacob Derivative estimation 36/ 40
Numerical results
Linear Gaussian state space model where the ground truth is available
through the Kalman filter.
X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V )
Yt = ηXt + N(0, W ).
The parameters are (ρ, V , η, W ). We generate T = 100 observations.
Easy set of parameters: ρ = 0.8, V = 0.72
, η = 0.9, W = 12
.
Hard set of parameters: ρ = 0.8, V = 12
, η = 0.9, W = 0.12
.
The gradient being four-dimensional, we plot only the first component
estimated over time.
Pierre Jacob Derivative estimation 37/ 40
Numerical results
IS FD
0
100
200
300
400
0 25 50 75 100 0 25 50 75 100
time
estimates
Figure : 250 runs for Iterated Smoothing and Finite Difference, “easy” scenario.
Pierre Jacob Derivative estimation 38/ 40
Numerical results
IS FD
0
500
1000
1500
0 25 50 75 100 0 25 50 75 100
time
estimates
Figure : 250 runs for Iterated Smoothing and Finite Difference, “hard” scenario.
Pierre Jacob Derivative estimation 39/ 40
Bibliography
Main references:
Inference for nonlinear dynamical systems, Ionides, Breto, King,
PNAS, 2006.
Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals of
Statistics, 2011.
Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall, Madsen,
16th IFAC Symposium on System Identification.
Derivative-Free Estimation of the Score Vector
and Observed Information Matrix,
Doucet, Jacob, Rubenthaler, 2013 (on arXiv).
Pierre Jacob Derivative estimation 40/ 40

More Related Content

PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
PDF
11.solution of linear and nonlinear partial differential equations using mixt...
PDF
Solution of linear and nonlinear partial differential equations using mixture...
PDF
Congrès SMAI 2019
PDF
Ada boost brown boost performance with noisy data
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
11.solution of linear and nonlinear partial differential equations using mixt...
Solution of linear and nonlinear partial differential equations using mixture...
Congrès SMAI 2019
Ada boost brown boost performance with noisy data

What's hot (20)

PDF
An introduction to moment closure techniques
PDF
Intro to Approximate Bayesian Computation (ABC)
PDF
Accelerated approximate Bayesian computation with applications to protein fol...
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
Bayesian Experimental Design for Stochastic Kinetic Models
PDF
Prévision de consommation électrique avec adaptive GAM
PDF
Numerical Fourier transform based on hyperfunction theory
PDF
ABC with data cloning for MLE in state space models
PDF
Monte Carlo Statistical Methods
PDF
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
PDF
Monte Carlo Statistical Methods
PDF
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
PDF
Random Matrix Theory and Machine Learning - Part 2
PDF
A brief introduction to Hartree-Fock and TDDFT
PDF
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
DOC
Chapter 4 (maths 3)
PDF
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
PDF
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
PDF
A current perspectives of corrected operator splitting (os) for systems
PDF
Levitan Centenary Conference Talk, June 27 2014
An introduction to moment closure techniques
Intro to Approximate Bayesian Computation (ABC)
Accelerated approximate Bayesian computation with applications to protein fol...
Inference for stochastic differential equations via approximate Bayesian comp...
Bayesian Experimental Design for Stochastic Kinetic Models
Prévision de consommation électrique avec adaptive GAM
Numerical Fourier transform based on hyperfunction theory
ABC with data cloning for MLE in state space models
Monte Carlo Statistical Methods
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Monte Carlo Statistical Methods
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
Random Matrix Theory and Machine Learning - Part 2
A brief introduction to Hartree-Fock and TDDFT
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
Chapter 4 (maths 3)
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
A current perspectives of corrected operator splitting (os) for systems
Levitan Centenary Conference Talk, June 27 2014
Ad

Similar to Estimation of the score vector and observed information matrix in intractable models (20)

PDF
On non-negative unbiased estimators
PDF
Expectation propagation
PDF
IVR - Chapter 1 - Introduction
PDF
Lec 4-slides
PDF
Pseudo Differential Operators and Markov Processes Volume III Markov Processe...
PPTX
Numerical Analysis and Its application to Boundary Value Problems
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Numerical_PDE_Paper
PDF
2012 mdsp pr03 kalman filter
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
PDF
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
PDF
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
PDF
Application of stochastic lognormal diffusion model with
PDF
On Frechet Derivatives with Application to the Inverse Function Theorem of Or...
PDF
1_AJMS_229_19[Review].pdf
PDF
Estimation Theory, PhD Course, Ghent University, Belgium
PDF
Semiglobal state observers for nonlinear analytic discrete-time systems
PDF
Stochastic Processes - part 6
PDF
Intro to Quant Trading Strategies (Lecture 6 of 10)
On non-negative unbiased estimators
Expectation propagation
IVR - Chapter 1 - Introduction
Lec 4-slides
Pseudo Differential Operators and Markov Processes Volume III Markov Processe...
Numerical Analysis and Its application to Boundary Value Problems
Maximum likelihood estimation of regularisation parameters in inverse problem...
Numerical_PDE_Paper
2012 mdsp pr03 kalman filter
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Computational Motor Control: Optimal Estimation in Noisy World (JAIST summer ...
Application of stochastic lognormal diffusion model with
On Frechet Derivatives with Application to the Inverse Function Theorem of Or...
1_AJMS_229_19[Review].pdf
Estimation Theory, PhD Course, Ghent University, Belgium
Semiglobal state observers for nonlinear analytic discrete-time systems
Stochastic Processes - part 6
Intro to Quant Trading Strategies (Lecture 6 of 10)
Ad

More from Pierre Jacob (16)

PDF
Talk at CIRM on Poisson equation and debiasing techniques
PDF
ISBA 2022 Susie Bayarri lecture
PDF
Couplings of Markov chains and the Poisson equation
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
PDF
Markov chain Monte Carlo methods and some attempts at parallelizing them
PDF
Unbiased MCMC with couplings
PDF
Unbiased Markov chain Monte Carlo methods
PDF
Recent developments on unbiased MCMC
PDF
Current limitations of sequential inference in general hidden Markov models
PDF
Path storage in the particle filter
PDF
Density exploration methods
PDF
SMC^2: an algorithm for sequential analysis of state-space models
PDF
PAWL - GPU meeting @ Warwick
PDF
Presentation of SMC^2 at BISP7
PDF
Presentation MCB seminar 09032011
Talk at CIRM on Poisson equation and debiasing techniques
ISBA 2022 Susie Bayarri lecture
Couplings of Markov chains and the Poisson equation
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Markov chain Monte Carlo methods and some attempts at parallelizing them
Unbiased MCMC with couplings
Unbiased Markov chain Monte Carlo methods
Recent developments on unbiased MCMC
Current limitations of sequential inference in general hidden Markov models
Path storage in the particle filter
Density exploration methods
SMC^2: an algorithm for sequential analysis of state-space models
PAWL - GPU meeting @ Warwick
Presentation of SMC^2 at BISP7
Presentation MCB seminar 09032011

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
BIOMOLECULES PPT........................
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2. Earth - The Living Planet earth and life
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
Sciences of Europe No 170 (2025)
Cell Membrane: Structure, Composition & Functions
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
BIOMOLECULES PPT........................
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet earth and life
bbec55_b34400a7914c42429908233dbd381773.pdf
Derivatives of integument scales, beaks, horns,.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
7. General Toxicologyfor clinical phrmacy.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
2. Earth - The Living Planet Module 2ELS
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ECG_Course_Presentation د.محمد صقران ppt
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Comparative Structure of Integument in Vertebrates.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Viruses (History, structure and composition, classification, Bacteriophage Re...

Estimation of the score vector and observed information matrix in intractable models

  • 1. Estimation of the score vector and observed information matrix in intractable models Arnaud Doucet (University of Oxford) Pierre E. Jacob (University of Oxford) Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis) October 30th, 2014 Pierre Jacob Derivative estimation 1/ 40
  • 2. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 1/ 40
  • 3. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 2/ 40
  • 4. Motivation Derivatives of the likelihood can help optimizing / sampling. For many complex models, the likelihood isn’t available, let alone its derivatives. One can resort to approximation techniques, and plug the estimates of the derivatives into optimization / sampling methods. Pierre Jacob Derivative estimation 2/ 40
  • 5. Using derivatives in sampling algorithms Modified Adjusted Langevin Algorithm At step t, given a point θt ∈ Θ, do: propose θ⋆ ∼ q(dθ | θt) ≡ N(θt + σ2 2 ∇θ log π(θt), σ2 ), with probability 1 ∧ π(θ⋆ )q(θt | θ⋆ ) π(θt)q(θ⋆ | θt) set θt+1 = θ⋆ , otherwise set θt+1 = θt. Pierre Jacob Derivative estimation 3/ 40
  • 6. Using derivatives in sampling algorithms Figure : Proposal mechanism for random walk Metropolis–Hastings. Pierre Jacob Derivative estimation 4/ 40
  • 7. Using derivatives in sampling algorithms Figure : Proposal mechanism for MALA. Pierre Jacob Derivative estimation 5/ 40
  • 8. Using derivatives in sampling algorithms In what sense is this algorithm better? Scaling with the dimension of the state space For Metropolis–Hastings, optimal scaling leads to σ2 = O(d−1 ), For MALA, optimal scaling leads to σ2 = O(d−1/3 ). Roberts & Rosenthal, Optimal Scaling for Various Metropolis-Hastings Algorithms, 2001. Pierre Jacob Derivative estimation 6/ 40
  • 9. Hidden Markov models y2 X2X0 y1 X1 ... ... yT XT θ Figure : Graph representation of a general hidden Markov model. Hidden process: initial distribution µθ, transition fθ. Observations conditional upon the hidden process, from gθ. Pierre Jacob Derivative estimation 7/ 40
  • 10. Assumptions Input: Parameter θ : unknown, prior distribution p. Initial condition µθ(dx0) : can be sampled from. Transition fθ(dxt|xt−1) : can be sampled from. Measurement gθ(yt|xt) : can be evaluated point-wise. Observations y1:T = (y1, . . . , yT ). Goals: score: ∇θ log L(θ; y1:T ) for any θ, observed information matrix: −∇2 θ log L(θ; y1:T ) for any θ. Then we could apply any fancy sampling algorithm. Pierre Jacob Derivative estimation 8/ 40
  • 11. Why is it an intractable model? The likelihood function does not admit a closed form expression: L(θ; y1, . . . , yT ) = ∫ XT+1 p(y1, . . . , yT | x0, . . . xT , θ)p(dx0, . . . dxT | θ) = ∫ XT+1 T∏ t=1 gθ(yt | xt) µθ(dx0) T∏ t=1 fθ(dxt | xt−1). Hence the likelihood can only be estimated, e.g. by standard Monte Carlo, or by particle filters. What about the derivatives of the likelihood? Pierre Jacob Derivative estimation 9/ 40
  • 12. Fisher and Louis’ identities Write the score as: ∇ℓ(θ) = ∫ ∇ log p(x0:T , y1:T | θ)p(dx0:T | y1:T , θ). which is an integral, with respect to the smoothing distribution p(dx0:T | y1:T , θ), of ∇ log p(x0:T , y1:T | θ) = ∇ log µθ(x0) + T∑ t=1 ∇ log fθ(xt | xt−1) + T∑ t=1 ∇ log gθ(yt | xt). However pointwise evaluations of ∇ log µθ(x0) and ∇ log fθ(xt | xt−1) are not always available. Pierre Jacob Derivative estimation 10/ 40
  • 13. New kid on the block: Iterated Filtering Perturbed model Hidden states ˜Xt = (˜θt, Xt). { ˜θ0 ∼ N(θ0, τ2 Σ) X0 ∼ µ˜θ0 (·) and { ˜θt ∼ N(˜θt−1, σ2 Σ) Xt ∼ f˜θt (· | Xt−1 = xt−1) Observations ˜Yt ∼ g˜θt (· | Xt). Score estimate T∑ t=1 VP,t −1 ( ˜θF,t − ˜θF,t−1 ) − ∇ℓ(θ0) ≤ C(τ + σ2 τ2 ) with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t]. Ionides, Breto, King, PNAS, 2006. Pierre Jacob Derivative estimation 11/ 40
  • 14. Iterated Filtering: the mystery Why is it valid? Is it related to any known techniques for derivative estimation? How does it compare to other methods such as finite difference? Can it be extended to estimate the observed information matrix? Pierre Jacob Derivative estimation 12/ 40
  • 15. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 13/ 40
  • 16. Proximity mapping Given a real function f and a point θ0, consider for any σ2 > 0 θ → f (θ) exp { − 1 2σ2 (θ − θ0)2 } θ θ0 Figure : Example for f : θ → exp(−|θ|) and three values of σ2 . Pierre Jacob Derivative estimation 13/ 40
  • 17. Proximity mapping Proximity mapping The σ2 -proximity mapping is defined by proxf : θ0 → argmaxθ∈R f (θ) exp { − 1 2σ2 (θ − θ0)2 } . Moreau approximation The σ2 -Moreau approximation is defined by fσ2 : θ0 → C supθ∈R f (θ) exp { − 1 2σ2 (θ − θ0)2 } where C is a normalizing constant. Pierre Jacob Derivative estimation 14/ 40
  • 18. Proximity mapping θ Figure : θ → f (θ) and θ → fσ2 (θ) for three values of σ2 . Pierre Jacob Derivative estimation 14/ 40
  • 19. Proximity mapping Property Those objects are such that proxf (θ0) − θ0 σ2 = ∇ log fσ2 (θ0) −−−−→ σ2→0 ∇ log f (θ0) Moreau (1962), Fonctions convexes duales et points proximaux dans un espace Hilbertien. Pereyra (2013), Proximal Markov chain Monte Carlo algorithms. Pierre Jacob Derivative estimation 15/ 40
  • 20. Proximity mapping Bayesian interpretation If f is a seen as a likelihood function then θ → f (θ) exp { − 1 2σ2 (θ − θ0)2 } is an unnormalized posterior density function based on a Normal prior with mean θ0 and variance σ2 . Hence proxf (θ0) − θ0 σ2 −−−−→ σ2→0 ∇ log f (θ0) can be read posterior mode − prior mode prior variance ≈ score. Pierre Jacob Derivative estimation 16/ 40
  • 21. Iterated Filtering Posterior expectation instead of mode Based on a prior θ ∼ N(θ0, σ2 ), |σ−2 (E[θ|Y ] − θ0) − ∇ log f (θ0)| ≤ Cσ2 . Phrased simply, posterior mean − prior mean prior variance ≈ score. Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering, 2011. Pierre Jacob Derivative estimation 17/ 40
  • 22. Extension of Iterated Filtering Observed information matrix Second-order moments give second-order derivatives: |σ−4 ( Cov[θ|Y ] − σ2 ) − ∇2 log f (θ0)| ≤ Cσ2 . Phrased simply, posterior variance − prior variance prior variance2 ≈ −observed information matrix. Result from Doucet, Jacob, Rubenthaler on arXiv, 2013. Pierre Jacob Derivative estimation 18/ 40
  • 23. A connection with Stein’s method Stein’s lemma states that θ ∼ N(θ0, σ2 ) if and only if for any function g such that the following objects exist, E [(θ − θ0) g (θ)] = σ2 E [∇g (θ)] . If we choose the function g : θ → exp ℓ (θ) /E [exp ℓ (θ)] and apply Stein’s lemma we obtain E [(θ − θ0)g(θ)] = σ2 E [∇g (θ)] = σ2 E [∇ℓ (θ) exp (ℓ (θ))] E [exp ℓ (θ)] Pierre Jacob Derivative estimation 19/ 40
  • 24. A connection with Stein’s method Hence we obtain E [(θ − θ0) exp ℓ(θ)] E [exp ℓ (θ)] = σ2 E [∇ℓ (θ) exp (ℓ (θ))] E [exp ℓ (θ)] On the left we have E[θ | Y ] − θ0. On the right we have σ2 E[∇ℓ(θ) | Y ]. When σ2 → 0, E[∇ℓ(θ) | Y ] should go to ∇ℓ(θ0). The Iterated Filtering method indeed relies on the approximation E [θ | Y ] − θ0 ≈ σ2 ∇ℓ (θ0) . Pierre Jacob Derivative estimation 20/ 40
  • 25. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 21/ 40
  • 26. Core Idea Let’s take an informal look at proofs, in one-dimensional notations. Introduce a normal prior distribution: N(θ0, σ2 ). Posterior concentration induced by the prior Under minimal assumptions, when σ → 0: the posterior is going to look more and more like the prior, the difference between the posterior and the prior moments is related to the derivatives of the log-likelihood. ∇ℓ(θ0) − σ−2 {E (θ|y) − θ0} ≤ Cσ2 ∇2 ℓ(θ0) − σ−4 { Cov (θ|y) − σ2 } ≤ C′ σ2 Pierre Jacob Derivative estimation 21/ 40
  • 27. Details Assumptions 1 Prior p(θ) = σ−d κ(θ−θ0 σ ) where κ is symmetric, has finite moments of all orders, and unit variance. 2 κ has tails that decrease at a faster rate than the likelihood increases. 3 The log-likelihood is four times continuously differentiable. Introduce a test function h such that |h(u)| < c|u|α for some c, α. Pierre Jacob Derivative estimation 22/ 40
  • 28. Details We start by writing E {h (θ − θ0)| y} = ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du ∫ exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du using u = (θ − θ0)/σ and then focus on the numerator ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du since the denominator is a particular instance of this expression with h : u → 1. Pierre Jacob Derivative estimation 23/ 40
  • 29. Details For the numerator: ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du we use a Taylor expansion of ℓ around θ0 and a Taylor expansion of exp around 0, and then take the integral with respect to κ. Notation: ℓ(k) (θ).u⊗k = ∑ 1≤i1,...,ik ≤d ∂k ℓ(θ) ∂θi1 . . . ∂θik ui1 . . . uik which in one dimension becomes ℓ(k) (θ).u⊗k = dk f (θ) dθk uk . Pierre Jacob Derivative estimation 24/ 40
  • 30. Details Main expansion: ∫ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du = ∫ h(σu)κ(u)du + σ ∫ h(σu)ℓ(1) (θ0).u κ(u)du + σ2 ∫ h(σu) { 1 2 ℓ(2) (θ0).u⊗2 + 1 2 (ℓ(1) (θ0).u)2 } κ(u)du + σ3 ∫ h(σu) { 1 3! (ℓ(1) (θ0).u)3 + 1 2 (ℓ(1) (θ0).u)(ℓ(2) (θ0).u⊗2 ) + 1 3! ℓ(3) (θ0).u⊗3 } κ(u)du + O(σ4+α ). The assumptions on the tails of the prior and the likelihood are used to control the remainder terms and to ensure there are O(σ4+α ). Pierre Jacob Derivative estimation 25/ 40
  • 31. Details We cut the integral into two bits: ∫ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du = ∫ σ|u|≤ρ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du + ∫ σ|u|>ρ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du The expansion stems from the first term, where σ|u| is small. The second term ends up in the remainder in O(σ4+α ) using the assumptions. Classic technique in Bayesian asymptotics theory, but here the likelihood is fixed and the prior concentrates, instead of the other way around. Pierre Jacob Derivative estimation 26/ 40
  • 32. Details To get the score from the expansion, choose h : u → u. To get the observed information matrix from the expansion, choose h : u → u2 , and surprisingly (?) further assume that κ is mesokurtic, i.e. ∫ u4 κ(u)du = 3 (∫ u2 κ(u)du )2 ⇒ choose a Gaussian prior to obtain the observed information matrix. Pierre Jacob Derivative estimation 27/ 40
  • 33. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 28/ 40
  • 34. Hidden Markov models y2 X2X0 y1 X1 ... ... yT XT θ Figure : Graph representation of a general hidden Markov model. Pierre Jacob Derivative estimation 28/ 40
  • 35. Hidden Markov models Direct application of the previous results 1 Prior distribution N(θ0, σ2 ) on the parameter θ. 2 The derivative approximations involve E[θ|Y ] and Cov[θ|Y ]. 3 Posterior moments for HMMs can be estimated by particle MCMC, SMC2 , ABC or your favourite method. Ionides et al. proposed another approach, more specific to HMMs. Pierre Jacob Derivative estimation 29/ 40
  • 36. Iterated Filtering Modification of the model: θ is allowed to be different at each time. The associated loglikelihood is ¯ℓ(θ1:T ) = log p(y1:T ; θ1:T ) = log ∫ XT+1 T∏ t=1 g(yt | xt, θt) µ(dx1 | θ1) T∏ t=2 f (dxt | xt−1, θt). Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have ¯ℓ(θ[T] ) = ℓ(θ) and the chain rule yields dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T] ) ∂θt . Pierre Jacob Derivative estimation 30/ 40
  • 37. Iterated Filtering Choice of prior on θ1:T : θ1 = θ0 + V1, V1 ∼ τ−1 κ { τ−1 (·) } θt+1 − θ0 = ρ ( θt − θ0 ) + Vt+1, Vt+1 ∼ σ−1 κ { σ−1 (·) } Choose σ2 such that τ2 = σ2 /(1 − ρ2 ). Covariance of the prior on θ1:T : ΣT = τ2            1 ρ · · · · · · · · · ρT−1 ρ 1 ρ · · · · · · ρT−2 ρ2 ρ 1 ... ρT−3 ... ... ... ... ... ρT−2 ... 1 ρ ρT−1 · · · · · · · · · ρ 1            . Pierre Jacob Derivative estimation 31/ 40
  • 38. Iterated Filtering Applying the general results for this prior yields, with |x| = ∑T t=1 |xi|: |∇¯ℓ(θ [T] 0 ) − Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 ) | ≤ Cτ2 Moreover we have T∑ t=1 ∂¯ℓ(θ[T] ) ∂θt − T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t ≤ T∑ t=1 ∂¯ℓ(θ[T] ) ∂θt − { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t and dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T] ) ∂θt . Pierre Jacob Derivative estimation 32/ 40
  • 39. Iterated Filtering The estimator of the score is thus given by T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t which can be reduced to Sτ,ρ,T (θ0) = τ−2 1 + ρ [ (1 − ρ) {T−1∑ t=2 E ( θt Y ) } − {(1 − ρ) T + 2ρ} θ0 +E ( θ1 Y ) + E ( θT Y )] , given the form of Σ−1 T . Note that in the quantities E(θt | Y ), Y = Y1:T is the complete dataset, thus those expectations are with respect to the smoothing distribution. Pierre Jacob Derivative estimation 33/ 40
  • 40. Iterated Filtering If ρ = 1, then the parameters follow a random walk: θ1 = θ0 + N(0, τ2 ) and θt+1 = θt + N(0, σ2 ). In this case Ionides et al. proposed the estimator Sτ,σ,T = τ−2 ( E ( θT | Y ) − θ0 ) as well as S (bis) τ,σ,T = T∑ t=1 VP,t −1 ( ˜θF,t − ˜θF,t−1 ) with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t]. Those expressions only involve expectations with respect to filtering distributions. Pierre Jacob Derivative estimation 34/ 40
  • 41. Iterated Filtering If ρ = 0, then the parameters are i.i.d: θ1 = θ0 + N(0, τ2 ) and θt+1 = θ0 + N(0, σ2 ). In this case the expression of the score estimator reduces to Sτ,T = τ−2 T∑ t=1 ( E ( θt | Y ) − θ0 ) which involves smoothing distributions. There’s only one parameter τ2 to choose for the prior. However smoothing for general hidden Markov models is difficult, and typically resorts to “fixed lag approximations”. Pierre Jacob Derivative estimation 35/ 40
  • 42. Iterated Smoothing Only for the case ρ = 0 are we able to obtain simple expressions for the observed information matrix. We propose the following estimator: Iτ,T (θ0) = −τ−4 { T∑ s=1 T∑ t=1 Cov ( θs, θt Y ) − τ2 T } . for which we can show that Iτ,T − (−∇2 ℓ(θ0)) ≤ Cτ2 . Pierre Jacob Derivative estimation 36/ 40
  • 43. Numerical results Linear Gaussian state space model where the ground truth is available through the Kalman filter. X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V ) Yt = ηXt + N(0, W ). The parameters are (ρ, V , η, W ). We generate T = 100 observations. Easy set of parameters: ρ = 0.8, V = 0.72 , η = 0.9, W = 12 . Hard set of parameters: ρ = 0.8, V = 12 , η = 0.9, W = 0.12 . The gradient being four-dimensional, we plot only the first component estimated over time. Pierre Jacob Derivative estimation 37/ 40
  • 44. Numerical results IS FD 0 100 200 300 400 0 25 50 75 100 0 25 50 75 100 time estimates Figure : 250 runs for Iterated Smoothing and Finite Difference, “easy” scenario. Pierre Jacob Derivative estimation 38/ 40
  • 45. Numerical results IS FD 0 500 1000 1500 0 25 50 75 100 0 25 50 75 100 time estimates Figure : 250 runs for Iterated Smoothing and Finite Difference, “hard” scenario. Pierre Jacob Derivative estimation 39/ 40
  • 46. Bibliography Main references: Inference for nonlinear dynamical systems, Ionides, Breto, King, PNAS, 2006. Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals of Statistics, 2011. Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall, Madsen, 16th IFAC Symposium on System Identification. Derivative-Free Estimation of the Score Vector and Observed Information Matrix, Doucet, Jacob, Rubenthaler, 2013 (on arXiv). Pierre Jacob Derivative estimation 40/ 40