Automatic Bayesian method for Numerical Integration

Automatic Bayesian method for Numerical
Integration
Jagadeeswaran Rathinavel, Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology
jrathin1@iit.edu
Thanks to the CASSC 2017 organizers

Introduction Bayesian Cubature Simulation results Conclusion References
Multivariate Integration
Approximate the d-dimensional integral over [0, 1)d
µ := E[f(X)] :=
ż
[0,1)d
f(x) ν(dx)
by a simple cubature rule
µ « ^µ :=
n´1ÿ
j=0
f(xj)wj =
ż
[0,1)d
f(x) ^ν(dx)
using points (xj)n´1
j=0 and associated weights wj. Then the approximation error
µ ´ ^µ =
ş
[0,1)d f(x) (ν ´ ^ν)(dx)
use extensible pointset and an algorithm that allows to add more points if needed.
2/22

Motivating Example
How to measure the water volume of a creek in a given area (For Ex: 10sqft) ?
Figure: (Nuyens, 2007)
3/22

What is multivariate integration?
A d-dim integral:
ż
[0,1)d
f(x) dx =
ż 1
0
ż 1
0
. . .
ż 1
0
f(x1, x2, . . . xd) dx1 dx2 . . . dxd
4/22

A d-dim integral:
ż
[0,1)d
f(x) dx =
ż 1
0
ż 1
0
. . .
ż 1
0
f(x1, x2, . . . xd) dx1 dx2 . . . dxd
Grid points: Curse of dimensionality
4/22

A d-dim integral:
ż
[0,1)d
f(x) dx =
ż 1
0
ż 1
0
. . .
ż 1
0
f(x1, x2, . . . xd) dx1 dx2 . . . dxd
IID Monte Carlo: Converges O(n´ 1
2 )
^µ = 1
n
řn´1
j=0 f(xj)
4/22

A d-dim integral:
ż
[0,1)d
f(x) dx =
ż 1
0
ż 1
0
. . .
ż 1
0
f(x1, x2, . . . xd) dx1 dx2 . . . dxd
2 )
^µ = 1
n
řn´1
j=0 f(xj)
Typical Quasi Monte Carlo: Converges O(n´1+
)
xj chosen carefully (low-discrepancy points)
4/22

A d-dim integral:
ż
[0,1)d
f(x) dx =
ż 1
0
ż 1
0
. . .
ż 1
0
f(x1, x2, . . . xd) dx1 dx2 . . . dxd
2 )
^µ = 1
n
řn´1
j=0 f(xj)
Typical Quasi Monte Carlo: Converges O(n´1+
)
xj chosen carefully (low-discrepancy points)
Can we do better? 4/22

Algorithm
1: procedure AutoCubature(f, errtol) Ź Integrate within the error tolerance
2: n Ð 28
3: do
4: Generate (xi)n´1
i=0
5: Sample (f(xi))n´1
i=0
6: Compute errn
7: n Ð 2 ˆ n
8: while errn ą errtol Ź Iterate till error tolerance is met
9: Compute weights (wi)n´1
i=0
10: Compute integral ^µn
11: return ^µn Ź Integral estimate ^µn
12: end procedure
Problem:
How to choose (xj)n´1
j=0 and (wj)n´1
j=0 to make |µ ´ ^µ| small?
Given error tolerance errtol, how big must ‘n’ be to guarantee |µ ´ ^µ| ď errtol
5/22

Bayesian Trio Identity
Random f : f „ GP(0, s2
Cθ), a Gaussian process
from the sample space F
with zero mean and covariance kernel, s2
Cθ, Cθ : X ˆ X Ñ R. Then
6/22

c0 =
ż
XˆX
C(x, t) ν(dx)ν(dt), c =
ż
X
C(xi, t) ν(dt)
n´1
i=0
C = C(xi, xj)
n´1
i,j=0
, w = wi
n´1
i=0
,
µ ´ ^µ =
ş
X f(x) (ν ´ ^ν)(dx)
s
a
c0 ´ 2cTw + wTCwloooooooooooooomoooooooooooooon
ALNB
(f, ν ´ ^ν)
a
c0 ´ 2cTw + wTCwlooooooooooooomooooooooooooon
DSCB
(ν ´ ^ν)
sloomoon
VARB
(f)
6/22

c0 =
ż
XˆX
C(x, t) ν(dx)ν(dt), c =
ż
X
C(xi, t) ν(dt)
n´1
i=0
C = C(xi, xj)
n´1
i,j=0
, w = wi
n´1
i=0
,
µ ´ ^µ =
ş
X f(x) (ν ´ ^ν)(dx)
s
a
c0 ´ 2cTw + wTCwloooooooooooooomoooooooooooooon
ALNB
(f, ν ´ ^ν)
a
c0 ´ 2cTw + wTCwlooooooooooooomooooooooooooon
DSCB
(ν ´ ^ν)
sloomoon
VARB
(f)
The scale parameter, s, and shape parameter, θ, should be estimated.
w = C´1
c minimizes DSCB
(ν ´ ^ν)
makes ALNB
(f, ν ´ ^ν)
ˇ
ˇ(f(xi) = yi)n´1
i=0 „ N(0, 1)
Ref: Diaconis (1988), O’Hagan (1991), Ritter (2000), Rasmussen (2003) and
others 6/22

Covariance kernel
shift invariant kernel
Cθ(x, t) :=
ÿ
mPZd
λm,θ e
?
´12πmT
(x´t)
, 0 ď |x ´ t| ď 1
7/22

Covariance kernel
Cθ(x, t) :=
ÿ
mPZd
λm,θ e
?
´12πmT
(x´t)
, 0 ď |x ´ t| ď 1
when
λm,θ :=
dź
l=1
1
max(|ml|
θ , 1)r
θď1
, with λ0,θ = 1, θ P (0, 1]d
, r P 2N
Cθ(x, t) =
dź
l=1
1 ´ θr
l
(2π
?
´1)r
r!
Br(|xl ´ tl|), θ P (0, 1]d
, r P 2N
where Br is Bernoulli polynomial of order r (Olver et al., 2013).
Br(x) =
´r!
(2π
?
´1)r
∞ÿ
k‰0,k=´∞
e2π
?
´1kx
kr
#
for r = 1, 0 ă x ă 1
for r = 2, 3, . . . 0 ď x ď 1
7/22

Covariance kernel
Cθ(x, t) :=
ÿ
mPZd
λm,θ e
?
´12πmT
(x´t)
, 0 ď |x ´ t| ď 1
when
λm,θ :=
dź
l=1
1
max(|ml|
θ , 1)r
θď1
, with λ0,θ = 1, θ P (0, 1]d
, r P 2N
Cθ(x, t) =
dź
l=1
1 ´ θr
l
(2π
?
´1)r
r!
Br(|xl ´ tl|), θ P (0, 1]d
, r P 2N
where Br is Bernoulli polynomial of order r (Olver et al., 2013).
Br(x) =
´r!
(2π
?
´1)r
∞ÿ
k‰0,k=´∞
e2π
?
´1kx
kr
#
for r = 1, 0 ă x ă 1
for r = 2, 3, . . . 0 ď x ď 1
Given (xi : i = 0, ..., n ´ 1), the symmetric kernel matrix is formed
Cθ = Cθ(xi, xj)
n´1
i,j=0
7/22

Quais Monte-Carlo : Sample “more uniformly”
8/22

Rank-1 Lattice rules : Low discrepancy point set
Given the “generating vector” g, the construction of n - Rank-1 lattice points is
given by
"
kg
n
* n´1
k=0
then the Lattice rule approximation is
1
n
n´1ÿ
k=0
f
"
kg
n
*
with t.u the fractional part, i.e, ‘modulo 1’ operator.
9/22

Rank-1 Lattice rules : Low discrepancy point set
Shift invariant kernel + Lattice points = ’Symmetric circulant kernel’ matrix
9/22

Bayesian Cubature
µ =
ż
X
f(x)ν(dx) « ^µn =
n´1ÿ
i=0
wif(xi)
10/22

Bayesian Cubature
µ =
ż
X
n´1ÿ
i=0
wif(xi)
Assume f „ GP(0, s2
Cθ)
µ ´ ^µn
ˇ
ˇ(f(xi) = yi)n´1
i=0 „ N yT
(C´1
c ´ w), s2
(c0 ´ cT
C´1
c)
10/22

Bayesian Cubature
µ =
ż
X
n´1ÿ
i=0
wif(xi)
Cθ)
µ ´ ^µn
ˇ
ˇ(f(xi) = yi)n´1
i=0 „ N yT
(C´1
c ´ w), s2
(c0 ´ cT
C´1
c)
Choosing w = C´1
^θ
c^θ is optimal
P[|µ ´ ^µn| ď errn] ě 99% for errn = 2.58
d
c^θ,0 ´ cT
^θ
C´1
^θ
c^θ
yTC´1
^θ
y
n
MLE ^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
, where y = f(xi)
n´1
i=0
.
10/22

Bayesian Cubature
µ =
ż
X
n´1ÿ
i=0
wif(xi)
Cθ)
µ ´ ^µn
ˇ
ˇ(f(xi) = yi)n´1
i=0 „ N yT
(C´1
c ´ w), s2
(c0 ´ cT
C´1
c)
Choosing w = C´1
^θ
c^θ is optimal
P[|µ ´ ^µn| ď errn] ě 99% for errn = 2.58
d
c^θ,0 ´ cT
^θ
C´1
^θ
c^θ
yTC´1
^θ
y
n
MLE ^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
, where y = f(xi)
n´1
i=0
.
C´1
typically requires O(n3
) operations.
But with covariance kernel C for which matrix C is symmetric circulant.
So operations on C require only O(n log(n)) operations.
10/22

Optimal Shape parameter θ
Maximum likelihood estimate of θ by
argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
11/22

Optimal Shape parameter θ
Maximum likelihood estimate of θ by
argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
simpliﬁed to (Bernstein, 2009)
argmin
θ
log
n´1ÿ
i=0
|zi|2
γi
+
1
n
n´1ÿ
i=0
log(γi)
where (γi) eigen values of Cθ
z := (zi)n´1
i=0 = DFT y , γ := (γi)n´1
i=0 = DFT C(xi, x0)
n´1
i=0
O(n log(n)) operations to estimate the ^θ
11/22

Cubature rule - Computing ^µ eﬃciently
Deﬁne
Cθ = Cθ(xi, xj)
n´1
i,j=0
, cθ =
ż
[0,1)d
Cθ(xi, x)dx
n´1
i=0
12/22

Deﬁne
Cθ = Cθ(xi, xj)
n´1
i,j=0
, cθ =
ż
[0,1)d
Cθ(xi, x)dx
n´1
i=0
To ﬁnd the approximate mean ^µ
^µ = wT
y = C´1
θ cθloomoon
wT
y
12/22

Define
Cθ = Cθ(xi, xj)
n´1
i,j=0
, cθ =
ż
[0,1)d
Cθ(xi, x)dx
n´1
i=0
To find the approximate mean ^µ
^µ = wT
y = C´1
θ cθloomoon
wT
y
Further simplified using shift-invariant and circulant matrix property (Bernstein,
2009)
^µ =
ş
[0,1]d K x, x0 dx ˆ
řn´1
i=0 C(x0, xi)
´1
ˆ
řn´1
i=0 yi
O(n) operations to compute the ^µ
12/22

Computing error bound eﬃciently
Let’s simplify: errn = 2.58
d
C^θ,0 ´ cT
^θ
C´1
^θ
c^θ
yTC´1
^θ
y
n
13/22

d
C^θ,0 ´ cT
^θ
C´1
^θ
c^θ
yTC´1
^θ
y
n
using the facts about the shift invariant kernel and R-1 Lattice points (Bernstein,
2009), where
z := (zi)n´1
i=0 = DFT (y) , γ := (γi)n´1
i=0 = DFT (C(xi, x0))n´1
i=0
13/22

d
C^θ,0 ´ cT
^θ
C´1
^θ
c^θ
yTC´1
^θ
y
n
using the facts about the shift invariant kernel and R-1 Lattice points (Bernstein,
2009), where
z := (zi)n´1
i=0 = DFT (y) , γ := (γi)n´1
i=0 = DFT (C(xi, x0))n´1
i=0
Finally
errn = 2.58
g
f
f
e 1 ´
1
řn´1
i=0 C(x0, xi)
1
n
n´1ÿ
i=0
|zi|2
γi
13/22

Periodization Transforms
Our algorithm works best with periodic functions
Baker : ˜f(t) = f 1 ´ 2|t ´
1
2
|
C0 : ˜f(t) = f 3t2
´ 2t3
dź
j=1
(6tj(1 ´ tj))
C1 : ˜f(t) = f t3
(10 ´ 15t + 6t2
)
dź
j=1
(30t2
j (1 ´ tj)2
)
Sidi’s C1 : ˜f(t) = f tj ´
sin(2πtj)
2π
d
j=1
dź
j=1
(1 ´ cos(2πtj))
14/22

Test functions for Numerical integration:
Multivariate Normal (MVN)
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
Keister : µ =
ż
Rd
cos( x ) exp(´ x 2
) dx, d = 1, 2, . . . .
Exp(Cos) : µ =
ż
(0,1]d
exp(cos(x))dx, d = 1, 2, . . . .
15/22

Integrating Multivariate Normal Probability
16/22

Exponential of Cosine
17/22

Keister Integral Example
18/22

Summary
Automatic Bayesian Cubature with O(n) complexity and O(n log(n)) MLE
Complexity
Having the advantages of a kernel method and the low computation cost of
Quasi Monte carlo
Scalable based on the complexity of the Integrand
i.e, Kernel order and Lattice-points can be chosen to suit the smoothness of
the integrand
More about Guaranteed Automatic Algorithms (GAIL):
http://gailgithub.github.io/GAIL_Dev/
These slides will also be available here.
19/22

Future work
Dimension independent
Tighter error bound
Roundoﬀ error
Lattice points optimization speciﬁc the the kernel
Choosing kernel order automatically as part of MLE
Can we compute n directly instead of in Loop?
Choosing Periodization transform automatically
20/22

References I
Bernstein, Dennis S. 2009. Matrix mathematics, theory, facts, and formulas, Princeton University
Press.
Diaconis, P. 1988. Bayesian numerical analysis, Statistical decision theory and related topics iv,
papers from the 4th purdue symp., west lafayette/indiana 1986, pp. 163–175.
Genz, A. 1993. Comparison of methods for the computation of multivariate normal probabilities,
Computing Science and Statistics 25, 400–405.
Nuyens, Dirk. 2007. Fast construction of good lattice rules, Ph.D. Thesis.
O’Hagan, A. 1991. Bayes-Hermite quadrature, J. Statist. Plann. Inference 29, 245–260.
Olver, F. W. J., D. W. Lozier, R. F. Boisvert, C. W. Clark, and A. B. O. Dalhuis. 2013. Digital library of
mathematical functions.
Rasmussen, C. E. 2003. Bayesian Monte Carlo, Advances in Neural Information Processing
Systems, pp. 489–496.
Ritter, K. 2000. Average-case analysis of numerical problems, Lecture Notes in Mathematics,
vol. 1733, Springer-Verlag, Berlin.
22/22

Automatic Bayesian method for Numerical Integration

More Related Content

What's hot (20)

Similar to Automatic Bayesian method for Numerical Integration (20)

Recently uploaded (20)

Automatic Bayesian method for Numerical Integration