QMC: Transition Workshop - Approximating Multivariate Functions When Function Information Is Expensive - Fred Hickernell, May 7, 2018

Function Approximation When Function Values Are Expensive
Fred J. Hickernell1
Simon Mak2
1
Department of Applied Mathematics, Illinois Institute of Technology, hickernell@iit.edu
2
School of Industrial and System Engineering, Georgia Institute of Technology
Supported by NSF-DMS-1522687 and DMS-1638521 (SAMSI)
SAMSI-QMC Transition Workshop, May 7, 2018

Background Approx. by Series Coeﬃcients Approx. by Function Values References Appendix
Thanks to ...
SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional
Sampling for Applied Mathematics
Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann,
Richard Smith, and David Banks
2/25

Thanks to ...
Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders
Especially Art, who keeps trying to push QMC out of its box
2/25

Thanks to ...
Many of you with whom I have had fruiful discussions, especially WGs 2, 4, and 5
2/25

Thanks to ...
Mac Hyman, who
Promoted QMC to SAMSI behind the scenes
Introduced me to problems with expensive function values
Co-led WG 5
2/25

Thanks to ...
Mac Hyman, who
Co-led WG 5
Henryk Woźniakowski, whose work on tractability has inspired some of what I will say
2/25

Thanks to ...
Mac Hyman, who
Co-led WG 5
Henryk Woźniakowski, whose work on tractability has inspired some of what I will say
Kai-Tai Fang, who introduced me to experimental design
2/25

Approximating Functions When Function Values Are Expensive
Interested in f : [−1, 1]d
→ R, e.g., the result of a climate model, or a ﬁnancial calculation
d is dozens or a few hundred
$(f) = cost to evaluate f(x) for any x ∈ [−1, 1]d
= hours or days or $1M
Want to construct a surrogate model, fapp ≈ f, with $(fapp) = $0.000001 so that we may
quickly explore (plot, integrate, optimize, search for sharp gradients of) f
fapp is constructed using n pieces of information about f
Want f − fapp ∞
ε for n = O(dp
ε−q
) as d ↑ ∞ or ε ↓ 0 (with small p and q)
Assume $(f) nr
for any practical n and any positive r, so the cost of the algorithm is
O($(f)n)
3/25

Functions Expressed at Series
Let f : [−1, 1]d
→ R have L2
([−1, 1]d
, ) an orthogonal series expansion:
f(x) =
j∈Nd
0
f(j)φj(x), φj(x) = φj1
(x1) · · · φjd
(xd), φj ∞
= 1
f(j) =
f, φj
φj, φj
, f, g :=
[−1,1]d
f(x)g(x) (x) dx
Legendre polynomials:
1
−1
φj(x)φk(x) dx = cjδj,k
Chebyshev polynomials: φj(x) = cos(j arccos(x)),
1
−1
φj(x)φk(x)
√
1 − x2
dx = cjδj,k
4/25

Example Bases
Legendre
Chebyshev
5/25

Approximation by Series Coefficients
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1
Suppose that we may observe the series coefficients f(j) at a cost of $1M each. (Eventually
we want to consider the case of observing function values.) For any vector of non-negative
constants, γ = (γj)j∈Nd
0
, define the norm
f q,γ
:=
f(j)
γj
j∈Nd
0 q
, 0/0 = 0, γj = 0 & f ∞,γ
< ∞ =⇒ f(j) = 0
Order the wavenumbers j such that γj1
γj2
· · · . The optimal approximation why? to f
given the choice of n series coefficients is
fapp(x) =
n
i=1
f(ji)φji
, f − fapp ∞
=
∞
i=n+1
f(ji)φji
∞
loose
f−fapp 1
tight
optimal
f ∞,γ
∞
i=n+1
γji
6/25

How Quickly Does Error Decay?
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj1
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
, f − fapp ∞
loose
f − fapp 1
tight
optimal
f ∞,γ
∞
i=n+1
γji
An often used trick (q > 0):
γjn+1
1
n
γ
1/q
j1
+ · · · + γ
1/q
jn
q
1
nq
γ 1/q , γ 1/q =
j∈Nd
0
γ
1/q
j
q
∞
i=n+1
γji
γ 1/q
∞
i=n
1
iq
γ 1/q
(q − 1)(n − 1)q−1
rate controlled by ﬁniteness of γ 1/q
7/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
dependence of f on d is hidden γj1
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
n = O





f ∞,γ
γ 1/q
ε


1/(q−1)


 is suﬃcient
To succeed with n = O(dp
)1, we need γ 1/q
= O(dp
)
1Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in
Mathematics 6 (European Mathematical Society, Zürich, 2008), Kühn, T. et al. Approximation numbers of Sobolev
embeddings—Sharp constants and tractability. J. Complexity 30, 95–116 (2014). 8/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
γ 1/q
= O(dp
) =⇒ n = O(dp
)
What remains?
8/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
γ 1/q
= O(dp
) =⇒ n = O(dp
)
What remains?
How do we infer γ in practice? Tradition ﬁxes something convenient.
8/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
γ 1/q
= O(dp
) =⇒ n = O(dp
)
What remains?
How do we infer a bound on f ∞,γ
?
8/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
γ 1/q
= O(dp
) =⇒ n = O(dp
)
What remains?
?
How do we approximate using function values, not series coeﬃcients?
8/25

Recap
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
=
f(j)
γj
j∈Nd
0 q
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
,
f − fapp ∞
f − fapp 1
f ∞,γ
∞
i=n+1
γji
f ∞,γ
γ 1/q
(q − 1)(n − 1)q−1
Want
ε
γ 1/q
= O(dp
) =⇒ n = O(dp
)
What remains? Assume that the function is nice enough to allow this inference.
?
How do we approximate using function values, not series coeﬃcients?
8/25

Main New Ideas
It is assumed that the f is nice enough to justify the following:
Inferring γ Assume a structure informed by experimental design principles. Infer
coordinate importance from a pilot sample with wavenumbers
J := {(0, . . . , 0, j, 0 . . . , 0) : j = 0, . . . , n0}
= {jek : j = 0, . . . , n0, k = 1, . . . , d}
Inferring f ∞,γ
Iteratively add wavenumber with largest γj to J. Inﬂate the norm that is
observed so far and assume
f ∞,γ
C fj j∈J ∞,γ
Function values Let the new wavenumber, j, pick the next design point via a shifted van der
Corput sequence. Use interpolation to estimate fj j∈J
.
9/25

Product, Order, and Smoothness Dependent (POSD) Weights
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ
= f(j) /γj
j∈Nd
0 q
j∈Nd
0
γ
1/q
j = O(dp
) =⇒ f − fapp ∞
ε for n = O(dp
) if f ∞,γ
< ∞
Experimental design assumes2
Effect sparsity: Only a small number of effects are important
Effect hierarchy: Lower-order effects are more important than higher-order effects
Effect heredity: Interaction is active only if both parent effects are also active
Effect smoothness: Coarse horizontal scales are more important than fine horizontal scales
Consider product, order, and smoothness dependent (POSD) weights:
γj = Γ j 0
d
=1
j >0
w sj , Γ0 = s1 = 1,



w = coordinate importance
Γr = order size
sj = smoothness degree
2Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design Optimization. (John Wiley
& Sons, Inc., New York, 2000). 10/25

Product, Order, and Smoothness Dependent Weights
Effect sparsity: Only a small number of effects are important
Effect hierarchy: Lower-order effects are more important than higher-order effects
Effect heredity: Interaction is active only if both parent effects are also active
Effect smoothness: Coarse horizontal scales are more important than fine horizontal scales
Consider product, order and smoothness dependent weights:
γj = Γ j 0
d
=1
j >0
w sj , Γ0 = s1 = 1,



w = coordinate importance
Γr = order size
sj = smoothness degree
j∈Nd
0
γ
1/q
j =
u⊆1:d


Γ
1/q
|u|
∈u
w
1/q


∞
j=1
s
1/q
j


|u|


 = O(dp
)
=⇒ f − fapp ∞
ε for n = O(dp
) if f ∞,γ
< ∞
11/25

Special Cases of Weights
j∈Nd
0
γ
1/q
j =
u⊆1:d


Γ
1/q
|u|
∈u
w
1/q


∞
j=1
s
1/q
j


|u|



Want
= O(dp
)
Coordinates, orders equally important
Γr = w = 1
j∈Nd
0
γ
1/q
j =

1 +
∞
j=1
s
1/q
j


d
Fail
Coordinates equally important
No interactions
w = Γ1 = 1, Γr = 0 ∀r > 1
j∈Nd
0
γ
1/q
j = 1 + d
∞
j=1
s
1/q
j Success
Coordinates diﬀer in importance
Interactions equally important
Γr = 1
j∈Nd
0
γ
1/q
j exp


∞
k=1
w
1/q
k
∞
j=1
s
1/q
j

 Success
12/25

13/25

Algorithm When Both γ and f ∞,γ
Are Inferred
Require: Γ = vector of order sizes s = vector of smoothness degrees w∗
= max
k
wk
n0 = minimum number of wavenumbers in each coordinate C = inflation factor
f = a black-box series coefficient generator for the function of interest, f, where
f ∞,γ
C fj j∈J ∞,γ
, J := {(0, . . . , 0, j, 0 . . . , 0) : j = 0, . . . , n0} for all γ
ε = positive absolute error tolerance
Ensure: f − fapp ∞ ε
1: Evaluate f(j) for j ∈ J
2: Define w = min argmin
w w∗
fj j∈J ∞,γ
3: Let n = min n :
∞
i=n +1
γji
ε
C fj j∈J ∞,γ
4: Compute fapp =
n
i=1
f(ji)φji
Computational cost is n = O ε−1
C f ∞,γ
γ 1/q
1/(q−1)
14/25

Example
f manufactured in terms of random series coeﬃcients
15/25

A Gap Between Theory and Practice
Theory
using
series
coeﬃcients
Photo Credit: Xinhua
Practice
using
function
values
16/25

A Very Sparse Grid on [−1, 1]d
j 0 1 2 3 4 · · ·
van der Corput tj 0 1/2 1/4 3/4 1/8 · · ·
ψ(tj) := 2(tj + 1/3 mod 1) − 1 −1/3 2/3 1/6 −5/6 −1/12 · · ·
ψ(tj) := − cos(π(tj + 1/3 mod 1)) −0.5 0.8660 0.2588 −0.9659 −0.1305 · · ·
To estimate f(j), j ∈ J, use the design {(ψ(tj1
), . . . , ψ(tjd
) : j ∈ J}. E.g., for
J = {(0, 0, 0, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (2, 0, 0, 0), (3, 0, 0, 0), (1, 1, 0, 0)}
Even Points ArcCos Points
· · 17/25

Algorithm Using Function Values When Both γ and f ∞,γ
Are Inferred
Require: Γ = vector of order sizes s = vector of smoothness degrees w∗
= max
k
wk
n0 = minimum number of wavenumbers in each coordinate C = inﬂation factor
f = a black-box function value generator ε = positive absolute error tolerance
Ensure: f − fapp ∞ ε
1: Approximate f(j) for j ∈ J := {(0, . . . , 0, j, 0 . . . , 0) : j = 1, . . . , n0} by interpolating the function data
{(xj, f(xj)) : xj = ψ(tj1
), . . . , ψ(tjd
), j ∈ J}
2: Deﬁne w = min argmin
w w∗
fj j∈J ∞,γ
3: while C fj j∈J ∞,γ
j/∈J
γj > ε do
4: Add argmin
j/∈J
γj to J
5: Approximate f(j) for j ∈ J by interpolating the function data {(xj, f(xj)) : x = ψ(tj1
), . . . , ψ(tjd
), j ∈ J}
6: end while
7: Compute fapp =
j∈J
f(j)φj
18/25

Example3 f(x) = exp((x2 + 1)(x3 + 1)/4) cos((x2 + 1)/2 + (x3 + 1)/2), d = 6
3Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013.
https://www.sfu.ca/~ssurjano/. 19/25

OTL CircuitExample4
4Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013.

Summary
Functions must be nice to succeed with few function values
Ideas underlying experimental design and tractability show us how to define “nice”
Effect sparsity, hierarchy, heredity, and smoothness
Product, order, and smoothness dependent (POSD) weighted function spaces
Infer properties of f from limited data (γ, f ∞,γ
, f)
Must assume some structure on weights to make progress at all
Design determined by wavenumbers included in approximation via van der Corput,
preserves low condition number of the design matrix
Gap in theory when sampling function values versus series coefficients
Sample size seems to be larger than necessary
Can we also infer the smoothness weights?
21/25

Thank you
These slides are available at
speakerdeck.com/fjhickernell/samsi-qmc-transition-2018-may

Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear
Information. EMS Tracts in Mathematics 6 (European Mathematical Society, Zürich, 2008).
Kühn, T., Sickel, W. & Ullrich, T. Approximation numbers of Sobolev embeddings—Sharp
constants and tractability. J. Complexity 30, 95–116 (2014).
Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design
Optimization. (John Wiley & Sons, Inc., New York, 2000).
Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013.

In What Sense Is This Optimal?
f(x) =
j∈Nd
0
f(j)φj(x), f(j) =
f, φj
φj, φj
, φj ∞
= 1, f q,γ =
|f(j)|
γj
j∈Nd
0 q
γj1
γj2
· · · , fapp(x) =
n
i=1
f(ji)φji
, f − fapp ∞
loose
f − fapp 1
tight
optimal
f ∞,γ
∞
i=n+1
γji
For any other approximation, g, based on series coeﬃcients, {f(j)}j∈J with |J| = n,
sup
h : ^h ∞,γ
= f ∞,γ
^h(j)=f(j) ∀j∈J
^h − ^g 1
= f(j) − ^g(j) j∈J 1
+ sup
h : ^h ∞,γ
= f ∞,γ
^h(j) − ^g(j) j/∈J 1
sup
h : ^h ∞,γ
= f ∞,γ
^h(j) j/∈J 1
= sup
h : ^h ∞,γ
= f ∞,γ
^h ∞,γ
j/∈J
γj
f ∞,γ
∞
i=n+1
γji
back
23/25

Inferring γ from Data
Given (estimates of) series coeﬃcients, f(j) for j ∈ J := {(0, . . . , 0, j, 0 . . . , 0) : j = 1, . . . , n0},
and ﬁxed {Γr}d
r=0, and {sj}∞
f=0, note that
f(j) j∈J ∞,γ
= max
j∈J
|f(j)|
γj
=
1
Γ1
max
k=1,...,d
fk,max
wk
, fk,max := sup
j=1,...,n0
|f(jek)|
sj
We choose
wk =
fk,max
max f ,max
, f(j) j∈J ∞,γ
=
max f ,max
Γ1
24/25

Tail Sum of γ
The term
∞
i=n+1
γji
=
∞
i=1
γji
−
n
i=1
γji
appears in the error bound. For certain γ of PSD form, we can compute the ﬁrst sum on the
right:
j∈Nd
0
γj =
u⊆1:d



∈u
w


∞
j=1
sj


|u|


 =
d
=1
(1 + w ssum), ssum =
∞
j=1
sj
25/25

QMC: Transition Workshop - Approximating Multivariate Functions When Function Information Is Expensive - Fred Hickernell, May 7, 2018

More Related Content

What's hot (20)

Similar to QMC: Transition Workshop - Approximating Multivariate Functions When Function Information Is Expensive - Fred Hickernell, May 7, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded (20)

QMC: Transition Workshop - Approximating Multivariate Functions When Function Information Is Expensive - Fred Hickernell, May 7, 2018