Harmonic Analysis and Deep Learning

Harmonic Analysis
&

Deep Learning
Sungbin Lim

In this talk…
Mathematical theory about ﬁlter, activation,
pooling through multi-layers based on DCNN
Encompass general ingredients
Lipschitz continuity & Deformation sensitivity
WARNING : Very tough mathematics
…without non-Euclidean geometry (e.g. Geometric DL)

What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efﬁciently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis

Hilbert space & Inner product
Banach space :
Hilbert space :
© Kyung-Min Rho

© Kyung-Min Rho
Banach space :
Normed space + Completeness
Hilbert space :

Banach space :
Hilbert space :
Banach space + Inner product
© Kyung-Min Rho

Banach space :
Hilbert space :
Rd
, L2, Wn
2 , · · ·
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho

Banach space :
Hilbert space :
Rd
, L2, Wn
2 , · · ·
hu, vi =
dX
k=1
ukvk
hf, giL2
=
Z
f(x)g(x)dx
hf, giW n
2
= hf, giL2 +
nX
k=1
h@k
xf, @k
xgiL2
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho

Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding

Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding
Why we prefer polynomial?

Stone-Weierstrass theorem
Polynomial is Universal approximation!
8f 2 C(X), 8" > 0,
9Pn s.t. max
x2X
|f(x) Pn(x)| < "
© Wikipedia

8f 2 C(X),
9Pn s.t. lim
n!1
kf Pnk1 = 0
© Wikipedia

Even we can approximate derivatives!
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Universal approximation = {DL, polynomials, Tree,…}
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Universal approximation = {DL, polynomials, Tree,…}
But why we do not use polynomial?
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia

Local interpolation works well for low dimension
© S. Mallat

Need " d
points to cover [0, 1]d
at a distance "
© S. Mallat

Need " d
points to cover [0, 1]d
at a distance "
High dimension ⇢ Curse of dimension!
© H. Bölcskei

Universal approximator
= Good feature extractor
?

Universal approximator
= Good feature extractor
…in HIGH dimension!

Nonlinear Feature Extraction
© S. Mallat, © H. Bölcskei

Dimension Reduction ⇢ Invariants
© S. Mallat

Dimension Reduction ⇢ Invariants
How?
© S. Mallat

Main Topic in Harmonic Analysis
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)

Invariance vs Discriminability

Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
AkfkH  kL[f]kH  BkfkH

kL[f1] L[f2]kH = kL[f1 f2]kH Akf1 f2kH
i.e. f1 6= f2 ) L[f1] 6= L[f2]

k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH

k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH
Banach ﬁxed-point theorem

Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform

Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Lipschitz continuity
ex) ReLU, tanh, sigmoid …
|f(x) f(y)|  Ckx yk () krf(x)k  C

How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Theorem
No change in Invariance!

Proof)
Let ⇢ = ReLU, H = W1
2 . Then
Theorem

Proof)
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2

Proof)
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
What about Discriminability?

Scale Invariant Feature
Translation Invariant
Stable at Deformation

Scattering Network (Mallat, 2012)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
© H. Bölcskei

Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Gabor frame
Tensor wavelet Directional wavelet
Ridgelet frame Curvelet frame
© H. Bölcskei

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
© S. Mallat

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
“Space folding”, Cho (2014)
© S. Mallat

(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem

f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Features become more translation invariant
with increasing network depth

© Philip Scott Johnson
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2

Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2
Multi-layer convolution linearize Features
i.e. stable to deformations

David Hilbert
Wir müssen wissen.
Wir werden wissen.

Harmonic Analysis and Deep Learning

More Related Content

What's hot (20)

Similar to Harmonic Analysis and Deep Learning (20)

Recently uploaded (20)

Harmonic Analysis and Deep Learning