SlideShare a Scribd company logo
Harmonic Analysis
&

Deep Learning
Sungbin Lim
In this talk…
Mathematical theory about filter, activation,
pooling through multi-layers based on DCNN
Encompass general ingredients
Lipschitz continuity & Deformation sensitivity
WARNING : Very tough mathematics
…without non-Euclidean geometry (e.g. Geometric DL)
What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efficiently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
What is Harmonic Analysis?
f(x)=
X
n2N
an n(x), an := hf, niH
How to represent a function efficiently in the
sense of Hilbert space?
Number theory
Signal processing
Quantum mechanics
Neuroscience, Statistics, Finance, etc…
Includes PDE theory, Stochastic Analysis
Hilbert space & Inner product
Banach space :
Hilbert space :
© Kyung-Min Rho
Hilbert space & Inner product
© Kyung-Min Rho
Banach space :
Normed space + Completeness
Hilbert space :
Banach space :
Normed space + Completeness
Hilbert space :
Banach space + Inner product
Hilbert space & Inner product
© Kyung-Min Rho
Banach space :
Normed space + Completeness
Hilbert space :
Banach space + Inner product
Rd
, L2, Wn
2 , · · ·
Hilbert space & Inner product
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho
Banach space :
Normed space + Completeness
Hilbert space :
Banach space + Inner product
Rd
, L2, Wn
2 , · · ·
hu, vi =
dX
k=1
ukvk
hf, giL2
=
Z
f(x)g(x)dx
hf, giW n
2
= hf, giL2 +
nX
k=1
h@k
xf, @k
xgiL2
Hilbert space & Inner product
Cn
, Lp, Wn
p · · ·
© Kyung-Min Rho
Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding
Why Harmonic Analysis?
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
(an, an 1, . . . , a1 , a0)
Encoding
Pn(x) = anxn
+ an 1xn 1
+ · · · + a1x + a0
Decoding
Why we prefer polynomial?
Stone-Weierstrass theorem
Polynomial is Universal approximation!
8f 2 C(X), 8" > 0,
9Pn s.t. max
x2X
|f(x) Pn(x)| < "
© Wikipedia
8f 2 C(X),
9Pn s.t. lim
n!1
kf Pnk1 = 0
Stone-Weierstrass theorem
Polynomial is Universal approximation!
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
9Pn s.t. lim
n!1
kf PnkCn ! 0
Polynomial is Universal approximation!
8f 2 Ck
(X),
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
Universal approximation = {DL, polynomials, Tree,…}
Polynomial is Universal approximation!
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia
Stone-Weierstrass theorem
Even we can approximate derivatives!
Universal approximation = {DL, polynomials, Tree,…}
But why we do not use polynomial?
Polynomial is Universal approximation!
9Pn s.t. lim
n!1
kf PnkCn ! 0
8f 2 Ck
(X),
© Wikipedia
Local interpolation works well for low dimension
© S. Mallat
Local interpolation works well for low dimension
Need " d
points to cover [0, 1]d
at a distance "
© S. Mallat
Local interpolation works well for low dimension
Need " d
points to cover [0, 1]d
at a distance "
High dimension ⇢ Curse of dimension!
© H. Bölcskei
Universal approximator
= Good feature extractor
?
Universal approximator
= Good feature extractor
…in HIGH dimension!
Nonlinear Feature Extraction
© S. Mallat, © H. Bölcskei
Dimension Reduction ⇢ Invariants
© S. Mallat
Dimension Reduction ⇢ Invariants
How?
© S. Mallat
Main Topic in Harmonic Analysis
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Invariance vs Discriminability
Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
AkfkH  kL[f]kH  BkfkH
Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH  kL[f]kH  BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
kL[f1] L[f2]kH = kL[f1 f2]kH Akf1 f2kH
i.e. f1 6= f2 ) L[f1] 6= L[f2]
Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH  kL[f]kH  BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH
Main Topic in Harmonic Analysis
L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
AkfkH  kL[f]kH  BkfkH
Linear operator ⇢ Convolution + Multiplier
Discriminability vs Invariance
Littlewood-Paley Condition ⇢ Semi-discrete Frame
k L · · · L| {z }
n-fold
[f]kH  Bk L · · · L| {z }
(n-1)-fold
[f]kH  · · ·  Bn
kfkH
Banach fixed-point theorem
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Main Tasks in Deep CNN
Representation learning
Feature Extraction
Nonlinear transform
Lipschitz continuity
ex) ReLU, tanh, sigmoid …
|f(x) f(y)|  Ckx yk () krf(x)k  C
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Theorem
No change in Invariance!
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
k⇢(L[f])kH  N(B, C)kfkH
Proof)
No change in Invariance!
Let ⇢ = ReLU, H = W1
2 . Then
Theorem
k⇢(L[f])kW 1
2
= k max{L[f], 0}kL2
+ kr⇢(L[f])kL2
 kL[f]kL2 + k ⇢0
(L[f])
| {z }
=1 or 0
r(L[f])kL2
 kL[f]kL2
+ kr(L[f])kL2
= kL[f]kW 1
2
 BkfkW 1
2
How to control Lipschitz ?
What about Discriminability?
Scale Invariant Feature
Translation Invariant
Stable at Deformation
© S. Mallat
Scale Invariant Feature
Translation Invariant
Stable at Deformation
Scattering Network (Mallat, 2012)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
© H. Bölcskei
Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Gabor frame
Tensor wavelet Directional wavelet
Ridgelet frame Curvelet frame
© H. Bölcskei
Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
© S. Mallat
Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
© S. Mallat
Generalized Scattering Network (Wiatowski, 2015)
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Linearize symmetries
“Space folding”, Cho (2014)
© S. Mallat
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Generalized Scattering Network (Wiatowski, 2015)
f 7! Sd/2
n Pn(f)(Sn·)
|k n(Ttf) n(f)|k = O
ktk
Qn
j=1 Sj
!
Theorem
Features become more translation invariant
with increasing network depth
Generalized Scattering Network (Wiatowski, 2015)
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
(f) =
[
n
(
· · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p)
| {z }
n-fold convolution
⇤ n
)
(j),··· , (p)
Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
Theorem
F⌧,! = e2⇡i!(x)
f(x ⌧(x))
|k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2
Multi-layer convolution linearize Features
i.e. stable to deformations
Generalized Scattering Network (Wiatowski, 2015)
© Philip Scott Johnson
Ergodic Reconstructions
© Philip Scott Johnson
© S. Mallat
David Hilbert
Wir müssen wissen.
Wir werden wissen.
Q.A

More Related Content

PDF
Matrix calculus
PDF
Wasserstein GAN 수학 이해하기 I
PDF
Linear algebra
PPTX
Metric space
PPTX
십분딥러닝_16_WGAN (Wasserstein GANs)
PPT
Fractional Calculus PP
PDF
Gmm to vgmm
PPTX
topology definitions
Matrix calculus
Wasserstein GAN 수학 이해하기 I
Linear algebra
Metric space
십분딥러닝_16_WGAN (Wasserstein GANs)
Fractional Calculus PP
Gmm to vgmm
topology definitions

What's hot (20)

PDF
Natural Policy Gradient 직관적 접근
PDF
Group Theory
PDF
가깝고도 먼 Trpo
PPTX
Convolutional neural networks 이론과 응용
PDF
강화학습 알고리즘의 흐름도 Part 2
PDF
Fractional calculus and applications
PPTX
Section 8: Symmetric Groups
PDF
안.전.제.일. 강화학습!
PDF
강화학습의 흐름도 Part 1
PDF
Flow based generative models
PDF
Contraction mapping
PDF
metric spaces
PPTX
圏論とHaskellは仲良し
PDF
素数の分解法則(フロベニウスやばい) #math_cafe
PDF
차원축소 훑어보기 (PCA, SVD, NMF)
PDF
Lecture 2 predicates quantifiers and rules of inference
PDF
Lesson 11: Limits and Continuity
PDF
Deep Learning: Recurrent Neural Network (Chapter 10)
PPT
Partial derivative1
PDF
Relation matrix & graphs in relations
Natural Policy Gradient 직관적 접근
Group Theory
가깝고도 먼 Trpo
Convolutional neural networks 이론과 응용
강화학습 알고리즘의 흐름도 Part 2
Fractional calculus and applications
Section 8: Symmetric Groups
안.전.제.일. 강화학습!
강화학습의 흐름도 Part 1
Flow based generative models
Contraction mapping
metric spaces
圏論とHaskellは仲良し
素数の分解法則(フロベニウスやばい) #math_cafe
차원축소 훑어보기 (PCA, SVD, NMF)
Lecture 2 predicates quantifiers and rules of inference
Lesson 11: Limits and Continuity
Deep Learning: Recurrent Neural Network (Chapter 10)
Partial derivative1
Relation matrix & graphs in relations
Ad

Similar to Harmonic Analysis and Deep Learning (20)

PDF
IVR - Chapter 1 - Introduction
PDF
Some Thoughts on Sampling
PDF
Wavelets and Other Adaptive Methods
PDF
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
PDF
Introduction to Fourier transform and signal analysis
PDF
Signal lexture
PDF
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
PDF
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
PDF
Can we estimate a constant?
PDF
Testing for mixtures by seeking components
PDF
The dual geometry of Shannon information
PDF
Divergence clustering
PDF
A sharp nonlinear Hausdorff-Young inequality for small potentials
PDF
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
PDF
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
PDF
Divergence center-based clustering and their applications
PDF
Approximation Methods Of Solutions For Equilibrium Problem In Hilbert Spaces
PDF
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
PDF
Building Compatible Bases on Graphs, Images, and Manifolds
PDF
Stochastic Control and Information Theoretic Dualities (Complete Version)
IVR - Chapter 1 - Introduction
Some Thoughts on Sampling
Wavelets and Other Adaptive Methods
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
Introduction to Fourier transform and signal analysis
Signal lexture
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Can we estimate a constant?
Testing for mixtures by seeking components
The dual geometry of Shannon information
Divergence clustering
A sharp nonlinear Hausdorff-Young inequality for small potentials
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
Divergence center-based clustering and their applications
Approximation Methods Of Solutions For Equilibrium Problem In Hilbert Spaces
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
Building Compatible Bases on Graphs, Images, and Manifolds
Stochastic Control and Information Theoretic Dualities (Complete Version)
Ad

Recently uploaded (20)

PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
An interstellar mission to test astrophysical black holes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Sciences of Europe No 170 (2025)
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
Introcution to Microbes Burton's Biology for the Health
PPT
veterinary parasitology ````````````.ppt
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
Microbes in human welfare class 12 .pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
The Land of Punt — A research by Dhani Irwanto
CORDINATION COMPOUND AND ITS APPLICATIONS
6.1 High Risk New Born. Padetric health ppt
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
An interstellar mission to test astrophysical black holes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Biophysics 2.pdffffffffffffffffffffffffff
Science Quipper for lesson in grade 8 Matatag Curriculum
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Sciences of Europe No 170 (2025)
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Introcution to Microbes Burton's Biology for the Health
veterinary parasitology ````````````.ppt
. Radiology Case Scenariosssssssssssssss
BODY FLUIDS AND CIRCULATION class 11 .pptx
Microbes in human welfare class 12 .pptx
Application of enzymes in medicine (2).pptx
Hypertension_Training_materials_English_2024[1] (1).pptx
Fluid dynamics vivavoce presentation of prakash
The Land of Punt — A research by Dhani Irwanto

Harmonic Analysis and Deep Learning

  • 2. In this talk… Mathematical theory about filter, activation, pooling through multi-layers based on DCNN Encompass general ingredients Lipschitz continuity & Deformation sensitivity WARNING : Very tough mathematics …without non-Euclidean geometry (e.g. Geometric DL)
  • 3. What is Harmonic Analysis? f(x)= X n2N an n(x), an := hf, niH How to represent a function efficiently in the sense of Hilbert space? Number theory Signal processing Quantum mechanics Neuroscience, Statistics, Finance, etc… Includes PDE theory, Stochastic Analysis
  • 4. What is Harmonic Analysis? f(x)= X n2N an n(x), an := hf, niH How to represent a function efficiently in the sense of Hilbert space? Number theory Signal processing Quantum mechanics Neuroscience, Statistics, Finance, etc… Includes PDE theory, Stochastic Analysis
  • 5. Hilbert space & Inner product Banach space : Hilbert space : © Kyung-Min Rho
  • 6. Hilbert space & Inner product © Kyung-Min Rho Banach space : Normed space + Completeness Hilbert space :
  • 7. Banach space : Normed space + Completeness Hilbert space : Banach space + Inner product Hilbert space & Inner product © Kyung-Min Rho
  • 8. Banach space : Normed space + Completeness Hilbert space : Banach space + Inner product Rd , L2, Wn 2 , · · · Hilbert space & Inner product Cn , Lp, Wn p · · · © Kyung-Min Rho
  • 9. Banach space : Normed space + Completeness Hilbert space : Banach space + Inner product Rd , L2, Wn 2 , · · · hu, vi = dX k=1 ukvk hf, giL2 = Z f(x)g(x)dx hf, giW n 2 = hf, giL2 + nX k=1 h@k xf, @k xgiL2 Hilbert space & Inner product Cn , Lp, Wn p · · · © Kyung-Min Rho
  • 10. Why Harmonic Analysis? Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0
  • 11. Why Harmonic Analysis? Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0 (an, an 1, . . . , a1 , a0) Encoding
  • 12. Why Harmonic Analysis? Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0 (an, an 1, . . . , a1 , a0) Encoding Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0 Decoding
  • 13. Why Harmonic Analysis? Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0 (an, an 1, . . . , a1 , a0) Encoding Pn(x) = anxn + an 1xn 1 + · · · + a1x + a0 Decoding Why we prefer polynomial?
  • 14. Stone-Weierstrass theorem Polynomial is Universal approximation! 8f 2 C(X), 8" > 0, 9Pn s.t. max x2X |f(x) Pn(x)| < " © Wikipedia
  • 15. 8f 2 C(X), 9Pn s.t. lim n!1 kf Pnk1 = 0 Stone-Weierstrass theorem Polynomial is Universal approximation! © Wikipedia
  • 16. Stone-Weierstrass theorem Even we can approximate derivatives! 9Pn s.t. lim n!1 kf PnkCn ! 0 Polynomial is Universal approximation! 8f 2 Ck (X), © Wikipedia
  • 17. Stone-Weierstrass theorem Even we can approximate derivatives! Universal approximation = {DL, polynomials, Tree,…} Polynomial is Universal approximation! 9Pn s.t. lim n!1 kf PnkCn ! 0 8f 2 Ck (X), © Wikipedia
  • 18. Stone-Weierstrass theorem Even we can approximate derivatives! Universal approximation = {DL, polynomials, Tree,…} But why we do not use polynomial? Polynomial is Universal approximation! 9Pn s.t. lim n!1 kf PnkCn ! 0 8f 2 Ck (X), © Wikipedia
  • 19. Local interpolation works well for low dimension © S. Mallat
  • 20. Local interpolation works well for low dimension Need " d points to cover [0, 1]d at a distance " © S. Mallat
  • 21. Local interpolation works well for low dimension Need " d points to cover [0, 1]d at a distance " High dimension ⇢ Curse of dimension! © H. Bölcskei
  • 22. Universal approximator = Good feature extractor ?
  • 23. Universal approximator = Good feature extractor …in HIGH dimension!
  • 24. Nonlinear Feature Extraction © S. Mallat, © H. Bölcskei
  • 25. Dimension Reduction ⇢ Invariants © S. Mallat
  • 26. Dimension Reduction ⇢ Invariants How? © S. Mallat
  • 27. Main Topic in Harmonic Analysis Linear operator ⇢ Convolution + Multiplier Invariance vs Discriminability L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!)
  • 28. Main Topic in Harmonic Analysis L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!) Linear operator ⇢ Convolution + Multiplier Invariance vs Discriminability
  • 29. Main Topic in Harmonic Analysis L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!) Linear operator ⇢ Convolution + Multiplier Discriminability vs Invariance Littlewood-Paley Condition ⇢ Semi-discrete Frame AkfkH  kL[f]kH  BkfkH
  • 30. Main Topic in Harmonic Analysis L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!) AkfkH  kL[f]kH  BkfkH Linear operator ⇢ Convolution + Multiplier Discriminability vs Invariance Littlewood-Paley Condition ⇢ Semi-discrete Frame kL[f1] L[f2]kH = kL[f1 f2]kH Akf1 f2kH i.e. f1 6= f2 ) L[f1] 6= L[f2]
  • 31. Main Topic in Harmonic Analysis L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!) AkfkH  kL[f]kH  BkfkH Linear operator ⇢ Convolution + Multiplier Discriminability vs Invariance Littlewood-Paley Condition ⇢ Semi-discrete Frame k L · · · L| {z } n-fold [f]kH  Bk L · · · L| {z } (n-1)-fold [f]kH  · · ·  Bn kfkH
  • 32. Main Topic in Harmonic Analysis L[f](x) = hTx[K], fi () dL[f](!) = bK(!) bf(!) AkfkH  kL[f]kH  BkfkH Linear operator ⇢ Convolution + Multiplier Discriminability vs Invariance Littlewood-Paley Condition ⇢ Semi-discrete Frame k L · · · L| {z } n-fold [f]kH  Bk L · · · L| {z } (n-1)-fold [f]kH  · · ·  Bn kfkH Banach fixed-point theorem
  • 33. Main Tasks in Deep CNN Representation learning Feature Extraction Nonlinear transform
  • 34. Main Tasks in Deep CNN Representation learning Feature Extraction Nonlinear transform
  • 35. Main Tasks in Deep CNN Representation learning Feature Extraction Nonlinear transform Lipschitz continuity ex) ReLU, tanh, sigmoid … |f(x) f(y)|  Ckx yk () krf(x)k  C
  • 36. How to control Lipschitz ? k⇢(L[f])kH  N(B, C)kfkH Theorem No change in Invariance!
  • 37. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem How to control Lipschitz ?
  • 38. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ?
  • 39. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ?
  • 40. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ?
  • 41. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ?
  • 42. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ?
  • 43. k⇢(L[f])kH  N(B, C)kfkH Proof) No change in Invariance! Let ⇢ = ReLU, H = W1 2 . Then Theorem k⇢(L[f])kW 1 2 = k max{L[f], 0}kL2 + kr⇢(L[f])kL2  kL[f]kL2 + k ⇢0 (L[f]) | {z } =1 or 0 r(L[f])kL2  kL[f]kL2 + kr(L[f])kL2 = kL[f]kW 1 2  BkfkW 1 2 How to control Lipschitz ? What about Discriminability?
  • 44. Scale Invariant Feature Translation Invariant Stable at Deformation © S. Mallat
  • 45. Scale Invariant Feature Translation Invariant Stable at Deformation
  • 46. Scattering Network (Mallat, 2012) (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) © H. Bölcskei
  • 47. Generalized Scattering Network (Wiatowski, 2015) (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) Gabor frame Tensor wavelet Directional wavelet Ridgelet frame Curvelet frame © H. Bölcskei
  • 48. Generalized Scattering Network (Wiatowski, 2015) (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) © S. Mallat
  • 49. Generalized Scattering Network (Wiatowski, 2015) (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) Linearize symmetries © S. Mallat
  • 50. Generalized Scattering Network (Wiatowski, 2015) (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) Linearize symmetries “Space folding”, Cho (2014) © S. Mallat
  • 51. (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) f 7! Sd/2 n Pn(f)(Sn·) |k n(Ttf) n(f)|k = O ktk Qn j=1 Sj ! Theorem Generalized Scattering Network (Wiatowski, 2015)
  • 52. f 7! Sd/2 n Pn(f)(Sn·) |k n(Ttf) n(f)|k = O ktk Qn j=1 Sj ! Theorem Features become more translation invariant with increasing network depth Generalized Scattering Network (Wiatowski, 2015)
  • 53. Generalized Scattering Network (Wiatowski, 2015) © Philip Scott Johnson (f) = [ n ( · · · |f ⇤ g (j) | ⇤ g (k) · · · ⇤ g (p) | {z } n-fold convolution ⇤ n ) (j),··· , (p) Theorem F⌧,! = e2⇡i!(x) f(x ⌧(x)) |k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2
  • 54. Generalized Scattering Network (Wiatowski, 2015) © Philip Scott Johnson Theorem F⌧,! = e2⇡i!(x) f(x ⌧(x)) |k (F⌧,![f]) (f)k|  C(k⌧k1 + k!k1)kfkL2 Multi-layer convolution linearize Features i.e. stable to deformations
  • 55. Generalized Scattering Network (Wiatowski, 2015) © Philip Scott Johnson
  • 56. Ergodic Reconstructions © Philip Scott Johnson © S. Mallat
  • 57. David Hilbert Wir müssen wissen. Wir werden wissen.
  • 58. Q.A