SlideShare a Scribd company logo
Deep Learning Theory Lecture Note
Chapter 3 (part 2)
2022.04.06.
KAIST ALIN-LAB
Sangwoo Mo
1
• Maurey sampling technique
• Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆
• Finite-sample approx. of 𝑋 is &
𝑋 =
!
"
∑#$!
"
𝑉# for 𝑉# iid sampled from 𝑝(𝑉)
• Here, &
𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − &
𝑋 = 𝑂(1/𝑘))
• It is very intuitive… let’s prove it!
(3.3) How to sample finite networks?
2
𝑉 is on a Hilbert space (i.e., has a inner product)
• Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
3
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
• Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
4
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
(1) We bound the 𝔼 form
(2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾,
there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾
(we need only one realization of 𝑈!, … , 𝑈" that satisfies
!
"
∑# 𝑈# ≈ 𝑋)
This technique is called “probabilistic method”!
• Maurey sampling technique
• Proof of Lemma 3.1
(3.3) How to sample finite networks?
5
𝔼𝑉#𝔼𝑉
$ − 𝔼𝑉#𝑋 − 𝔼𝑉
$𝑋 + 𝑋 %
= 𝑋 %
− 𝑋 %
− 𝑋 %
+ 𝑋 %
= 0
𝑉 %
− 2𝔼𝑉𝑋 + 𝑋 %
= 𝑉 %
− 2 𝑋 %
+ 𝑋 %
= 𝑉 %
− 𝑋 %
• Sampling finite-width network
• Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1
• However, our “weight distribution” of infinite-width NN is not a probability!
• Our infinite-width NN
• The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1
• Q. How to extend Maurey sampling for general weight distribution?
(3.3) How to sample finite networks?
6
• Sampling finite-width network
• For simplicity, let the infinite-width NN be
where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ%
• 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏})
• To convert the general signed measure 𝜇 to a probability measure,
1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇±
• For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition)
• Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 )
2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1
• Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠)
• After the conversion, one can extend Maurey sampling for general signed measure
(3.3) How to sample finite networks?
7
‖ Infinite NN – Finite NN ‖
• Sampling finite-width network
• Applying Maurey sampling, the approx. error of finite-width NN is
• (3.1) Univariate case
• (3.2) Baron’s construction
(3.3) How to sample finite networks?
8
Approx. error ≤
Approx. error ≤
9
Thank you for listening! 😀

More Related Content

PDF
Deep Learning Theory Seminar (Chap 1-2, part 1)
PPTX
強化学習3章
PDF
1 Million Writes per second on 60 nodes with Cassandra and EBS
PDF
Learning Theory 101 ...and Towards Learning the Flat Minima
PDF
Sharpness-aware minimization (SAM)
PDF
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
PDF
Deep State Space Models for Time Series Forecasting の紹介
PDF
最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)
Deep Learning Theory Seminar (Chap 1-2, part 1)
強化学習3章
1 Million Writes per second on 60 nodes with Cassandra and EBS
Learning Theory 101 ...and Towards Learning the Flat Minima
Sharpness-aware minimization (SAM)
ペアリングベースの効率的なレベル2準同型暗号(SCIS2018)
Deep State Space Models for Time Series Forecasting の紹介
最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)

What's hot (20)

PPTX
数理最適化と機械学習の 融合アプローチ -分類と新しい枠組み-
PDF
Word2vecの並列実行時の学習速度の改善
PDF
Recurrent Neural Networks. Part 1: Theory
PDF
Reinforcement Learning @ NeurIPS2018
PDF
[DL輪読会]Attention is not Explanation (NAACL2019)
PDF
[DL輪読会]Deep Learning 第20章 深層生成モデル
PDF
PRML4.3.3
PPTX
ResNetの仕組み
PDF
連続最適化勉強会
PPTX
RedisConf17- Using Redis at scale @ Twitter
PDF
スペクトラル・クラスタリング
PPTX
医用画像における解剖学的ランドマークの検出、定義および応用
PPTX
Angle-Based Outlier Detection周辺の論文紹介
PPTX
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
PDF
自然言語処理で読み解く金融文書
PDF
多数のグラフからの統計的機械学習 (2014.7.24 人工知能学会 第94回人工知能基本問題研究会 招待講演)
PDF
Polyglot persistence @ netflix (CDE Meetup)
PPTX
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
PDF
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
数理最適化と機械学習の 融合アプローチ -分類と新しい枠組み-
Word2vecの並列実行時の学習速度の改善
Recurrent Neural Networks. Part 1: Theory
Reinforcement Learning @ NeurIPS2018
[DL輪読会]Attention is not Explanation (NAACL2019)
[DL輪読会]Deep Learning 第20章 深層生成モデル
PRML4.3.3
ResNetの仕組み
連続最適化勉強会
RedisConf17- Using Redis at scale @ Twitter
スペクトラル・クラスタリング
医用画像における解剖学的ランドマークの検出、定義および応用
Angle-Based Outlier Detection周辺の論文紹介
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
自然言語処理で読み解く金融文書
多数のグラフからの統計的機械学習 (2014.7.24 人工知能学会 第94回人工知能基本問題研究会 招待講演)
Polyglot persistence @ netflix (CDE Meetup)
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
Ad

Similar to Deep Learning Theory Seminar (Chap 3, part 2) (20)

PPTX
Elementary statistical inference1
PPTX
Efficient anomaly detection via matrix sketching
PDF
Distributional RL via Moment Matching
PDF
Average Sensitivity of Graph Algorithms
PPTX
Elements of Statistical Learning 読み会 第2章
PPTX
Av 738- Adaptive Filtering - Background Material
PDF
A compact zero knowledge proof to restrict message space in homomorphic encry...
PDF
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
PDF
Paper study: Attention, learn to solve routing problems!
PPTX
Bayesian Neural Networks
PDF
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
PDF
Score based Generative Modeling through Stochastic Differential Equations
PDF
Markov Chain Basic
PDF
Paper study: Learning to solve circuit sat
PPT
Lecture1
PDF
Probabilistic Models of Time Series and Sequences
PDF
Domain adaptation: A Theoretical View
PPTX
Fa18_P1.pptx
PDF
is anyone_interest_in_auto-encoding_variational-bayes
PDF
Probability Theory Application and statitics
Elementary statistical inference1
Efficient anomaly detection via matrix sketching
Distributional RL via Moment Matching
Average Sensitivity of Graph Algorithms
Elements of Statistical Learning 読み会 第2章
Av 738- Adaptive Filtering - Background Material
A compact zero knowledge proof to restrict message space in homomorphic encry...
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
Paper study: Attention, learn to solve routing problems!
Bayesian Neural Networks
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
Score based Generative Modeling through Stochastic Differential Equations
Markov Chain Basic
Paper study: Learning to solve circuit sat
Lecture1
Probabilistic Models of Time Series and Sequences
Domain adaptation: A Theoretical View
Fa18_P1.pptx
is anyone_interest_in_auto-encoding_variational-bayes
Probability Theory Application and statitics
Ad

More from Sangwoo Mo (20)

PDF
Brief History of Visual Representation Learning
PDF
Learning Visual Representations from Uncurated Data
PDF
Hyperbolic Deep Reinforcement Learning
PDF
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
PDF
Self-supervised Learning Lecture Note
PDF
Introduction to Diffusion Models
PDF
Object-Region Video Transformers
PDF
Deep Implicit Layers: Learning Structured Problems with Neural Networks
PDF
Explicit Density Models
PDF
Score-Based Generative Modeling through Stochastic Differential Equations
PDF
Self-Attention with Linear Complexity
PDF
Meta-Learning with Implicit Gradients
PDF
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
PDF
Generative Models for General Audiences
PDF
Bayesian Model-Agnostic Meta-Learning
PDF
Deep Learning for Natural Language Processing
PDF
Domain Transfer and Adaptation Survey
PDF
Neural Processes
PDF
Improved Trainings of Wasserstein GANs (WGAN-GP)
PDF
Recursive Neural Networks
Brief History of Visual Representation Learning
Learning Visual Representations from Uncurated Data
Hyperbolic Deep Reinforcement Learning
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Self-supervised Learning Lecture Note
Introduction to Diffusion Models
Object-Region Video Transformers
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Explicit Density Models
Score-Based Generative Modeling through Stochastic Differential Equations
Self-Attention with Linear Complexity
Meta-Learning with Implicit Gradients
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Generative Models for General Audiences
Bayesian Model-Agnostic Meta-Learning
Deep Learning for Natural Language Processing
Domain Transfer and Adaptation Survey
Neural Processes
Improved Trainings of Wasserstein GANs (WGAN-GP)
Recursive Neural Networks

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Deep Learning Theory Seminar (Chap 3, part 2)

  • 1. Deep Learning Theory Lecture Note Chapter 3 (part 2) 2022.04.06. KAIST ALIN-LAB Sangwoo Mo 1
  • 2. • Maurey sampling technique • Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆 • Finite-sample approx. of 𝑋 is & 𝑋 = ! " ∑#$! " 𝑉# for 𝑉# iid sampled from 𝑝(𝑉) • Here, & 𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − & 𝑋 = 𝑂(1/𝑘)) • It is very intuitive… let’s prove it! (3.3) How to sample finite networks? 2 𝑉 is on a Hilbert space (i.e., has a inner product)
  • 3. • Maurey sampling technique • Formal statement (3.3) How to sample finite networks? 3 = 𝑂(1/𝑘), goes to zero as 𝑘 → ∞
  • 4. • Maurey sampling technique • Formal statement (3.3) How to sample finite networks? 4 = 𝑂(1/𝑘), goes to zero as 𝑘 → ∞ (1) We bound the 𝔼 form (2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾, there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾 (we need only one realization of 𝑈!, … , 𝑈" that satisfies ! " ∑# 𝑈# ≈ 𝑋) This technique is called “probabilistic method”!
  • 5. • Maurey sampling technique • Proof of Lemma 3.1 (3.3) How to sample finite networks? 5 𝔼𝑉#𝔼𝑉 $ − 𝔼𝑉#𝑋 − 𝔼𝑉 $𝑋 + 𝑋 % = 𝑋 % − 𝑋 % − 𝑋 % + 𝑋 % = 0 𝑉 % − 2𝔼𝑉𝑋 + 𝑋 % = 𝑉 % − 2 𝑋 % + 𝑋 % = 𝑉 % − 𝑋 %
  • 6. • Sampling finite-width network • Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1 • However, our “weight distribution” of infinite-width NN is not a probability! • Our infinite-width NN • The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1 • Q. How to extend Maurey sampling for general weight distribution? (3.3) How to sample finite networks? 6
  • 7. • Sampling finite-width network • For simplicity, let the infinite-width NN be where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ% • 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏}) • To convert the general signed measure 𝜇 to a probability measure, 1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇± • For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition) • Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 ) 2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1 • Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠) • After the conversion, one can extend Maurey sampling for general signed measure (3.3) How to sample finite networks? 7 ‖ Infinite NN – Finite NN ‖
  • 8. • Sampling finite-width network • Applying Maurey sampling, the approx. error of finite-width NN is • (3.1) Univariate case • (3.2) Baron’s construction (3.3) How to sample finite networks? 8 Approx. error ≤ Approx. error ≤
  • 9. 9 Thank you for listening! 😀