Deep Learning Theory Seminar (Chap 3, part 2)

Deep Learning Theory Lecture Note
Chapter 3 (part 2)
2022.04.06.
KAIST ALIN-LAB
Sangwoo Mo
1

• Maurey sampling technique
• Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆
• Finite-sample approx. of 𝑋 is &
𝑋 =
!
"
∑#$!
"
𝑉# for 𝑉# iid sampled from 𝑝(𝑉)
• Here, &
𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − &
𝑋 = 𝑂(1/𝑘))
• It is very intuitive… let’s prove it!
(3.3) How to sample finite networks?
2
𝑉 is on a Hilbert space (i.e., has a inner product)

• Formal statement
3
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞

• Formal statement
4
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
(1) We bound the 𝔼 form
(2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾,
there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾
(we need only one realization of 𝑈!, … , 𝑈" that satisfies
!
"
∑# 𝑈# ≈ 𝑋)
This technique is called “probabilistic method”!

• Proof of Lemma 3.1
5
𝔼𝑉#𝔼𝑉
$ − 𝔼𝑉#𝑋 − 𝔼𝑉
$𝑋 + 𝑋 %
= 𝑋 %
− 𝑋 %
− 𝑋 %
+ 𝑋 %
= 0
𝑉 %
− 2𝔼𝑉𝑋 + 𝑋 %
= 𝑉 %
− 2 𝑋 %
+ 𝑋 %
= 𝑉 %
− 𝑋 %

• Sampling finite-width network
• Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1
• However, our “weight distribution” of infinite-width NN is not a probability!
• Our infinite-width NN
• The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1
• Q. How to extend Maurey sampling for general weight distribution?
6

• For simplicity, let the infinite-width NN be
where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ%
• 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏})
• To convert the general signed measure 𝜇 to a probability measure,
1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇±
• For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition)
• Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 )
2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1
• Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠)
• After the conversion, one can extend Maurey sampling for general signed measure
7
‖ Infinite NN – Finite NN ‖

• Applying Maurey sampling, the approx. error of finite-width NN is
• (3.1) Univariate case
• (3.2) Baron’s construction
8
Approx. error ≤
Approx. error ≤

9
Thank you for listening! 😀

Deep Learning Theory Seminar (Chap 3, part 2)

More Related Content

What's hot (20)

Similar to Deep Learning Theory Seminar (Chap 3, part 2) (20)

More from Sangwoo Mo (20)

Recently uploaded (20)

Deep Learning Theory Seminar (Chap 3, part 2)