SlideShare a Scribd company logo
ベイズ深層学習
4.2 ∼ 4.6
Variational Bayes
DKL = [q(Z; ξ) ∥ p(Z|X)]
= −
∫
q(Z; ξ)ln
p(Z|X)
q(Z; ξ)
dZ = −
∫
q(Z; ξ)ln
p(Z, X)
q(Z; ξ)p(X)
dZ
= ln p(X) −
∫
q(Z; ξ)ln
p(Z, X)
q(Z; ξ)
dZ = ln p(X) − ℒ(ξ)
• Maximize ELBO Minimize KL-divergence

• Normalization constant not required for computing ELBO
DKL(q ∥ p) ≥ 0 ⇒ ln p(X) ≥ ℒ(ξ) Evidence lower bound (ELBO)
Linear dimensionality reduction
p(X ∣ Z, W) =
N
∏
n=1
p(xn |zn, W) =
N
∏
n=1
𝒩 (xn ∣ Wzn, σ2
x I)
Observation model
Prior
p(Z) =
N
∏
n=1
𝒩 (zn ∣ 0, I) p(W) =
∏
i,j
𝒩 (wij ∣ 0,σ2
w)
Variational posterior
p(Z, W ∣ X) ≈ q(Z)q(W)
ℒ = Eq(Z)q(W) [ln p(X|Z, W)] − DKL [q(Z) ∥ p(Z)] − DKL [q(W) ∥ p(W)]
Likelihood Regularizer
Objective
ℒqi(Z) = Eqi(Z)qi+1(W) [ln p(X ∣ Z, W)] − DKL [qi+1(W) ∥ p(W)] + const .
= Eqi+1(W) ln
exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W)
qi+1(W)
+ const .
= DKL [qi+1(W) ∥ ri(W)] + const .
∴ qi+1(W) = ri(W) ∝ exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W)
Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W)
Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W)
qi+1(Z) = ri(Z) ∝ exp (Eqi(W) [ln p(X ∣ Z, W)]) p(Z)
Gaussian mixture model
p(X ∣ S, W) =
N
∏
n=1
p(xn |sn, W) =
N
∏
n=1
𝒩 (xn ∣ Wsn, σ2
x I)
Observation model
sn ∈ {0,1}K
,
K
∑
k=1
sn,k = 1
Prior
Variational posterior
p(S) =
N
∏
n=1
cat (sn ∣ π)
p(S, W ∣ X) ≈ q(S)q(W)
Laplace approximation
p(Z ∣ X) ≈ 𝒩 (Z ∣ ZMAP, Λ(ZMAP))
Quadratic approximation of posterior around
Λ(Z) = − ∇2
Zln p (Z ∣ X)
∵ ln p(Z ∣ X) ≈ ln p (ZMAP ∣ X)+
(Z − ZMAP)
⊤
∇2
Zln p (Z|X)
Z=ZMAP
(Z − ZMAP)
ZMAP
∵ ∇Zln p(Z|X)
Z=ZMAP
= 0
Moment matching
q(z; η) = h(z)exp (η⊤
t(z) − a(η))Approximate byp(z)
DKL (p(z) ∥ q(z; η)) = − Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)]
= − ηEp(z) [t(z)] + a(η) + const .
∴ Eq(z;η) [t(z)] = Ep(z) [t(z)]
∇ηDKL (p(z) ∥ q(z; η)) = − Ep(z) [t(z)] + ∇ηa(η)
= − Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
Assumed density filtering
qi+1(θ) ≈ ri+1(θ) = Z−1
i+1p(𝒟i+1 ∣ θ)qi(θ) = Z−1
i+1fi+1(θ)qi(θ)
With conjugate prior, are the same familypi(θ)(i = 0,1,⋯)
pi+1(θ) = Z−1
i+1p(𝒟i+1 ∣ θ)pi(θ)
With non-conjugate prior, …
Consider estimation for sequence of data
𝒟1, 𝒟2, ⋯
(q0(θ) = p0(θ))
Moment matching
MM with 1-dim Gaussian distribution
qi(θ) = 𝒩 (θ ∣ μi, vi)
Differentiate w.r.t.ln Zi+1
μi
Normalization constant Zi+1 =
∫
fi+1(θ)
1
2πv2
i
exp
(
−
(θ − μi)2
2v2
i )
dθ
∂
∂μi
ln Zi+1 =
1
Zi+1
∫
fi+1(θ)𝒩(θ ∣ μi, vi)
θ − μi
vi
dθ =
Eri+1 [θ] − μi
vi
∴ μi+1 = Eri+1 [θ] = μi + vi
∂
∂μi
ln Zi+1
Differentiate w.r.t.ln Zi+1 vi
MM with Gamma distribution
MM for probit regression
Marginal likelihood is intractable
p(Y ∣ X, w) =
N
∏
n=1
ϕ(yn ∣ xn, w) p(w) = 𝒩 (w ∣ 0,v0)
Z =
∫
p(Y ∣ X, w)p(w)dw
Instead, apply recursive update qi+1(θ) = Z−1
i+1p(𝒟i+1 ∣ θ)qi(θ)
Expectation propagation

More Related Content

PDF
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
PDF
Improved Trainings of Wasserstein GANs (WGAN-GP)
PPT
SPLITTING FIELD.ppt
PDF
boyd 3.1
PDF
QMC: Operator Splitting Workshop, Open Problems - Heinz Bauschke, Mar 23, 2018
PDF
ガウス過程入門
PDF
8803-09-lec16.pdf
PDF
HARMONIC ANALYSIS ASSOCIATED WITH A GENERALIZED BESSEL-STRUVE OPERATOR ON THE...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
Improved Trainings of Wasserstein GANs (WGAN-GP)
SPLITTING FIELD.ppt
boyd 3.1
QMC: Operator Splitting Workshop, Open Problems - Heinz Bauschke, Mar 23, 2018
ガウス過程入門
8803-09-lec16.pdf
HARMONIC ANALYSIS ASSOCIATED WITH A GENERALIZED BESSEL-STRUVE OPERATOR ON THE...

What's hot (20)

PDF
Decision Making with Hierarchical Credal Sets (IPMU 2014)
PDF
Formal methods 8 - category theory (last one)
PDF
Parallel Wavelet Schemes for Images
PDF
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
PDF
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
PDF
Darmon Points: an Overview
PDF
Control Synthesis by Sum of Squares Optimization
PDF
Particle Collision near 1+1- D Horava-Lifshitz Black Holes (Karl Schwarzschil...
PDF
Quadrature
PPT
Newton-Raphson Iteration marths 4 ntsm
PDF
Control as Inference (強化学習とベイズ統計)
PDF
Convergence methods for approximated reciprocal and reciprocal-square-root
PDF
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
PDF
Coordinate sampler: A non-reversible Gibbs-like sampler
PDF
Slides: The Burbea-Rao and Bhattacharyya centroids
PDF
Loss Calibrated Variational Inference
PDF
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
PDF
Coordinate sampler : A non-reversible Gibbs-like sampler
PDF
On the Jensen-Shannon symmetrization of distances relying on abstract means
PDF
2 4 the_smith_chart_package
Decision Making with Hierarchical Credal Sets (IPMU 2014)
Formal methods 8 - category theory (last one)
Parallel Wavelet Schemes for Images
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
Darmon Points: an Overview
Control Synthesis by Sum of Squares Optimization
Particle Collision near 1+1- D Horava-Lifshitz Black Holes (Karl Schwarzschil...
Quadrature
Newton-Raphson Iteration marths 4 ntsm
Control as Inference (強化学習とベイズ統計)
Convergence methods for approximated reciprocal and reciprocal-square-root
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
Coordinate sampler: A non-reversible Gibbs-like sampler
Slides: The Burbea-Rao and Bhattacharyya centroids
Loss Calibrated Variational Inference
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Coordinate sampler : A non-reversible Gibbs-like sampler
On the Jensen-Shannon symmetrization of distances relying on abstract means
2 4 the_smith_chart_package
Ad

Similar to 20191026 bayes dl (20)

PDF
Machine Learning With MapReduce, K-Means, MLE
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Minimum mean square error estimation and approximation of the Bayesian update
PDF
Expectation propagation
ODP
Explaining the Basics of Mean Field Variational Approximation for Statisticians
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
A Note on Correlated Topic Models
PDF
Variational inference
PDF
Toward Disentanglement through Understand ELBO
PPTX
NICE Implementations of Variational Inference
PPTX
NICE Research -Variational inference project
PDF
A Note on PCVB0 for HDP-LDA
PDF
Cs229 notes8
PDF
Deep generative model.pdf
PDF
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
PDF
Introduction to modern Variational Inference.
PPTX
STAN_MS_PPT.pptx
PDF
SAS Homework Help
PDF
Poisson factorization
PDF
Approximate Inference (Chapter 10, PRML Reading)
Machine Learning With MapReduce, K-Means, MLE
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Minimum mean square error estimation and approximation of the Bayesian update
Expectation propagation
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Maximum likelihood estimation of regularisation parameters in inverse problem...
A Note on Correlated Topic Models
Variational inference
Toward Disentanglement through Understand ELBO
NICE Implementations of Variational Inference
NICE Research -Variational inference project
A Note on PCVB0 for HDP-LDA
Cs229 notes8
Deep generative model.pdf
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
Introduction to modern Variational Inference.
STAN_MS_PPT.pptx
SAS Homework Help
Poisson factorization
Approximate Inference (Chapter 10, PRML Reading)
Ad

More from Taku Yoshioka (9)

PDF
20191123 bayes dl-jp
PDF
20191019 sinkhorn
PDF
20181221 q-trader
PDF
20181125 pybullet
PDF
20180722 pyro
PDF
20171207 domain-adaptation
PDF
20171025 pp-in-robotics
PDF
20160611 pymc3-latent
PDF
自動微分変分ベイズ法の紹介
20191123 bayes dl-jp
20191019 sinkhorn
20181221 q-trader
20181125 pybullet
20180722 pyro
20171207 domain-adaptation
20171025 pp-in-robotics
20160611 pymc3-latent
自動微分変分ベイズ法の紹介

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
737-MAX_SRG.pdf student reference guides
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Well-logging-methods_new................
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
Foundation to blockchain - A guide to Blockchain Tech
Embodied AI: Ushering in the Next Era of Intelligent Systems
737-MAX_SRG.pdf student reference guides
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Internet of Things (IOT) - A guide to understanding
Well-logging-methods_new................
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Fundamentals of Mechanical Engineering.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Categorization of Factors Affecting Classification Algorithms Selection
Mechanical Engineering MATERIALS Selection
Fundamentals of safety and accident prevention -final (1).pptx

20191026 bayes dl

  • 2. Variational Bayes DKL = [q(Z; ξ) ∥ p(Z|X)] = − ∫ q(Z; ξ)ln p(Z|X) q(Z; ξ) dZ = − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ)p(X) dZ = ln p(X) − ∫ q(Z; ξ)ln p(Z, X) q(Z; ξ) dZ = ln p(X) − ℒ(ξ) • Maximize ELBO Minimize KL-divergence • Normalization constant not required for computing ELBO DKL(q ∥ p) ≥ 0 ⇒ ln p(X) ≥ ℒ(ξ) Evidence lower bound (ELBO)
  • 3. Linear dimensionality reduction p(X ∣ Z, W) = N ∏ n=1 p(xn |zn, W) = N ∏ n=1 𝒩 (xn ∣ Wzn, σ2 x I) Observation model Prior p(Z) = N ∏ n=1 𝒩 (zn ∣ 0, I) p(W) = ∏ i,j 𝒩 (wij ∣ 0,σ2 w) Variational posterior p(Z, W ∣ X) ≈ q(Z)q(W)
  • 4. ℒ = Eq(Z)q(W) [ln p(X|Z, W)] − DKL [q(Z) ∥ p(Z)] − DKL [q(W) ∥ p(W)] Likelihood Regularizer Objective ℒqi(Z) = Eqi(Z)qi+1(W) [ln p(X ∣ Z, W)] − DKL [qi+1(W) ∥ p(W)] + const . = Eqi+1(W) ln exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) qi+1(W) + const . = DKL [qi+1(W) ∥ ri(W)] + const . ∴ qi+1(W) = ri(W) ∝ exp (Eqi(Z) [ln p(X ∣ Z, W)]) p(W) Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W)
  • 5. Variational M-step: maximize w.r.t.ℒqi(Z) qi+1(W) qi+1(Z) = ri(Z) ∝ exp (Eqi(W) [ln p(X ∣ Z, W)]) p(Z)
  • 6. Gaussian mixture model p(X ∣ S, W) = N ∏ n=1 p(xn |sn, W) = N ∏ n=1 𝒩 (xn ∣ Wsn, σ2 x I) Observation model sn ∈ {0,1}K , K ∑ k=1 sn,k = 1 Prior Variational posterior p(S) = N ∏ n=1 cat (sn ∣ π) p(S, W ∣ X) ≈ q(S)q(W)
  • 7. Laplace approximation p(Z ∣ X) ≈ 𝒩 (Z ∣ ZMAP, Λ(ZMAP)) Quadratic approximation of posterior around Λ(Z) = − ∇2 Zln p (Z ∣ X) ∵ ln p(Z ∣ X) ≈ ln p (ZMAP ∣ X)+ (Z − ZMAP) ⊤ ∇2 Zln p (Z|X) Z=ZMAP (Z − ZMAP) ZMAP ∵ ∇Zln p(Z|X) Z=ZMAP = 0
  • 8. Moment matching q(z; η) = h(z)exp (η⊤ t(z) − a(η))Approximate byp(z) DKL (p(z) ∥ q(z; η)) = − Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)] = − ηEp(z) [t(z)] + a(η) + const . ∴ Eq(z;η) [t(z)] = Ep(z) [t(z)] ∇ηDKL (p(z) ∥ q(z; η)) = − Ep(z) [t(z)] + ∇ηa(η) = − Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
  • 9. Assumed density filtering qi+1(θ) ≈ ri+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ) = Z−1 i+1fi+1(θ)qi(θ) With conjugate prior, are the same familypi(θ)(i = 0,1,⋯) pi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)pi(θ) With non-conjugate prior, … Consider estimation for sequence of data 𝒟1, 𝒟2, ⋯ (q0(θ) = p0(θ)) Moment matching
  • 10. MM with 1-dim Gaussian distribution qi(θ) = 𝒩 (θ ∣ μi, vi) Differentiate w.r.t.ln Zi+1 μi Normalization constant Zi+1 = ∫ fi+1(θ) 1 2πv2 i exp ( − (θ − μi)2 2v2 i ) dθ ∂ ∂μi ln Zi+1 = 1 Zi+1 ∫ fi+1(θ)𝒩(θ ∣ μi, vi) θ − μi vi dθ = Eri+1 [θ] − μi vi ∴ μi+1 = Eri+1 [θ] = μi + vi ∂ ∂μi ln Zi+1
  • 12. MM with Gamma distribution
  • 13. MM for probit regression Marginal likelihood is intractable p(Y ∣ X, w) = N ∏ n=1 ϕ(yn ∣ xn, w) p(w) = 𝒩 (w ∣ 0,v0) Z = ∫ p(Y ∣ X, w)p(w)dw Instead, apply recursive update qi+1(θ) = Z−1 i+1p(𝒟i+1 ∣ θ)qi(θ)