SlideShare a Scribd company logo
Paper Summary :
beta-VAE: Learning Basic Visual Concepts with a
Constrained Variational Framework
Jun-sik Choi
Department of Brain and Cognitive Engineering,
Korea University
November 9, 2019
Overview of beta-VAE [1]
β-VAE is an unsupervised for learning disentangled
representations of independent visual data generative factors.
β-VAE adds an extra hyperparameter β to the VAE objective,
which constricts the encoding capacity of the latent bottleneck
and encourages factorized latent representation.
A protocol that can quantitatively compare the degree of
disentanglement learnt by different models is proposed.
Derivation of beta-VAE framework I
Assumption
Let D = {X, V , W },
x ∈ RN
: images,
v ∈ RK
: conditionally independent factors,
w ∈ RH
: conditionally dependent factors
p(x|v, w) = Sim(v, w): true world simulator using ground truth
generative factors.
An unsupervised deep generative model pθ(x|z) can learn a joint
distribution between x and z ∈ RM
, (M ≥ K) by maximizing :
max
θ
Epθ(z) [pθ(x|z)] then p∗
θ (x|z) ≈ p(x|v, w) = Sim(v, w)
The aim is to ensure that the inference model qφ(z|x) capture the
independent generative factor v in a disentangled manner and keep
conditional generative factors remain entangled in a separate subset
of z.
Derivation of beta-VAE framework II
To encourage disentangling property of qφ(z|x),
1. the prior p(z) is set to an isotropic unit Gaussian N(0, I).
2. qφ(z|x) is constrained to match a prior p(z)
This constrained optimisation problem can be expressed as:
max
φ,θ
Ex∼D Eqφ(z|x) [log pθ(x|z)] subject to DKL (qφ(z|x) p(z)) <
After applying Lagrangian transformation under the KKT
conditions,
F(θ, φ, β; x, z) = Eqφ(z|x) [log pθ(x|z)] − β (DKL (qφ(z|x) p(z)) − )
≥ L(θ, φ; x, z, β)
= Eqφ(z|x) [log pθ(x|z)] − βDKL (qφ(z|x) p(z))
Derivation of beta-VAE framework III
Meaning of β
1. β changes the degree of applied learning pressure during
training, thus encouraging different learnt representations.
2. When β = 1, β-VAE corresponds to the original VAE
formulation.
3. Set β > 1 is putting a stronger constraint on the latent
bottleneck than in the original VAE formulation.
4. Pressure to match KL-divergence limit the capacity of z,
encourage the model to learn the most efficient representation
of the data (the disentangled representation by conditionally
independent factor v).
5. There is a trade-off between reconstruction fidelity and the
quality of disentanglement.
Disentanglement Metric I
The description of calculation of disentanglement in the paper
[1] is too complex, so I summarized it to a form of pseudocode.
Data: D = {V ∈ RK
, C ∈ RH
, X ∈ RN
}
lclf ; Linear classifier, q(z|x) ∼ N(µ(x), σ(x));
for b in Batch do
Sample yb
from Unif[1 · · · K];
for l in L do
Sample v1 from p(v) and Sample v2 from p(v);
[v2]yb ← [v1]yb ;
Sample c1 and c2 from p(c);
x1 ← Sim(v1, c1) and x2 ← Sim(v2, c2);
z1 ← µ(x1) and z2 ← µ(x2);
zl
diff ← |z1 − z2|;
end
zb
diff = 1
L ΣL
l zl
diff ;
Predb
= lclf (zb
diff );
end
Loss = ΣB
b CrossEntropy(Predb
, yb
);
Update lclf ;
Disentanglement Metric II
The linear classifier predict which generative factor [v]i is shared
along the pair of images.
As q(z|x) has disentangled representation, the performance of
classifier increases.
The linear classifier should be very simple and have a low
VC-dimension in order to ensure that it has no capacity to perform
nonlinear disentangling itself.
Qualitative Results - 3D chairs
Figure: Qualitative results comparing disentangling performance of
beta-VAE (beta = 5), and other comparing methods.
Qualitative Results - 3D faces
Figure: Qualitative results comparing disentangling performance of
beta-VAE (beta = 20), and other comparing methods.
Qualitative Results - CelebA
Figure: Traversal of individual latents demonstrates that beta-VAE
discovered.
Quantitative Results
Figure: (Left) Disentanglement metric classification accuracy for 2D
shapes dataset. (Right) Positive correlation between normalized beta and
size of latent variable for disentangled factor learning for a fixed
beta-VAE architecture.
Quantitative Results
Figure: Representations learnt by
beta-VAE (beta=4)
Figure: Representations learnt by
InfoGAN
References
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot,
M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae:
Learning basic visual concepts with a constrained variational
framework.,” ICLR, vol. 2, no. 5, p. 6, 2017.

More Related Content

PDF
Paper Summary of Disentangling by Factorising (Factor-VAE)
PDF
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
PDF
Toward Disentanglement through Understand ELBO
PPTX
Disentangled Representation Learning of Deep Generative Models
PDF
Ridge regression, lasso and elastic net
PDF
Wasserstein GAN 수학 이해하기 I
PDF
Generative adversarial networks
PPTX
Lecture 1 graphical models
Paper Summary of Disentangling by Factorising (Factor-VAE)
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Toward Disentanglement through Understand ELBO
Disentangled Representation Learning of Deep Generative Models
Ridge regression, lasso and elastic net
Wasserstein GAN 수학 이해하기 I
Generative adversarial networks
Lecture 1 graphical models

What's hot (20)

PDF
Gradient descent method
PDF
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PPTX
Generative Adversarial Networks (GAN)
PPTX
Variational Autoencoder Tutorial
PDF
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
PPTX
Computational learning theory
PDF
GANs and Applications
PPT
Primitive Recursive Functions
PDF
GAN - Theory and Applications
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
PPTX
Laplace Distribution | Statistics
PDF
Nonnegative Matrix Factorization
PPT
Discrete mathematics Ch2 Propositional Logic_Dr.khaled.Bakro د. خالد بكرو
PPTX
GANs Presentation.pptx
PDF
Latent Dirichlet Allocation
PDF
Deep Generative Models
PDF
Digital Image Processing - Image Enhancement
PDF
PR-409: Denoising Diffusion Probabilistic Models
PDF
Pixel RNN to Pixel CNN++
Gradient descent method
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Generative Adversarial Networks (GAN)
Variational Autoencoder Tutorial
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Computational learning theory
GANs and Applications
Primitive Recursive Functions
GAN - Theory and Applications
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Laplace Distribution | Statistics
Nonnegative Matrix Factorization
Discrete mathematics Ch2 Propositional Logic_Dr.khaled.Bakro د. خالد بكرو
GANs Presentation.pptx
Latent Dirichlet Allocation
Deep Generative Models
Digital Image Processing - Image Enhancement
PR-409: Denoising Diffusion Probabilistic Models
Pixel RNN to Pixel CNN++
Ad

Similar to Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (20)

PDF
GAN(と強化学習との関係)
PDF
Iclr2016 vaeまとめ
PDF
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
PDF
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
PPTX
Secure Domination in graphs
PDF
Presentation OCIP 2015
PDF
MLHEP 2015: Introductory Lecture #3
PDF
Epsrcws08 campbell isvm_01
PDF
Scalable Global Alignment Graph Kernel Using Random Features: From Node Embed...
PDF
Pattern baysin
PDF
Quantum Deep Learning
PDF
ABC workshop: 17w5025
PDF
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
PDF
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
PDF
MVPA with SpaceNet: sparse structured priors
PDF
VJAI Paper Reading#3-KDD2019-ClusterGCN
PDF
Dl1 deep learning_algorithms
PDF
On Convolution of Graph Signals and Deep Learning on Graph Domains
PDF
Anniversary2012
GAN(と強化学習との関係)
Iclr2016 vaeまとめ
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Secure Domination in graphs
Presentation OCIP 2015
MLHEP 2015: Introductory Lecture #3
Epsrcws08 campbell isvm_01
Scalable Global Alignment Graph Kernel Using Random Features: From Node Embed...
Pattern baysin
Quantum Deep Learning
ABC workshop: 17w5025
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
MVPA with SpaceNet: sparse structured priors
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dl1 deep learning_algorithms
On Convolution of Graph Signals and Deep Learning on Graph Domains
Anniversary2012
Ad

Recently uploaded (20)

PDF
Pre independence Education in Inndia.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Basic Mud Logging Guide for educational purpose
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Institutional Correction lecture only . . .
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Computing-Curriculum for Schools in Ghana
Pre independence Education in Inndia.pdf
01-Introduction-to-Information-Management.pdf
human mycosis Human fungal infections are called human mycosis..pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Basic Mud Logging Guide for educational purpose
Supply Chain Operations Speaking Notes -ICLT Program
Classroom Observation Tools for Teachers
VCE English Exam - Section C Student Revision Booklet
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Abdominal Access Techniques with Prof. Dr. R K Mishra
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPH.pptx obstetrics and gynecology in nursing
Pharma ospi slides which help in ospi learning
Institutional Correction lecture only . . .
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Computing-Curriculum for Schools in Ghana

Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

  • 1. Paper Summary : beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework Jun-sik Choi Department of Brain and Cognitive Engineering, Korea University November 9, 2019
  • 2. Overview of beta-VAE [1] β-VAE is an unsupervised for learning disentangled representations of independent visual data generative factors. β-VAE adds an extra hyperparameter β to the VAE objective, which constricts the encoding capacity of the latent bottleneck and encourages factorized latent representation. A protocol that can quantitatively compare the degree of disentanglement learnt by different models is proposed.
  • 3. Derivation of beta-VAE framework I Assumption Let D = {X, V , W }, x ∈ RN : images, v ∈ RK : conditionally independent factors, w ∈ RH : conditionally dependent factors p(x|v, w) = Sim(v, w): true world simulator using ground truth generative factors. An unsupervised deep generative model pθ(x|z) can learn a joint distribution between x and z ∈ RM , (M ≥ K) by maximizing : max θ Epθ(z) [pθ(x|z)] then p∗ θ (x|z) ≈ p(x|v, w) = Sim(v, w) The aim is to ensure that the inference model qφ(z|x) capture the independent generative factor v in a disentangled manner and keep conditional generative factors remain entangled in a separate subset of z.
  • 4. Derivation of beta-VAE framework II To encourage disentangling property of qφ(z|x), 1. the prior p(z) is set to an isotropic unit Gaussian N(0, I). 2. qφ(z|x) is constrained to match a prior p(z) This constrained optimisation problem can be expressed as: max φ,θ Ex∼D Eqφ(z|x) [log pθ(x|z)] subject to DKL (qφ(z|x) p(z)) < After applying Lagrangian transformation under the KKT conditions, F(θ, φ, β; x, z) = Eqφ(z|x) [log pθ(x|z)] − β (DKL (qφ(z|x) p(z)) − ) ≥ L(θ, φ; x, z, β) = Eqφ(z|x) [log pθ(x|z)] − βDKL (qφ(z|x) p(z))
  • 5. Derivation of beta-VAE framework III Meaning of β 1. β changes the degree of applied learning pressure during training, thus encouraging different learnt representations. 2. When β = 1, β-VAE corresponds to the original VAE formulation. 3. Set β > 1 is putting a stronger constraint on the latent bottleneck than in the original VAE formulation. 4. Pressure to match KL-divergence limit the capacity of z, encourage the model to learn the most efficient representation of the data (the disentangled representation by conditionally independent factor v). 5. There is a trade-off between reconstruction fidelity and the quality of disentanglement.
  • 6. Disentanglement Metric I The description of calculation of disentanglement in the paper [1] is too complex, so I summarized it to a form of pseudocode. Data: D = {V ∈ RK , C ∈ RH , X ∈ RN } lclf ; Linear classifier, q(z|x) ∼ N(µ(x), σ(x)); for b in Batch do Sample yb from Unif[1 · · · K]; for l in L do Sample v1 from p(v) and Sample v2 from p(v); [v2]yb ← [v1]yb ; Sample c1 and c2 from p(c); x1 ← Sim(v1, c1) and x2 ← Sim(v2, c2); z1 ← µ(x1) and z2 ← µ(x2); zl diff ← |z1 − z2|; end zb diff = 1 L ΣL l zl diff ; Predb = lclf (zb diff ); end Loss = ΣB b CrossEntropy(Predb , yb ); Update lclf ;
  • 7. Disentanglement Metric II The linear classifier predict which generative factor [v]i is shared along the pair of images. As q(z|x) has disentangled representation, the performance of classifier increases. The linear classifier should be very simple and have a low VC-dimension in order to ensure that it has no capacity to perform nonlinear disentangling itself.
  • 8. Qualitative Results - 3D chairs Figure: Qualitative results comparing disentangling performance of beta-VAE (beta = 5), and other comparing methods.
  • 9. Qualitative Results - 3D faces Figure: Qualitative results comparing disentangling performance of beta-VAE (beta = 20), and other comparing methods.
  • 10. Qualitative Results - CelebA Figure: Traversal of individual latents demonstrates that beta-VAE discovered.
  • 11. Quantitative Results Figure: (Left) Disentanglement metric classification accuracy for 2D shapes dataset. (Right) Positive correlation between normalized beta and size of latent variable for disentangled factor learning for a fixed beta-VAE architecture.
  • 12. Quantitative Results Figure: Representations learnt by beta-VAE (beta=4) Figure: Representations learnt by InfoGAN
  • 13. References I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework.,” ICLR, vol. 2, no. 5, p. 6, 2017.