Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

Paper Summary :
beta-VAE: Learning Basic Visual Concepts with a
Constrained Variational Framework
Jun-sik Choi
Department of Brain and Cognitive Engineering,
Korea University
November 9, 2019

Overview of beta-VAE [1]
β-VAE is an unsupervised for learning disentangled
representations of independent visual data generative factors.
β-VAE adds an extra hyperparameter β to the VAE objective,
which constricts the encoding capacity of the latent bottleneck
and encourages factorized latent representation.
A protocol that can quantitatively compare the degree of
disentanglement learnt by diﬀerent models is proposed.

Derivation of beta-VAE framework I
Assumption
Let D = {X, V , W },
x ∈ RN
: images,
v ∈ RK
: conditionally independent factors,
w ∈ RH
: conditionally dependent factors
p(x|v, w) = Sim(v, w): true world simulator using ground truth
generative factors.
An unsupervised deep generative model pθ(x|z) can learn a joint
distribution between x and z ∈ RM
, (M ≥ K) by maximizing :
max
θ
Epθ(z) [pθ(x|z)] then p∗
θ (x|z) ≈ p(x|v, w) = Sim(v, w)
The aim is to ensure that the inference model qφ(z|x) capture the
independent generative factor v in a disentangled manner and keep
conditional generative factors remain entangled in a separate subset
of z.

Derivation of beta-VAE framework II
To encourage disentangling property of qφ(z|x),
1. the prior p(z) is set to an isotropic unit Gaussian N(0, I).
2. qφ(z|x) is constrained to match a prior p(z)
This constrained optimisation problem can be expressed as:
max
φ,θ
Ex∼D Eqφ(z|x) [log pθ(x|z)] subject to DKL (qφ(z|x) p(z)) <
After applying Lagrangian transformation under the KKT
conditions,
F(θ, φ, β; x, z) = Eqφ(z|x) [log pθ(x|z)] − β (DKL (qφ(z|x) p(z)) − )
≥ L(θ, φ; x, z, β)
= Eqφ(z|x) [log pθ(x|z)] − βDKL (qφ(z|x) p(z))

Derivation of beta-VAE framework III
Meaning of β
1. β changes the degree of applied learning pressure during
training, thus encouraging different learnt representations.
2. When β = 1, β-VAE corresponds to the original VAE
formulation.
3. Set β > 1 is putting a stronger constraint on the latent
bottleneck than in the original VAE formulation.
4. Pressure to match KL-divergence limit the capacity of z,
encourage the model to learn the most efficient representation
of the data (the disentangled representation by conditionally
independent factor v).
5. There is a trade-off between reconstruction fidelity and the
quality of disentanglement.

Disentanglement Metric I
The description of calculation of disentanglement in the paper
[1] is too complex, so I summarized it to a form of pseudocode.
Data: D = {V ∈ RK
, C ∈ RH
, X ∈ RN
}
lclf ; Linear classifier, q(z|x) ∼ N(µ(x), σ(x));
for b in Batch do
Sample yb
from Unif[1 · · · K];
for l in L do
Sample v1 from p(v) and Sample v2 from p(v);
[v2]yb ← [v1]yb ;
Sample c1 and c2 from p(c);
x1 ← Sim(v1, c1) and x2 ← Sim(v2, c2);
z1 ← µ(x1) and z2 ← µ(x2);
zl
diff ← |z1 − z2|;
end
zb
diff = 1
L ΣL
l zl
diff ;
Predb
= lclf (zb
diff );
end
Loss = ΣB
b CrossEntropy(Predb
, yb
);
Update lclf ;

Disentanglement Metric II
The linear classifier predict which generative factor [v]i is shared
along the pair of images.
As q(z|x) has disentangled representation, the performance of
classifier increases.
The linear classifier should be very simple and have a low
VC-dimension in order to ensure that it has no capacity to perform
nonlinear disentangling itself.

Qualitative Results - 3D chairs
Figure: Qualitative results comparing disentangling performance of
beta-VAE (beta = 5), and other comparing methods.

Qualitative Results - 3D faces
Figure: Qualitative results comparing disentangling performance of
beta-VAE (beta = 20), and other comparing methods.

Qualitative Results - CelebA
Figure: Traversal of individual latents demonstrates that beta-VAE
discovered.

Quantitative Results
Figure: (Left) Disentanglement metric classiﬁcation accuracy for 2D
shapes dataset. (Right) Positive correlation between normalized beta and
size of latent variable for disentangled factor learning for a ﬁxed
beta-VAE architecture.

Quantitative Results
Figure: Representations learnt by
beta-VAE (beta=4)
Figure: Representations learnt by
InfoGAN

References
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot,
M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae:
Learning basic visual concepts with a constrained variational
framework.,” ICLR, vol. 2, no. 5, p. 6, 2017.

Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

More Related Content

What's hot (20)

Similar to Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (20)

Recently uploaded (20)

Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework