On Unifying Deep Generative Models

Algorithmic Intelligence Lab Seminar:
On Unifying Deep Generative Models
2017.06.21.
Sangwoo Mo
1/21

Overview
• On Unifying Deep Generative Models (arXiv, 2 Jun 2017)
• Author: Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
• Contribution
1. Establish formal connection between GAN and VAE
2. Enables to exchange ideas across models in principled way
(apply ideas in VAE to GAN, and ideas in GAN to VAE)
2/21

Table of Contents
• Bridging the Gap
• ADA (Adversarial Domain Adaptation)
• GAN (Generative Adversarial Network)
• VAE (Variational Autoencoder)
• WS (Wake Sleep Algorithm)
• Applications
• IWGAN (Importance Weighted GAN)
• AAVAE (Adversary Activated VAE)
• Experiments
• Conclusion
3/21

Table of Contents
• Applications
• Experiments
• Conclusion
4/21

ADA (Adversarial Domain Adaptation)
• Goal: Transfer knowledge from source domain to target domain
• => Learn feature extractor whose output cannot be distinguished
by a discriminator of source and target domains
• Notations
• 𝒛: data (either from source or target domain)
• 𝒚 ∈ {0,1}: domain indicator (source: 1, target: 0)
• 𝒙 = 𝐺,(𝑧): feature (𝐺,: feature extractor)
5/21

• The discriminator should guess the true domain,
and the feature extractor should fool the discriminator
• Thus, the objective function is
(omitted supervised learning part of feature extractor)
7/21

GAN (Generative Adversarial Network)
• GAN can be seen as a special case of ADA
• Let real data to be source, and generated data to be target
• Note that 𝑝=(𝑥) is parametrized by 𝜃, while 𝑝?@A@(𝑥) is fixed
(code space and generator is degenerated for 𝑦 = 1)
• Here, ADA objective is identical to GAN (unsaturated version)
8/21

• GAN objective = 𝐾𝐿(𝑝, 𝑥 𝑦 ||𝑞D 𝑥 𝑦 ) − 𝐽𝑆𝐷(𝑝=||𝑝?@A@)
• Let 𝑦 as visible and 𝑥 as latent
• Then it is variational inference where 𝑞D 𝑥 𝑦 is posterior
• Since 𝑞D 𝑥 𝑦 ∝ 𝑝,I
𝑥 =
J
K
(𝑝= 𝑥 + 𝑝?@A@ 𝑥 ), 𝑝= goes to 𝑝?@A@
• Remark that it is reverse KL, thus occurs mode collapse problem
9/21

• InfoGAN: additionally recover the latent code 𝑧
• Simply introduce extra conditional 𝑝M(𝑧|𝑥, 𝑦)
• Then the objective is
where 𝑞D 𝑥 𝑧, 𝑦 ∝ 𝑞MI
𝑧 𝑥, 𝑦 𝑞7I
D
𝑦 𝑥 𝑝,I
(𝑥)
10/21

VAE (Variational Autoencoder)
• Assume VAE has optimal (degenerated) discriminator 𝑞∗(𝑦|𝑥)
• VAE detects every false data, and only learns from real data
• The original objective is
•
and identical to
11/21

Compare GAN and VAE
• In summary
• InfoGAN objective: 𝐾𝐿(𝑝, 𝑥 𝑧, 𝑦 ||𝑞D 𝑥 𝑧, 𝑦 )
• VAE objective: 𝐾𝐿(𝑞D 𝑧, 𝑦 𝑥 ||𝑝, 𝑧, 𝑦 𝑥 )
• Remark that (1) position of 𝑝 and 𝑞 are reversed, and
(2) hidden/visible variables 𝑥 and 𝑦, 𝑧 are inverted
• VAE minimizes KL -> smoothed output
• GAN minimizes reverse KL -> mode collapse
=> VAE/GAN joint model
12/21

WS (Wake Sleep Algorithm)
• Classic wake-sleep algorithm is
• VAE = wake step
• Let ℎ be 𝑧 and 𝜆 be 𝜂. VAE objective is 𝑝,, but also optimize 𝑞M
• GAN = sleep step
• Let ℎ be 𝑦 and 𝜆 be 𝜙. GAN objective is 𝑞7, but also optimize 𝑝,
13/21

Table of Contents
• Applications
• Experiments
• Conclusion
14/21

IWGAN (Importance Weighted GAN)
• Importance Weighted GAN
• In practice, just assign weights for each samples in mini-batch
15/21

AAVAE (Adversary Activated VAE)
• Adversary Activated VAE
• Motivation: Utilize fake samples
• => Use discriminator network 𝑞7(𝑦|𝑥) instead of optimal 𝑞∗(𝑦|𝑥)
• Objective
16/21

AAVAE (Adversary Activated VAE)
• However, (18) is intractable since 𝑝, 𝑥 𝑧, 𝑦 = 1 = 𝑝?@A@(𝑥) is
an implicit distribution (cannot estimate likelihood)
• In practice, AAVAE use binary classifier same as GAN
• In my opinion, it is just a GAN variant using different 𝐺 objective
17/21

Table of Contents
• Applications
• Experiments
• Conclusion
18/21

Table of Contents
• Applications
• Experiments
• Conclusion
20/21

Conclusion & Discussions
• Conclusion
• Traditional models usually distinguish visible/latent variables
• However, it may not necessary to make clear boundary between
visible/latent and generator/discriminator
• GAN and VAE can be thought as (in some sense) dual
• Research Directions
• Generalize framework to connect to other learning paradigms
e.g. Reinforcement Learning, Energy-based model, etc.
21/21

Appendix: ADA objective = GAN objective
23/21

Appendix: Reverse KL divergence
24/21

Appendix: Proof of Lemma 1.
25/21

Appendix: Proof of Lemma 2.
26/21

On Unifying Deep Generative Models

More Related Content

Similar to On Unifying Deep Generative Models (20)

More from Sangwoo Mo (20)

Recently uploaded (20)

On Unifying Deep Generative Models