SlideShare a Scribd company logo
InfoGAN: Interpretable Representation Learning by
Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman,
Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)
Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs
Reference
https://arxiv.org/abs/1606.03657 (with Appendix sections)
Implementations
https://github.com/openai/InfoGAN (by the authors, with TensorFlow)
https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)
NIPS2016読み会
Motivation
How can we achieve
unsupervised learning of disentangled representation?
In general, learned representation is entangled,
i.e. encoded in a data space in a complicated manner
When a representation is disentangled, it would be
more interpretable and easier to apply to tasks
Related works
• Unsupervised learning of representation
(no mechanism to force disentanglement)
Stacked (often denoising) autoencoder, RBM
Many others, including semi-supervised approach
• Supervised learning of disentangled representation
Bilinear models, multi-view perceptron
VAEs, adversarial autoencoders
• Weakly supervised learning of disentangled representation
disBM, DC-IGN
• Unsupervised learning of disentangled representation
hossRBM, applicable only to discrete latent factors
which the presenter has almost no knowledge about.
This work:
Unsupervised learning of disentangled representation
applicable to both continuous and discrete latent factors
Generative Adversarial Nets(GANs)
Generative model trained by competition between
two neural nets:
Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍
𝑝 𝑧 𝑍 : an arbitrary noise distribution
Discriminator 𝐷 𝑥 ∈ 0,1 :
probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋)
rather than generated by the generator 𝐺 𝑧
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 , where
𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧
Optimization problem to solve:
Problems with GANs
From the perspective of representation learning:
No restrictions on how 𝐺 𝑧 uses 𝑧
• 𝑧 can be used in a highly entangled way
• Each dimension of 𝑧 does not represent
any salient feature of the training data
𝑧1
𝑧2
Proposed Resolution: InfoGAN
-Maximizing Mutual Information -
Observation in conventional GANs:
a generated date 𝑥 does not have much information
on the noise 𝑧 from which 𝑥 is generated
because of heavily entangled use of 𝑧
Proposed resolution = InfoGAN:
the generator 𝐺 𝑧, 𝑐 trained so that
it maximize the mutual information 𝐼 𝐶 𝑋 between
the latent code 𝐶 and the generated data 𝑋
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
Mutual Information
𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where
• 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 :
Entropy of the prior distribution
• 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 :
Entropy of the posterior distribution
𝑝 𝑋 = 𝑥
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0
Sampling 𝑦 ∼ 𝑝 𝑌
Avoiding increase of calculation costs
Major difficulty:
Evaluation of 𝐼 𝐶 𝑋 based on
evaluation and sampling from the posterior 𝑝 𝐶 𝑋
Two strategies:
Variational maximization of mutual information
Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
Sharing the neural net
between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
Variational Maximization of MI
For an arbitrary function 𝑄 𝑐, 𝑥 ,
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
(∵ positivity of KL divergence)
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln
𝑝 𝐶 = 𝑐 𝑋 = 𝑥
𝑄 𝑐, 𝑥
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥
≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
Variational Maximization of MI
Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄
 With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain
an variational estimate of the mutual information:
𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶
≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
⇔
• Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
• Maximizing the mutual information
min
𝐺,𝑄
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄
Optimization problem to solve in InfoGAN:
Eliminate sampling from posterior
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥
= 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 ,
By using this lemma and noting that
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙
we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 :
Easy to estimate!
Proof of lemma
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
∵ l. h. s. =
𝑥 𝑦
𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦
=
𝑥 𝑦
𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
= r. h. s.
∵ Bayes’ theorem
Sharing layers between 𝐷 and 𝑄
Model 𝑄 𝑐, 𝑥 using neural network
Reduce the calculation costs by
sharing all the convolution layers with 𝐷
Image from Odena, et al., arXiv:1610.09585.
Convolution layers of the discriminator
𝐷 𝑄
Given DCGANs,
InfoGAN comes for negligible additional costs!
Experiment – MI Maximization
• InfoGAN on MNIST dataset
• Latent code 𝑐
= 10-class categorical code
𝐿𝐼 quickly saturates to
𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN
Figure 1 in the original paper
Experiment
– Disentangled Representation –
Figure 2 in the original paper
• InfoGAN on MNIST dataset
• Latent codes
 𝑐1: 10-class categorical code
 𝑐2, 𝑐3: continuous code
 𝑐1 can be used as a
classifier with 5% error
rate.
 𝑐2 and 𝑐3 captured the
rotation and width,
respectively
Experiment
– Disentangled Representation –
Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.
Figure 3 in the original paper
Experiment
– Disentangled Representation –
Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.
InfoGAN learned salient features without supervision
Figure 4 in the original paper
Experiment
– Disentangled Representation –
Dataset: Street View House Number
Figure 5 in the original paper
Experiment
– Disentangled Representation –
Dataset: CelebA
Figure 6 in the original paper
Future Prospect and Conclusion
Mutual information maximization can be applied to
other methods, e.g. VAE
Learning hierarchical latent representation
Improving semi-supervised learning
High-dimentional data discovery
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs

More Related Content

PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
PPTX
Understanding Autoencoder (Deep Learning Book, Chapter 14)
PPTX
CONVOLUTIONAL NEURAL NETWORK
PPTX
CNN Machine learning DeepLearning
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
1 seaborn introduction
ODP
Computational Intelligence and Applications
PPTX
decision tree regression
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Understanding Autoencoder (Deep Learning Book, Chapter 14)
CONVOLUTIONAL NEURAL NETWORK
CNN Machine learning DeepLearning
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
1 seaborn introduction
Computational Intelligence and Applications
decision tree regression

What's hot (20)

PPTX
Generative models
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
PPTX
Bagging.pptx
PPTX
Data Analysis packages
PDF
Generative adversarial networks
PDF
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
PDF
CNN Algorithm
PPTX
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
PDF
Lecture13 - Association Rules
ODP
Machine Learning with Decision trees
PDF
Machine Learning Algorithm - Decision Trees
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PPTX
Attention Is All You Need
PPTX
PDF
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PDF
PDF
Rnn and lstm
PDF
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
PPT
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
PPTX
StarGAN
Generative models
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Bagging.pptx
Data Analysis packages
Generative adversarial networks
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
CNN Algorithm
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Lecture13 - Association Rules
Machine Learning with Decision trees
Machine Learning Algorithm - Decision Trees
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Attention Is All You Need
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Rnn and lstm
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
StarGAN
Ad

Viewers also liked (18)

PPTX
Introduction of "TrailBlazer" algorithm
PPT
時系列データ3
PDF
Value iteration networks
PDF
Fast and Probvably Seedings for k-Means
PDF
Dual Learning for Machine Translation (NIPS 2016)
PDF
Interaction Networks for Learning about Objects, Relations and Physics
PDF
Safe and Efficient Off-Policy Reinforcement Learning
PDF
Conditional Image Generation with PixelCNN Decoders
PDF
Learning to learn by gradient descent by gradient descent
PPTX
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
PDF
Improving Variational Inference with Inverse Autoregressive Flow
PDF
[DL輪読会]Convolutional Sequence to Sequence Learning
PDF
NIPS 2016 Overview and Deep Learning Topics
PDF
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
PPTX
Differential privacy without sensitivity [NIPS2016読み会資料]
PDF
Matching networks for one shot learning
PPTX
ICML2016読み会 概要紹介
PDF
論文紹介 Pixel Recurrent Neural Networks
Introduction of "TrailBlazer" algorithm
時系列データ3
Value iteration networks
Fast and Probvably Seedings for k-Means
Dual Learning for Machine Translation (NIPS 2016)
Interaction Networks for Learning about Objects, Relations and Physics
Safe and Efficient Off-Policy Reinforcement Learning
Conditional Image Generation with PixelCNN Decoders
Learning to learn by gradient descent by gradient descent
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Improving Variational Inference with Inverse Autoregressive Flow
[DL輪読会]Convolutional Sequence to Sequence Learning
NIPS 2016 Overview and Deep Learning Topics
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
Differential privacy without sensitivity [NIPS2016読み会資料]
Matching networks for one shot learning
ICML2016読み会 概要紹介
論文紹介 Pixel Recurrent Neural Networks
Ad

Similar to InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (20)

PDF
InfoGAN and Generative Adversarial Networks
PPTX
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
PDF
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
PDF
InfoGAN:Bridging the Gap Between Data and Understanding in GANs
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
PDF
Introduction to Generative Adversarial Network
PDF
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
PDF
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
PDF
[PR12] intro. to gans jaejun yoo
PDF
Brief introduction on GAN
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
Generative adversarial networks
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
PDF
A Walk in the GAN Zoo
PPTX
GANs Deep Learning Summer School
PPTX
Synthetic Image Data Generation using GAN &Triple GAN.pptx
PPT
Deep-Learning-2017-Lecture7GAN.ppt
PPT
Deep-Learning-2017-Lecture7GAN.ppt
PPT
Deep-Learning-2017-Lecture7GAN.ppt
PPTX
GAN for Bayesian Inference objectives
InfoGAN and Generative Adversarial Networks
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN:Bridging the Gap Between Data and Understanding in GANs
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
Introduction to Generative Adversarial Network
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
[PR12] intro. to gans jaejun yoo
Brief introduction on GAN
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
Generative adversarial networks
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
A Walk in the GAN Zoo
GANs Deep Learning Summer School
Synthetic Image Data Generation using GAN &Triple GAN.pptx
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
Deep-Learning-2017-Lecture7GAN.ppt
GAN for Bayesian Inference objectives

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Spectroscopy.pptx food analysis technology

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

  • 1. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI) Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo) Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs Reference https://arxiv.org/abs/1606.03657 (with Appendix sections) Implementations https://github.com/openai/InfoGAN (by the authors, with TensorFlow) https://github.com/yoshum/InfoGAN (by the presenter, with Chainer) NIPS2016読み会
  • 2. Motivation How can we achieve unsupervised learning of disentangled representation? In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner When a representation is disentangled, it would be more interpretable and easier to apply to tasks
  • 3. Related works • Unsupervised learning of representation (no mechanism to force disentanglement) Stacked (often denoising) autoencoder, RBM Many others, including semi-supervised approach • Supervised learning of disentangled representation Bilinear models, multi-view perceptron VAEs, adversarial autoencoders • Weakly supervised learning of disentangled representation disBM, DC-IGN • Unsupervised learning of disentangled representation hossRBM, applicable only to discrete latent factors which the presenter has almost no knowledge about. This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors
  • 4. Generative Adversarial Nets(GANs) Generative model trained by competition between two neural nets: Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍 𝑝 𝑧 𝑍 : an arbitrary noise distribution Discriminator 𝐷 𝑥 ∈ 0,1 : probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋) rather than generated by the generator 𝐺 𝑧 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 , where 𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧 Optimization problem to solve:
  • 5. Problems with GANs From the perspective of representation learning: No restrictions on how 𝐺 𝑧 uses 𝑧 • 𝑧 can be used in a highly entangled way • Each dimension of 𝑧 does not represent any salient feature of the training data 𝑧1 𝑧2
  • 6. Proposed Resolution: InfoGAN -Maximizing Mutual Information - Observation in conventional GANs: a generated date 𝑥 does not have much information on the noise 𝑧 from which 𝑥 is generated because of heavily entangled use of 𝑧 Proposed resolution = InfoGAN: the generator 𝐺 𝑧, 𝑐 trained so that it maximize the mutual information 𝐼 𝐶 𝑋 between the latent code 𝐶 and the generated data 𝑋 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
  • 7. Mutual Information 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where • 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 : Entropy of the prior distribution • 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 : Entropy of the posterior distribution 𝑝 𝑋 = 𝑥 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0 Sampling 𝑦 ∼ 𝑝 𝑌
  • 8. Avoiding increase of calculation costs Major difficulty: Evaluation of 𝐼 𝐶 𝑋 based on evaluation and sampling from the posterior 𝑝 𝐶 𝑋 Two strategies: Variational maximization of mutual information Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 Sharing the neural net between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
  • 9. Variational Maximization of MI For an arbitrary function 𝑄 𝑐, 𝑥 , 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 (∵ positivity of KL divergence) = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 𝑄 𝑐, 𝑥 = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥 ≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
  • 10. Variational Maximization of MI Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄  With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain an variational estimate of the mutual information: 𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶 ≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶 ⇔ • Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 • Maximizing the mutual information min 𝐺,𝑄 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄 Optimization problem to solve in InfoGAN:
  • 11. Eliminate sampling from posterior Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 , By using this lemma and noting that 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙 we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 : Easy to estimate!
  • 12. Proof of lemma Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . ∵ l. h. s. = 𝑥 𝑦 𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = r. h. s. ∵ Bayes’ theorem
  • 13. Sharing layers between 𝐷 and 𝑄 Model 𝑄 𝑐, 𝑥 using neural network Reduce the calculation costs by sharing all the convolution layers with 𝐷 Image from Odena, et al., arXiv:1610.09585. Convolution layers of the discriminator 𝐷 𝑄 Given DCGANs, InfoGAN comes for negligible additional costs!
  • 14. Experiment – MI Maximization • InfoGAN on MNIST dataset • Latent code 𝑐 = 10-class categorical code 𝐿𝐼 quickly saturates to 𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN Figure 1 in the original paper
  • 15. Experiment – Disentangled Representation – Figure 2 in the original paper • InfoGAN on MNIST dataset • Latent codes  𝑐1: 10-class categorical code  𝑐2, 𝑐3: continuous code  𝑐1 can be used as a classifier with 5% error rate.  𝑐2 and 𝑐3 captured the rotation and width, respectively
  • 16. Experiment – Disentangled Representation – Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301. Figure 3 in the original paper
  • 17. Experiment – Disentangled Representation – Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769. InfoGAN learned salient features without supervision Figure 4 in the original paper
  • 18. Experiment – Disentangled Representation – Dataset: Street View House Number Figure 5 in the original paper
  • 19. Experiment – Disentangled Representation – Dataset: CelebA Figure 6 in the original paper
  • 20. Future Prospect and Conclusion Mutual information maximization can be applied to other methods, e.g. VAE Learning hierarchical latent representation Improving semi-supervised learning High-dimentional data discovery Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs