SlideShare a Scribd company logo
InfoGAN
Interpretable Representation Learning by Information
Maximizing Generative Adversarial Nets
Introduction
 In ordinary GANs, the latent vector z has an arbitrary
representation. The vector has no intrinsic meaning.
 It would be desirable to make a more meaningful
representation of the outputs.
Disentangled Representation
 We wish to disentangle the representation of the output
images in the input latent vectors.
 That is, we wish to make the values of the latent vector c
correspond to features in the generated images.
Information Theory
A brief introduction to relevant concepts in information theory.
Entropy
Entropy can be intuitively
understood the amount of
information that a variable contains.
It is borrowed from statistical
thermodynamics and is directly
analogous to the physical definition
of randomness of particle states.
𝐻 𝑋 = −
𝑖=1
𝑛
𝑃 𝑥𝑖 log 𝑏 𝑃 𝑥𝑖
𝐻 𝑋 𝑌 = −
𝑖,𝑗
𝑝 𝑥𝑖, 𝑦𝑗 𝑙𝑜𝑔
𝑝(𝑥𝑖, 𝑦𝑖)
𝑝 𝑦𝑖
Mutual Information: Definition
 In information theory, I(X;Y),
the mutual information
between X and Y, measures
the “amount of information”
learned from knowledge of
random variable Y about the
random variable X.
 The mutual information can be
expressed as the difference of
two entropy terms:
I(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X)
Mutual Information: Interpretation
 If X and Y are independent,
then I(X;Y) = 0, because
knowing one variable reveals
nothing about the other.
 If X and Y are related by a
deterministic, invertible
function, I(X;Y) is at its
maximum.
 I(X;Y) is the reduction of
uncertainty in X when Y is
observed and vice versa.
Mutual Information: Implications
 Formulation of a cost function using mutual information.
 Information regularized mini-max game.
min
𝐺
max
𝐷
𝑉𝐼 𝐷, 𝐺 = 𝑉 𝐷, 𝐺 − 𝜆𝐼(𝑐; 𝐺 𝑧, 𝑐 )
z: vector for representing intrinsic noise
c: Latent code representing meaningful information.
Implementation
Variational Mutual Information Maximization and Approximations.
Variational Mutual Information Maximization
 In practice, 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) cannot be maximized directly as this
requires 𝑃(𝑐|𝑥), the posterior distribution.
 We can obtain a lower bound for 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) by using an
auxiliary distribution, 𝑄 𝑐 𝑥 , to approximate 𝑃(𝑐|𝑥).
 This technique is known as Variational Mutual Information
Maximization. The equations are in the next slide.
Variational Mutual Information Maximization
 The Mutual Information is decomposed into its components.
 Q is introduced to use the definition of KL divergence.
 KL Divergence is always greater than 0 by definition.
Variational Mutual Information Maximization
 Although 𝐻(𝑐) can also be optimized, it is set as a constant for
simplicity. This is done by drawing c from a fixed distribution.
 The equality in this part is proven in the appendix of the paper. It
holds under most conditions.
Variational Mutual Information Maximization
 By the proofs discussed previously, the actual information
regularization is as follows.
min
𝐺,𝑄
max
𝐷
𝑉𝐼𝑛𝑓𝑜𝐺𝐴𝑁 𝐷, 𝐺, 𝑄 = 𝑉 𝐷, 𝐺 − 𝜆𝐿𝐼(𝐺, 𝑄)
𝐿𝐼: Information Lower Bound
𝜆: Weighting Hyperparameter
Practical Implementation
 In practice, we use a neural
network to represent Q.
 KL Divergence is minimized
when Q converges to P.
 Q is just D with an extra FC
layer. It outputs an estimate
of the c latent vector(s).
 𝐿𝐼(𝐺, 𝑄) has been observed
to converge faster than GAN
objectives.
 InfoGAN thus comes
essentially for free with GAN.
Overview of the Architecture
Additional explanation
 Cross Entropy Loss is used on the estimate of c and the actual c
(Some implementations use MSE).
 For discrete cases, the outputs are softmax activations. For
continuous cases, they may be sigmoid, tanh, or linear.
 The original is much more complicated and does not directly
output the estimate of the latent vector but estimates of its
distribution (mean and standard deviation).
Analysis
 The information lower bound
𝐿𝐼 does not increase without
information regularization.
 The maximum appears to be
2.3 for this case.
Results: MNIST
The discrete latent vector input for digits is
highly correlated with the output number.
It can be used to classify MNIST with a 5%
error rate, even when InfoGAN is trained
without any labels.
The continuous latent vector has found
the angle of the characters.
We confirm that the latent code validity by
extending the range beyond the original.
The ordinary GAN has learned nothing.
There is no control over which vector
learns what.
Results: Faces
Multiple vectors with a range between -1
and 1 are used.
The 4 latent vectors appear to have
captured azimuth, elevation, lighting, and
width.
There is smooth interpolation within and
even beyond the range.
Moreover, the other details change to
make a much more natural image.
It is not a simple case where only the
target feature changes while the other
factors remain unnaturally constant.
Results on chairs
Results on SVHN
Results on CelebA
Conclusion
 InfoGAN can learn and interpret salient features of data
without any labels or supervision.
 InfoGAN can discover salient latent factors of variation
automatically and produce a latent vector containing that
information.

More Related Content

PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
PPTX
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
PDF
Wasserstein GAN 수학 이해하기 I
PDF
Hyperoptとその周辺について
PPTX
InfoGAN Paper Review
PDF
深層学習と確率プログラミングを融合したEdwardについて
PPTX
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
PPTX
Gradient Boosting
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
Wasserstein GAN 수학 이해하기 I
Hyperoptとその周辺について
InfoGAN Paper Review
深層学習と確率プログラミングを融合したEdwardについて
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
Gradient Boosting

What's hot (20)

PDF
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessm...
PDF
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
PPTX
InfoGAIL
PPTX
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
PPT
Naive bayes
PDF
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
PPTX
십분딥러닝_16_WGAN (Wasserstein GANs)
PDF
論文紹介 Pixel Recurrent Neural Networks
PDF
Uncertainty in Deep Learning
PDF
カーネル法:正定値カーネルの理論
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
ENBIS 2018 presentation on Deep k-Means
PPTX
A note on word embedding
PDF
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PPTX
Invertible Denoising Network: A Light Solution for Real Noise Removal
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessm...
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
InfoGAIL
딥 러닝 자연어 처리를 학습을 위한 파워포인트. (Deep Learning for Natural Language Processing)
Naive bayes
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
십분딥러닝_16_WGAN (Wasserstein GANs)
論文紹介 Pixel Recurrent Neural Networks
Uncertainty in Deep Learning
カーネル法:正定値カーネルの理論
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
ENBIS 2018 presentation on Deep k-Means
A note on word embedding
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Invertible Denoising Network: A Light Solution for Real Noise Removal
Semantic segmentation with Convolutional Neural Network Approaches
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Ad

Similar to InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (20)

PDF
An Image representation using Compressive Sensing and Arithmetic Coding
PDF
Deep learning MindMap
PDF
V. pacáková, d. brebera
PPTX
Artificial intelligence.pptx
PPTX
Artificial intelligence
PPTX
Artificial intelligence
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Classifiers
PPTX
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
PPTX
Machine learning session4(linear regression)
PDF
Data Mining for Integration and Verification of Socio-Geographical Trend Stat...
PDF
Data Mining for Integration and Verification of Socio-Geographical Trend Stat...
PPTX
A CONVERGENCE ANALYSIS OF GRADIENT_version1
PPTX
Illustrative Introductory Neural Networks
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
AI & ML(Unit III).pptx.It contains also syllabus
DOCX
Non-Normally Distributed Errors In Regression Diagnostics.docx
PPTX
PRML Chapter 5
PDF
nonlinear_rmt.pdf
PDF
(研究会輪読) Weight Uncertainty in Neural Networks
An Image representation using Compressive Sensing and Arithmetic Coding
Deep learning MindMap
V. pacáková, d. brebera
Artificial intelligence.pptx
Artificial intelligence
Artificial intelligence
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Classifiers
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
Machine learning session4(linear regression)
Data Mining for Integration and Verification of Socio-Geographical Trend Stat...
Data Mining for Integration and Verification of Socio-Geographical Trend Stat...
A CONVERGENCE ANALYSIS OF GRADIENT_version1
Illustrative Introductory Neural Networks
International Journal of Engineering Research and Development (IJERD)
AI & ML(Unit III).pptx.It contains also syllabus
Non-Normally Distributed Errors In Regression Diagnostics.docx
PRML Chapter 5
nonlinear_rmt.pdf
(研究会輪読) Weight Uncertainty in Neural Networks
Ad

More from Joonhyung Lee (11)

PPTX
PPTX
Rethinking Attention with Performers
PPTX
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
PPTX
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
PPTX
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
PPTX
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
PPTX
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
PPTX
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
PPTX
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
PPTX
StarGAN
PPTX
Deep Learning in Bio-Medical Imaging
Rethinking Attention with Performers
Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Squeeze Excitation Networks, The simple idea that won the final ImageNet Chal...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
StarGAN
Deep Learning in Bio-Medical Imaging

Recently uploaded (20)

PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
An interstellar mission to test astrophysical black holes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
2. Earth - The Living Planet Module 2ELS
Introduction to Cardiovascular system_structure and functions-1
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
7. General Toxicologyfor clinical phrmacy.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
2. Earth - The Living Planet earth and life
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Comparative Structure of Integument in Vertebrates.pptx
Placing the Near-Earth Object Impact Probability in Context
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
INTRODUCTION TO EVS | Concept of sustainability
An interstellar mission to test astrophysical black holes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Cell Membrane: Structure, Composition & Functions
POSITIONING IN OPERATION THEATRE ROOM.ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
HPLC-PPT.docx high performance liquid chromatography
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

  • 1. InfoGAN Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
  • 2. Introduction  In ordinary GANs, the latent vector z has an arbitrary representation. The vector has no intrinsic meaning.  It would be desirable to make a more meaningful representation of the outputs.
  • 3. Disentangled Representation  We wish to disentangle the representation of the output images in the input latent vectors.  That is, we wish to make the values of the latent vector c correspond to features in the generated images.
  • 4. Information Theory A brief introduction to relevant concepts in information theory.
  • 5. Entropy Entropy can be intuitively understood the amount of information that a variable contains. It is borrowed from statistical thermodynamics and is directly analogous to the physical definition of randomness of particle states. 𝐻 𝑋 = − 𝑖=1 𝑛 𝑃 𝑥𝑖 log 𝑏 𝑃 𝑥𝑖 𝐻 𝑋 𝑌 = − 𝑖,𝑗 𝑝 𝑥𝑖, 𝑦𝑗 𝑙𝑜𝑔 𝑝(𝑥𝑖, 𝑦𝑖) 𝑝 𝑦𝑖
  • 6. Mutual Information: Definition  In information theory, I(X;Y), the mutual information between X and Y, measures the “amount of information” learned from knowledge of random variable Y about the random variable X.  The mutual information can be expressed as the difference of two entropy terms: I(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X)
  • 7. Mutual Information: Interpretation  If X and Y are independent, then I(X;Y) = 0, because knowing one variable reveals nothing about the other.  If X and Y are related by a deterministic, invertible function, I(X;Y) is at its maximum.  I(X;Y) is the reduction of uncertainty in X when Y is observed and vice versa.
  • 8. Mutual Information: Implications  Formulation of a cost function using mutual information.  Information regularized mini-max game. min 𝐺 max 𝐷 𝑉𝐼 𝐷, 𝐺 = 𝑉 𝐷, 𝐺 − 𝜆𝐼(𝑐; 𝐺 𝑧, 𝑐 ) z: vector for representing intrinsic noise c: Latent code representing meaningful information.
  • 9. Implementation Variational Mutual Information Maximization and Approximations.
  • 10. Variational Mutual Information Maximization  In practice, 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) cannot be maximized directly as this requires 𝑃(𝑐|𝑥), the posterior distribution.  We can obtain a lower bound for 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) by using an auxiliary distribution, 𝑄 𝑐 𝑥 , to approximate 𝑃(𝑐|𝑥).  This technique is known as Variational Mutual Information Maximization. The equations are in the next slide.
  • 11. Variational Mutual Information Maximization  The Mutual Information is decomposed into its components.  Q is introduced to use the definition of KL divergence.  KL Divergence is always greater than 0 by definition.
  • 12. Variational Mutual Information Maximization  Although 𝐻(𝑐) can also be optimized, it is set as a constant for simplicity. This is done by drawing c from a fixed distribution.  The equality in this part is proven in the appendix of the paper. It holds under most conditions.
  • 13. Variational Mutual Information Maximization  By the proofs discussed previously, the actual information regularization is as follows. min 𝐺,𝑄 max 𝐷 𝑉𝐼𝑛𝑓𝑜𝐺𝐴𝑁 𝐷, 𝐺, 𝑄 = 𝑉 𝐷, 𝐺 − 𝜆𝐿𝐼(𝐺, 𝑄) 𝐿𝐼: Information Lower Bound 𝜆: Weighting Hyperparameter
  • 14. Practical Implementation  In practice, we use a neural network to represent Q.  KL Divergence is minimized when Q converges to P.  Q is just D with an extra FC layer. It outputs an estimate of the c latent vector(s).  𝐿𝐼(𝐺, 𝑄) has been observed to converge faster than GAN objectives.  InfoGAN thus comes essentially for free with GAN.
  • 15. Overview of the Architecture
  • 16. Additional explanation  Cross Entropy Loss is used on the estimate of c and the actual c (Some implementations use MSE).  For discrete cases, the outputs are softmax activations. For continuous cases, they may be sigmoid, tanh, or linear.  The original is much more complicated and does not directly output the estimate of the latent vector but estimates of its distribution (mean and standard deviation).
  • 17. Analysis  The information lower bound 𝐿𝐼 does not increase without information regularization.  The maximum appears to be 2.3 for this case.
  • 18. Results: MNIST The discrete latent vector input for digits is highly correlated with the output number. It can be used to classify MNIST with a 5% error rate, even when InfoGAN is trained without any labels. The continuous latent vector has found the angle of the characters. We confirm that the latent code validity by extending the range beyond the original. The ordinary GAN has learned nothing. There is no control over which vector learns what.
  • 19. Results: Faces Multiple vectors with a range between -1 and 1 are used. The 4 latent vectors appear to have captured azimuth, elevation, lighting, and width. There is smooth interpolation within and even beyond the range. Moreover, the other details change to make a much more natural image. It is not a simple case where only the target feature changes while the other factors remain unnaturally constant.
  • 23. Conclusion  InfoGAN can learn and interpret salient features of data without any labels or supervision.  InfoGAN can discover salient latent factors of variation automatically and produce a latent vector containing that information.