SlideShare a Scribd company logo
Large Scale GAN Training for
High Fidelity Natural Image Synthesis
Andrew Brock, Jeff Donahue, Karen Simonyan
2018.10.14
Presented by Young Seok Kim
Under Review for ICLR 2019
https://arxiv.org/abs/1809.11096

https://openreview.net/forum?id=B1xsqj09Fm
BigGAN
BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis
Related papers in PR12
• Goodfellow, Ian J. et al. “Generative Adversarial Nets.” NIPS (2014)

• PR-001 : https://youtu.be/L3hz57whyNw

• Chen, Xi et al. “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative
Adversarial Nets.” NIPS (2016)

• PR-022 : https://youtu.be/_4jbgniqt_Q

• Mirza, Mehdi and Simon Osindero. “Conditional Generative Adversarial Nets.” CoRR abs/1411.1784 (2014)

• PR-051 : https://youtu.be/iCgT8G4PkqI

• Miyato, Takeru et al. “Spectral Normalization for Generative Adversarial Networks.” ICLR (2018)

• PR-087 : https://youtu.be/iXSYqohGQhM
!3
Motivation
• Since the advent of GAN, there has been several attempts to stabilize the training of
GAN.

• Although some were quite successful in small scale, generating high-resolution and
diverse samples from datasets like ImageNet is hard

• We need more analysis to instabilities of GAN
!4
Contributions
• We demonstrate that GANs benefit dramatically from scaling. We introduce two simple,
general architectural changes that improve scalability, and modify a regularization
scheme to improve conditioning, demonstrably boosting performance. 

• Our models become amenable to the “truncation trick,” a simple sampling technique
that allows explicit, fine-grained control of the tradeoff between sample variety and
fidelity.

• We discover instabilities specific to large scale GANs, and characterize them
empirically.
!5
Core Methods
• Shared Embedding

• Hierarchical latent space

• Orthogonal Regularization

• Truncation Trick
!6
Shared embedding
• Introduced in

• Perez, Ethan et al. “FiLM: Visual
Reasoning with a General
Conditioning Layer.” AAAI (2018)

• Shared embedding is linearly
projected to each layer’s gains
and biases

• Reduces computation and
memory cost and improves
training speed by 37%
!7
Hierarchical latent space
• Noise vector z is fed into multiple layers of G, 

instead of just the initial layer

• z is spliced into one chunk per resolution,

and concatenated to conditional vector c,

which is projected to BatchNorm gains and
biases

• Improves memory and compute costs

• Provides performance improvement of 4%

• Improves training speed by further 18%
!8
!9
Truncation Trick
• Truncating a z vector by resampling the values with magnitude above a chosen threshold

• Leads to improvement in individual sample quality at the cost of reduction in overall sample variety
!10
Orthogonal Regularization
• Original thoughts
!11
from Spectral Normalization paper
Orthogonal Regularization
• Original thoughts
!12
• Suggested Method
Orthogonal Regularization
• “The version we find to work best removes the diagonal terms from the regularization,
and aims to minimize the pairwise cosine similarity between filters but does not
constrain their norm”

• Without Orthogonal Regularization

-> only 16% of models are amenable to truncation

• With Orthogonal Regularization

-> 60% of models are amenable to truncation
!13
• Suggested Method
Review
• Inception Score (IS)

• Fréchet Inception Distance (FID)

• Spectral Normalization
!14
Inception Score (IS)
• Introduced in

• Salimans, Tim et al. “Improved Techniques for Training GANs.” NIPS (2016).
!15
• p(y|x) is evaluated with Inception model

• If the x (image) is recognizable as y (label), p(y|x) should have low entropy
• p(y) is calculated with marginal.

• If model outputs diverse images, p(y) should have high entropy
• TLDR; Higher is better.
Fréchet Inception Distance (FID)
• Introduced in

• Heusel, Martin et al. “GANs Trained by a Two Time-Scale Update Rule Converge to a
Local Nash Equilibrium.” NIPS (2017).
!16
• Improves IS by actually comparing the statistics of generated samples to real samples

• TDLR; Lower is Better
Spectral Normalization Review
!17
Definition of Norm Spectral norm is equal to the highest singular value
Result
!18
Batch Experiment
Ablation Experiment
Result
!19
A typical plot of the first singular value σ0 in the layers of G (a) and D (b) before Spectral Normalization
Result
!20
Class leakage
BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis
BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis
BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis
JFT-300M
BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis
Thank you!
!26
References
• https://nealjean.com/ml/frechet-inception-distance/

• Explanation about IS / FID
!27
Appendix
!28
Spectral Norm on CNN
Model details
!30
!31
!32

More Related Content

PDF
Evolution of the StyleGAN family
PPTX
A Style-Based Generator Architecture for Generative Adversarial Networks
PPTX
Machine Learning - Convolutional Neural Network
PDF
diffusion 모델부터 DALLE2까지.pdf
PDF
PR-395: Variational Image Compression with a Scale Hyperprior
PDF
Wasserstein GAN
PDF
Basic Generative Adversarial Networks
PDF
Neural scene representation and rendering の解説(第3回3D勉強会@関東)
Evolution of the StyleGAN family
A Style-Based Generator Architecture for Generative Adversarial Networks
Machine Learning - Convolutional Neural Network
diffusion 모델부터 DALLE2까지.pdf
PR-395: Variational Image Compression with a Scale Hyperprior
Wasserstein GAN
Basic Generative Adversarial Networks
Neural scene representation and rendering の解説(第3回3D勉強会@関東)

What's hot (20)

PPTX
Style gan
PPTX
Invertible Denoising Network: A Light Solution for Real Noise Removal
PPTX
End to-end semi-supervised object detection with soft teacher ver.1.0
PDF
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
PPTX
Generative adversarial networks
PDF
Unsupervised learning represenation with DCGAN
PPTX
Exploring Simple Siamese Representation Learning
PPTX
Introduction to Deep learning
PDF
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
PDF
研究室内PRML勉強会 8章1節
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PDF
Generative adversarial networks
PDF
Variational Autoencoders For Image Generation
PDF
A beginner's guide to Style Transfer and recent trends
PPTX
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
PDF
Generative adversarial text to image synthesis
PDF
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
PPTX
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
PPTX
[DL輪読会]Depth Prediction Without the Sensors: Leveraging Structure for Unsuper...
PDF
RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」
Style gan
Invertible Denoising Network: A Light Solution for Real Noise Removal
End to-end semi-supervised object detection with soft teacher ver.1.0
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
Generative adversarial networks
Unsupervised learning represenation with DCGAN
Exploring Simple Siamese Representation Learning
Introduction to Deep learning
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
研究室内PRML勉強会 8章1節
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Generative adversarial networks
Variational Autoencoders For Image Generation
A beginner's guide to Style Transfer and recent trends
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Generative adversarial text to image synthesis
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
[DL輪読会]Depth Prediction Without the Sensors: Leveraging Structure for Unsuper...
RUTILEA社内勉強会第4回 「敵対的生成ネットワーク(GAN)」
Ad

Similar to BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis (20)

PDF
Large Scale GAN Training for High Fidelity Natural Image Synthesis
PDF
Generative Adversarial Networks @ ICML 2019
PDF
Generative Adversarial Networksの基礎と応用について
PDF
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
PDF
Tutorial of GANs in Gifu Univ
PDF
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
PDF
【DL輪読会】Toward Fast and Stabilized GAN Training for Highfidelity Few-shot Imag...
PPTX
Ensemble normalization for stable training
PDF
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
PDF
[Japanese]Obake-GAN (Perturbative GAN): GAN with Perturbation Layers
PDF
Deep neural network with GANs pre- training for tuberculosis type classificat...
PDF
stylegan.pdf
PDF
Icml2012 learning hierarchies of invariant features
PDF
Generative Adversarial Networks (GANs) and Disentangled Representations @ N...
PDF
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
PDF
consistency regularization for generative adversarial networks_review
PDF
[PR12] Spectral Normalization for Generative Adversarial Networks
PDF
Generative Adversarial Networks and Their Medical Imaging Applications
PDF
Style gan2 review
PPTX
A Style-Based Generator Architecture for Generative Adversarial Networks Walk...
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Generative Adversarial Networks @ ICML 2019
Generative Adversarial Networksの基礎と応用について
PROGRESSIVE GROWING OF GAN S FOR I MPROVED QUALITY , STABILITY , AND VARIATION
Tutorial of GANs in Gifu Univ
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
【DL輪読会】Toward Fast and Stabilized GAN Training for Highfidelity Few-shot Imag...
Ensemble normalization for stable training
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
[Japanese]Obake-GAN (Perturbative GAN): GAN with Perturbation Layers
Deep neural network with GANs pre- training for tuberculosis type classificat...
stylegan.pdf
Icml2012 learning hierarchies of invariant features
Generative Adversarial Networks (GANs) and Disentangled Representations @ N...
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
consistency regularization for generative adversarial networks_review
[PR12] Spectral Normalization for Generative Adversarial Networks
Generative Adversarial Networks and Their Medical Imaging Applications
Style gan2 review
A Style-Based Generator Architecture for Generative Adversarial Networks Walk...
Ad

Recently uploaded (20)

PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
project resource management chapter-09.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
The various Industrial Revolutions .pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Architecture types and enterprise applications.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
O2C Customer Invoices to Receipt V15A.pptx
Enhancing emotion recognition model for a student engagement use case through...
1. Introduction to Computer Programming.pptx
Getting Started with Data Integration: FME Form 101
project resource management chapter-09.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
The various Industrial Revolutions .pptx
Module 1.ppt Iot fundamentals and Architecture
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
August Patch Tuesday
NewMind AI Weekly Chronicles - August'25-Week II
A comparative study of natural language inference in Swahili using monolingua...
cloud_computing_Infrastucture_as_cloud_p
Web App vs Mobile App What Should You Build First.pdf
WOOl fibre morphology and structure.pdf for textiles
Programs and apps: productivity, graphics, security and other tools
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
Architecture types and enterprise applications.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

BigGAN: Large Scale GAN Training for High Fidelity Natural Image Synthesis

  • 1. Large Scale GAN Training for High Fidelity Natural Image Synthesis Andrew Brock, Jeff Donahue, Karen Simonyan 2018.10.14 Presented by Young Seok Kim Under Review for ICLR 2019 https://arxiv.org/abs/1809.11096
 https://openreview.net/forum?id=B1xsqj09Fm BigGAN
  • 3. Related papers in PR12 • Goodfellow, Ian J. et al. “Generative Adversarial Nets.” NIPS (2014) • PR-001 : https://youtu.be/L3hz57whyNw
 • Chen, Xi et al. “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.” NIPS (2016) • PR-022 : https://youtu.be/_4jbgniqt_Q
 • Mirza, Mehdi and Simon Osindero. “Conditional Generative Adversarial Nets.” CoRR abs/1411.1784 (2014) • PR-051 : https://youtu.be/iCgT8G4PkqI
 • Miyato, Takeru et al. “Spectral Normalization for Generative Adversarial Networks.” ICLR (2018) • PR-087 : https://youtu.be/iXSYqohGQhM !3
  • 4. Motivation • Since the advent of GAN, there has been several attempts to stabilize the training of GAN. • Although some were quite successful in small scale, generating high-resolution and diverse samples from datasets like ImageNet is hard • We need more analysis to instabilities of GAN !4
  • 5. Contributions • We demonstrate that GANs benefit dramatically from scaling. We introduce two simple, general architectural changes that improve scalability, and modify a regularization scheme to improve conditioning, demonstrably boosting performance. • Our models become amenable to the “truncation trick,” a simple sampling technique that allows explicit, fine-grained control of the tradeoff between sample variety and fidelity. • We discover instabilities specific to large scale GANs, and characterize them empirically. !5
  • 6. Core Methods • Shared Embedding • Hierarchical latent space • Orthogonal Regularization • Truncation Trick !6
  • 7. Shared embedding • Introduced in • Perez, Ethan et al. “FiLM: Visual Reasoning with a General Conditioning Layer.” AAAI (2018) • Shared embedding is linearly projected to each layer’s gains and biases • Reduces computation and memory cost and improves training speed by 37% !7
  • 8. Hierarchical latent space • Noise vector z is fed into multiple layers of G, 
 instead of just the initial layer • z is spliced into one chunk per resolution,
 and concatenated to conditional vector c,
 which is projected to BatchNorm gains and biases • Improves memory and compute costs • Provides performance improvement of 4% • Improves training speed by further 18% !8
  • 9. !9
  • 10. Truncation Trick • Truncating a z vector by resampling the values with magnitude above a chosen threshold • Leads to improvement in individual sample quality at the cost of reduction in overall sample variety !10
  • 11. Orthogonal Regularization • Original thoughts !11 from Spectral Normalization paper
  • 12. Orthogonal Regularization • Original thoughts !12 • Suggested Method
  • 13. Orthogonal Regularization • “The version we find to work best removes the diagonal terms from the regularization, and aims to minimize the pairwise cosine similarity between filters but does not constrain their norm” • Without Orthogonal Regularization
 -> only 16% of models are amenable to truncation • With Orthogonal Regularization
 -> 60% of models are amenable to truncation !13 • Suggested Method
  • 14. Review • Inception Score (IS) • Fréchet Inception Distance (FID) • Spectral Normalization !14
  • 15. Inception Score (IS) • Introduced in • Salimans, Tim et al. “Improved Techniques for Training GANs.” NIPS (2016). !15 • p(y|x) is evaluated with Inception model • If the x (image) is recognizable as y (label), p(y|x) should have low entropy • p(y) is calculated with marginal. • If model outputs diverse images, p(y) should have high entropy • TLDR; Higher is better.
  • 16. Fréchet Inception Distance (FID) • Introduced in • Heusel, Martin et al. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.” NIPS (2017). !16 • Improves IS by actually comparing the statistics of generated samples to real samples • TDLR; Lower is Better
  • 17. Spectral Normalization Review !17 Definition of Norm Spectral norm is equal to the highest singular value
  • 19. Result !19 A typical plot of the first singular value σ0 in the layers of G (a) and D (b) before Spectral Normalization
  • 31. !31
  • 32. !32