SlideShare a Scribd company logo
TRAINING GENERATIVE ADVERSARIAL NETWORKS WITH
BINARY NEURONS BY END-TO-END BACKPROPAGATION
Hao-Wen Dong and Yi-Hsuan Yang
OUTLINES
 Backgrounds
 Generative Adversarial Networks
 Binary Neurons
 Straight-through Estimators
 BinaryGAN
 Experiments & Results
 Discussions & Conclusion
2
BACKGROUNDS
GENERATIVE ADVERSARIAL NETWORKS
 Goal—learn a mapping from the prior distribution to the data
distribution [2]
4
prior
distribution
data
distribution
GENERATIVE ADVERSARIAL NETWORKS
 Goal—learn a mapping from the prior distribution to the data
distribution [2]
5
prior
distribution
data
distribution
can be intractable
GENERATIVE ADVERSARIAL NETWORKS
 Use a deep neural network to learn an implicit mapping
generator
prior
distribution
model
distribution
data
distribution
6
GENERATIVE ADVERSARIAL NETWORKS
 Use another deep neural network to provide guidance/critics
generator
prior
distribution
model
distribution
data
distribution
discriminator
real/fake
7
BINARY NEURONS
 Definition: neurons that output binary-valued predictions
8
BINARY NEURONS
 Definition: neurons that output binary-valued predictions
 Deterministic binary neurons (DBNs):
𝑫𝑫𝑫𝑫𝑫𝑫 𝒙𝒙 ≡ �
𝟏𝟏, 𝒊𝒊𝒊𝒊 𝝈𝝈 𝒙𝒙 > 𝟎𝟎. 𝟓𝟓
𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐
9
(hard thresholding)
BINARY NEURONS
 Definition: neurons that output binary-valued predictions
 Deterministic binary neurons (DBNs):
𝑫𝑫𝑫𝑫𝑫𝑫 𝒙𝒙 ≡ �
𝟏𝟏, 𝒊𝒊𝒊𝒊 𝝈𝝈 𝒙𝒙 > 𝟎𝟎. 𝟓𝟓
𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐
 Stochastic binary neurons (SBNs):
𝑺𝑺𝑩𝑩𝑩𝑩 𝒙𝒙 ≡ �
𝟏𝟏, 𝒊𝒊𝒊𝒊 𝒛𝒛 < 𝝈𝝈 𝒙𝒙
𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐
, 𝒛𝒛~𝑼𝑼 𝟎𝟎, 𝟏𝟏
10
(hard thresholding)
(Bernoulli sampling)
BACKPROPAGATING THROUGH BINARY NEURONS
 Backpropagating through binary neurons is intractable
 For DBNs, it involves the nondifferentiable threshold function
 For SBNs, it requires the computation of expected gradients on all possible
combinations (exponential to the number of binary neurons) of values taken
by the binary neurons
11
BACKPROPAGATING THROUGH BINARY NEURONS
 Backpropagating through binary neurons is intractable
 For DBNs, it involves the nondifferentiable threshold function
 For SBNs, it requires the computation of expected gradients on all possible
combinations (exponential to the number of binary neurons) of values taken
by the binary neurons
 We can introduce gradient estimators for the binary neurons
 Examples include Straight-through [3,4] (the one adopted in this work),
REINFORCE [5], REBAR [6], RELAX [7] estimators
12
STRAIGHT-THROUGH ESTIMATORS
 Straight-through estimators: [3,4]
 Forward pass—use hard thresholding (DBNs) or Bernoulli sampling (SBNs)
 Backward pass—pretend it as an identity function
13
STRAIGHT-THROUGH ESTIMATORS
 Straight-through estimators: [3,4]
 Forward pass—use hard thresholding (DBNs) or Bernoulli sampling (SBNs)
 Backward pass—pretend it as an identity function
 Sigmoid-adjusted straight-through estimators
 Use the derivative of the sigmoid function in the backward pass
 Found to achieve better performance in a classification task presented in [4]
 Adopted in this work
14
BINARYGAN
BINARYGAN
 Use binary neurons at the output layer of the generator
 Use sigmoid-adjusted straight-through estimators to provide the
gradients for the binary neurons
 Train the whole network by end-to-end backpropagation
16
BINARYGAN
17
Generator Discriminator
real data
random neuron
binary neuron
normal neuron
(implemented by multilayer perceptrons)
BINARYGAN
18
Generator Discriminator
real data
random neuron
binary neuron
normal neuron
(implemented by multilayer perceptrons)
binary
neurons
EXPERIMENTS & RESULTS
TRAINING DATA
 Binarized MNIST handwritten digit database [8]
 Digits with nonzero intensities → 1
 Digits with zero intensities → 0
20
Sample binarized MNIST digits
IMPLEMENTATION DETAILS
 Batch size is 64
 Use WGAN-GP [9] objectives
 Use Adam optimizer [10]
 Apply batch normalization [11] to the generator (but not to the
discriminator)
 Binary neurons are implemented with the code kindly provided in
a blog post on the R2RT blog [12]
21
IMPLEMENTATION DETAILS
 Apply slope annealing trick [13]
 Gradually increase the slopes of the sigmoid functions used in the sigmoid-
adjusted straight-through estimators as the training proceeds
 We multiply the slopes by 1.1 after each epoch
 The slopes start from 1.0 and reach 6.1 at the end of 20 epochs
22
EXPERIMENT I—DBNS VS SBNS
 DBNs and SBNs can achieve similar qualities
 They show distinct characteristics on the preactivated outputs
23
DBNs
0 1
SBNs
EXPERIMENT I—DBNS VS SBNS
24
Histograms of the preactivated outputs
DBNs
– more values in the middle
– a notch at 0.5 (the threshold)
SBNs
– more values close to 0 and 1
preactivated values
density
0.0 0.2 0.4 0.6 0.8 1.0
10-4
10-3
10-2
10-1
100
DBNs
SBNs
z
EXPERIMENT II—REAL-VALUED MODEL
 Use no binary neurons
 Train the discriminator by the real-valued outputs of the generator
250
1
raw predictions
hard thresholding Bernoulli sampling
binarized results
EXPERIMENT II—REAL-VALUED MODEL
26
Histograms of the preactivated outputs
DBNs
– more values in the middle
– a notch at 0.5 (the threshold)
SBNs
– more values close to 0 and 1
Real-valued model
– even more U-shaped
preactivated values
density
0.0 0.2 0.4 0.6 0.8 1.0
10-4
10-3
10-2
10-1
100
DBNs
SBNs
z
Real-valued model
EXPERIMENT III—GAN OBJECTIVES
 WGAN [14] model can achieve similar qualities to the WGAN-GP
 GAN [2] model suffers from mode collapse issue
27
GAN + DBNs GAN + SBNs WGAN + DBNs WGAN + SBNs
EXPERIMENT IV—MLPS VS CNNS
 CNN model produces less artifacts even with a small number of
trainable parameters (MLP—0.53M; CNN—1.4M)
28
DBNs SBNsSBNs DBNs
MLP model CNN model
DISCUSSIONS & CONCLUSION
DISCUSSIONS
 Why is binary neurons important?
 Open the possibility of conditional computation graph [4,13]
 Move toward a stronger AI that can make reliable decisions
30
DISCUSSIONS
 Why is binary neurons important?
 Open the possibility of conditional computation graph [4,13]
 Move toward a stronger AI that can make reliable decisions
 Other approaches to model discrete distributions with GANs
 Replace the target discrete outputs with continuous relaxations
 View the generator as agent in reinforcement learning (RL) and introduce
RL-based training strategies
31
CONCLUSION
 A new GAN model that
 can generate binary-valued predictions without further post-processing
 can be trained by end-to-end backpropagation
 Experimentally compare
 deterministic and stochastic binary neurons
 the proposed model and the real-valued model
 GAN, WGAN, WGAN-GP objectives
 MLPs and CNNs
32
FUTURE WORK
 Examine the use of gradient estimators for training a GAN that has
a conditional computation graph
33
REFERENCES
[1] Hao-Wen Dong and Yi-Hsuan Yang. Training Generative Adversarial Networks with Binary Neurons by
End-to-end Backpropagation. arXiv Preprint arXiv:1810.04714, 2018
[2] Ian J. Goodfellow et al. Generative adversarial nets. In Proc. NIPS, 2014.
[3] Geoffrey Hinton. Neural networks for machine learning—Using noise as a regularizer (lecture 9c), 2012.
Coursera, video lectures. [Online] https://www.coursera.org/lecture/neural-networks/using-noise-as-a-regularizer-
7-min-wbw7b.
[4] Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. Estimating or propagating gradients
through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
[5] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement
learning. Machine learning, 8(3-4):229–256, 1992.
[6] George Tucker, Andriy Mnih, Chris J Maddison, and Jascha Sohl-Dickstein. REBAR: Low-variance,
unbiased gradient estimates for discrete latent variable models. In Proc. NIPS, 2017.
[7] Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, and David Duvenaud. Backpropagation
through the Void: Optimizing control variates for black-box gradient estimation. In Proc. ICLR. 2018.
34
REFERENCES
[8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to
document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
[9] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved
training of Wasserstein GANs. In Proc. NIPS, 2017.
[10] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[11] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by
reducing internal covariate shift. In Proc. ICML, 2015.
[12] Binary stochastic neurons in tensorflow, 2016. Blog post on the R2RT blog. [Online]
https://r2rt.com/binary-stochastic-neurons-in-tensorflow.
[13] Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks.
In Proc. ICLR, 2017.
[14] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In
Proc. ICML, 2017.
35
Thank you for your attention

More Related Content

PDF
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
PDF
Why Batch Normalization Works so Well
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
PDF
Deep learning in Computer Vision
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
PPTX
Autoencoders for image_classification
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PDF
Investigations on the role of analysis window shape parameter in speech enhan...
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
Why Batch Normalization Works so Well
Deep Style: Using Variational Auto-encoders for Image Generation
Deep learning in Computer Vision
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
Autoencoders for image_classification
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Investigations on the role of analysis window shape parameter in speech enhan...

What's hot (20)

PPTX
Deep learning (2)
PDF
APPLIED MACHINE LEARNING
PPTX
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PDF
Kernels in convolution
PDF
Image Compression Using Wavelet Packet Tree
PDF
NeuralProcessingofGeneralPurposeApproximatePrograms
PPTX
Voice Activity Detection using Single Frequency Filtering
PDF
Introduction to Applied Machine Learning
PPT
Discrete Cosine Transform Stegonagraphy
PPTX
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
PPTX
Deep learning
PDF
Convolutional neural network
PPTX
Deep learning algorithms
PPTX
Deep neural networks & computational graphs
PPTX
Deep neural networks
PPTX
Introduction to deep learning
Deep learning (2)
APPLIED MACHINE LEARNING
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Kernels in convolution
Image Compression Using Wavelet Packet Tree
NeuralProcessingofGeneralPurposeApproximatePrograms
Voice Activity Detection using Single Frequency Filtering
Introduction to Applied Machine Learning
Discrete Cosine Transform Stegonagraphy
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Deep learning
Convolutional neural network
Deep learning algorithms
Deep neural networks & computational graphs
Deep neural networks
Introduction to deep learning
Ad

Similar to Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation (20)

PDF
Tutorial on Theory and Application of Generative Adversarial Networks
PPTX
GAN for Bayesian Inference objectives
PDF
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PPTX
Gans - Generative Adversarial Nets
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
PDF
M4L19 Generative Models - Slides v 3.pdf
PPT
NIPS2007: deep belief nets
PDF
A Walk in the GAN Zoo
PDF
Generative adversarial networks
PDF
A Short Introduction to Generative Adversarial Networks
PDF
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
PDF
TWCC22_PPT_v3_KL.pdf quantum computer, quantum computer , quantum computer ,...
PPTX
Generative Adversarial Networks (GAN)
PPTX
GAN Generative Adversarial Networks.pptx
PPTX
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
PDF
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
PDF
[PR12] intro. to gans jaejun yoo
Tutorial on Theory and Application of Generative Adversarial Networks
GAN for Bayesian Inference objectives
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Gans - Generative Adversarial Nets
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
M4L19 Generative Models - Slides v 3.pdf
NIPS2007: deep belief nets
A Walk in the GAN Zoo
Generative adversarial networks
A Short Introduction to Generative Adversarial Networks
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
TWCC22_PPT_v3_KL.pdf quantum computer, quantum computer , quantum computer ,...
Generative Adversarial Networks (GAN)
GAN Generative Adversarial Networks.pptx
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
[PR12] intro. to gans jaejun yoo
Ad

More from Hao-Wen (Herman) Dong (6)

PDF
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
PDF
Organizing Machine Learning Projects - Repository Organization
PDF
What is Critical in GAN Training?
PDF
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
PDF
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
PDF
Introduction to Deep Generative Models
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Organizing Machine Learning Projects - Repository Organization
What is Critical in GAN Training?
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Introduction to Deep Generative Models

Recently uploaded (20)

PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Science Form five needed shit SCIENEce so
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
Presentation of a Romanian Institutee 2.
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
BIOMOLECULES PPT........................
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Biomechanics of the Hip - Basic Science.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
endocrine - management of adrenal incidentaloma.pptx
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Science Form five needed shit SCIENEce so
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Presentation of a Romanian Institutee 2.
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
6.1 High Risk New Born. Padetric health ppt
Seminar Hypertension and Kidney diseases.pptx
A powerpoint on colorectal cancer with brief background
BIOMOLECULES PPT........................
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Biomechanics of the Hip - Basic Science.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
The Land of Punt — A research by Dhani Irwanto
endocrine - management of adrenal incidentaloma.pptx

Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation

  • 1. TRAINING GENERATIVE ADVERSARIAL NETWORKS WITH BINARY NEURONS BY END-TO-END BACKPROPAGATION Hao-Wen Dong and Yi-Hsuan Yang
  • 2. OUTLINES  Backgrounds  Generative Adversarial Networks  Binary Neurons  Straight-through Estimators  BinaryGAN  Experiments & Results  Discussions & Conclusion 2
  • 4. GENERATIVE ADVERSARIAL NETWORKS  Goal—learn a mapping from the prior distribution to the data distribution [2] 4 prior distribution data distribution
  • 5. GENERATIVE ADVERSARIAL NETWORKS  Goal—learn a mapping from the prior distribution to the data distribution [2] 5 prior distribution data distribution can be intractable
  • 6. GENERATIVE ADVERSARIAL NETWORKS  Use a deep neural network to learn an implicit mapping generator prior distribution model distribution data distribution 6
  • 7. GENERATIVE ADVERSARIAL NETWORKS  Use another deep neural network to provide guidance/critics generator prior distribution model distribution data distribution discriminator real/fake 7
  • 8. BINARY NEURONS  Definition: neurons that output binary-valued predictions 8
  • 9. BINARY NEURONS  Definition: neurons that output binary-valued predictions  Deterministic binary neurons (DBNs): 𝑫𝑫𝑫𝑫𝑫𝑫 𝒙𝒙 ≡ � 𝟏𝟏, 𝒊𝒊𝒊𝒊 𝝈𝝈 𝒙𝒙 > 𝟎𝟎. 𝟓𝟓 𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐 9 (hard thresholding)
  • 10. BINARY NEURONS  Definition: neurons that output binary-valued predictions  Deterministic binary neurons (DBNs): 𝑫𝑫𝑫𝑫𝑫𝑫 𝒙𝒙 ≡ � 𝟏𝟏, 𝒊𝒊𝒊𝒊 𝝈𝝈 𝒙𝒙 > 𝟎𝟎. 𝟓𝟓 𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐  Stochastic binary neurons (SBNs): 𝑺𝑺𝑩𝑩𝑩𝑩 𝒙𝒙 ≡ � 𝟏𝟏, 𝒊𝒊𝒊𝒊 𝒛𝒛 < 𝝈𝝈 𝒙𝒙 𝟎𝟎, 𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐 , 𝒛𝒛~𝑼𝑼 𝟎𝟎, 𝟏𝟏 10 (hard thresholding) (Bernoulli sampling)
  • 11. BACKPROPAGATING THROUGH BINARY NEURONS  Backpropagating through binary neurons is intractable  For DBNs, it involves the nondifferentiable threshold function  For SBNs, it requires the computation of expected gradients on all possible combinations (exponential to the number of binary neurons) of values taken by the binary neurons 11
  • 12. BACKPROPAGATING THROUGH BINARY NEURONS  Backpropagating through binary neurons is intractable  For DBNs, it involves the nondifferentiable threshold function  For SBNs, it requires the computation of expected gradients on all possible combinations (exponential to the number of binary neurons) of values taken by the binary neurons  We can introduce gradient estimators for the binary neurons  Examples include Straight-through [3,4] (the one adopted in this work), REINFORCE [5], REBAR [6], RELAX [7] estimators 12
  • 13. STRAIGHT-THROUGH ESTIMATORS  Straight-through estimators: [3,4]  Forward pass—use hard thresholding (DBNs) or Bernoulli sampling (SBNs)  Backward pass—pretend it as an identity function 13
  • 14. STRAIGHT-THROUGH ESTIMATORS  Straight-through estimators: [3,4]  Forward pass—use hard thresholding (DBNs) or Bernoulli sampling (SBNs)  Backward pass—pretend it as an identity function  Sigmoid-adjusted straight-through estimators  Use the derivative of the sigmoid function in the backward pass  Found to achieve better performance in a classification task presented in [4]  Adopted in this work 14
  • 16. BINARYGAN  Use binary neurons at the output layer of the generator  Use sigmoid-adjusted straight-through estimators to provide the gradients for the binary neurons  Train the whole network by end-to-end backpropagation 16
  • 17. BINARYGAN 17 Generator Discriminator real data random neuron binary neuron normal neuron (implemented by multilayer perceptrons)
  • 18. BINARYGAN 18 Generator Discriminator real data random neuron binary neuron normal neuron (implemented by multilayer perceptrons) binary neurons
  • 20. TRAINING DATA  Binarized MNIST handwritten digit database [8]  Digits with nonzero intensities → 1  Digits with zero intensities → 0 20 Sample binarized MNIST digits
  • 21. IMPLEMENTATION DETAILS  Batch size is 64  Use WGAN-GP [9] objectives  Use Adam optimizer [10]  Apply batch normalization [11] to the generator (but not to the discriminator)  Binary neurons are implemented with the code kindly provided in a blog post on the R2RT blog [12] 21
  • 22. IMPLEMENTATION DETAILS  Apply slope annealing trick [13]  Gradually increase the slopes of the sigmoid functions used in the sigmoid- adjusted straight-through estimators as the training proceeds  We multiply the slopes by 1.1 after each epoch  The slopes start from 1.0 and reach 6.1 at the end of 20 epochs 22
  • 23. EXPERIMENT I—DBNS VS SBNS  DBNs and SBNs can achieve similar qualities  They show distinct characteristics on the preactivated outputs 23 DBNs 0 1 SBNs
  • 24. EXPERIMENT I—DBNS VS SBNS 24 Histograms of the preactivated outputs DBNs – more values in the middle – a notch at 0.5 (the threshold) SBNs – more values close to 0 and 1 preactivated values density 0.0 0.2 0.4 0.6 0.8 1.0 10-4 10-3 10-2 10-1 100 DBNs SBNs z
  • 25. EXPERIMENT II—REAL-VALUED MODEL  Use no binary neurons  Train the discriminator by the real-valued outputs of the generator 250 1 raw predictions hard thresholding Bernoulli sampling binarized results
  • 26. EXPERIMENT II—REAL-VALUED MODEL 26 Histograms of the preactivated outputs DBNs – more values in the middle – a notch at 0.5 (the threshold) SBNs – more values close to 0 and 1 Real-valued model – even more U-shaped preactivated values density 0.0 0.2 0.4 0.6 0.8 1.0 10-4 10-3 10-2 10-1 100 DBNs SBNs z Real-valued model
  • 27. EXPERIMENT III—GAN OBJECTIVES  WGAN [14] model can achieve similar qualities to the WGAN-GP  GAN [2] model suffers from mode collapse issue 27 GAN + DBNs GAN + SBNs WGAN + DBNs WGAN + SBNs
  • 28. EXPERIMENT IV—MLPS VS CNNS  CNN model produces less artifacts even with a small number of trainable parameters (MLP—0.53M; CNN—1.4M) 28 DBNs SBNsSBNs DBNs MLP model CNN model
  • 30. DISCUSSIONS  Why is binary neurons important?  Open the possibility of conditional computation graph [4,13]  Move toward a stronger AI that can make reliable decisions 30
  • 31. DISCUSSIONS  Why is binary neurons important?  Open the possibility of conditional computation graph [4,13]  Move toward a stronger AI that can make reliable decisions  Other approaches to model discrete distributions with GANs  Replace the target discrete outputs with continuous relaxations  View the generator as agent in reinforcement learning (RL) and introduce RL-based training strategies 31
  • 32. CONCLUSION  A new GAN model that  can generate binary-valued predictions without further post-processing  can be trained by end-to-end backpropagation  Experimentally compare  deterministic and stochastic binary neurons  the proposed model and the real-valued model  GAN, WGAN, WGAN-GP objectives  MLPs and CNNs 32
  • 33. FUTURE WORK  Examine the use of gradient estimators for training a GAN that has a conditional computation graph 33
  • 34. REFERENCES [1] Hao-Wen Dong and Yi-Hsuan Yang. Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation. arXiv Preprint arXiv:1810.04714, 2018 [2] Ian J. Goodfellow et al. Generative adversarial nets. In Proc. NIPS, 2014. [3] Geoffrey Hinton. Neural networks for machine learning—Using noise as a regularizer (lecture 9c), 2012. Coursera, video lectures. [Online] https://www.coursera.org/lecture/neural-networks/using-noise-as-a-regularizer- 7-min-wbw7b. [4] Yoshua Bengio, Nicholas Léonard, and Aaron C. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013. [5] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992. [6] George Tucker, Andriy Mnih, Chris J Maddison, and Jascha Sohl-Dickstein. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. In Proc. NIPS, 2017. [7] Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, and David Duvenaud. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation. In Proc. ICLR. 2018. 34
  • 35. REFERENCES [8] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998. [9] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of Wasserstein GANs. In Proc. NIPS, 2017. [10] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [11] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proc. ICML, 2015. [12] Binary stochastic neurons in tensorflow, 2016. Blog post on the R2RT blog. [Online] https://r2rt.com/binary-stochastic-neurons-in-tensorflow. [13] Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks. In Proc. ICLR, 2017. [14] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In Proc. ICML, 2017. 35
  • 36. Thank you for your attention