cs231n_2019_lecture11_Tispptisneededforth.pptx

Lecture 11:
Generative
Models
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 1

Administrative
Lecture 11 -2 May 9,
Fei-Fei Li & Justin Johnson & Serena
● A3 is out. Due May 22.
● Milestone is due next Wednesday.
○ Read Piazza post for milestone requirements.
○ Need to Finish data preprocessing and initial results by
then.
● Don't discuss exam yet since people are still taking it.

Overview
Lecture 11 May 9,
● Unsupervised Learning
● Generative Models
○ PixelRNN and PixelCNN
○ Variational Autoencoders (VAE)
○ Generative Adversarial Networks
(GAN)

Supervised vs Unsupervised Learning
Lecture 11 May 9,
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.

Supervised Learning
Data: (x, y)
y
Cat
Lecture 11 May 9,
Classificatio
n
This image is CC0 public
domain

Supervised Learning
Data: (x, y)
y
DOG, DOG, CAT
Lecture 11 May 9,
domain
Object
Detection

Supervised Learning
Data: (x, y)
y
Semantic
Segmentation
GRASS, CAT,
TREE, SKY
Lecture 11 May 9,

Supervised Learning
Data: (x, y)
y
Image
captioning
A cat sitting on a suitcase on the
floor
Lecture 11 May 9,
Caption generated using
neuraltalk2 Image is CC0 Public
domain.

Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc. Lecture 11 May 9,

Data: x
Goal: Learn some
of the data
estimation, etc.
K-means
clustering
Lecture 11 May 9,
domain

Data: x
Goal: Learn some
of the data
estimation, etc.
3-d 2-d
Principal Component
Analysis (Dimensionality
reduction) This image from Matthias
Scholz is CC0 public domain
Lecture 11 May 9,

Data: x
Goal: Learn some
of the data
estimation, etc.
Autoencoders
(Feature
learning)
Lecture 11 May 9,

Data: x
Just data, no
labels!
Goal: Learn some
of the data
estimation, etc.
2-d density
estimation
2-d density images left and
right are CC0 public domain
Figure copyright Ian Goodfellow, 2016. Reproduced with permission.
1-d density
estimation
Lecture 11 May 9,

Data: x
Goal: Learn some
of the data
estimation, etc.
Lecture 11 May 9,
Supervised Learning
Data: (x, y)
y

Data: x
Just data, no
labels!
Training data is cheap
Goal: Learn some
of the data
estimation, etc.
Holy grail: Solve
unsupervised
learning
=> understand
structure of visual
world
Supervised Learning
Data: (x, y)
y
Lecture 11 May 9,

Generative Models
Given training data, generate new samples from same
distribution
Training data ~ pdata
(x) Generated samples ~
pmodel
(x)
Want to learn pmodel
(x) similar to pdata
(x)
Lecture 11 May 9,

Generative Models
Given training data, generate new samples from same
distribution
Training data ~ pdata
(x) Generated samples ~ pmodel
(x)
Want to learn pmodel
(x) similar to pdata
(x)
Addresses density estimation, a core problem in unsupervised learning
Several flavors:
- Explicit density estimation: explicitly define and solve for pmodel
(x)
- Implicit density estimation: learn model that can sample from pmodel
(x) w/o explicitly
defining it
Lecture 11 May 9,

Why Generative Models?
- Realistic samples for artwork, super-resolution, colorization,
etc.
- Generative models of time-series data can be used for simulation
and planning (reinforcement learning applications!)
- Training generative models can also enable inference of
latent representations that can be useful as general
features
FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR
Blog.
Lecture 11 May 9,

Taxonomy of Generative
Models Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
GSN
GAN
Fully Visible Belief
Nets
Lecture 11 May 9,
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord

Direct
Markov Chain
Variational
Autoencoder
Boltzmann
Machine
GSN
GAN
2017.
Today: discuss 3 most
popular types of
generative models today
Lecture 11 May 9,
Nets
PixelRNN/CNN
- NADE
- MADE
-
- NICE / RealNVP
- Glow
- Ffjord

PixelRNN and
PixelCNN
Lecture 11 - 21 May 9,

Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product
of 1-d distributions:
Likelihood
of image
x
Lecture 11 May 9,
Probability of i’th pixel
value given all previous
pixels
Then maximize likelihood of training
data

data
Likelihood
of image
x
Lecture 11 May 9,
pixels
Complex
distribution
over pixel
values => Express using a
neural network!

Likelihood
of image
x
Lecture 11 May 9,
pixels
Will need to define
ordering of
“previous pixels”
Complex distribution over pixel
values => Express using a
neural network!
data

PixelRNN
Generate image pixels starting from
corner
Dependency on previous pixels
modeled using an RNN (LSTM)
Lecture 11 May 9,
[van der Oord et al. 2016]

PixelRNN
corner
Lecture 11 May 9,

PixelRNN
corner
Lecture 11 May 9,
Drawback: sequential generation is
slow!

PixelCNN [van der Oord et al. 2016]
Lecture 11 May 9,
Still generate image pixels starting
from corner
Dependency on previous pixels now
modeled using a CNN over context
region
Figure copyright van der Oord et al., 2016. Reproduced with permission.

from corner
region
Training: maximize likelihood of
training images
Softmax loss at each
pixel
Lecture 11 May 9,

Lecture 11 May 9,
from corner
region
Training is faster than PixelRNN
(can parallelize convolutions since context
region values known from training images)
Generation must still proceed sequentially
=> still slow

Generation Samples
Figures copyright Aaron van der Oord et al., 2016. Reproduced with permission.
32x32 CIFAR-10
Lecture 11 May 9,
32x32
ImageNet

PixelRNN and PixelCNN
Lecture 11 May 9,
See
Improving PixelCNN
performance
- Gated convolutional layers
- Short-cut connections
- Discretized logistic loss
- Multi-scale
- Training tricks
- Etc…
- Van der Oord et al. NIPS
2016
- Salimans et al.
2017 (PixelCNN++)
Pros:
- Can explicitly compute
likelihood p(x)
- Explicit likelihood of
training data gives good
evaluation metric
- Good samples
Con:
- Sequential generation =>
slow

Variational
Autoencoders
(VAE)

PixelCNNs define tractable density function, optimize likelihood of training
data:
Lecture 11 May 9,
So far...

So far...
data:
Lecture 11 May 9,
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood
instead

Some background first: Autoencoders
Features
Encoder
Input data
Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
Lecture 11 May 9,

Originally: Linear +
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
Features
Encoder
Input data
Lecture 11 May 9,

Features
Encoder
Input data
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
z usually smaller than x
(dimensionality
reduction)
Q: Why
dimensionality
reduction?
Lecture 11 May 9,

Encode
r
Input
data
Feature
s
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
z usually smaller than x
(dimensionality
reduction)
Q: Why
dimensionality
reduction?
A: Want features to
capture meaningful
factors of variation
in data
Lecture 11 May 9,

Features
Encoder
Input data
How to learn this feature
representation?
Lecture 11 May 9,

How to learn this feature representation?
Train such that features can be used to reconstruct original
data “Autoencoding” - encoding itself
Reconstructe
d input
data
D
e
c
o
d
e
r
Lecture 11 May 9,

Decode
r
Features
Encoder
Input data
Reconstructe
d input
data
nonlinearity (sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN (upconv)
Lecture 11 May 9,

Some background first:
Autoencoders
Features
Encoder
Input data
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Encoder: 4-layer conv
Decoder: 4-layer
upconv
Lecture 11 May 9,

Autoencoders
Features
Encoder
Input data
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Decoder: 4-layer
upconv
L2 Loss
function:
Train such that
features
Lecture 11 May 9,
can be used to
reconstruct original
data

Autoencoders
Encode
r
Input
data
Feature
s
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Decoder: 4-layer
upconv
L2 Loss
function:
Train such that
features can be used
to reconstruct original
data
Lecture 11 May 9,
Doesn’t use
labels!

Encode
r
Lecture 11 May 9,
Input
data
Feature
s
Decode
r
Reconstructe
d input
data
After training,
throw away
decoder

Encode
r
Input
data
Feature
s
Classifie
r
Predicted
Label Fine-tune
encoder
jointly
with
classifier
Loss
function
(Softmax,
etc)
Encoder can be
used to initialize
a supervised
model
plan
e
Lecture 11 May 9,
do
g
dee
r
bir
d truc
k
Train for final
task (sometimes
with small data)

Features
Encoder
Input data
Decode
r
Lecture 11 May 9,
Reconstructe
d input
data
Autoencoders can
reconstruct data, and can
learn features to initialize a
supervised model
Features capture factors of
variation in training data. Can
we generate new images from
an autoencoder?

Variational Autoencoders
Lecture 11 May 9,
Probabilistic spin on autoencoders - will let us sample from the model to generate
data!

Sample
from true
prior
Variational
Autoencoders
data!
Assume training data is generated from underlying unobserved
(latent) representation z
Sample from
true
conditional
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,

Sample
from true
prior
Variational
Autoencoders
data!
Assume training data is generated from underlying unobserved
(latent) representation z
Sample from
true
conditional
2014
Lecture 11 May 9,
Intuition (remember from
autoencoders!): x is an image, z is
latent factors used to generate x:
attributes, orientation, etc.

Sample
from true
prior
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
How should we represent this
model?
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
model?
Choose prior p(z) to be simple, e.g.
Gaussian. Reasonable for latent
attributes,
e.g. pose, how much smile.
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
model?
Choose prior p(z) to be simple,
e.g. Gaussian.
Conditional p(x|z) is complex
(generates image) => represent with
neural network
Decode
r
networ
k
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
How to train the
model?
2014
Lecture 11 May 9,
Decode
r
networ
k

Sample
from true
prior
Sample from
true
conditional
How to train the
model?
Remember strategy for training generative
models from FVBNs. Learn model
parameters to maximize likelihood of
training data
Decode
r
networ
k
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
How to train the
model?
training data
Now with latent
z
Decode
r
networ
k
2014
Lecture 11 May 9,

Sample
from true
prior
Sample from
true
conditional
How to train the
model?
training data
Q: What is the problem with
this?
2014
Lecture 11 May 9,
Decode
r
networ
k

Sample
from true
prior
Sample from
true
conditional
How to train the
model?
training data
Q: What is the problem with
this? Intractable!
2014
Lecture 11 May 9,
Decode
r
networ
k

Variational Autoencoders:
Intractability
Data
likelihood:
2014
Lecture 11 May 9,

Intractability ✔
Data likelihood:
Simple Gaussian
prior
2014
Lecture 11 May 9,

Intractability ✔ ✔
Data likelihood:
Decoder
neural
network
2014
Lecture 11 May 9,

Intractability
Data likelihood:
Intractible to
compute p(x|z) for
every z!
✔ ✔
2014
Lecture 11 May 9,

Intractability ✔ ✔
Data likelihood:
Posterior density also
intractable:
2014
Lecture 11 May 9,

Intractability
Data likelihood:
✔
✔
Posterior density also
intractable:
✔
✔
Intractable data
likelihood
2014
Lecture 11 May 9,

Intractability
Data likelihood:
✔
✔
✔
✔
Posterior density also intractable:
Solution: In addition to decoder network modeling pθ
(x|z), define
additional encoder network qɸ
(z|x) that approximates pθ
(z|x)
Will see that this allows us to derive a lower bound on the data likelihood
that is tractable, which we can optimize
2014
Lecture 11 May 9,

Since we’re modeling probabilistic generation of data, encoder and decoder networks are
probabilistic Mean and (diagonal) covariance of z | x Mean and (diagonal) covariance of x | z
Encoder
network
Decoder
network
(parameters
ɸ)
(parameters
θ)
2014
Lecture 11 May 9,

Encoder
network
probabilistic
Decoder
network
(parameters
ɸ)
(parameters
θ)
Sample z
from
Sample x|z
from
2014
Lecture 11 May 9,

2014
Encoder
network
probabilistic
Decoder
network
(parameters
ɸ)
(parameters
θ)
Sample z
from
Sample x|z
from
Encoder and decoder networks also called
“recognition”/“inference” and “generation”
networks
Lecture 11 May 9,

Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,

likelihood:
Lecture 11 May 9,
Taking expectation wrt. z
(using encoder network)
will come in handy later

likelihood:
Lecture 11 May 9,

likelihood:
The expectation wrt. z
(using encoder network)
let us write nice KL terms
Lecture 11 May 9,

likelihood:
This KL term (between
Gaussians for encoder and
z prior) has nice closed-
form solution!
pθ
(z|x) intractable (saw
earlier), can’t compute this
KL term :( But we know KL
divergence always >= 0.
Decoder network gives pθ
(x|z), can
compute estimate of this term
through sampling. (Sampling
differentiable through reparam.
trick, see paper.) Lecture 11 May 9,

likelihood:
This KL term (between
Gaussians for encoder and
z
pθ
(z|x) intractable (saw
earlier), can’t compute this
KL
Decoder network gives pθ
(x|z), can
compute estimate of this term
through
We want to
maximize
the data
likelihood
term :( But we know KL
divergence always >=
0.
prior) has nice closed-
form solution!
sampling. (Sampling
differentiable through reparam.
trick, see paper.)
Lecture 11 May 9,

likelihood:
Tractable lower bound which we can take
gradient of and optimize! (pθ
(x|z)
differentiable,
We want to
maximize
the data
likelihood
KL term differentiable)
Fei-Fei Li & Justin Johnson & Serena Lecture 11 May 9,
81

likelihood:
We want to
maximize
the data
likelihood
Training: Maximize lower
bound
Variational lower bound
(“ELBO”)
Lecture 11 May 9,

likelihood:
Reconstruct
the input
data
Make approximate
posterior
distribution close
to prior
Training: Maximize lower
bound
Variational lower bound
(“ELBO”)
Lecture 11 May 9,

Putting it all together: maximizing
the likelihood lower bound
Lecture 11 May 9,

Input Data
Let’s look at computing the bound
(forward pass) for a given minibatch
of input data
Lecture 11 May 9,

Encoder network
Input Data
Lecture 11 May 9,

Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,

Sample z
from
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,

Decoder
network
Sample z
from
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,

Decoder
network
Sample z
from
Sample x|z
from
Variational
Autoencoders
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Maximiz
e
likelihood of
original
input being
reconstructe
d
Lecture 11 May 9,

Decoder
network
Sample z
from
Sample x|z
from
Input Data
Variational
Autoencoders
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Maximiz
e
likelihood of
original
input being
reconstructe
d
For every minibatch of
input data: compute this
forward pass, and then
backprop!
Lecture 11 May 9,

Decoder
network
Sample x|z
from
Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from
prior!
Sample z from
2014
Lecture 11 May 9,

Decoder
network
Sample x|z
from
Use decoder network. Now sample z from
prior!
Sample z from
2014
Lecture 11 May 9,

Decoder
network
Sample x|z
from
Sample z from
2014
Use decoder network. Now sample z from prior! Data manifold for 2-d
z
Vary z1
Vary z2
Lecture 11 May 9,

Vary z1
Vary z2
Degree of
smile
Head
pose
Diagonal prior on
z
=>
independent
latent
variables
Lecture 11 May 9,
Different
dimensions of z
encode
interpretable
2014

Vary z1
Vary z2
Degree of
smile
Head
pose
Diagonal prior on
z
=>
independent
latent
variables
Different
dimensions of z
encode
interpretable
Also good feature representation
that can be computed using qɸ
(z|
x)!
Kingma and Welling, “Auto-Encoding Variational
Bayes”, ICLR 2014 Lecture 11 May 9,

32x32 CIFAR-10
Lecture 11 May 9,
Labeled Faces in the
Wild
Figures copyright (L) Dirk Kingma et al. 2016; (R) Anders Larsen et al. 2017. Reproduced with permission.

Lecture 11 May 9,
Probabilistic spin to traditional autoencoders => allows generating data
Defines an intractable density => derive and optimize a (variational) lower bound
Pros:
- Principled approach to generative models
- Allows inference of q(z|x), can be useful feature representation for other
tasks
Cons:
- Maximizes lower bound of likelihood: okay, but not as good
evaluation as PixelRNN/PixelCNN
- Samples blurrier and lower quality compared to state-of-the-art
(GANs)
Active areas of research:
- More flexible approximations, e.g. richer approximate posterior instead of
diagonal Gaussian, e.g., Gaussian Mixture Models (GMMs)
- Incorporating structure in latent variables, e.g., Categorical Distributions

Generative
Adversarial
Networks (GAN)

So far...
data:
Lecture 11 May 9,
instead

So far...
data:
instead What if we give up on explicitly modeling density, and just want
ability to sample?
Lecture 11 May 9,

So far...
data:
instead What if we give up on explicitly modeling density, and just want
ability to sample?
GANs: don’t work with any explicit density function!
Instead, take game-theoretic approach: learn to generate from training
distribution through 2-player game Lecture 11 May 9,

Generative Adversarial
Networks
Lecture 11 May 9,
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Problem: Want to sample from complex, high-dimensional training distribution. No
direct way to do this!
Solution: Sample from a simple distribution, e.g. random noise. Learn
transformation to training distribution.
Q: What can we use
to represent this
complex
transformation?

Problem: Want to sample from complex, high-dimensional training distribution. No
direct way to do this!
Solution: Sample from a simple distribution, e.g. random noise. Learn
transformation to training distribution.
Networks
z
Input: Random
noise
Generato
r
Network
Output: Sample
from training
distribution
Lecture 11 May 9,
Q: What can we use
to represent this
complex
transformation?
A: A neural
network!
NIPS 2014

Training GANs: Two-player
game
Lecture 11 May 9,
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
NIPS 2014

game
images
Real or Fake
z
Random
noise
Generator Network
Discriminator
Network
Fake Images
(from
generator)
Real Images
(from training
set)
NIPS 2014
Lecture 11 May 9,
Fake and real images copyright Emily Denton et al. 2015. Reproduced with
permission.

game
images
Train jointly in minimax game
Minimax objective function:
NIPS 2014
Lecture 11 May 9,

game
images
Discriminator outputs likelihood in (0,1) of real
image
Discriminator
output for real
data x
Discriminator output
for generated fake data
G(z)
Lecture 11 May 9,
NIPS 2014

game
images
Discriminator outputs likelihood in (0,1) of real
image
Discriminator
output for real
data x
Discriminator output
for generated fake data
G(z)
Lecture 11 May 9,
- Discriminator (θd
) wants to maximize objective such that D(x) is close to 1 (real)
and D(G(z)) is close to 0 (fake)
- Generator (θg
) wants to minimize objective such that D(G(z)) is close to
1 (discriminator is fooled into thinking generated G(z) is real)
NIPS 2014

game
Minimax objective
function:
Alternate between:
1. Gradient ascent on
discriminator
2. Gradient descent on
generator
NIPS 2014
Lecture 11 May 9,

game
Minimax objective
function:
Alternate between:
discriminator
2. Gradient descent on
generator
In practice, optimizing this generator
objective does not work well!
NIPS 2014
When sample is
likely fake, want to
learn from it to
improve generator.
But gradient in this
region is relatively
flat!
Gradient signal
dominated by
region where
sample is already
good
Lecture 11 May 9,

game
Minimax objective
function:
Alternate between:
discriminator
2. Instead: Gradient ascent on generator, different
objective
NIPS 2014
Instead of minimizing likelihood of discriminator being
correct, now maximize likelihood of discriminator being
wrong.
Same objective of fooling discriminator, but now higher
gradient signal for bad samples => works much better!
Standard in practice.
High gradient
signal
Low gradient
signal
Lecture 11 May 9,

game
Minimax objective
function:
Alternate between:
discriminator
2. Instead: Gradient ascent on generator, different
objective
NIPS 2014
Aside: Jointly training two
networks is challenging,
can be unstable.
Choosing objectives with
better loss landscapes
helps training, is an active
area of research.
Instead of minimizing likelihood of discriminator being
correct, now maximize likelihood of discriminator being
wrong.
Same objective of fooling discriminator, but now higher
gradient signal for bad samples => works much better!
Standard in practice.
High gradient
signal
Low gradient
signal
Lecture 11 May 9,

Training GANs: Two-player game
Putting it together: GAN training
algorithm
NIPS 2014
Lecture 11 May 9,

game
Putting it together: GAN training
algorithm
Some find k=1
more stable,
others use k >
1, no best rule.
Recent work (e.g.
Wasserstein GAN)
alleviates this
problem, better
stability!
Lecture 11 May 9,
NIPS 2014

game
images
Real or Fake
z
Generator Network
Discriminator
Network
Fake Images
(from
generator)
Random
noise
Real Images
(from training
set)
After training, use generator
network to generate new images
Lecture 11 May 9,
NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with
permission.

Nets Generated
samples
Nearest neighbor from training
set
Figures copyright Ian Goodfellow et al., 2014. Reproduced with
permission.
Lecture 11 May 9,
117
NIPS 2014

Nets Generated samples (CIFAR-10)
NIPS 2014
Nearest neighbor from training
set
Figures copyright Ian Goodfellow et al., 2014. Reproduced with
permission.
Lecture 11 May 9,
118

Generative Adversarial Nets: Convolutional
Architectures
Generator is an upsampling network with fractionally-strided
convolutions Discriminator is a convolutional network
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”,
ICLR 2016
Lecture 11 May 9,

Generato
r
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”,
ICLR 2016
Lecture 11 May 9,
Architectures

Samples
from the
model
look much
better!
Radford et
al, ICLR
2016
Lecture 11 May 9,
Architectures

n
t
Interpolatin
g between
random
points in
late space
Architectures
Radford et
al, ICLR
2016
Lecture 11 May 9,

Generative Adversarial Nets: Interpretable Vector
Math
Smiling woman Neutral woman Neutral
man
Lecture 11 May 9,
Sample
s from
the
model
Radford et al, ICLR 2016

man
Average Z
vectors,
do
arithmetic
Lecture 11 May 9,
Sample
s from
the
model
Math

man
Smiling
Man
Sample
s from
the
model
Average Z
vectors,
do
arithmetic
Lecture 11 May 9,
Math

Radford et
al, ICLR 2016
Lecture 11 May 9,
Glasses man No glasses man No glasses
woman
Math

Glasses man No glasses man No glasses
woman
Lecture 11 May 9,
Woman with
glasses
Radford et
al, ICLR 2016
Math

“The GAN
Zoo”
2017: Explosion of GANs
“The GAN Zoo”
https://github.com/hindupuravinash/the-
gan-zoo
Lecture 11 May 9,

and tricks for trainings
GANs
https://github.com/hindupuravinash/the-
gan-zoo
Lecture 11 May 9,
2017: Explosion of GANs See also: https://github.com/soumith/ganhacks for tips
“The GAN Zoo”

Better training and
generation
LSGAN, Zhu 2017.
Lecture 11 May 9,
Wasserstein GAN,
Arjovsky 2017.
Improved
Wasserstein GAN,
Gulrajani 2017. Progressive GAN, Karras 2018.
2017: Explosion of
GANs

2017: Explosion of
GANs
Source->Target domain transfer
CycleGAN. Zhu et al. 2017.
Pix2pix. Isola 2017. Many examples
at https://phillipi.github.io/pix2pix/
Reed et al. 2017.
Many GAN
applications
Lecture 11 May 9,
Text -> Image
Synthesis

2019: BigGAN
Brock et al.,
2019
Lecture 11 May 9,

GANs
Lecture 11 May 9,
Don’t work with an explicit density function
Take game-theoretic approach: learn to generate from training distribution through
2-player game
Pros:
- Beautiful, state-of-the-art samples!
Cons:
- Trickier / more unstable to train
- Can’t solve inference queries such as p(x), p(z|x)
Active areas of research:
- Better loss functions, more stable training (Wasserstein GAN, LSGAN, many
others)
- Conditional GANs, GANs for all kinds of applications

Direct
Markov Chain
Variational
Autoencoder
Boltzmann
Machine
GSN
GAN
2017.
Nets
Lecture 11 May 9,
PixelRNN/CNN
- NADE
- MADE
-
- NICE / RealNVP
- Glow
- Ffjord

Useful Resources on Generative Models
Lecture 11 May 9,
CS 236: Deep Generative Models (Stanford)
CS 294-158 Deep Unsupervised Learning
(Berkeley)

Recap
Lecture 11 May 9,
Generative Models
- PixelRNN and
PixelCNN
- Variational Autoencoders
(VAE)
- Generative Adversarial Networks
(GANs)
Explicit density model, optimizes exact likelihood,
good samples. But inefficient sequential
generation.
Optimize variational lower bound on likelihood.
Useful latent representation, inference queries. But
current sample quality not the best.
Game-theoretic approach, best
samples! But can be tricky and
unstable to train, no inference
queries.

cs231n_2019_lecture11_Tispptisneededforth.pptx

More Related Content

Similar to cs231n_2019_lecture11_Tispptisneededforth.pptx (20)

Recently uploaded (20)

cs231n_2019_lecture11_Tispptisneededforth.pptx