Lecture 11:
Generative
Models
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 1
Administrative
Lecture 11 -2 May 9,
Fei-Fei Li & Justin Johnson & Serena
● A3 is out. Due May 22.
● Milestone is due next Wednesday.
○ Read Piazza post for milestone requirements.
○ Need to Finish data preprocessing and initial results by
then.
● Don't discuss exam yet since people are still taking it.
Overview
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 3
● Unsupervised Learning
● Generative Models
○ PixelRNN and PixelCNN
○ Variational Autoencoders (VAE)
○ Generative Adversarial Networks
(GAN)
Supervised vs Unsupervised Learning
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 4
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Cat
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 5
Classificatio
n
This image is CC0 public
domain
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
DOG, DOG, CAT
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 6
This image is CC0 public
domain
Object
Detection
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Semantic
Segmentation
GRASS, CAT,
TREE, SKY
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 7
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Image
captioning
A cat sitting on a suitcase on the
floor
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 8
Caption generated using
neuraltalk2 Image is CC0 Public
domain.
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc. Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 9
Supervised vs Unsupervised Learning
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Supervised vs Unsupervised Learning
K-means
clustering
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 10
This image is CC0 public
domain
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Supervised vs Unsupervised Learning
3-d 2-d
Principal Component
Analysis (Dimensionality
reduction) This image from Matthias
Scholz is CC0 public domain
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 11
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Supervised vs Unsupervised Learning
Autoencoders
(Feature
learning)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 12
Unsupervised Learning
Data: x
Just data, no
labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Supervised vs Unsupervised Learning
2-d density
estimation
2-d density images left and
right are CC0 public domain
Figure copyright Ian Goodfellow, 2016. Reproduced with permission.
1-d density
estimation
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 13
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 14
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Data: x
Just data, no
labels!
Unsupervised Learning
Training data is cheap
Goal: Learn some
underlying hidden structure
of the data
Examples: Clustering,
dimensionality reduction,
feature learning, density
estimation, etc.
Holy grail: Solve
unsupervised
learning
=> understand
structure of visual
world
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x ->
y
Examples: Classification,
regression, object detection,
semantic segmentation,
image captioning, etc.
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 15
Generative Models
Given training data, generate new samples from same
distribution
Training data ~ pdata
(x) Generated samples ~
pmodel
(x)
Want to learn pmodel
(x) similar to pdata
(x)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 16
Generative Models
Given training data, generate new samples from same
distribution
Training data ~ pdata
(x) Generated samples ~ pmodel
(x)
Want to learn pmodel
(x) similar to pdata
(x)
Addresses density estimation, a core problem in unsupervised learning
Several flavors:
- Explicit density estimation: explicitly define and solve for pmodel
(x)
- Implicit density estimation: learn model that can sample from pmodel
(x) w/o explicitly
defining it
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 17
Why Generative Models?
- Realistic samples for artwork, super-resolution, colorization,
etc.
- Generative models of time-series data can be used for simulation
and planning (reinforcement learning applications!)
- Training generative models can also enable inference of
latent representations that can be useful as general
features
FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR
Blog.
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 18
Taxonomy of Generative
Models Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
GSN
GAN
Fully Visible Belief
Nets
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 19
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord
Taxonomy of Generative
Models Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational
Autoencoder
Boltzmann
Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
Today: discuss 3 most
popular types of
generative models today
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 20
Fully Visible Belief
Nets
PixelRNN/CNN
- NADE
- MADE
-
- NICE / RealNVP
- Glow
- Ffjord
PixelRNN and
PixelCNN
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019
Lecture 11 - 21 May 9,
Fei-Fei Li & Justin Johnson & Serena
Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product
of 1-d distributions:
Likelihood
of image
x
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 22
Probability of i’th pixel
value given all previous
pixels
Then maximize likelihood of training
data
Then maximize likelihood of training
data
Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product
of 1-d distributions:
Likelihood
of image
x
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 23
Probability of i’th pixel
value given all previous
pixels
Complex
distribution
over pixel
values => Express using a
neural network!
Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product
of 1-d distributions:
Likelihood
of image
x
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 24
Probability of i’th pixel
value given all previous
pixels
Will need to define
ordering of
“previous pixels”
Complex distribution over pixel
values => Express using a
neural network!
Then maximize likelihood of training
data
PixelRNN
Generate image pixels starting from
corner
Dependency on previous pixels
modeled using an RNN (LSTM)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 25
[van der Oord et al. 2016]
PixelRNN
Generate image pixels starting from
corner
Dependency on previous pixels
modeled using an RNN (LSTM)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 26
[van der Oord et al. 2016]
PixelRNN
Generate image pixels starting from
corner
Dependency on previous pixels
modeled using an RNN (LSTM)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 27
[van der Oord et al. 2016]
PixelRNN
Generate image pixels starting from
corner
Dependency on previous pixels
modeled using an RNN (LSTM)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 28
[van der Oord et al. 2016]
Drawback: sequential generation is
slow!
PixelCNN [van der Oord et al. 2016]
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 29
Still generate image pixels starting
from corner
Dependency on previous pixels now
modeled using a CNN over context
region
Figure copyright van der Oord et al., 2016. Reproduced with permission.
PixelCNN [van der Oord et al. 2016]
Still generate image pixels starting
from corner
Dependency on previous pixels now
modeled using a CNN over context
region
Training: maximize likelihood of
training images
Figure copyright van der Oord et al., 2016. Reproduced with permission.
Softmax loss at each
pixel
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 30
PixelCNN [van der Oord et al. 2016]
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 31
Still generate image pixels starting
from corner
Dependency on previous pixels now
modeled using a CNN over context
region
Training is faster than PixelRNN
(can parallelize convolutions since context
region values known from training images)
Generation must still proceed sequentially
=> still slow
Figure copyright van der Oord et al., 2016. Reproduced with permission.
Generation Samples
Figures copyright Aaron van der Oord et al., 2016. Reproduced with permission.
32x32 CIFAR-10
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 32
32x32
ImageNet
PixelRNN and PixelCNN
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 33
See
Improving PixelCNN
performance
- Gated convolutional layers
- Short-cut connections
- Discretized logistic loss
- Multi-scale
- Training tricks
- Etc…
- Van der Oord et al. NIPS
2016
- Salimans et al.
2017 (PixelCNN++)
Pros:
- Can explicitly compute
likelihood p(x)
- Explicit likelihood of
training data gives good
evaluation metric
- Good samples
Con:
- Sequential generation =>
slow
Variational
Autoencoders
(VAE)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019
Lecture 11 - 34 May 9,
Fei-Fei Li & Justin Johnson & Serena
PixelCNNs define tractable density function, optimize likelihood of training
data:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 35
So far...
So far...
PixelCNNs define tractable density function, optimize likelihood of training
data:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 36
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood
instead
Some background first: Autoencoders
Features
Encoder
Input data
Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 37
Some background first: Autoencoders
Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
Originally: Linear +
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
Features
Encoder
Input data
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 38
Some background first: Autoencoders
Features
Encoder
Input data
Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
Originally: Linear +
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
z usually smaller than x
(dimensionality
reduction)
Q: Why
dimensionality
reduction?
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 39
Some background first: Autoencoders
Encode
r
Input
data
Feature
s
Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
Originally: Linear +
nonlinearity
(sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN
z usually smaller than x
(dimensionality
reduction)
Q: Why
dimensionality
reduction?
A: Want features to
capture meaningful
factors of variation
in data
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 40
Some background first: Autoencoders
Features
Encoder
Input data
How to learn this feature
representation?
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 41
Some background first: Autoencoders
How to learn this feature representation?
Train such that features can be used to reconstruct original
data “Autoencoding” - encoding itself
Reconstructe
d input
data
D
e
c
o
d
e
r
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 42
Some background first: Autoencoders
Decode
r
Features
Encoder
Input data
How to learn this feature representation?
Train such that features can be used to reconstruct original
data “Autoencoding” - encoding itself
Originally: Linear +
Reconstructe
d input
data
nonlinearity (sigmoid)
Later: Deep, fully-
connected
Later: ReLU CNN (upconv)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 43
Some background first:
Autoencoders
Features
Encoder
Input data
How to learn this feature representation?
Train such that features can be used to reconstruct original
data “Autoencoding” - encoding itself
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Encoder: 4-layer conv
Decoder: 4-layer
upconv
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 44
Some background first:
Autoencoders
Features
Encoder
Input data
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Encoder: 4-layer conv
Decoder: 4-layer
upconv
L2 Loss
function:
Train such that
features
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 45
can be used to
reconstruct original
data
Some background first:
Autoencoders
Encode
r
Input
data
Feature
s
Decode
r
Reconstructe
d input
data
Reconstructed
data
Input
data
Encoder: 4-layer conv
Decoder: 4-layer
upconv
L2 Loss
function:
Train such that
features can be used
to reconstruct original
data
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 46
Doesn’t use
labels!
Some background first: Autoencoders
Encode
r
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 47
Input
data
Feature
s
Decode
r
Reconstructe
d input
data
After training,
throw away
decoder
Some background first: Autoencoders
Encode
r
Input
data
Feature
s
Classifie
r
Predicted
Label Fine-tune
encoder
jointly
with
classifier
Loss
function
(Softmax,
etc)
Encoder can be
used to initialize
a supervised
model
plan
e
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 48
do
g
dee
r
bir
d truc
k
Train for final
task (sometimes
with small data)
Some background first: Autoencoders
Features
Encoder
Input data
Decode
r
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 49
Reconstructe
d input
data
Autoencoders can
reconstruct data, and can
learn features to initialize a
supervised model
Features capture factors of
variation in training data. Can
we generate new images from
an autoencoder?
Variational Autoencoders
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 50
Probabilistic spin on autoencoders - will let us sample from the model to generate
data!
Sample
from true
prior
Variational
Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate
data!
Assume training data is generated from underlying unobserved
(latent) representation z
Sample from
true
conditional
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 51
Sample
from true
prior
Variational
Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate
data!
Assume training data is generated from underlying unobserved
(latent) representation z
Sample from
true
conditional
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 52
Intuition (remember from
autoencoders!): x is an image, z is
latent factors used to generate x:
attributes, orientation, etc.
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 53
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How should we represent this
model?
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 54
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How should we represent this
model?
Choose prior p(z) to be simple, e.g.
Gaussian. Reasonable for latent
attributes,
e.g. pose, how much smile.
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 55
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How should we represent this
model?
Choose prior p(z) to be simple,
e.g. Gaussian.
Conditional p(x|z) is complex
(generates image) => represent with
neural network
Decode
r
networ
k
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 56
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How to train the
model?
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 57
Decode
r
networ
k
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How to train the
model?
Remember strategy for training generative
models from FVBNs. Learn model
parameters to maximize likelihood of
training data
Decode
r
networ
k
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 58
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How to train the
model?
Remember strategy for training generative
models from FVBNs. Learn model
parameters to maximize likelihood of
training data
Now with latent
z
Decode
r
networ
k
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 59
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How to train the
model?
Remember strategy for training generative
models from FVBNs. Learn model
parameters to maximize likelihood of
training data
Q: What is the problem with
this?
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 60
Decode
r
networ
k
Sample
from true
prior
Variational Autoencoders
Sample from
true
conditional
We want to estimate the true
parameters of this generative model.
How to train the
model?
Remember strategy for training generative
models from FVBNs. Learn model
parameters to maximize likelihood of
training data
Q: What is the problem with
this? Intractable!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 61
Decode
r
networ
k
Variational Autoencoders:
Intractability
Data
likelihood:
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 62
Variational Autoencoders:
Intractability ✔
Data likelihood:
Simple Gaussian
prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 63
Variational Autoencoders:
Intractability ✔ ✔
Data likelihood:
Decoder
neural
network
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 64
Variational Autoencoders:
Intractability
Data likelihood:
Intractible to
compute p(x|z) for
every z!
✔ ✔
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 65
Variational Autoencoders:
Intractability ✔ ✔
Data likelihood:
Posterior density also
intractable:
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 66
Variational Autoencoders:
Intractability
Data likelihood:
✔
✔
Posterior density also
intractable:
✔
✔
Intractable data
likelihood
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 67
Variational Autoencoders:
Intractability
Data likelihood:
✔
✔
✔
✔
Posterior density also intractable:
Solution: In addition to decoder network modeling pθ
(x|z), define
additional encoder network qɸ
(z|x) that approximates pθ
(z|x)
Will see that this allows us to derive a lower bound on the data likelihood
that is tractable, which we can optimize
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 68
Variational Autoencoders
Since we’re modeling probabilistic generation of data, encoder and decoder networks are
probabilistic Mean and (diagonal) covariance of z | x Mean and (diagonal) covariance of x | z
Encoder
network
Decoder
network
(parameters
ɸ)
(parameters
θ)
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 69
Variational Autoencoders
Encoder
network
Since we’re modeling probabilistic generation of data, encoder and decoder networks are
probabilistic
Decoder
network
(parameters
ɸ)
(parameters
θ)
Sample z
from
Sample x|z
from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 70
Variational Autoencoders
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Encoder
network
Since we’re modeling probabilistic generation of data, encoder and decoder networks are
probabilistic
Decoder
network
(parameters
ɸ)
(parameters
θ)
Sample z
from
Sample x|z
from
Encoder and decoder networks also called
“recognition”/“inference” and “generation”
networks
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 71
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 72
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 73
Taking expectation wrt. z
(using encoder network)
will come in handy later
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 74
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 75
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 76
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 77
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
The expectation wrt. z
(using encoder network)
let us write nice KL terms
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 78
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
This KL term (between
Gaussians for encoder and
z prior) has nice closed-
form solution!
pθ
(z|x) intractable (saw
earlier), can’t compute this
KL term :( But we know KL
divergence always >= 0.
Decoder network gives pθ
(x|z), can
compute estimate of this term
through sampling. (Sampling
differentiable through reparam.
trick, see paper.) Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 79
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
This KL term (between
Gaussians for encoder and
z
pθ
(z|x) intractable (saw
earlier), can’t compute this
KL
Decoder network gives pθ
(x|z), can
compute estimate of this term
through
We want to
maximize
the data
likelihood
term :( But we know KL
divergence always >=
0.
prior) has nice closed-
form solution!
sampling. (Sampling
differentiable through reparam.
trick, see paper.)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 80
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Tractable lower bound which we can take
gradient of and optimize! (pθ
(x|z)
differentiable,
We want to
maximize
the data
likelihood
KL term differentiable)
Fei-Fei Li & Justin Johnson & Serena Lecture 11 May 9,
81
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
We want to
maximize
the data
likelihood
Training: Maximize lower
bound
Variational lower bound
(“ELBO”)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 82
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the (log) data
likelihood:
Reconstruct
the input
data
Make approximate
posterior
distribution close
to prior
Training: Maximize lower
bound
Variational lower bound
(“ELBO”)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 83
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 84
Input Data
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Let’s look at computing the bound
(forward pass) for a given minibatch
of input data
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 85
Encoder network
Input Data
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 86
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 87
Sample z
from
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 88
Decoder
network
Sample z
from
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 89
Variational Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Decoder
network
Sample z
from
Sample x|z
from
Variational
Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Maximiz
e
likelihood of
original
input being
reconstructe
d
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 90
Decoder
network
Sample z
from
Sample x|z
from
Input Data
Variational
Autoencoders
Putting it all together: maximizing
the likelihood lower bound
Make approximate
posterior
distribution close
to prior
E
n
c
o
d
e
Maximiz
e
likelihood of
original
input being
reconstructe
d
For every minibatch of
input data: compute this
forward pass, and then
backprop!
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 91
Decoder
network
Sample x|z
from
Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from
prior!
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 92
Decoder
network
Sample x|z
from
Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from
prior!
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 93
Decoder
network
Sample x|z
from
Variational Autoencoders: Generating Data!
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Use decoder network. Now sample z from prior! Data manifold for 2-d
z
Vary z1
Vary z2
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 94
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of
smile
Head
pose
Diagonal prior on
z
=>
independent
latent
variables
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 95
Different
dimensions of z
encode
interpretable
factors of variation
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR
2014
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of
smile
Head
pose
Diagonal prior on
z
=>
independent
latent
variables
Different
dimensions of z
encode
interpretable
factors of variation
Also good feature representation
that can be computed using qɸ
(z|
x)!
Kingma and Welling, “Auto-Encoding Variational
Bayes”, ICLR 2014 Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 96
Variational Autoencoders: Generating Data!
32x32 CIFAR-10
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 97
Labeled Faces in the
Wild
Figures copyright (L) Dirk Kingma et al. 2016; (R) Anders Larsen et al. 2017. Reproduced with permission.
Variational Autoencoders
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 98
Probabilistic spin to traditional autoencoders => allows generating data
Defines an intractable density => derive and optimize a (variational) lower bound
Pros:
- Principled approach to generative models
- Allows inference of q(z|x), can be useful feature representation for other
tasks
Cons:
- Maximizes lower bound of likelihood: okay, but not as good
evaluation as PixelRNN/PixelCNN
- Samples blurrier and lower quality compared to state-of-the-art
(GANs)
Active areas of research:
- More flexible approximations, e.g. richer approximate posterior instead of
diagonal Gaussian, e.g., Gaussian Mixture Models (GMMs)
- Incorporating structure in latent variables, e.g., Categorical Distributions
Generative
Adversarial
Networks (GAN)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019
Lecture 11 - 99 May 9,
Fei-Fei Li & Justin Johnson & Serena
So far...
PixelCNNs define tractable density function, optimize likelihood of training
data:
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 100
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood
instead
So far...
PixelCNNs define tractable density function, optimize likelihood of training
data:
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood
instead What if we give up on explicitly modeling density, and just want
ability to sample?
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 101
So far...
PixelCNNs define tractable density function, optimize likelihood of training
data:
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood
instead What if we give up on explicitly modeling density, and just want
ability to sample?
GANs: don’t work with any explicit density function!
Instead, take game-theoretic approach: learn to generate from training
distribution through 2-player game Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 102
Generative Adversarial
Networks
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 103
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Problem: Want to sample from complex, high-dimensional training distribution. No
direct way to do this!
Solution: Sample from a simple distribution, e.g. random noise. Learn
transformation to training distribution.
Q: What can we use
to represent this
complex
transformation?
Problem: Want to sample from complex, high-dimensional training distribution. No
direct way to do this!
Solution: Sample from a simple distribution, e.g. random noise. Learn
transformation to training distribution.
Generative Adversarial
Networks
z
Input: Random
noise
Generato
r
Network
Output: Sample
from training
distribution
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 104
Q: What can we use
to represent this
complex
transformation?
A: A neural
network!
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Training GANs: Two-player
game
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 105
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Training GANs: Two-player
game
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Real or Fake
z
Random
noise
Generator Network
Discriminator
Network
Fake Images
(from
generator)
Real Images
(from training
set)
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 106
Fake and real images copyright Emily Denton et al. 2015. Reproduced with
permission.
Training GANs: Two-player
game
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Train jointly in minimax game
Minimax objective function:
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 107
Training GANs: Two-player
game
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Train jointly in minimax game
Discriminator outputs likelihood in (0,1) of real
image
Minimax objective function:
Discriminator
output for real
data x
Discriminator output
for generated fake data
G(z)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 108
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Training GANs: Two-player
game
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Train jointly in minimax game
Discriminator outputs likelihood in (0,1) of real
image
Minimax objective function:
Discriminator
output for real
data x
Discriminator output
for generated fake data
G(z)
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 109
- Discriminator (θd
) wants to maximize objective such that D(x) is close to 1 (real)
and D(G(z)) is close to 0 (fake)
- Generator (θg
) wants to minimize objective such that D(G(z)) is close to
1 (discriminator is fooled into thinking generated G(z) is real)
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Training GANs: Two-player
game
Minimax objective
function:
Alternate between:
1. Gradient ascent on
discriminator
2. Gradient descent on
generator
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 110
Training GANs: Two-player
game
Minimax objective
function:
Alternate between:
1. Gradient ascent on
discriminator
2. Gradient descent on
generator
In practice, optimizing this generator
objective does not work well!
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
When sample is
likely fake, want to
learn from it to
improve generator.
But gradient in this
region is relatively
flat!
Gradient signal
dominated by
region where
sample is already
good
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 111
Training GANs: Two-player
game
Minimax objective
function:
Alternate between:
1. Gradient ascent on
discriminator
2. Instead: Gradient ascent on generator, different
objective
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Instead of minimizing likelihood of discriminator being
correct, now maximize likelihood of discriminator being
wrong.
Same objective of fooling discriminator, but now higher
gradient signal for bad samples => works much better!
Standard in practice.
High gradient
signal
Low gradient
signal
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 112
Training GANs: Two-player
game
Minimax objective
function:
Alternate between:
1. Gradient ascent on
discriminator
2. Instead: Gradient ascent on generator, different
objective
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Aside: Jointly training two
networks is challenging,
can be unstable.
Choosing objectives with
better loss landscapes
helps training, is an active
area of research.
Instead of minimizing likelihood of discriminator being
correct, now maximize likelihood of discriminator being
wrong.
Same objective of fooling discriminator, but now higher
gradient signal for bad samples => works much better!
Standard in practice.
High gradient
signal
Low gradient
signal
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 113
Training GANs: Two-player game
Putting it together: GAN training
algorithm
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 114
Training GANs: Two-player
game
Putting it together: GAN training
algorithm
Some find k=1
more stable,
others use k >
1, no best rule.
Recent work (e.g.
Wasserstein GAN)
alleviates this
problem, better
stability!
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 115
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Training GANs: Two-player
game
Generator network: try to fool the discriminator by generating real-looking
images
Discriminator network: try to distinguish between real and fake images
Real or Fake
z
Generator Network
Discriminator
Network
Fake Images
(from
generator)
Random
noise
Real Images
(from training
set)
After training, use generator
network to generate new images
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 116
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with
permission.
Generative Adversarial
Nets Generated
samples
Nearest neighbor from training
set
Figures copyright Ian Goodfellow et al., 2014. Reproduced with
permission.
Lecture 11 May 9,
117
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Generative Adversarial
Nets Generated samples (CIFAR-10)
Ian Goodfellow et al.,
“Generative Adversarial Nets”,
NIPS 2014
Nearest neighbor from training
set
Figures copyright Ian Goodfellow et al., 2014. Reproduced with
permission.
Lecture 11 May 9,
118
Generative Adversarial Nets: Convolutional
Architectures
Generator is an upsampling network with fractionally-strided
convolutions Discriminator is a convolutional network
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”,
ICLR 2016
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 119
Generato
r
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”,
ICLR 2016
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 120
Generative Adversarial Nets: Convolutional
Architectures
Samples
from the
model
look much
better!
Radford et
al, ICLR
2016
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 121
Generative Adversarial Nets: Convolutional
Architectures
n
t
Interpolatin
g between
random
points in
late space
Generative Adversarial Nets: Convolutional
Architectures
Radford et
al, ICLR
2016
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 122
Generative Adversarial Nets: Interpretable Vector
Math
Smiling woman Neutral woman Neutral
man
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 123
Sample
s from
the
model
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral
man
Average Z
vectors,
do
arithmetic
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 124
Sample
s from
the
model
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector
Math
Smiling woman Neutral woman Neutral
man
Smiling
Man
Sample
s from
the
model
Average Z
vectors,
do
arithmetic
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 125
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector
Math
Radford et
al, ICLR 2016
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 126
Glasses man No glasses man No glasses
woman
Generative Adversarial Nets: Interpretable Vector
Math
Glasses man No glasses man No glasses
woman
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 127
Woman with
glasses
Radford et
al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector
Math
“The GAN
Zoo”
2017: Explosion of GANs
“The GAN Zoo”
https://github.com/hindupuravinash/the-
gan-zoo
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 128
and tricks for trainings
GANs
https://github.com/hindupuravinash/the-
gan-zoo
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 129
2017: Explosion of GANs See also: https://github.com/soumith/ganhacks for tips
“The GAN Zoo”
Better training and
generation
LSGAN, Zhu 2017.
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 130
Wasserstein GAN,
Arjovsky 2017.
Improved
Wasserstein GAN,
Gulrajani 2017. Progressive GAN, Karras 2018.
2017: Explosion of
GANs
2017: Explosion of
GANs
Source->Target domain transfer
CycleGAN. Zhu et al. 2017.
Pix2pix. Isola 2017. Many examples
at https://phillipi.github.io/pix2pix/
Reed et al. 2017.
Many GAN
applications
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 131
Text -> Image
Synthesis
2019: BigGAN
Brock et al.,
2019
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 132
GANs
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 133
Don’t work with an explicit density function
Take game-theoretic approach: learn to generate from training distribution through
2-player game
Pros:
- Beautiful, state-of-the-art samples!
Cons:
- Trickier / more unstable to train
- Can’t solve inference queries such as p(x), p(z|x)
Active areas of research:
- Better loss functions, more stable training (Wasserstein GAN, LSGAN, many
others)
- Conditional GANs, GANs for all kinds of applications
Taxonomy of Generative
Models Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational
Autoencoder
Boltzmann
Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
Fully Visible Belief
Nets
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 134
PixelRNN/CNN
- NADE
- MADE
-
- NICE / RealNVP
- Glow
- Ffjord
Useful Resources on Generative Models
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 135
CS 236: Deep Generative Models (Stanford)
CS 294-158 Deep Unsupervised Learning
(Berkeley)
Recap
Lecture 11 May 9,
Fei-Fei Li & Justin Johnson & Serena 136
Generative Models
- PixelRNN and
PixelCNN
- Variational Autoencoders
(VAE)
- Generative Adversarial Networks
(GANs)
Explicit density model, optimizes exact likelihood,
good samples. But inefficient sequential
generation.
Optimize variational lower bound on likelihood.
Useful latent representation, inference queries. But
current sample quality not the best.
Game-theoretic approach, best
samples! But can be tricky and
unstable to train, no inference
queries.

More Related Content

PDF
Cs231n 2017 lecture13 Generative Model
PDF
lecture_13_jiajun.pdf Generative models GAN
PDF
GenerativeModelsMaskedSelf-Attention.pdf
PDF
Deep Generative Modelling (updated)
PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PDF
Deep Generative Modelling
PDF
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
PPTX
GAN Generative Adversarial Networks.pptx
Cs231n 2017 lecture13 Generative Model
lecture_13_jiajun.pdf Generative models GAN
GenerativeModelsMaskedSelf-Attention.pdf
Deep Generative Modelling (updated)
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Deep Generative Modelling
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
GAN Generative Adversarial Networks.pptx

Similar to cs231n_2019_lecture11_Tispptisneededforth.pptx (20)

PDF
lecture_5_ruohan image classification with CNN
PPTX
Gans - Generative Adversarial Nets
PDF
A Short Introduction to Generative Adversarial Networks
PPTX
GAN Deep Learning Approaches to Image Processing Applications (1).pptx
PDF
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
PDF
Lec 1-2 ssdsdffffsssssfsdfsdfstGenAI.pdf
PDF
Deep Generative Learning for All
PDF
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
lecture_14_jiajun.pdf Self supervised Learning
PDF
lecture_2-classification and learning -dl -tutorial
PDF
Tutorial on Theory and Application of Generative Adversarial Networks
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
PPTX
Introduction to Generative Models.pptx
PPTX
GAN for business value @ Data Science Milan
PDF
Deep LearningフレームワークChainerと最近の技術動向
PDF
Generative Adversarial Networks
PPTX
CM20315_01_Intro_Machine_Learning_ap.pptx
PDF
M4L19 Generative Models - Slides v 3.pdf
PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
lecture_5_ruohan image classification with CNN
Gans - Generative Adversarial Nets
A Short Introduction to Generative Adversarial Networks
GAN Deep Learning Approaches to Image Processing Applications (1).pptx
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Lec 1-2 ssdsdffffsssssfsdfsdfstGenAI.pdf
Deep Generative Learning for All
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
lecture_14_jiajun.pdf Self supervised Learning
lecture_2-classification and learning -dl -tutorial
Tutorial on Theory and Application of Generative Adversarial Networks
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Introduction to Generative Models.pptx
GAN for business value @ Data Science Milan
Deep LearningフレームワークChainerと最近の技術動向
Generative Adversarial Networks
CM20315_01_Intro_Machine_Learning_ap.pptx
M4L19 Generative Models - Slides v 3.pdf
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Recently uploaded (20)

PDF
John Deere 410E II Articulated Dump Truck Service Manual.pdf
PPT
Main/Core Business Application User Manual
PPTX
729193dbwbsve251-Calabarzon-Ppt-Copy.pptx
DOC
EAU-960 COMBINED INJECTION AND IGNITION SYSTEM WITH ELECTRONIC REGULATION.doc
PDF
Engine Volvo EC55 Compact Excavator Service Repair Manual.pdf
PPTX
Cloud_Computing_ppt[1].pptx132EQ342RRRRR1
PPTX
IOT-UNIT 3.pptxaaaasasasasasasaasasasasas
PDF
GMPL auto injector molding toollllllllllllllll
PDF
Articulated Dump Truck John Deere 370E 410E 460E Technical Manual.pdf
PDF
MES Chapter 3 Combined UNIVERSITY OF VISVESHWARAYA
PDF
150 caterpillar motor grader service repair manual EB4
PDF
Lubrication system for Automotive technologies
PPTX
368455847-Relibility RJS-Relibility-PPT-1.pptx
PDF
book-slidefsdljflsk fdslkfjslf sflgs.pdf
PDF
Transmission John Deere 370E 410E 460E Technical Manual.pdf
PDF
Cylinder head Volvo EC55 Service Repair Manual.pdf
PDF
harrier-ev-brochure___________________.pdf
PDF
Compact Excavator Volvo EC55 Service Repair Manual.pdf
PPTX
Constitutional Design PPT.pptxl from social science class IX
PDF
Dongguan Sunnew ESS Profile for the year of 2023
John Deere 410E II Articulated Dump Truck Service Manual.pdf
Main/Core Business Application User Manual
729193dbwbsve251-Calabarzon-Ppt-Copy.pptx
EAU-960 COMBINED INJECTION AND IGNITION SYSTEM WITH ELECTRONIC REGULATION.doc
Engine Volvo EC55 Compact Excavator Service Repair Manual.pdf
Cloud_Computing_ppt[1].pptx132EQ342RRRRR1
IOT-UNIT 3.pptxaaaasasasasasasaasasasasas
GMPL auto injector molding toollllllllllllllll
Articulated Dump Truck John Deere 370E 410E 460E Technical Manual.pdf
MES Chapter 3 Combined UNIVERSITY OF VISVESHWARAYA
150 caterpillar motor grader service repair manual EB4
Lubrication system for Automotive technologies
368455847-Relibility RJS-Relibility-PPT-1.pptx
book-slidefsdljflsk fdslkfjslf sflgs.pdf
Transmission John Deere 370E 410E 460E Technical Manual.pdf
Cylinder head Volvo EC55 Service Repair Manual.pdf
harrier-ev-brochure___________________.pdf
Compact Excavator Volvo EC55 Service Repair Manual.pdf
Constitutional Design PPT.pptxl from social science class IX
Dongguan Sunnew ESS Profile for the year of 2023

cs231n_2019_lecture11_Tispptisneededforth.pptx

  • 1. Lecture 11: Generative Models Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 1
  • 2. Administrative Lecture 11 -2 May 9, Fei-Fei Li & Justin Johnson & Serena ● A3 is out. Due May 22. ● Milestone is due next Wednesday. ○ Read Piazza post for milestone requirements. ○ Need to Finish data preprocessing and initial results by then. ● Don't discuss exam yet since people are still taking it.
  • 3. Overview Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 3 ● Unsupervised Learning ● Generative Models ○ PixelRNN and PixelCNN ○ Variational Autoencoders (VAE) ○ Generative Adversarial Networks (GAN)
  • 4. Supervised vs Unsupervised Learning Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 4 Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
  • 5. Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Cat Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 5 Classificatio n This image is CC0 public domain
  • 6. Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. DOG, DOG, CAT Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 6 This image is CC0 public domain Object Detection
  • 7. Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Semantic Segmentation GRASS, CAT, TREE, SKY Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 7
  • 8. Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Image captioning A cat sitting on a suitcase on the floor Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 8 Caption generated using neuraltalk2 Image is CC0 Public domain.
  • 9. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 9 Supervised vs Unsupervised Learning
  • 10. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Supervised vs Unsupervised Learning K-means clustering Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 10 This image is CC0 public domain
  • 11. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Supervised vs Unsupervised Learning 3-d 2-d Principal Component Analysis (Dimensionality reduction) This image from Matthias Scholz is CC0 public domain Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 11
  • 12. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Supervised vs Unsupervised Learning Autoencoders (Feature learning) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 12
  • 13. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Supervised vs Unsupervised Learning 2-d density estimation 2-d density images left and right are CC0 public domain Figure copyright Ian Goodfellow, 2016. Reproduced with permission. 1-d density estimation Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 13
  • 14. Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 14 Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
  • 15. Data: x Just data, no labels! Unsupervised Learning Training data is cheap Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Holy grail: Solve unsupervised learning => understand structure of visual world Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 15
  • 16. Generative Models Given training data, generate new samples from same distribution Training data ~ pdata (x) Generated samples ~ pmodel (x) Want to learn pmodel (x) similar to pdata (x) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 16
  • 17. Generative Models Given training data, generate new samples from same distribution Training data ~ pdata (x) Generated samples ~ pmodel (x) Want to learn pmodel (x) similar to pdata (x) Addresses density estimation, a core problem in unsupervised learning Several flavors: - Explicit density estimation: explicitly define and solve for pmodel (x) - Implicit density estimation: learn model that can sample from pmodel (x) w/o explicitly defining it Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 17
  • 18. Why Generative Models? - Realistic samples for artwork, super-resolution, colorization, etc. - Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!) - Training generative models can also enable inference of latent representations that can be useful as general features FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR Blog. Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 18
  • 19. Taxonomy of Generative Models Generative models Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Variational Autoencoder Boltzmann Machine Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. GSN GAN Fully Visible Belief Nets Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 19 - NADE - MADE - PixelRNN/CNN - NICE / RealNVP - Glow - Ffjord
  • 20. Taxonomy of Generative Models Generative models Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Variational Autoencoder Boltzmann Machine GSN GAN Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Today: discuss 3 most popular types of generative models today Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 20 Fully Visible Belief Nets PixelRNN/CNN - NADE - MADE - - NICE / RealNVP - Glow - Ffjord
  • 21. PixelRNN and PixelCNN Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019 Lecture 11 - 21 May 9, Fei-Fei Li & Justin Johnson & Serena
  • 22. Fully visible belief network Explicit density model Use chain rule to decompose likelihood of an image x into product of 1-d distributions: Likelihood of image x Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 22 Probability of i’th pixel value given all previous pixels Then maximize likelihood of training data
  • 23. Then maximize likelihood of training data Fully visible belief network Explicit density model Use chain rule to decompose likelihood of an image x into product of 1-d distributions: Likelihood of image x Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 23 Probability of i’th pixel value given all previous pixels Complex distribution over pixel values => Express using a neural network!
  • 24. Fully visible belief network Explicit density model Use chain rule to decompose likelihood of an image x into product of 1-d distributions: Likelihood of image x Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 24 Probability of i’th pixel value given all previous pixels Will need to define ordering of “previous pixels” Complex distribution over pixel values => Express using a neural network! Then maximize likelihood of training data
  • 25. PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 25 [van der Oord et al. 2016]
  • 26. PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 26 [van der Oord et al. 2016]
  • 27. PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 27 [van der Oord et al. 2016]
  • 28. PixelRNN Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 28 [van der Oord et al. 2016] Drawback: sequential generation is slow!
  • 29. PixelCNN [van der Oord et al. 2016] Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 29 Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Figure copyright van der Oord et al., 2016. Reproduced with permission.
  • 30. PixelCNN [van der Oord et al. 2016] Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training: maximize likelihood of training images Figure copyright van der Oord et al., 2016. Reproduced with permission. Softmax loss at each pixel Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 30
  • 31. PixelCNN [van der Oord et al. 2016] Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 31 Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow Figure copyright van der Oord et al., 2016. Reproduced with permission.
  • 32. Generation Samples Figures copyright Aaron van der Oord et al., 2016. Reproduced with permission. 32x32 CIFAR-10 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 32 32x32 ImageNet
  • 33. PixelRNN and PixelCNN Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 33 See Improving PixelCNN performance - Gated convolutional layers - Short-cut connections - Discretized logistic loss - Multi-scale - Training tricks - Etc… - Van der Oord et al. NIPS 2016 - Salimans et al. 2017 (PixelCNN++) Pros: - Can explicitly compute likelihood p(x) - Explicit likelihood of training data gives good evaluation metric - Good samples Con: - Sequential generation => slow
  • 34. Variational Autoencoders (VAE) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019 Lecture 11 - 34 May 9, Fei-Fei Li & Justin Johnson & Serena
  • 35. PixelCNNs define tractable density function, optimize likelihood of training data: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 35 So far...
  • 36. So far... PixelCNNs define tractable density function, optimize likelihood of training data: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 36 VAEs define intractable density function with latent z: Cannot optimize directly, derive and optimize lower bound on likelihood instead
  • 37. Some background first: Autoencoders Features Encoder Input data Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 37
  • 38. Some background first: Autoencoders Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully- connected Later: ReLU CNN Features Encoder Input data Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 38
  • 39. Some background first: Autoencoders Features Encoder Input data Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully- connected Later: ReLU CNN z usually smaller than x (dimensionality reduction) Q: Why dimensionality reduction? Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 39
  • 40. Some background first: Autoencoders Encode r Input data Feature s Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data Originally: Linear + nonlinearity (sigmoid) Later: Deep, fully- connected Later: ReLU CNN z usually smaller than x (dimensionality reduction) Q: Why dimensionality reduction? A: Want features to capture meaningful factors of variation in data Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 40
  • 41. Some background first: Autoencoders Features Encoder Input data How to learn this feature representation? Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 41
  • 42. Some background first: Autoencoders How to learn this feature representation? Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself Reconstructe d input data D e c o d e r Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 42
  • 43. Some background first: Autoencoders Decode r Features Encoder Input data How to learn this feature representation? Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself Originally: Linear + Reconstructe d input data nonlinearity (sigmoid) Later: Deep, fully- connected Later: ReLU CNN (upconv) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 43
  • 44. Some background first: Autoencoders Features Encoder Input data How to learn this feature representation? Train such that features can be used to reconstruct original data “Autoencoding” - encoding itself Decode r Reconstructe d input data Reconstructed data Input data Encoder: 4-layer conv Decoder: 4-layer upconv Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 44
  • 45. Some background first: Autoencoders Features Encoder Input data Decode r Reconstructe d input data Reconstructed data Input data Encoder: 4-layer conv Decoder: 4-layer upconv L2 Loss function: Train such that features Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 45 can be used to reconstruct original data
  • 46. Some background first: Autoencoders Encode r Input data Feature s Decode r Reconstructe d input data Reconstructed data Input data Encoder: 4-layer conv Decoder: 4-layer upconv L2 Loss function: Train such that features can be used to reconstruct original data Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 46 Doesn’t use labels!
  • 47. Some background first: Autoencoders Encode r Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 47 Input data Feature s Decode r Reconstructe d input data After training, throw away decoder
  • 48. Some background first: Autoencoders Encode r Input data Feature s Classifie r Predicted Label Fine-tune encoder jointly with classifier Loss function (Softmax, etc) Encoder can be used to initialize a supervised model plan e Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 48 do g dee r bir d truc k Train for final task (sometimes with small data)
  • 49. Some background first: Autoencoders Features Encoder Input data Decode r Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 49 Reconstructe d input data Autoencoders can reconstruct data, and can learn features to initialize a supervised model Features capture factors of variation in training data. Can we generate new images from an autoencoder?
  • 50. Variational Autoencoders Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 50 Probabilistic spin on autoencoders - will let us sample from the model to generate data!
  • 51. Sample from true prior Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! Assume training data is generated from underlying unobserved (latent) representation z Sample from true conditional Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 51
  • 52. Sample from true prior Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! Assume training data is generated from underlying unobserved (latent) representation z Sample from true conditional Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 52 Intuition (remember from autoencoders!): x is an image, z is latent factors used to generate x: attributes, orientation, etc.
  • 53. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 53
  • 54. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How should we represent this model? Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 54
  • 55. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How should we represent this model? Choose prior p(z) to be simple, e.g. Gaussian. Reasonable for latent attributes, e.g. pose, how much smile. Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 55
  • 56. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How should we represent this model? Choose prior p(z) to be simple, e.g. Gaussian. Conditional p(x|z) is complex (generates image) => represent with neural network Decode r networ k Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 56
  • 57. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How to train the model? Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 57 Decode r networ k
  • 58. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How to train the model? Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data Decode r networ k Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 58
  • 59. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How to train the model? Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data Now with latent z Decode r networ k Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 59
  • 60. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How to train the model? Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data Q: What is the problem with this? Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 60 Decode r networ k
  • 61. Sample from true prior Variational Autoencoders Sample from true conditional We want to estimate the true parameters of this generative model. How to train the model? Remember strategy for training generative models from FVBNs. Learn model parameters to maximize likelihood of training data Q: What is the problem with this? Intractable! Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 61 Decode r networ k
  • 62. Variational Autoencoders: Intractability Data likelihood: Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 62
  • 63. Variational Autoencoders: Intractability ✔ Data likelihood: Simple Gaussian prior Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 63
  • 64. Variational Autoencoders: Intractability ✔ ✔ Data likelihood: Decoder neural network Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 64
  • 65. Variational Autoencoders: Intractability Data likelihood: Intractible to compute p(x|z) for every z! ✔ ✔ Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 65
  • 66. Variational Autoencoders: Intractability ✔ ✔ Data likelihood: Posterior density also intractable: Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 66
  • 67. Variational Autoencoders: Intractability Data likelihood: ✔ ✔ Posterior density also intractable: ✔ ✔ Intractable data likelihood Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 67
  • 68. Variational Autoencoders: Intractability Data likelihood: ✔ ✔ ✔ ✔ Posterior density also intractable: Solution: In addition to decoder network modeling pθ (x|z), define additional encoder network qɸ (z|x) that approximates pθ (z|x) Will see that this allows us to derive a lower bound on the data likelihood that is tractable, which we can optimize Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 68
  • 69. Variational Autoencoders Since we’re modeling probabilistic generation of data, encoder and decoder networks are probabilistic Mean and (diagonal) covariance of z | x Mean and (diagonal) covariance of x | z Encoder network Decoder network (parameters ɸ) (parameters θ) Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 69
  • 70. Variational Autoencoders Encoder network Since we’re modeling probabilistic generation of data, encoder and decoder networks are probabilistic Decoder network (parameters ɸ) (parameters θ) Sample z from Sample x|z from Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 70
  • 71. Variational Autoencoders Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Encoder network Since we’re modeling probabilistic generation of data, encoder and decoder networks are probabilistic Decoder network (parameters ɸ) (parameters θ) Sample z from Sample x|z from Encoder and decoder networks also called “recognition”/“inference” and “generation” networks Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 71
  • 72. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 72
  • 73. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 73 Taking expectation wrt. z (using encoder network) will come in handy later
  • 74. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 74
  • 75. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 75
  • 76. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 76
  • 77. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 77
  • 78. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: The expectation wrt. z (using encoder network) let us write nice KL terms Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 78
  • 79. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: This KL term (between Gaussians for encoder and z prior) has nice closed- form solution! pθ (z|x) intractable (saw earlier), can’t compute this KL term :( But we know KL divergence always >= 0. Decoder network gives pθ (x|z), can compute estimate of this term through sampling. (Sampling differentiable through reparam. trick, see paper.) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 79
  • 80. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: This KL term (between Gaussians for encoder and z pθ (z|x) intractable (saw earlier), can’t compute this KL Decoder network gives pθ (x|z), can compute estimate of this term through We want to maximize the data likelihood term :( But we know KL divergence always >= 0. prior) has nice closed- form solution! sampling. (Sampling differentiable through reparam. trick, see paper.) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 80
  • 81. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Tractable lower bound which we can take gradient of and optimize! (pθ (x|z) differentiable, We want to maximize the data likelihood KL term differentiable) Fei-Fei Li & Justin Johnson & Serena Lecture 11 May 9, 81
  • 82. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: We want to maximize the data likelihood Training: Maximize lower bound Variational lower bound (“ELBO”) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 82
  • 83. Variational Autoencoders Now equipped with our encoder and decoder networks, let’s work out the (log) data likelihood: Reconstruct the input data Make approximate posterior distribution close to prior Training: Maximize lower bound Variational lower bound (“ELBO”) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 83
  • 84. Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 84
  • 85. Input Data Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Let’s look at computing the bound (forward pass) for a given minibatch of input data Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 85
  • 86. Encoder network Input Data Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 86
  • 87. Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior E n c o d e Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 87
  • 88. Sample z from Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior E n c o d e Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 88
  • 89. Decoder network Sample z from Make approximate posterior distribution close to prior E n c o d e Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 89 Variational Autoencoders Putting it all together: maximizing the likelihood lower bound
  • 90. Decoder network Sample z from Sample x|z from Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior E n c o d e Maximiz e likelihood of original input being reconstructe d Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 90
  • 91. Decoder network Sample z from Sample x|z from Input Data Variational Autoencoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior E n c o d e Maximiz e likelihood of original input being reconstructe d For every minibatch of input data: compute this forward pass, and then backprop! Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 91
  • 92. Decoder network Sample x|z from Variational Autoencoders: Generating Data! Use decoder network. Now sample z from prior! Sample z from Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 92
  • 93. Decoder network Sample x|z from Variational Autoencoders: Generating Data! Use decoder network. Now sample z from prior! Sample z from Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 93
  • 94. Decoder network Sample x|z from Variational Autoencoders: Generating Data! Sample z from Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Use decoder network. Now sample z from prior! Data manifold for 2-d z Vary z1 Vary z2 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 94
  • 95. Variational Autoencoders: Generating Data! Vary z1 Vary z2 Degree of smile Head pose Diagonal prior on z => independent latent variables Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 95 Different dimensions of z encode interpretable factors of variation Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
  • 96. Variational Autoencoders: Generating Data! Vary z1 Vary z2 Degree of smile Head pose Diagonal prior on z => independent latent variables Different dimensions of z encode interpretable factors of variation Also good feature representation that can be computed using qɸ (z| x)! Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 96
  • 97. Variational Autoencoders: Generating Data! 32x32 CIFAR-10 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 97 Labeled Faces in the Wild Figures copyright (L) Dirk Kingma et al. 2016; (R) Anders Larsen et al. 2017. Reproduced with permission.
  • 98. Variational Autoencoders Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 98 Probabilistic spin to traditional autoencoders => allows generating data Defines an intractable density => derive and optimize a (variational) lower bound Pros: - Principled approach to generative models - Allows inference of q(z|x), can be useful feature representation for other tasks Cons: - Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN - Samples blurrier and lower quality compared to state-of-the-art (GANs) Active areas of research: - More flexible approximations, e.g. richer approximate posterior instead of diagonal Gaussian, e.g., Gaussian Mixture Models (GMMs) - Incorporating structure in latent variables, e.g., Categorical Distributions
  • 99. Generative Adversarial Networks (GAN) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - May 9, 2019 Lecture 11 - 99 May 9, Fei-Fei Li & Justin Johnson & Serena
  • 100. So far... PixelCNNs define tractable density function, optimize likelihood of training data: Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 100 VAEs define intractable density function with latent z: Cannot optimize directly, derive and optimize lower bound on likelihood instead
  • 101. So far... PixelCNNs define tractable density function, optimize likelihood of training data: VAEs define intractable density function with latent z: Cannot optimize directly, derive and optimize lower bound on likelihood instead What if we give up on explicitly modeling density, and just want ability to sample? Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 101
  • 102. So far... PixelCNNs define tractable density function, optimize likelihood of training data: VAEs define intractable density function with latent z: Cannot optimize directly, derive and optimize lower bound on likelihood instead What if we give up on explicitly modeling density, and just want ability to sample? GANs: don’t work with any explicit density function! Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 102
  • 103. Generative Adversarial Networks Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 103 Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this! Solution: Sample from a simple distribution, e.g. random noise. Learn transformation to training distribution. Q: What can we use to represent this complex transformation?
  • 104. Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this! Solution: Sample from a simple distribution, e.g. random noise. Learn transformation to training distribution. Generative Adversarial Networks z Input: Random noise Generato r Network Output: Sample from training distribution Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 104 Q: What can we use to represent this complex transformation? A: A neural network! Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 105. Training GANs: Two-player game Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 105 Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 106. Training GANs: Two-player game Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Real or Fake z Random noise Generator Network Discriminator Network Fake Images (from generator) Real Images (from training set) Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 106 Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
  • 107. Training GANs: Two-player game Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Train jointly in minimax game Minimax objective function: Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 107
  • 108. Training GANs: Two-player game Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Train jointly in minimax game Discriminator outputs likelihood in (0,1) of real image Minimax objective function: Discriminator output for real data x Discriminator output for generated fake data G(z) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 108 Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 109. Training GANs: Two-player game Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Train jointly in minimax game Discriminator outputs likelihood in (0,1) of real image Minimax objective function: Discriminator output for real data x Discriminator output for generated fake data G(z) Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 109 - Discriminator (θd ) wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake) - Generator (θg ) wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real) Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 110. Training GANs: Two-player game Minimax objective function: Alternate between: 1. Gradient ascent on discriminator 2. Gradient descent on generator Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 110
  • 111. Training GANs: Two-player game Minimax objective function: Alternate between: 1. Gradient ascent on discriminator 2. Gradient descent on generator In practice, optimizing this generator objective does not work well! Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 When sample is likely fake, want to learn from it to improve generator. But gradient in this region is relatively flat! Gradient signal dominated by region where sample is already good Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 111
  • 112. Training GANs: Two-player game Minimax objective function: Alternate between: 1. Gradient ascent on discriminator 2. Instead: Gradient ascent on generator, different objective Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong. Same objective of fooling discriminator, but now higher gradient signal for bad samples => works much better! Standard in practice. High gradient signal Low gradient signal Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 112
  • 113. Training GANs: Two-player game Minimax objective function: Alternate between: 1. Gradient ascent on discriminator 2. Instead: Gradient ascent on generator, different objective Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training, is an active area of research. Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong. Same objective of fooling discriminator, but now higher gradient signal for bad samples => works much better! Standard in practice. High gradient signal Low gradient signal Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 113
  • 114. Training GANs: Two-player game Putting it together: GAN training algorithm Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 114
  • 115. Training GANs: Two-player game Putting it together: GAN training algorithm Some find k=1 more stable, others use k > 1, no best rule. Recent work (e.g. Wasserstein GAN) alleviates this problem, better stability! Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 115 Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 116. Training GANs: Two-player game Generator network: try to fool the discriminator by generating real-looking images Discriminator network: try to distinguish between real and fake images Real or Fake z Generator Network Discriminator Network Fake Images (from generator) Random noise Real Images (from training set) After training, use generator network to generate new images Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 116 Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
  • 117. Generative Adversarial Nets Generated samples Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission. Lecture 11 May 9, 117 Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
  • 118. Generative Adversarial Nets Generated samples (CIFAR-10) Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014 Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission. Lecture 11 May 9, 118
  • 119. Generative Adversarial Nets: Convolutional Architectures Generator is an upsampling network with fractionally-strided convolutions Discriminator is a convolutional network Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 119
  • 120. Generato r Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 120 Generative Adversarial Nets: Convolutional Architectures
  • 121. Samples from the model look much better! Radford et al, ICLR 2016 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 121 Generative Adversarial Nets: Convolutional Architectures
  • 122. n t Interpolatin g between random points in late space Generative Adversarial Nets: Convolutional Architectures Radford et al, ICLR 2016 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 122
  • 123. Generative Adversarial Nets: Interpretable Vector Math Smiling woman Neutral woman Neutral man Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 123 Sample s from the model Radford et al, ICLR 2016
  • 124. Smiling woman Neutral woman Neutral man Average Z vectors, do arithmetic Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 124 Sample s from the model Radford et al, ICLR 2016 Generative Adversarial Nets: Interpretable Vector Math
  • 125. Smiling woman Neutral woman Neutral man Smiling Man Sample s from the model Average Z vectors, do arithmetic Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 125 Radford et al, ICLR 2016 Generative Adversarial Nets: Interpretable Vector Math
  • 126. Radford et al, ICLR 2016 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 126 Glasses man No glasses man No glasses woman Generative Adversarial Nets: Interpretable Vector Math
  • 127. Glasses man No glasses man No glasses woman Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 127 Woman with glasses Radford et al, ICLR 2016 Generative Adversarial Nets: Interpretable Vector Math
  • 128. “The GAN Zoo” 2017: Explosion of GANs “The GAN Zoo” https://github.com/hindupuravinash/the- gan-zoo Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 128
  • 129. and tricks for trainings GANs https://github.com/hindupuravinash/the- gan-zoo Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 129 2017: Explosion of GANs See also: https://github.com/soumith/ganhacks for tips “The GAN Zoo”
  • 130. Better training and generation LSGAN, Zhu 2017. Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 130 Wasserstein GAN, Arjovsky 2017. Improved Wasserstein GAN, Gulrajani 2017. Progressive GAN, Karras 2018. 2017: Explosion of GANs
  • 131. 2017: Explosion of GANs Source->Target domain transfer CycleGAN. Zhu et al. 2017. Pix2pix. Isola 2017. Many examples at https://phillipi.github.io/pix2pix/ Reed et al. 2017. Many GAN applications Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 131 Text -> Image Synthesis
  • 132. 2019: BigGAN Brock et al., 2019 Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 132
  • 133. GANs Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 133 Don’t work with an explicit density function Take game-theoretic approach: learn to generate from training distribution through 2-player game Pros: - Beautiful, state-of-the-art samples! Cons: - Trickier / more unstable to train - Can’t solve inference queries such as p(x), p(z|x) Active areas of research: - Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others) - Conditional GANs, GANs for all kinds of applications
  • 134. Taxonomy of Generative Models Generative models Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Variational Autoencoder Boltzmann Machine GSN GAN Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Fully Visible Belief Nets Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 134 PixelRNN/CNN - NADE - MADE - - NICE / RealNVP - Glow - Ffjord
  • 135. Useful Resources on Generative Models Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 135 CS 236: Deep Generative Models (Stanford) CS 294-158 Deep Unsupervised Learning (Berkeley)
  • 136. Recap Lecture 11 May 9, Fei-Fei Li & Justin Johnson & Serena 136 Generative Models - PixelRNN and PixelCNN - Variational Autoencoders (VAE) - Generative Adversarial Networks (GANs) Explicit density model, optimizes exact likelihood, good samples. But inefficient sequential generation. Optimize variational lower bound on likelihood. Useful latent representation, inference queries. But current sample quality not the best. Game-theoretic approach, best samples! But can be tricky and unstable to train, no inference queries.