Improving Variational Inference with Inverse Autoregressive Flow

Improving Variational Inference
with Inverse Autoregressive Flow
Jan. 19, 2017
Tatsuya Shirakawa (tatsuya@abeja.asia)
Diederik P. Kingma (OpenAI) Tim Salimans (OpenAI) Rafal Jozefowics (OpenAI)
Xi Chen (OpenAI) Ilya Sutskever (OpenAI)
Max Welling (University of Amsterdam)

2
Requirements for the inference model q(z|x)
Computational Tractability
1. Computationally cheap to compute and differentiate
2. Computationally cheap to sample from
3. Parallel computation
Accuracy
4. Sufficiently flexible to match
the true posterior p(z|x)
P(z|x; μ*)
q(z|x; ν*)
P(z|x; μ)
q(z|x; ν)

3
Previous Designs of q(z|x)
Basic Designs
- Diagonal Gaussian Distribution
- Full Covariance Gaussian Distribution
Designs based on Change of Variables
- Nice
L. Dinh et al., “Nice: non-linear independent components estimation”, 2014
- Normalizing Flow
D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015
Designs based on Adding Auxiliary Variables
- Hamiltonian Flow/Hamiltonian Variational Inference
T. Salimans et al., ”Markov chain Monte Carlo and variational inference: Bridging the gap”, 2014

4
Diagonal/Full Covariance Gaussian Distribution
Diagonal: Efficient but not flexible
𝑞 𝒛 𝒙 = ΠU 𝑁 𝒛𝒊|𝜇U 𝒙 , 𝜎U 𝒙
Full Covariance: Not Efficient and not flexible (unimodal)
𝑞 𝒛 𝒙 = 𝑁 𝒛|𝝁 𝒙 , 𝚺 𝒙
1. Computationally cheap to compute and differentiate ✓ / ✗
2. Computationally cheap to sample from ✓ / ✗
3. Parallel computation ✓ / ✗
✗

5
Change of Variables based methods
Transoform 𝑞 𝑧Z 𝑥 to make more powerful distribution
𝑞 𝑧 𝑥 via sequential application of change of variables
𝒛 𝒕 = 𝑓^ 𝒛 𝒕_𝟏
𝑞 𝒛 𝒕 𝒙 = 𝑞 𝒛 𝒕_𝟏 𝒙 det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
_G
⇒ log 𝑞 𝒛 𝑻 𝒙 = log 𝑞 𝒛 𝟎 𝒙 − B log det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
^
• Nice
L. Dinh et al., “Nice: non-linear independent components estimation”, 2014
• Normalizing Flow
D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015

6
Normalizing Flow
Transformation via
𝒛 𝒕 = 𝒛 𝒕_𝟏 + 𝒖 𝒕 𝑓^ 𝒘 𝒕

𝒛 𝒕_𝟏 + 𝑏^
Key Features
- Determinants are computable
Drawbacks
- Information goes through single bottleneck
1. Computationally cheap to compute and differentiate ✓
2. Computationally cheap to sample from ✓
3. Parallel computation ✗
✗
single bottleneck
⊕
𝒛 𝒕_𝟏
𝒛 𝒕
𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^
𝒖 𝒕 𝑓^ 𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^

7
Hamiltonian Flow / Hamiltonian Variational Inference
ELBO with auxiliary variables y
log 𝑝 𝒙 ≥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 − 𝐷23 𝑞 𝒚 𝒙, 𝒛 ∥ 𝑟 𝒚 𝒙, 𝒛 =: ℒ 𝒙
Drawing (y, z) via HMC
𝑦^, 𝑧^ ~𝐻𝑀𝐶 𝑦^, 𝑧^|𝑦^_G, 𝑧^_G
Key Features
- Capability to sample from exact posterior
Drawbacks
- Long mixing time and lower ELBO
1. Computationally cheap to compute and differentiate ✗
2. Computationally cheap to sample from ✗
✓

8
Nice
Transform only half of z at each steps
𝒛 𝒕 = 𝒛 𝒕
𝜶
, 𝒛 𝒕
𝜷
= 𝒛 𝒕_𝟏
𝜶
, 𝒛 𝒕_𝟏
𝜷
+ 𝑓^ 𝒙, 𝒛 𝒕_𝟏
𝜶
,
Key Features
- Determinant of the Jacobian det
uvw 𝒛 𝒕x𝟏
u𝒛 𝒕x𝟏
is always 1
Drawbacks
- Limited form of transformation
- less accurate powerful than Normalizing Flow (Next)
✗

9
Autoregressive Flow (proposed)
Autoregressive Flow (𝑑𝜇^,U/𝑑𝑧^,z = 𝑑𝜎^,U/𝑑𝑧^,z = 0 if 𝑖 ≤ 𝑗)
𝑧^,U = 𝜇^,U 𝒛 𝒕,𝟎:𝒊_𝟏 + 𝜎^,U 𝒛 𝒕,𝟎:𝒊_𝟏 ⊙ 𝑧^_G,U
Key features
- Powerful
- Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = ΠU 𝜎^,U 𝐳𝐭_𝟏
Drawbacks
- Difficult to parallelize
✓

10
Inverse Autoregressive Flow (proposed)
Inverting AF (𝝁 𝒕, 𝝈 𝒕 is also autoregressive)
𝒛 𝒕 =
𝒛 𝒕_𝟏 − 𝝁 𝒕 𝒛 𝒕_𝟏
𝝈 𝒕 𝒛 𝒕_𝟏
Key Features
- Equally powerful as AF
- Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = 1/ΠU 𝜎^,U 𝐳𝐭_𝟏
- Parallelizable
3. Parallel computation ✓
✓

11
IAF through Masked Autoencoder (MADE)
Modeling autoregressive 𝝁 𝒕 and 𝝈 𝒕 with MADE
• Removing paths from futures
from Autoencoders
by introducing masks
• MADE is a probabilistic model
𝑝 𝑥 = ΠU 𝑝 𝑥U 𝑥Z:U_G

12
Experiments
IAF is evaluated on image generating models
Models for MNIST
- Convolutional VAE with ResNet blocks
- IAF = 2-layer MADE
- IAF transformations are stacked with ordering reversed alternately
Models for CIFAR-10 (very complicated)

15
IAF in 1 slide
𝒒 𝒛 𝑻 𝒙; 𝝂 𝑻 𝝂 𝑻
𝒑 𝒛 𝒙; 𝝁∗𝒑 𝒛 𝒙; 𝝁
𝒒 𝒛 𝒙; 𝝂 𝑻
∗
𝒒 𝒛 𝒕 𝒙; 𝝂 𝒕 𝝂 𝒕
𝒒 𝒛 𝟎 𝒙; 𝝂 𝟎 𝝂 𝟎
Autoregressive Flow
Inverse Autoregressive Flow
IAF is
ü Easy to compute and differentiate
ü Easy to sample from
ü Parallelizable
ü Flexible
𝒒 𝒛 𝒙; 𝝂 𝑻

We are hiring!
http://www.abeja.asia/
https://www.wantedly.com/companies/abeja

Improving Variational Inference with Inverse Autoregressive Flow

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Improving Variational Inference with Inverse Autoregressive Flow (20)

More from Tatsuya Shirakawa (14)

Recently uploaded (20)

Improving Variational Inference with Inverse Autoregressive Flow