SlideShare a Scribd company logo
Improving	Variational Inference
with	Inverse	Autoregressive	Flow
Jan.	19,	2017
Tatsuya	Shirakawa (tatsuya@abeja.asia)
Diederik P.	Kingma (OpenAI)				Tim	Salimans (OpenAI)				Rafal Jozefowics (OpenAI)
Xi	Chen	(OpenAI)				Ilya	Sutskever (OpenAI)
Max	Welling	(University	of	Amsterdam)
1
Variational Autoencoder (VAE)
log 𝑝 𝒙
≥
𝔼( 𝒛|𝒙 log 𝑝 𝒙, 𝒛 − log 𝑞(𝒛 |𝒙)
∥
log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙
∥
𝔼( 𝒛|𝒙 log 𝑝 𝒙 𝒛 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛
=: ℒ 𝒙; 𝜽
Model
z ~ p(z;η)
x ~ p(x|z;η)
Optimization
maximize
𝜼
1
𝑁
B log 𝑝 𝒙 𝒏; 𝜼
D
EFG
Inference Model
z ~ q(z|x;ν)
Optimization
maximize
𝜽F(𝜼,𝝂)
1
𝑁
B ℒ 𝒙 𝒏; 𝜽
D
EFG
ELBO
𝒘𝒊𝒕𝒉	𝜽 = 𝝁, 𝝂
P(z|x;	μ*)
𝑫 𝑲𝑳(𝒒 ∥ 𝒑)
q(z|x;	ν*)
P(z|x;	μ)
q(z|x;	ν)
2
Requirements	for	the	inference	model	q(z|x)
Computational	Tractability
1. Computationally	cheap	to	compute	and	differentiate
2. Computationally	cheap	to	sample	from
3. Parallel	computation
Accuracy
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
P(z|x;	μ*)
𝑫 𝑲𝑳(𝒒 ∥ 𝒑)
q(z|x;	ν*)
P(z|x;	μ)
q(z|x;	ν)
3
Previous	Designs	of	q(z|x)
Basic	Designs
- Diagonal	Gaussian	Distribution
- Full	Covariance	Gaussian	Distribution
Designs	based	on	Change	of	Variables
- Nice
L.	Dinh et	al.,	“Nice:	non-linear	independent	components	estimation”,	2014
- Normalizing	Flow
D.	J.	Rezende et	al.,	“Variational inference	with	normalizing	flows”,	ICML2015
Designs	based	on	Adding	Auxiliary	Variables
- Hamiltonian	Flow/Hamiltonian	Variational Inference
T.	Salimans et	al.,	”Markov	chain	Monte	Carlo	and	variational inference:	Bridging	the	gap”,	2014
4
Diagonal/Full	Covariance	Gaussian	Distribution
Diagonal:	Efficient	but	not	flexible
𝑞 𝒛 𝒙 = ΠU 𝑁 𝒛𝒊|𝜇U 𝒙 , 𝜎U 𝒙
Full	Covariance:	Not	Efficient	and	not	flexible	(unimodal)
𝑞 𝒛 𝒙 = 𝑁 𝒛|𝝁 𝒙 , 𝚺 𝒙
1. Computationally	cheap	to	compute	and	differentiate ✓ / ✗
2. Computationally	cheap	to	sample	from ✓ / ✗
3. Parallel	computation ✓ / ✗
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✗
5
Change	of	Variables	based	methods
Transoform	𝑞 𝑧Z 𝑥 to	make	more	powerful distribution	
𝑞 𝑧 𝑥 via	sequential	application	of	change	of	variables
𝒛 𝒕 = 𝑓^ 𝒛 𝒕_𝟏
𝑞 𝒛 𝒕 𝒙 = 𝑞 𝒛 𝒕_𝟏 𝒙 det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
_G
⇒ log 𝑞 𝒛 𝑻 𝒙 = log 𝑞 𝒛 𝟎 𝒙 − B log det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
^
• Nice
L.	Dinh et	al.,	“Nice:	non-linear	independent	components	estimation”,	2014
• Normalizing	Flow
D.	J.	Rezende et	al.,	“Variational inference	with	normalizing	flows”,	ICML2015
6
Normalizing	Flow
Transformation	via
𝒛 𝒕 = 𝒛 𝒕_𝟏 + 𝒖 𝒕 𝑓^ 𝒘 𝒕

𝒛 𝒕_𝟏 + 𝑏^
Key	Features
- Determinants	are	computable
Drawbacks
- Information	goes	through	single	bottleneck
1. Computationally	cheap	to	compute	and	differentiate ✓
2. Computationally	cheap	to	sample	from ✓
3. Parallel	computation ✗
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✗
single	bottleneck
⊕
𝒛 𝒕_𝟏
𝒛 𝒕
𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^
𝒖 𝒕 𝑓^ 𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^
7
Hamiltonian	Flow	/	Hamiltonian	Variational Inference
ELBO	with	auxiliary	variables	y
log 𝑝 𝒙 ≥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 − 𝐷23 𝑞 𝒚 𝒙, 𝒛 ∥ 𝑟 𝒚 𝒙, 𝒛 =: ℒ 𝒙
Drawing	(y,	z)	via	HMC
𝑦^, 𝑧^ ~𝐻𝑀𝐶 𝑦^, 𝑧^|𝑦^_G, 𝑧^_G
Key	Features
- Capability	to	sample	from	exact	posterior
Drawbacks
- Long	mixing	time	and	lower	ELBO
1. Computationally	cheap	to	compute	and	differentiate ✗
2. Computationally	cheap	to	sample	from ✗
3. Parallel	computation ✗
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✓
8
Nice
Transform	only	half	of	z	at	each	steps
𝒛 𝒕 = 𝒛 𝒕
𝜶
, 𝒛 𝒕
𝜷
= 𝒛 𝒕_𝟏
𝜶
, 𝒛 𝒕_𝟏
𝜷
+ 𝑓^ 𝒙, 𝒛 𝒕_𝟏
𝜶
,
Key	Features
- Determinant	of	the	Jacobian	det
uvw 𝒛 𝒕x𝟏
u𝒛 𝒕x𝟏
is	always	1
Drawbacks
- Limited	form	of	transformation
- less	accurate	powerful	than	Normalizing	Flow	(Next)	
1. Computationally	cheap	to	compute	and	differentiate ✓
2. Computationally	cheap	to	sample	from ✓
3. Parallel computation ✗
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✗
9
Autoregressive Flow	(proposed)
Autoregressive	Flow	(𝑑𝜇^,U/𝑑𝑧^,z =	𝑑𝜎^,U/𝑑𝑧^,z =	0	if	𝑖 ≤ 𝑗)
𝑧^,U = 𝜇^,U 𝒛 𝒕,𝟎:𝒊_𝟏 + 𝜎^,U 𝒛 𝒕,𝟎:𝒊_𝟏 ⊙ 𝑧^_G,U
Key	features
- Powerful
- Easy	to	compute	det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = ΠU 𝜎^,U 𝐳𝐭_𝟏
Drawbacks
- Difficult	to	parallelize
1. Computationally	cheap	to	compute	and	differentiate ✓
2. Computationally	cheap	to	sample	from ✓
3. Parallel computation ✗
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✓
10
Inverse	Autoregressive Flow	(proposed)
Inverting	AF	(𝝁 𝒕, 𝝈 𝒕	is	also	autoregressive)
𝒛 𝒕 =
𝒛 𝒕_𝟏 − 𝝁 𝒕 𝒛 𝒕_𝟏
𝝈 𝒕 𝒛 𝒕_𝟏
Key	Features
- Equally	powerful	as	AF
- Easy	to	compute	det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = 1/ΠU 𝜎^,U 𝐳𝐭_𝟏
- Parallelizable
1. Computationally	cheap	to	compute	and	differentiate ✓
2. Computationally	cheap	to	sample	from ✓
3. Parallel	computation ✓
4. Sufficiently	flexible	to	match	
the	true	posterior	p(z|x)
✓
11
IAF	through	Masked	Autoencoder (MADE)
Modeling	autoregressive	𝝁 𝒕 and	𝝈 𝒕 with	MADE
• Removing	paths	from	futures
from	Autoencoders
by	introducing	masks
• MADE	is	a	probabilistic	model
𝑝 𝑥 = ΠU 𝑝 𝑥U 𝑥Z:U_G
12
Experiments
IAF	is	evaluated	on	image	generating	models
Models	for	MNIST
- Convolutional	VAE	with	ResNet blocks
- IAF	=	2-layer	MADE
- IAF	transformations	are	stacked	with	ordering	reversed	alternately
Models	for	CIFAR-10	(very	complicated)
13
MNIST
14
CIFAR-10
15
IAF	in	1	slide
𝑫 𝑲𝑳(𝒒 ∥ 𝒑)
𝒒 𝒛 𝑻 𝒙; 𝝂 𝑻 𝝂 𝑻
𝒑 𝒛 𝒙; 𝝁∗𝒑 𝒛 𝒙; 𝝁
𝒒 𝒛 𝒙; 𝝂 𝑻
∗
𝒒 𝒛 𝒕 𝒙; 𝝂 𝒕 𝝂 𝒕
𝒒 𝒛 𝟎 𝒙; 𝝂 𝟎 𝝂 𝟎
Autoregressive Flow
Inverse Autoregressive Flow
IAF is
ü Easy to compute and differentiate
ü Easy to sample from
ü Parallelizable
ü Flexible
𝒒 𝒛 𝒙; 𝝂 𝑻
We	are	hiring!
http://www.abeja.asia/
https://www.wantedly.com/companies/abeja

More Related Content

PDF
統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)
PDF
Fisher Vectorによる画像認識
PPTX
クラシックな機械学習の入門  5. サポートベクターマシン
PDF
敵対的生成ネットワーク(GAN)
PDF
PRML学習者から入る深層生成モデル入門
PDF
PRML輪読#11
PDF
動画認識サーベイv1(メタサーベイ )
PDF
ELBO型VAEのダメなところ
統計的学習理論チュートリアル: 基礎から応用まで (Ibis2012)
Fisher Vectorによる画像認識
クラシックな機械学習の入門  5. サポートベクターマシン
敵対的生成ネットワーク(GAN)
PRML学習者から入る深層生成モデル入門
PRML輪読#11
動画認識サーベイv1(メタサーベイ )
ELBO型VAEのダメなところ

What's hot (20)

PDF
[読会]Long tail learning via logit adjustment
PDF
ベイズ最適化
PDF
最適化計算の概要まとめ
PDF
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
PDF
文献紹介:Token Shift Transformer for Video Classification
PPTX
PyTorch, PixyzによるGenerative Query Networkの実装
PPTX
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
PDF
深層学習と確率プログラミングを融合したEdwardについて
PDF
深層生成モデルを用いたマルチモーダル学習
PDF
(DL hacks輪読) Deep Kernel Learning
PDF
PRML 6.1章 カーネル法と双対表現
PDF
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
PDF
[DL輪読会]Deep Learning 第17章 モンテカルロ法
PPT
Gurobi python
PDF
論文紹介 Pixel Recurrent Neural Networks
PDF
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
PDF
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
PPTX
Noisy Labels と戦う深層学習
PDF
大規模凸最適化問題に対する勾配法
PPTX
[DL輪読会]End-to-End Object Detection with Transformers
[読会]Long tail learning via logit adjustment
ベイズ最適化
最適化計算の概要まとめ
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
文献紹介:Token Shift Transformer for Video Classification
PyTorch, PixyzによるGenerative Query Networkの実装
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
深層学習と確率プログラミングを融合したEdwardについて
深層生成モデルを用いたマルチモーダル学習
(DL hacks輪読) Deep Kernel Learning
PRML 6.1章 カーネル法と双対表現
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
[DL輪読会]Deep Learning 第17章 モンテカルロ法
Gurobi python
論文紹介 Pixel Recurrent Neural Networks
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
Noisy Labels と戦う深層学習
大規模凸最適化問題に対する勾配法
[DL輪読会]End-to-End Object Detection with Transformers
Ad

Viewers also liked (18)

PDF
CSC446: Pattern Recognition (LN7)
PDF
Value iteration networks
PPTX
Introduction of "TrailBlazer" algorithm
PPTX
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
PDF
Learning to learn by gradient descent by gradient descent
PDF
Interaction Networks for Learning about Objects, Relations and Physics
PDF
Conditional Image Generation with PixelCNN Decoders
PDF
Dual Learning for Machine Translation (NIPS 2016)
PPT
時系列データ3
PDF
Safe and Efficient Off-Policy Reinforcement Learning
PDF
Fast and Probvably Seedings for k-Means
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
PDF
[DL輪読会]Convolutional Sequence to Sequence Learning
PDF
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
PDF
NIPS 2016 Overview and Deep Learning Topics
PPTX
Differential privacy without sensitivity [NIPS2016読み会資料]
PDF
Matching networks for one shot learning
PPTX
ICML2016読み会 概要紹介
CSC446: Pattern Recognition (LN7)
Value iteration networks
Introduction of "TrailBlazer" algorithm
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Learning to learn by gradient descent by gradient descent
Interaction Networks for Learning about Objects, Relations and Physics
Conditional Image Generation with PixelCNN Decoders
Dual Learning for Machine Translation (NIPS 2016)
時系列データ3
Safe and Efficient Off-Policy Reinforcement Learning
Fast and Probvably Seedings for k-Means
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
[DL輪読会]Convolutional Sequence to Sequence Learning
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
NIPS 2016 Overview and Deep Learning Topics
Differential privacy without sensitivity [NIPS2016読み会資料]
Matching networks for one shot learning
ICML2016読み会 概要紹介
Ad

Similar to Improving Variational Inference with Inverse Autoregressive Flow (20)

PPTX
PDF
Metrics for generativemodels
PDF
fuzzy logic
PDF
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
PDF
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
PPTX
Variational Autoencoder Tutorial
PPTX
Abductive commonsense reasoning
PDF
Introduction to Graph Theory
PPTX
DFA minimization algorithms in map reduce
PPTX
Lec05.pptx
PDF
Asynchronous Stochastic Optimization, New Analysis and Algorithms
PDF
Supervisory control of discrete event systems for linear temporal logic speci...
PDF
Paper Study: Melding the data decision pipeline
PPTX
Learning a nonlinear embedding by preserving class neibourhood structure 최종
PPTX
Particle filter
PPTX
Inverse Function.pptx
PPTX
[Book Reading] 機械翻訳 - Section 5 No.2
PDF
SIAM-AG21-Topological Persistence Machine of Phase Transition
PDF
"Incremental Lossless Graph Summarization", KDD 2020
PPTX
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx
Metrics for generativemodels
fuzzy logic
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
Variational Autoencoder Tutorial
Abductive commonsense reasoning
Introduction to Graph Theory
DFA minimization algorithms in map reduce
Lec05.pptx
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Supervisory control of discrete event systems for linear temporal logic speci...
Paper Study: Melding the data decision pipeline
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Particle filter
Inverse Function.pptx
[Book Reading] 機械翻訳 - Section 5 No.2
SIAM-AG21-Topological Persistence Machine of Phase Transition
"Incremental Lossless Graph Summarization", KDD 2020
5-Introduction to Parsing and Context Free Grammar-09-05-2023.pptx

More from Tatsuya Shirakawa (14)

PDF
NeurIPS2021読み会 Fairness in Ranking under Uncertainty
PDF
2021 10-07 kdd2021読み会 uc phrase
PDF
医療ビッグデータの今後を見通すために知っておきたい機械学習の基礎〜最前線 agains COVID-19
PDF
ICCV2019 report
PDF
Retail Face Analysis Inside-Out
PDF
データに内在する構造をみるための埋め込み手法
PDF
ヒトの機械学習
PDF
Seeing Unseens with Machine Learning -- 
見えていないものを見出す機械学習
PDF
Taskonomy: Disentangling Task Transfer Learning -- Scouty Meetup 2018 Feb., ...
PDF
Hyperbolic Neural Networks
PDF
Learning to Compose Domain-Specific Transformations for Data Augmentation
PDF
Icml2017 overview
PPTX
Poincare embeddings for Learning Hierarchical Representations
PDF
Dynamic filter networks
NeurIPS2021読み会 Fairness in Ranking under Uncertainty
2021 10-07 kdd2021読み会 uc phrase
医療ビッグデータの今後を見通すために知っておきたい機械学習の基礎〜最前線 agains COVID-19
ICCV2019 report
Retail Face Analysis Inside-Out
データに内在する構造をみるための埋め込み手法
ヒトの機械学習
Seeing Unseens with Machine Learning -- 
見えていないものを見出す機械学習
Taskonomy: Disentangling Task Transfer Learning -- Scouty Meetup 2018 Feb., ...
Hyperbolic Neural Networks
Learning to Compose Domain-Specific Transformations for Data Augmentation
Icml2017 overview
Poincare embeddings for Learning Hierarchical Representations
Dynamic filter networks

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Computer network topology notes for revision
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Lecture1 pattern recognition............
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Quality review (1)_presentation of this 21
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Moving the Public Sector (Government) to a Digital Adoption
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
Computer network topology notes for revision
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
Major-Components-ofNKJNNKNKNKNKronment.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Lecture1 pattern recognition............
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Quality review (1)_presentation of this 21

Improving Variational Inference with Inverse Autoregressive Flow