SlideShare a Scribd company logo
Tutorial on
Variational Autoencoder
SKKU Data Mining Lab
Hojin Yang
Index
Autoencoder
Variational Autoencoder
Autoencoder
output is same as input
Focus on the middle layer(encoding value)
Autoencoder = 자기부호화기 = 자기를 잘 나타내는 부호(encoding 값)를 생성해내는 Neural Net
Reducing the dimensionality of data with neural net
𝐿 𝑊, 𝑏, 𝑊, 𝑏 =
𝑛=1
𝑁
𝑥 𝑛 − 𝑥 𝑥 𝑛
2
If the shape of W is [n,m],
than the shape of 𝑊 is [m,n]
𝑦𝑛
Generally, 𝑊 is not 𝑊 𝑇,
but weight sharing is also possible!
(to reduce the number of parameters)
𝑦1
Autoencoder
original
noisy
input denoisedAutoencoder
Denoised Autoencoder
VAE
식을 어떻게 해석해야 하는가?
뉴럴 넷(Autoencoder)과 어떻게 연결시켜야 하는가?
Intro
VAE
The ultimate goal of statistical learning is
Learning an underlying distribution from finite data
𝑝(𝑥)
28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28)
0 > 1> 2> … > 6 > … > 9
assume that the frequency of number is
motivation
The ultimate goal of statistical learning is
Learning an underlying distribution from finite data
𝑝(𝑥)
28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28)
0 > 1> 2> … > 6 > … > 9
assume that the frequency of number is
𝑥1
The probability density
is very high
VAE
motivation
The ultimate goal of statistical learning is
Learning an underlying distribution from finite data
𝑝(𝑥)
28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28)
assume that the frequency of number is
0 > 1> 2> … > 6 > … > 9
The probability density
is relatively low
𝑥2
VAE
motivation
The ultimate goal of statistical learning is
Learning an underlying distribution from finite data
𝑝(𝑥)
28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28)
0 > 1> 2> … > 6 > … > 9
assume that the frequency of number is
𝑥3
The probability density
is almost zero
VAE
motivation
If you know 𝑝(𝑥), You can sample some data from 𝑝(𝑥)
𝑝(𝑥)
Sampling: Generate 𝑥 ~𝑝(𝑥)
Then, how can we learn a distribution from data?
Sampling
With high probability With extremely low probability
𝑥 (28𝑏𝑦28)
VAE
motivation
Set a parametric model 𝑃 𝜃(𝑥), then find 𝜃
Possibly with maximum likelihood or maximum a posteriori
For example, if parametric model is Gaussian distribution, then find 𝜇, 𝜎
VAE
Explicit density model
VAE
Explicit density model
𝑝(𝑥)는 dataset에 기반해서 고정된 값으로 존재. but 알지 못함(ideal, fixed value)
𝑃 𝜃(𝑥)를 가정하고, dataset에 등장하는 𝑥 값들이 나올 확률을 높이는 식으로
𝑝(𝑥)를 추정 ⇒ 𝑎𝑟𝑔𝑚𝑎𝑥 𝜃 𝑃 𝜃 𝑥
input
𝑥
output
𝑓𝜃(𝑥)
data parameter
𝑦𝑓𝜃1 𝑥
𝑝(𝑦| 𝑓𝜃1 𝑥 )𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ]
VAE
Explicit density model
assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1)
find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net
Actually, we do this using neural Net!
The output of neural net is parameter of distribution
N(𝑓𝜃 𝑥 , 1) 인 정규분포에서 데이터 y가 나올 확률밀도 값을 얻을 수 있음, 𝒑 𝒚 𝒇 𝜽 𝒙
이를 최대화 하는 방향으로 업데이트!
input
𝑥
output
𝑓𝜃(𝑥)
data parameter
𝑦𝑓𝜃1 𝑥
𝑝(𝑦| 𝑓𝜃1 𝑥 ) <
𝑓𝜃2 𝑥
𝑝(𝑦| 𝑓𝜃2 𝑥 )𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ]
VAE
Explicit density model
assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1)
find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net
Actually, we do this using neural Net!
The output of neural net is parameter of distribution
input
𝑥
output
𝑓𝜃(𝑥)
data parameter
𝑦𝑓𝜃1 𝑥 𝑓𝜃2 𝑥 = 𝑓𝜃3 𝑥
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ]
VAE
Explicit density model
assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1)
find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net
Actually, we do this using neural Net!
The output of neural net is parameter of distribution
VAE
Explicit density model
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwSlide from Autoencoder, 이활석
VAE
Explicit density model
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwSlide from Autoencoder, 이활석
From now, we want to learn 𝑝(𝑥) from data
There are two ways to approximate 𝑝(𝑥) using parametric model
𝑥
𝑓𝑟𝑒𝑞
VAE
Explicit density model
From now, we want to learn 𝑝(𝑥) from data
There are two ways to approximate 𝑝(𝑥) using parametric model
𝑥
𝑝(𝑥)
Set a parametric model 𝑃 𝜃(𝑥), then find 𝜃
𝑃(𝑥)는 gaussian이라 가정하고,
MLE를 통해 parameter 𝜃 = 𝜇, 𝜎를 찾자!
VAE
Explicit density model
From now, we want to get 𝑝(𝑥) from data
There are two ways to approximate 𝑝(𝑥) using parametric model
𝑥
Introduce new latent variable 𝑧~ 𝑃 𝜙(𝑧),
And set a parametric model 𝑃 𝜃(𝑥|𝑧), then find 𝜙, 𝜃𝑝(𝑥)
VAE
Explicit density model
From now, we want to get 𝑝(𝑥) from data
There are two ways to approximate 𝑝(𝑥) using parametric model
𝑥
Introduce new latent variable 𝑧~ 𝑃 𝜙(𝑧),
And set a parametric model 𝑃 𝜃(𝑥|𝑧), then find 𝜙, 𝜃
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 lnP x 𝜙, 𝜃 using E.M algorithm
Log likelihood of the dataset is
lnP x 𝜙, 𝜃 =
𝑛=1
𝑁
ln
𝑧=0
1
𝑃 𝜙(𝑧) 𝑃 𝜃(𝑥|𝑧)
𝑧~ 𝑃 𝜙(𝑧), 베르누이 분포 따른다고 가정. parameter 𝜙= True 확률
𝑃 𝜃(𝑥|𝑧) , 정규분포 따른다고 가정. Parameter 𝜃 = 𝜇, 𝜎
𝜇, 𝜎 = f(z), g(z)
𝑝(𝑥)
VAE
Explicit density model
We are aiming maximize the probability of each x in the dataset,
According to:
From now, we want to get 𝑝(𝑥)
Instead of set parameter on 𝑝 𝜃(𝑥) directly,
Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰)
각도
획 굵기
숫자
… 𝑥 (28𝑏𝑦28)
Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰)
우리가 derive한 p(x)
≈
VAE
Explicit density model
𝑓 𝑧 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑓(𝑧)
𝑧 ∙ 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑧
VAE
preliminary
𝑓 𝑧 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑓(𝑧)
Monte Carlo approximation
1
𝑁 𝑖=1
𝑁
𝑓(𝑧𝑖), 𝑧𝑖~𝑝(𝑧)
𝑧 ∙ 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑧
1. 𝑝(𝑧)에서 𝑧𝑖를 sampling한 뒤, 𝑓(𝑧𝑖)를 계산.
2. 1을 여러 번 반복한 후 평균 취함.
VAE
preliminary
And log is concave
VAE
preliminary
Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰)
Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰)
VAE
Naïve attempt
𝐿 𝑥𝑖 = log 𝑝( 𝑥𝑖)
Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰)
Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) ≈ log 𝑝 𝜃(𝑥𝑖|𝑧)𝑝(𝑧) 𝑑𝑧
= log 𝑝(𝑥𝑖| 𝑔 𝜃 𝑧 )𝑝(𝑧) 𝑑𝑧
VAE
Naïve attempt
𝐿 𝑥𝑖 = log 𝑝( 𝑥𝑖)
Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰)
Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) ≈ log 𝑝 𝜃(𝑥𝑖|𝑧)𝑝(𝑧) 𝑑𝑧
= log 𝑝(𝑥𝑖| 𝑔 𝜃 𝑧 )𝑝(𝑧) 𝑑𝑧
= log 𝐸𝑧~𝑝(𝑧)[𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
≥ 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
=
1
𝑀 𝑗=1
𝑀
log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다.
2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와 실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다.
3. 위 과정을 i와 j에 대하여 계속 반복한다.
VAE
Naïve attempt
𝑝(𝑧)
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
VAE
Naïve attempt
𝑝(𝑧)
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
𝑧1,1 = (0.2,0.1)
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑥1
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
VAE
Naïve attempt
=
1
𝑀 𝑗=1
𝑀
log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
𝑝(𝑧)
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
𝑧1,2 = (0.4,1.1)
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑥1
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
VAE
Naïve attempt
=
1
𝑀 𝑗=1
𝑀
log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
𝑝(𝑧)
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
𝑧1,𝑗 = (1.1,0.1)
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑥1
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
VAE
Naïve attempt
=
1
𝑀 𝑗=1
𝑀
log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
𝑝(𝑧)
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
𝑧2,1 = (0.1,0.5)
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑥2
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
VAE
Naïve attempt
=
1
𝑀 𝑗=1
𝑀
log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
1
2
34
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
≈
VAE
Naïve attempt
이상적으로 학습이 되었다면..
1
2
34
𝑥1 𝑥2 𝑥10… …
…
𝑥𝑖
≈
VAE
Naïve attempt
이상적으로 학습이 되었다면..
VAE
Naïve attempt
https://www.slideshare.net/haezoom/variational-autoencoder-understanding-variational-autoencoder-from-various-perspectives
slide from VAE 여러 각도에서
이해하기, 윤상웅
𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
= 𝑗=1
𝑀
𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다.
2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와
실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다.
3. 위 과정을 i와 j에 대하여 계속 반복한다.
VAE
variational distribution
Let’s use𝐸𝑧~𝑝(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 instead of 𝐸𝑧~𝑝(𝑧)
We can now get “differentiating” samples!
However, 𝑝 𝑧 𝑥𝑖 is intractable(cannot calculate)
Therefore, we go variational.
We approximate the posterior 𝑝 𝑧 𝑥𝑖
𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
= 𝑗=1
𝑀
𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
즉, 𝑥𝑖가 주어지면
𝑝 𝑧 𝑥𝑖 에서 𝑧를 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔한 뒤
𝑔 𝜃 𝑧 를 구한 후
𝑥𝑖와 𝑔 𝜃 𝑧 를 비교한다.1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다.
2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와
실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다.
3. 위 과정을 i와 j에 대하여 계속 반복한다.
VAE
variational distribution
Since we will never know the posterior 𝑝 𝑧 𝑥𝑖
We approximate it with variational distribution 𝑞 𝜙 𝑧 𝑥𝑖
𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑝(𝑧|𝑥 𝑖)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑞 𝜙(𝑧|𝑥 𝑖)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ]
With sufficiently good 𝑞 𝜙(𝑧|𝑥𝑖), we will get better gradients
VAE
variational distribution
VAE
variational distribution
https://www.slideshare.net/haezoom/variational-autoencoder-understanding-variational-autoencoder-from-various-perspectives
slide from VAE 여러 각도에서
이해하기, 윤상웅
Now, we have two problems
1. Get 𝑞 𝜙 𝑧 𝑥𝑖 which is similar to 𝑝(𝑧|𝑥𝑖)
2. Maximize 𝐸 𝑧~𝑞 𝜙(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧
1. How to calculate the distance between 𝑞 𝜙 𝑧 𝑥𝑖 and 𝑝(𝑧|𝑥𝑖)?
2. How much does𝐸 𝑧~𝑞 𝜙(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 deviate from the marginal likelihood 𝑝(𝑥𝑖)?
Then following questions arise…
VAE
deriving ELBO
𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥
𝑝 𝑧 𝑥
𝑑𝑧
= 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥)
𝑝(𝑧, 𝑥)
𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥)
𝑝 𝑥 𝑧 ∙ 𝑝(𝑧)
𝑑𝑧
VAE
deriving ELBO
= 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥)
𝑝(𝑧, 𝑥)
𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥)
𝑝 𝜃 𝑥 𝑧 ∙ 𝑝(𝑧)
𝑑𝑧
= 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥
𝑝(𝑧)
𝑑𝑧 + 𝑞 𝜙 𝑧 𝑥 log 𝑝(𝑥) 𝑑𝑧 − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧
= 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥
𝑝(𝑧)
𝑑𝑧 + log 𝑝(𝑥) 𝑞 𝜙 𝑧 𝑥 𝑑𝑧 − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧
= 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥
𝑝(𝑧)
𝑑𝑧 + log 𝑝(𝑥) − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧
= 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍
𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝑞 𝜙 𝑧 𝑥 log
𝑞 𝜙 𝑧 𝑥
𝑝 𝑧 𝑥
𝑑𝑧
VAE
deriving ELBO
𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍
VAE
deriving ELBO
𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍
log 𝑝(𝑥) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝑥 + 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧
𝐸𝐿𝐵𝑂 = 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧
VAE
deriving ELBO
𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍
log 𝑝(𝑥) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝑥 + 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧
𝐸𝐿𝐵𝑂 = 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧
log 𝑝(𝑥)
Fix!
𝐸𝐿𝐵𝑂
𝐷 𝐾𝐿(𝑞 𝑧 𝑥 | 𝑝 𝑧 𝑥
Remind that we have two problems
1. Get 𝑞 𝜙 𝑧 𝑥 which is similar to 𝑝(𝑧|𝑥)
2. Maximize 𝐸 𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍
If we maximize 𝐸𝐿𝐵𝑂 , we can solve both
problems!
VAE
deriving ELBO
VAE
deriving ELBO
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwSlide from Autoencoder, 이활석
VAE
neural net
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwSlide from Autoencoder, 이활석
X
Encoder
q(z|x)
𝜇 𝜎
Sample z from q(z|x)
decoder
p(x|z)
𝜇 = 𝑓(𝑧)
VAE
neural net
𝑋 − 𝑓 𝑧
2
VAE
neural net
VAE
neural net
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwSlide from Autoencoder, 이활석
VAE
neural net
VAE
neural net
References
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐼𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒
ratsgo’s blog (Korean) https://ratsgo.github.io/generative%20model/2017/12/19/vi/
𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐴𝑢𝑡𝑜𝑒𝑛𝑐𝑜𝑑𝑒𝑟
그림 그리는 AI, 이활석
Pr12, 차준범
https://www.youtube.com/watch?v=RYRgX3WD178
https://www.youtube.com/watch?v=KYA-GEhObIs
https://www.youtube.com/watch?v=uaaqyVS9-rMLec, Ail Ghodsi
ratsgo’s blog (Korean) https://ratsgo.github.io/generative%20model/2018/01/27/VAE/
https://www.slideshare.net/haezoom/variational-autoencoder-understanding-
variational-autoencoder-from-various-perspectives
VAE 여러 각도에서
이해하기, 윤상웅*
https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwAutoencoder, 이활석*
http://fbsight.com/t/autoencoder-vae/132132/12Some recommended items

More Related Content

PDF
오토인코더의 모든 것
PDF
Deep Generative Models
PDF
Introduction to Autoencoders
PDF
PR-409: Denoising Diffusion Probabilistic Models
PDF
Autoencoders
PPTX
[Paper Reading] Attention is All You Need
PPTX
[AIoTLab]attention mechanism.pptx
PPTX
Batch normalization presentation
오토인코더의 모든 것
Deep Generative Models
Introduction to Autoencoders
PR-409: Denoising Diffusion Probabilistic Models
Autoencoders
[Paper Reading] Attention is All You Need
[AIoTLab]attention mechanism.pptx
Batch normalization presentation

What's hot (20)

PPTX
PPTX
Recurrent neural network
PDF
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
PDF
GAN - Theory and Applications
PPTX
Autoencoders in Deep Learning
PPTX
알기쉬운 Variational autoencoder
PDF
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
PDF
Tutorial on Deep Generative Models
PDF
Introduction to Diffusion Models
PDF
Forward-Forward Algorithm
PDF
全日本コンピュータビジョン勉強会:Disentangling and Unifying Graph Convolutions for Skeleton-B...
PDF
그림 그리는 AI
PPTX
Recurrent Neural Network
PDF
An introduction to deep reinforcement learning
PPTX
Attention Is All You Need
PPTX
Resnet.pptx
PDF
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
PPTX
Autoencoder
PDF
Introduction to Neural Networks
PPTX
An introduction to reinforcement learning
Recurrent neural network
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
GAN - Theory and Applications
Autoencoders in Deep Learning
알기쉬운 Variational autoencoder
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Tutorial on Deep Generative Models
Introduction to Diffusion Models
Forward-Forward Algorithm
全日本コンピュータビジョン勉強会:Disentangling and Unifying Graph Convolutions for Skeleton-B...
그림 그리는 AI
Recurrent Neural Network
An introduction to deep reinforcement learning
Attention Is All You Need
Resnet.pptx
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Autoencoder
Introduction to Neural Networks
An introduction to reinforcement learning
Ad

Similar to Variational Autoencoder Tutorial (20)

PDF
2021 04-01-dalle
PPTX
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
PDF
Auto-encoding variational bayes
PDF
Lecture 5: Neural Networks II
PDF
Matrix calculus
PDF
GAN in_kakao
PPTX
20230213_ComputerVision_연구.pptx
DOCX
Regression &amp; Classification
PDF
is anyone_interest_in_auto-encoding_variational-bayes
PDF
Deep Neural Network
PDF
Normalizing flow
PDF
Iclr2016 vaeまとめ
PDF
Introduction to Variational Auto Encoder
PPTX
PPT - Deep and Confident Prediction For Time Series at Uber
PDF
알고리즘 중심의 머신러닝 가이드 Ch04
PDF
diffusion 모델부터 DALLE2까지.pdf
PPTX
Deep Learning for AI (2)
PDF
About RNN
PDF
About RNN
PDF
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
2021 04-01-dalle
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Auto-encoding variational bayes
Lecture 5: Neural Networks II
Matrix calculus
GAN in_kakao
20230213_ComputerVision_연구.pptx
Regression &amp; Classification
is anyone_interest_in_auto-encoding_variational-bayes
Deep Neural Network
Normalizing flow
Iclr2016 vaeまとめ
Introduction to Variational Auto Encoder
PPT - Deep and Confident Prediction For Time Series at Uber
알고리즘 중심의 머신러닝 가이드 Ch04
diffusion 모델부터 DALLE2까지.pdf
Deep Learning for AI (2)
About RNN
About RNN
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
Ad

Recently uploaded (20)

PPTX
L1 - Introduction to python Backend.pptx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Digital Strategies for Manufacturing Companies
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Essential Infomation Tech presentation.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
L1 - Introduction to python Backend.pptx
Softaken Excel to vCard Converter Software.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PTS Company Brochure 2025 (1).pdf.......
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Digital Strategies for Manufacturing Companies
VVF-Customer-Presentation2025-Ver1.9.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Odoo Companies in India – Driving Business Transformation.pdf
Understanding Forklifts - TECH EHS Solution
Navsoft: AI-Powered Business Solutions & Custom Software Development
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Essential Infomation Tech presentation.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

Variational Autoencoder Tutorial

  • 1. Tutorial on Variational Autoencoder SKKU Data Mining Lab Hojin Yang
  • 3. Autoencoder output is same as input Focus on the middle layer(encoding value) Autoencoder = 자기부호화기 = 자기를 잘 나타내는 부호(encoding 값)를 생성해내는 Neural Net Reducing the dimensionality of data with neural net
  • 4. 𝐿 𝑊, 𝑏, 𝑊, 𝑏 = 𝑛=1 𝑁 𝑥 𝑛 − 𝑥 𝑥 𝑛 2 If the shape of W is [n,m], than the shape of 𝑊 is [m,n] 𝑦𝑛 Generally, 𝑊 is not 𝑊 𝑇, but weight sharing is also possible! (to reduce the number of parameters) 𝑦1 Autoencoder
  • 6. VAE 식을 어떻게 해석해야 하는가? 뉴럴 넷(Autoencoder)과 어떻게 연결시켜야 하는가? Intro
  • 7. VAE The ultimate goal of statistical learning is Learning an underlying distribution from finite data 𝑝(𝑥) 28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28) 0 > 1> 2> … > 6 > … > 9 assume that the frequency of number is motivation
  • 8. The ultimate goal of statistical learning is Learning an underlying distribution from finite data 𝑝(𝑥) 28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28) 0 > 1> 2> … > 6 > … > 9 assume that the frequency of number is 𝑥1 The probability density is very high VAE motivation
  • 9. The ultimate goal of statistical learning is Learning an underlying distribution from finite data 𝑝(𝑥) 28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28) assume that the frequency of number is 0 > 1> 2> … > 6 > … > 9 The probability density is relatively low 𝑥2 VAE motivation
  • 10. The ultimate goal of statistical learning is Learning an underlying distribution from finite data 𝑝(𝑥) 28𝑏𝑦28 mnist data set 𝑥 (28𝑏𝑦28) 0 > 1> 2> … > 6 > … > 9 assume that the frequency of number is 𝑥3 The probability density is almost zero VAE motivation
  • 11. If you know 𝑝(𝑥), You can sample some data from 𝑝(𝑥) 𝑝(𝑥) Sampling: Generate 𝑥 ~𝑝(𝑥) Then, how can we learn a distribution from data? Sampling With high probability With extremely low probability 𝑥 (28𝑏𝑦28) VAE motivation
  • 12. Set a parametric model 𝑃 𝜃(𝑥), then find 𝜃 Possibly with maximum likelihood or maximum a posteriori For example, if parametric model is Gaussian distribution, then find 𝜇, 𝜎 VAE Explicit density model
  • 13. VAE Explicit density model 𝑝(𝑥)는 dataset에 기반해서 고정된 값으로 존재. but 알지 못함(ideal, fixed value) 𝑃 𝜃(𝑥)를 가정하고, dataset에 등장하는 𝑥 값들이 나올 확률을 높이는 식으로 𝑝(𝑥)를 추정 ⇒ 𝑎𝑟𝑔𝑚𝑎𝑥 𝜃 𝑃 𝜃 𝑥
  • 14. input 𝑥 output 𝑓𝜃(𝑥) data parameter 𝑦𝑓𝜃1 𝑥 𝑝(𝑦| 𝑓𝜃1 𝑥 )𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ] VAE Explicit density model assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1) find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net Actually, we do this using neural Net! The output of neural net is parameter of distribution N(𝑓𝜃 𝑥 , 1) 인 정규분포에서 데이터 y가 나올 확률밀도 값을 얻을 수 있음, 𝒑 𝒚 𝒇 𝜽 𝒙 이를 최대화 하는 방향으로 업데이트!
  • 15. input 𝑥 output 𝑓𝜃(𝑥) data parameter 𝑦𝑓𝜃1 𝑥 𝑝(𝑦| 𝑓𝜃1 𝑥 ) < 𝑓𝜃2 𝑥 𝑝(𝑦| 𝑓𝜃2 𝑥 )𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ] VAE Explicit density model assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1) find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net Actually, we do this using neural Net! The output of neural net is parameter of distribution
  • 16. input 𝑥 output 𝑓𝜃(𝑥) data parameter 𝑦𝑓𝜃1 𝑥 𝑓𝜃2 𝑥 = 𝑓𝜃3 𝑥 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝒑 𝜽 𝒚 𝒙 𝑜𝑟 𝒑 𝒚 𝒇 𝜽 𝒙 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 − 𝐥𝐨𝐠[𝒑 𝒚 𝒇 𝜽 𝒙 ] VAE Explicit density model assume that 𝑝 𝜃 𝑦 𝑥 follows normal distribution, N(𝑓𝜃 𝑥 , 1) find 𝑓𝜃 𝑥 that maximize 𝑝 𝜃 𝑦 𝑥 using neural net Actually, we do this using neural Net! The output of neural net is parameter of distribution
  • 19. From now, we want to learn 𝑝(𝑥) from data There are two ways to approximate 𝑝(𝑥) using parametric model 𝑥 𝑓𝑟𝑒𝑞 VAE Explicit density model
  • 20. From now, we want to learn 𝑝(𝑥) from data There are two ways to approximate 𝑝(𝑥) using parametric model 𝑥 𝑝(𝑥) Set a parametric model 𝑃 𝜃(𝑥), then find 𝜃 𝑃(𝑥)는 gaussian이라 가정하고, MLE를 통해 parameter 𝜃 = 𝜇, 𝜎를 찾자! VAE Explicit density model
  • 21. From now, we want to get 𝑝(𝑥) from data There are two ways to approximate 𝑝(𝑥) using parametric model 𝑥 Introduce new latent variable 𝑧~ 𝑃 𝜙(𝑧), And set a parametric model 𝑃 𝜃(𝑥|𝑧), then find 𝜙, 𝜃𝑝(𝑥) VAE Explicit density model
  • 22. From now, we want to get 𝑝(𝑥) from data There are two ways to approximate 𝑝(𝑥) using parametric model 𝑥 Introduce new latent variable 𝑧~ 𝑃 𝜙(𝑧), And set a parametric model 𝑃 𝜃(𝑥|𝑧), then find 𝜙, 𝜃 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 lnP x 𝜙, 𝜃 using E.M algorithm Log likelihood of the dataset is lnP x 𝜙, 𝜃 = 𝑛=1 𝑁 ln 𝑧=0 1 𝑃 𝜙(𝑧) 𝑃 𝜃(𝑥|𝑧) 𝑧~ 𝑃 𝜙(𝑧), 베르누이 분포 따른다고 가정. parameter 𝜙= True 확률 𝑃 𝜃(𝑥|𝑧) , 정규분포 따른다고 가정. Parameter 𝜃 = 𝜇, 𝜎 𝜇, 𝜎 = f(z), g(z) 𝑝(𝑥) VAE Explicit density model
  • 23. We are aiming maximize the probability of each x in the dataset, According to: From now, we want to get 𝑝(𝑥) Instead of set parameter on 𝑝 𝜃(𝑥) directly, Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰) 각도 획 굵기 숫자 … 𝑥 (28𝑏𝑦28) Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) 우리가 derive한 p(x) ≈ VAE Explicit density model
  • 24. 𝑓 𝑧 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑓(𝑧) 𝑧 ∙ 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑧 VAE preliminary
  • 25. 𝑓 𝑧 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑓(𝑧) Monte Carlo approximation 1 𝑁 𝑖=1 𝑁 𝑓(𝑧𝑖), 𝑧𝑖~𝑝(𝑧) 𝑧 ∙ 𝑝 𝑧 𝑑𝑧 = 𝐸𝑧~𝑝(𝑧) 𝑧 1. 𝑝(𝑧)에서 𝑧𝑖를 sampling한 뒤, 𝑓(𝑧𝑖)를 계산. 2. 1을 여러 번 반복한 후 평균 취함. VAE preliminary
  • 26. And log is concave VAE preliminary
  • 27. Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰) Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) VAE Naïve attempt
  • 28. 𝐿 𝑥𝑖 = log 𝑝( 𝑥𝑖) Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰) Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) ≈ log 𝑝 𝜃(𝑥𝑖|𝑧)𝑝(𝑧) 𝑑𝑧 = log 𝑝(𝑥𝑖| 𝑔 𝜃 𝑧 )𝑝(𝑧) 𝑑𝑧 VAE Naïve attempt
  • 29. 𝐿 𝑥𝑖 = log 𝑝( 𝑥𝑖) Let’s use latent variable 𝑧 that follows standard normal distribution N(0, 𝑰) Let’s assume that 𝒑 𝜽 𝒙 𝒛 𝑜𝑟 𝒑 𝒙 𝒈 𝜽 𝒛 is gaussian with N(𝒈 𝜽 𝒛 , 𝑰) ≈ log 𝑝 𝜃(𝑥𝑖|𝑧)𝑝(𝑧) 𝑑𝑧 = log 𝑝(𝑥𝑖| 𝑔 𝜃 𝑧 )𝑝(𝑧) 𝑑𝑧 = log 𝐸𝑧~𝑝(𝑧)[𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] ≥ 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] = 1 𝑀 𝑗=1 𝑀 log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧) 1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다. 2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와 실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다. 3. 위 과정을 i와 j에 대하여 계속 반복한다. VAE Naïve attempt
  • 30. 𝑝(𝑧) 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 VAE Naïve attempt
  • 31. 𝑝(𝑧) 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑧1,1 = (0.2,0.1) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥1 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 VAE Naïve attempt = 1 𝑀 𝑗=1 𝑀 log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
  • 32. 𝑝(𝑧) 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑧1,2 = (0.4,1.1) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥1 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 VAE Naïve attempt = 1 𝑀 𝑗=1 𝑀 log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
  • 33. 𝑝(𝑧) 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑧1,𝑗 = (1.1,0.1) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥1 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 VAE Naïve attempt = 1 𝑀 𝑗=1 𝑀 log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
  • 34. 𝑝(𝑧) 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑧2,1 = (0.1,0.5) 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑥2 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 VAE Naïve attempt = 1 𝑀 𝑗=1 𝑀 log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧)
  • 35. 1 2 34 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 ≈ VAE Naïve attempt 이상적으로 학습이 되었다면..
  • 36. 1 2 34 𝑥1 𝑥2 𝑥10… … … 𝑥𝑖 ≈ VAE Naïve attempt 이상적으로 학습이 되었다면..
  • 38. 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] = 𝑗=1 𝑀 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧) 1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다. 2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와 실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다. 3. 위 과정을 i와 j에 대하여 계속 반복한다. VAE variational distribution
  • 39. Let’s use𝐸𝑧~𝑝(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 instead of 𝐸𝑧~𝑝(𝑧) We can now get “differentiating” samples! However, 𝑝 𝑧 𝑥𝑖 is intractable(cannot calculate) Therefore, we go variational. We approximate the posterior 𝑝 𝑧 𝑥𝑖 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] = 𝑗=1 𝑀 𝑝 𝑥𝑖 𝑔 𝜃 𝑧𝑗 , 𝑧~𝑝(𝑧) 즉, 𝑥𝑖가 주어지면 𝑝 𝑧 𝑥𝑖 에서 𝑧를 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔한 뒤 𝑔 𝜃 𝑧 를 구한 후 𝑥𝑖와 𝑔 𝜃 𝑧 를 비교한다.1. 표준정규분포 𝑝(𝑧) 에서 𝑧𝑗 를 셈플링한다. 2. 뉴럴넷을 통해 얻은 값인 𝑔 𝜃 𝑧𝑗 와 실제 𝑥𝑖 가 가까워지도록 gradient descent를 진행한다. 3. 위 과정을 i와 j에 대하여 계속 반복한다. VAE variational distribution
  • 40. Since we will never know the posterior 𝑝 𝑧 𝑥𝑖 We approximate it with variational distribution 𝑞 𝜙 𝑧 𝑥𝑖 𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑝(𝑧)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] 𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑝(𝑧|𝑥 𝑖)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] 𝐿 𝑥𝑖 ≅ 𝐸𝑧~𝑞 𝜙(𝑧|𝑥 𝑖)[log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 ] With sufficiently good 𝑞 𝜙(𝑧|𝑥𝑖), we will get better gradients VAE variational distribution
  • 42. Now, we have two problems 1. Get 𝑞 𝜙 𝑧 𝑥𝑖 which is similar to 𝑝(𝑧|𝑥𝑖) 2. Maximize 𝐸 𝑧~𝑞 𝜙(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 1. How to calculate the distance between 𝑞 𝜙 𝑧 𝑥𝑖 and 𝑝(𝑧|𝑥𝑖)? 2. How much does𝐸 𝑧~𝑞 𝜙(𝑧|𝑥 𝑖) log 𝑝 𝑥𝑖 𝑔 𝜃 𝑧 deviate from the marginal likelihood 𝑝(𝑥𝑖)? Then following questions arise… VAE deriving ELBO
  • 43. 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 𝑝 𝑧 𝑥 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥) 𝑝(𝑧, 𝑥) 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥) 𝑝 𝑥 𝑧 ∙ 𝑝(𝑧) 𝑑𝑧 VAE deriving ELBO
  • 44. = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥) 𝑝(𝑧, 𝑥) 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 ∙ 𝑝(𝑥) 𝑝 𝜃 𝑥 𝑧 ∙ 𝑝(𝑧) 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 𝑝(𝑧) 𝑑𝑧 + 𝑞 𝜙 𝑧 𝑥 log 𝑝(𝑥) 𝑑𝑧 − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 𝑝(𝑧) 𝑑𝑧 + log 𝑝(𝑥) 𝑞 𝜙 𝑧 𝑥 𝑑𝑧 − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧 = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 𝑝(𝑧) 𝑑𝑧 + log 𝑝(𝑥) − 𝑞 𝜙 𝑧 𝑥 log 𝑝 𝜃(𝑥|𝑧) 𝑑𝑧 = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝑞 𝜙 𝑧 𝑥 log 𝑞 𝜙 𝑧 𝑥 𝑝 𝑧 𝑥 𝑑𝑧 VAE deriving ELBO
  • 45. 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 VAE deriving ELBO
  • 46. 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 log 𝑝(𝑥) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝑥 + 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝐸𝐿𝐵𝑂 = 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 VAE deriving ELBO
  • 47. 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 ||𝑝 𝑧 𝑥 ) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 + log 𝑝(𝑥) − 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 log 𝑝(𝑥) = 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝑥 + 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 𝐸𝐿𝐵𝑂 = 𝐸𝑧~𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 − 𝐷 𝐾𝐿(𝑞 𝜙 𝑧 𝑥 | 𝑝 𝑧 log 𝑝(𝑥) Fix! 𝐸𝐿𝐵𝑂 𝐷 𝐾𝐿(𝑞 𝑧 𝑥 | 𝑝 𝑧 𝑥 Remind that we have two problems 1. Get 𝑞 𝜙 𝑧 𝑥 which is similar to 𝑝(𝑧|𝑥) 2. Maximize 𝐸 𝑞 𝜙(𝑧|𝑥) log 𝑝 𝜃 𝑋 𝑍 If we maximize 𝐸𝐿𝐵𝑂 , we can solve both problems! VAE deriving ELBO
  • 50. X Encoder q(z|x) 𝜇 𝜎 Sample z from q(z|x) decoder p(x|z) 𝜇 = 𝑓(𝑧) VAE neural net 𝑋 − 𝑓 𝑧 2
  • 55. References 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐼𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒 ratsgo’s blog (Korean) https://ratsgo.github.io/generative%20model/2017/12/19/vi/ 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙 𝐴𝑢𝑡𝑜𝑒𝑛𝑐𝑜𝑑𝑒𝑟 그림 그리는 AI, 이활석 Pr12, 차준범 https://www.youtube.com/watch?v=RYRgX3WD178 https://www.youtube.com/watch?v=KYA-GEhObIs https://www.youtube.com/watch?v=uaaqyVS9-rMLec, Ail Ghodsi ratsgo’s blog (Korean) https://ratsgo.github.io/generative%20model/2018/01/27/VAE/ https://www.slideshare.net/haezoom/variational-autoencoder-understanding- variational-autoencoder-from-various-perspectives VAE 여러 각도에서 이해하기, 윤상웅* https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjwAutoencoder, 이활석* http://fbsight.com/t/autoencoder-vae/132132/12Some recommended items