Deep Feature Consistent VAE

Deep Feature Consistent Variational Autoencoder
PR12-101
09 Sep, 2018
Vision and Intelligence System Laboratory In PNU
Kang, MinGuk
1
X. Hou, L. Shen, K. Sun, and G. Qiu, “Deep feature consistent variational autoencoder,”
in Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. IEEE, 2017, pp. 1133–1141.
Picture: https://twitter.com/DmitryUlyanovML

1. Preliminaries
1. Variational Autoencoder (https://arxiv.org/abs/1312.6114)
N(0,1) ~ Z 𝑝 𝜃(𝑋|𝑍) X
Decoder
If we assume 𝑝 𝜃(𝑋|𝑍) to be Gaussian distribution,
This becomes Maximum likelihood Estimation problem that maximizes
p x = ‫׬‬ 𝑝 𝑥 𝑧; 𝜃 𝑝(𝑧)𝑑𝑧 .
𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝜃 𝑝(𝑥; 𝜃)
2

1. Preliminaries
1. Variational Autoencoder
Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908 3

1. Preliminaries
Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908 4
Euclidean distance: d(a,b) < d(a,c)
Z 𝑝 𝜃(𝑋|𝑍) X
Decoder
reshape
(b)

1. Preliminaries
5
Picture: https://www.facebook.com/groups/TensorFlowKR/permalink/496009234073473/?hc_location=ufi
Download ppt: https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjw

1. Preliminaries
6
Picture: https://www.facebook.com/groups/TensorFlowKR/permalink/496009234073473/?hc_location=ufi
Download ppt: https://mega.nz/#!tBo3zAKR!yE6tZ0g-GyUyizDf7uglDk2_ahP-zj5trVZSLW3GAjw

1. Preliminaries
So, Diederik P Kingma introduced Variational Autoencoder.
Z 𝑝 𝜃(. ) X
Decoder
X 𝑞∅(. )
Encoder
𝜇
𝜎
𝑍 = 𝜇 + 𝜎𝜀
Where 𝜀 ~ N(0,1)
Reparameterization trick
𝑞∅ 𝑍 𝑋
Variational Inference
𝑝 𝜃(𝑋|𝑍)
7

1. Preliminaries
8
−𝐿 𝑎
𝜃, ∅; 𝑋 = න
𝑍
𝑞∅ 𝑍 𝑋 × log 𝑃 𝜃 𝑋|𝑍 − 𝐷 𝐾𝐿(𝑞∅(𝑍|𝑋)| 𝑃 𝜃 𝑍
= −
1
𝑀
σ𝑖=1
𝑀
σ 𝑗=1
𝐷
{𝑋𝑖,𝑗log 𝑃𝑖,𝑗 + 1 − 𝑋𝑖,𝑗 𝑙𝑜𝑔(1 − 𝑃𝑖,𝑗)} +
1
𝐿
σ𝑖=1
𝐷
[
1
2
σ 𝑗=1
𝐽
{1 − log 𝜎𝑖,𝑗
2
+ 𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
}]
Reconstruction loss Regularization loss

1. Preliminaries
9
−𝐿 𝑎 𝜃, ∅; 𝑋 = −
1
𝑀
෍
𝑖=1
𝑀
෍
𝑗=1
𝐷
{𝑋𝑖,𝑗log 𝑃𝑖,𝑗 + 1 − 𝑋𝑖,𝑗 𝑙𝑜𝑔(1 − 𝑃𝑖,𝑗)} +
1
𝐿
෍
𝑖=1
𝐷
[
1
2
෍
𝑗=1
𝐽
{1 − log 𝜎𝑖,𝑗
2
+ 𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
}]

1. Preliminaries
2. Relationship between activation map and image generation (https://arxiv.org/abs/1412.0035)
10
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: Proceedings of the IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR). (2015)
Original Image
h_1 h_2 h_3 h_4

1. Preliminaries
2. Relationship between activation map and image generation (https://arxiv.org/abs/1412.0035)
11
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: Proceedings of the IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR). (2015)

2. Deep Feature Consistent VAE
1. Terminology
12
1. Distortion: the dissimilarity between the reconstructed image ො𝑥 and the original image 𝑥.
2. Perceptual quality: the visual quality of ො𝑥, regardless of its similarity to 𝑥.
namely, it is the extent to which ො𝑥 looks like a valid natural image.
>
Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR. (2018)
※ Distortion measure : Mean Square Error, Cross Entropy Error
※ Perception measure : Divergence between 𝑝 𝑑𝑎𝑡𝑎 and 𝑝 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑

2. Motivation
13
1. VAE generates blurry images. Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR. (2018)

2. Motivation
14
Model generates average of training images

3. Perceptual Loss
15
𝐿 𝐷𝐹𝐶𝑉𝐴𝐸 = 𝐿 𝑘𝑙 + 𝐿 𝑝𝑒𝑟𝑐𝑒𝑝𝑡𝑢𝑎𝑙, 𝑤ℎ𝑒𝑟𝑒 𝐿 𝑝𝑒𝑟𝑐𝑒𝑝𝑡𝑢𝑎𝑙 = 𝐿 𝑟𝑒𝑐
𝑟𝑒𝑙𝑢1_1
+ 𝐿 𝑟𝑒𝑐
𝑟𝑒𝑙𝑢2_1
+ …

3. Perceptual Loss
16
Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: CVPR. (2018)
𝐿 𝐷𝐹𝐶𝑉𝐴𝐸 = 𝐿 𝑘𝑙 + 𝐿 𝑝𝑒𝑟𝑐𝑒𝑝𝑡𝑢𝑎𝑙, 𝑤ℎ𝑒𝑟𝑒 𝐿 𝑝𝑒𝑟𝑐𝑒𝑝𝑡𝑢𝑎𝑙 = 𝐿 𝑟𝑒𝑐
𝑟𝑒𝑙𝑢1_1
+ 𝐿 𝑟𝑒𝑐
𝑟𝑒𝑙𝑢2_1
+ …

3. Perceptual Loss
17
Use nearest neighbor method by a scale of 2

3. Results
1. Experiments(Generated fake face images from 100D latent vector z)
18

3. Results
2. Experiments(Generated fake face images from Input Images)
19

3. Results
3. Experiments(Vector arithmetic)
20
Smile latent vector = average latent vector of smile people - average latent vector of non smile people
Picture: http://clipartmag.com/smile-mouth-clipart
= -

3. Results
3. Experiments(Vector arithmetic)
21

3. Results
4. Experiments(Facial Attribute Prediction)
22
Z
Encoder
𝜇
𝜎
𝑍 = 𝜇 + 𝜎𝜀
Where 𝜀 ~ N(0,1)
Linear SVM
40
Dim
Binary classification

3. Results
4. Experiments(Facial Attribute Prediction)
23

4. Reference
24
1. Deep Feature Consistent Variational Autoencoder
2. Tutorial on Variational Autoencoders
3. The perception-distortion tradeoff.
4. Understanding deep image representations by inverting them.
5. Texture Synthesis Using Convolutional Neural Networks
6. Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Deep Feature Consistent VAE

More Related Content

What's hot (20)

Similar to Deep Feature Consistent VAE (20)

More from 강민국 강민국 (12)

Recently uploaded (20)

Deep Feature Consistent VAE