SlideShare a Scribd company logo
Recent Progress on Utilizing Tag
Information with GANs
StarGAN & TD-GAN
Hao-Wen Dong
Dec 26, 2017
Research Center for IT Innovation, Academia Sinica
Table of contents
1. Introduction
2. StarGAN
3. TD-GAN
4. Conclusions & Discussions
1
Introduction
Main Idea
How to utilize tag information?
2
Straightforward Approach
Fig. 1: Girl Friend Factory∗: generating anime characters with
specific attributes by conditional GANs
∗https://hiroshiba.github.io/girl_friend_factory/index.html 3
Inspiring Viewpoints
StarGAN [1]
Key Assumption: images of different tags can be viewed
as different domains
Approach: multi-domain image-to-image translation
Paper:
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun
Kim, and Jaegul Choo, “StarGAN: Unified Generative
Adversarial Networks for Multi-Domain Image-to-Image
Translation”, arXiv preprint arXiv:1711.09020, 2017.
4
Inspiring Viewpoints
TD-GAN [2]
Key Assumption: images and tags record the same object
from two different perspectives
Approach: enforce consistency between learned
disentangled representations for images and tags
Paper:
Chaoyue Wang, Chaohui Wang, Chang Xu, and Dacheng Tao,
“Tag Disentangled Generative Adversarial Networks for Object
Image Re-rendering”, in Proc. 36th Int. Joint Conf. on Artificial
Intelligence (IJCAI), 2017.
5
StarGAN
StarGAN - Motivation
Fig. 2: CycleGAN [3]: Cycle-Consistent Adversarial Networks
6
StarGAN - Motivation
Fig. 3: Comparison of cross-domain and multi-domain models
7
StarGAN - Qualitative Results
Fig. 4: Results on CelebA
8
StarGAN - Qualitative Results
Fig. 5: Results on CelebA via transferring knowledge learned
from RaFD
9
StarGAN - System Overview
Fig. 6: System overview
10
StarGAN - Formulation
G : (x, c) → y
D : x → (Dsrc(x), Dcls(x))
x : input image, c : domain label, y : output image
11
StarGAN - Objective Functions
Adversarial Loss
Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))]
12
StarGAN - Objective Functions
Adversarial Loss
Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))]
Domain Classification Loss
Lr
cls = Ex,c′ [− log Dcls(c′
|x)] (real images)
Lf
cls = Ex,c[− log Dcls(c|G(x, c))] (fake images)
12
StarGAN - Objective Functions
Adversarial Loss
Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))]
Domain Classification Loss
Lr
cls = Ex,c′ [− log Dcls(c′
|x)] (real images)
Lf
cls = Ex,c[− log Dcls(c|G(x, c))] (fake images)
Reconstruction Loss
Lrec = Ex,c,c′ [||x − G(G(x, c), c′
)||1]
12
StarGAN - Objective Functions
Full Objective Functions
LD = −Ladv + λclsLr
cls
LG = −Ladv + λclsLf
cls + λrecLrec
13
StarGAN - Qualitative Results
Fig. 7: Qualitative comparison among different models
14
StarGAN - Multiple Datasets
CelebA dataset
(binary) Black, Blond, Brown, Male, Young, etc.
RaFD dataset
(categorical) Angry, Fearful, Happy, Sad, Disgusted, etc.
15
StarGAN - Multiple Datasets
CelebA dataset
(binary) Black, Blond, Brown, Male, Young, etc.
RaFD dataset
(categorical) Angry, Fearful, Happy, Sad, Disgusted, etc.
Fig. 8: Label vectors and mask vector
15
StarGAN - Training on Multiple Datasets
Fig. 9: System overview - multiple datasets (I)
16
StarGAN - Training on Multiple Datasets
Fig. 9 (cont.): System overview - multiple datasets (II)
17
StarGAN - Qualitative Results
Fig. 10: Results on CelebA via transferring knowledge learned
from RaFD
18
StarGAN - Mask Vector
Fig. 11: Learned role of mask vector
19
StarGAN - Quantitative Experiments
Amazon Mechanical Turk (AMT)
vote for the best generated im-
ages based on:
• perceptual realism
• quality of transfer in
attribute(s)
• preservation of a figure’s
original identity
20
StarGAN - Quantitative Results
Method Hair color Gender Aged
DIAT 9.3% 31.4% 6.9%
CycleGAN 20.0% 16.6% 13.3%
IcGAN 4.5% 12.9% 9.2%
StarGAN 66.2 39.1 70.6
Table 1: AMT perceptual evaluation (by votes)
21
StarGAN - Quantitative Results
Method H+G H+A G+A H+G+A
DIAT 20.4% 15.6% 18.7% 15.6%
CycleGAN 14.0% 12.0% 11.2% 11.9%
IcGAN 18.2% 10.9% 20.3% 20.3%
StarGAN 47.4 61.5 49.8 52.2
Table 1 (cont.): AMT perceptual evaluation (by votes)
22
TD-GAN
TD-GAN - Motivation & Goals
Key Assumption
the image and its tags record the same object from two
different perspectives, so they should share the same
disentangled representations
Goals
• to extract disentangled and interpretable
representations for both image and its tags
• to explore the consistency between the image and
its tags by integrating the tag mapping net
23
TD-GAN - Qualitative Results
Fig. 12: Multi-factor transformation
24
TD-GAN - System Overview
Disentangling network R
x → R(x)
image → disentangled representations
25
TD-GAN - System Overview
Disentangling network R
x → R(x)
image → disentangled representations
Tag mapping net g
C → g(C)
tag code → disentangled representations
25
TD-GAN - System Overview
Generative network G
g(C) or R(x) → G(g(C)) or G(R(x))
disentangled representations → re-rendered image
26
TD-GAN - System Overview
Generative network G
g(C) or R(x) → G(g(C)) or G(R(x))
disentangled representations → re-rendered image
Discriminative network D
adversarial training with G and R
26
TD-GAN - System Overview
Fig. 13: System overview
27
TD-GAN - Formulation
Let the tagged training dataset be
XL
= {(x1, C1), . . . , (x|XL|, C|XL|)} ,
where tag codes
Ci = (cide
i , cview
i , cexp
i , . . .)
and cide
i , cview
i , cexp
i , . . . are one-hot encoding vectors.
Let the untagged training dataset be XU
.
28
TD-GAN - Objective Functions
Discrepency between disentangled representations
f1(R, g) =
1
|XL|
∑
xi∈XL
||R(xi) − g(Ci)||2
2
29
TD-GAN - Objective Functions
Discrepency between disentangled representations
f1(R, g) =
1
|XL|
∑
xi∈XL
||R(xi) − g(Ci)||2
2
Discrepency between real and rendered images
f2(G, g) =
1
|XL|
∑
xi∈XL
||G(g(Ci)) − xi||2
2
29
TD-GAN - Objective Functions
Discrepency between disentangled representations
f1(R, g) =
1
|XL|
∑
xi∈XL
||R(xi) − g(Ci)||2
2
Discrepency between real and rendered images
f2(G, g) =
1
|XL|
∑
xi∈XL
||G(g(Ci)) − xi||2
2
29
TD-GAN - Objective Functions
Reconstruction Loss for untagged images
˜f1(G, R) =
1
|XU |
∑
xi∈XU
||G(R(xi)) − xi||2
2
Adversarial Loss
f3(R, G, D) = E[log D(x)] + E[log(1 − D(G(R(x))))]
30
TD-GAN - Objective Functions
Full Objective Functions
LR = min
R
λ1f1(R, g∗
) + λ3f3(R, G∗
, D∗
)
Lg = min
g
λ1f1(R∗
, g) + λ2f2(G∗
, g)
LG = min
G
λ2f2(G, g∗
) + λ3f3(R∗
, G, D∗
)
LD = max
D
λ3f3(R∗
, G∗
, D)
(R∗
, g∗
, G∗
, D∗
are fixed as the configurations
obtained from the previous iteration)
31
TD-GAN - Qualitative Results
Fig. 14: Novel view synthesis results on 3D-chairs dataset 32
TD-GAN - Semi-supervised Extension
Test three representative training settings:
• fully-500: all the 500 training models and their tags
• fully-100: only the first 100 models and their tags
• semi-(100,400): all the 500 training models but only
the tags of the first 100 models
33
TD-GAN - Qualitative Results
Fig. 15: Novel view synthesis results trained in three settings
34
TD-GAN - Quantitative Results
Fig. 16: MSE of novel view synthesis results trained in three
settings (using testing chair images under 0◦ as inputs) 35
TD-GAN - Qualitative Results
Fig. 17: Illumination transformation of human face results on
Multi-PIE dataset
36
TD-GAN - Quantitative Results
XCov TD-GAN TD-GAN
Flash-1 0.5776 0.5623 0.5280
Flash-4 5.0692 0.8972 0.8818
Flash-6 4.3991 1.2509 0.1079
Flash-9 3.4639 0.6145 0.5870
Flash-12 2.4624 0.7142 0.6973
All Flash (mean) 3.8675 0.6966 0.6667
Table 2: MSE (×10−2) of illumination transformation results
37
TD-GAN - Possible Applications
• virtual reality systems
e.g. to naturally ‘paste’ persons into a virtual
environment by re-rendering of faces for the
continuous pose, illumination directions, and
various expressions
• architecture
• simulators
• video games
• movies
• visual effects
38
Conclusions & Discussions
Conclusions & Discussions
• Assumptions matter.
39
Conclusions & Discussions
• Assumptions matter. (are they reasonable?)
• images of different tags can be viewed as different
domains
• images and tags record the same object from two
different perspectives
39
Conclusions & Discussions
• Assumptions matter. (are they reasonable?)
• images of different tags can be viewed as different
domains
• images and tags record the same object from two
different perspectives
• How to utilize tag information?
39
Conclusions & Discussions
• Assumptions matter. (are they reasonable?)
• images of different tags can be viewed as different
domains
• images and tags record the same object from two
different perspectives
• How to utilize tag information? (plenty of ways)
• multi-domain image-to-image translation
• enforce consistency between learned disentangled
representations for images and tags
39
Conclusions & Discussions
• Assumptions matter. (are they reasonable?)
• images of different tags can be viewed as different
domains
• images and tags record the same object from two
different perspectives
• How to utilize tag information? (plenty of ways)
• multi-domain image-to-image translation
• enforce consistency between learned disentangled
representations for images and tags
• Does disentangling make sense?
39
Conclusions & Discussions
• Assumptions matter. (are they reasonable?)
• images of different tags can be viewed as different
domains
• images and tags record the same object from two
different perspectives
• How to utilize tag information? (plenty of ways)
• multi-domain image-to-image translation
• enforce consistency between learned disentangled
representations for images and tags
• Does disentangling make sense? (some trade-offs?)
• interpretability vs. effectiveness (efficiency)
• information underneath entangled factors
39
Questions?
ICGAN - System Overview
Fig. 18: ICGAN [4] - system overview
DC-IGN - System Overview
Fig. 19: DC-IGN [5] - system overview
References
[1] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative
adversarial networks for multi-Domain image-to-image translation,” ArXiv
preprint arXiv:1711.09020, 2017.
[2] C. Wang, C. Wang, C. Xu, and D. Tao, “Tag disentangled generative adversarial
networks for object image Re-rendering,” in Proc. 36th Int. Joint Conf. on
Artificial Intelligence (IJCAI), 2017.
[3] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation
using cycle-consistent adversarial networks,” in The IEEE International
Conference on Computer Vision (ICCV), 2017.
[4] G. Perarnau, J. van de Weijer, B. Raducanu, and J. M. Álvarez, “Invertible
conditional gans for image editing,” in Proc. NIPS 2016 Workshop on Adversarial
Training, 2016.
[5] T. D. Kulkarni, W. Whitney, P. Kohli, and J. B. Tenenbaum, “Deep convolutional
inverse graphics network,” in Advances in Neural Information Processing
Systems (NIPS) 28, 2015.

More Related Content

PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
Generative adversarial text to image synthesis
PDF
Introduction to Generative Adversarial Networks
PDF
Generative adversarial networks
PDF
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
PDF
Generative adversarial networks
PDF
Finding connections among images using CycleGAN
PDF
그림 그리는 AI
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Generative adversarial text to image synthesis
Introduction to Generative Adversarial Networks
Generative adversarial networks
DiscoGAN - Learning to Discover Cross-Domain Relations with Generative Advers...
Generative adversarial networks
Finding connections among images using CycleGAN
그림 그리는 AI

What's hot (20)

PDF
Introduction of DiscoGAN
PDF
DiscoGAN
PDF
Generative adversarial network_Ayadi_Alaeddine
PPT
ECCV2010: feature learning for image classification, part 4
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
Networkx tutorial
PDF
A Walk in the GAN Zoo
PPT
ECCV2010: feature learning for image classification, part 2
PDF
Graph Analyses with Python and NetworkX
PPTX
A Fast and Dirty Intro to NetworkX (and D3)
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Dynamics in graph analysis (PyData Carolinas 2016)
PDF
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
PPT
CS 354 Blending, Compositing, Anti-aliasing
PDF
Gan intro
PPTX
Matrix decomposition and_applications_to_nlp
PPT
CS 354 Acceleration Structures
PPT
CS 354 More Graphics Pipeline
PDF
Encryption Quality Analysis and Security Evaluation of CAST-128 Algorithm and...
PDF
Lec 9 05_sept [compatibility mode]
Introduction of DiscoGAN
DiscoGAN
Generative adversarial network_Ayadi_Alaeddine
ECCV2010: feature learning for image classification, part 4
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
Networkx tutorial
A Walk in the GAN Zoo
ECCV2010: feature learning for image classification, part 2
Graph Analyses with Python and NetworkX
A Fast and Dirty Intro to NetworkX (and D3)
International Journal of Computational Engineering Research(IJCER)
Dynamics in graph analysis (PyData Carolinas 2016)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
CS 354 Blending, Compositing, Anti-aliasing
Gan intro
Matrix decomposition and_applications_to_nlp
CS 354 Acceleration Structures
CS 354 More Graphics Pipeline
Encryption Quality Analysis and Security Evaluation of CAST-128 Algorithm and...
Lec 9 05_sept [compatibility mode]
Ad

Similar to Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN (20)

PDF
Image Translation with GAN
PDF
(20180715) ksiim gan in medical imaging - vuno - kyuhwan jung
PDF
Generative Adversarial Networks and Their Medical Imaging Applications
PPTX
Computer Vision Gans
PPTX
brief Introduction to Different Kinds of GANs
PDF
Unsupervised Cross-Domain Image Generation
PDF
Deep Generative Modelling (updated)
PPTX
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
PDF
Introduction to Generative Adversarial Network
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
PDF
GANs and Applications
PPTX
StarGAN
PDF
Unpaired Image Translations Using GANs: A Review
PDF
Generative Adversarial Networks (GANs) and Disentangled Representations @ N...
PDF
Deep neural network with GANs pre- training for tuberculosis type classificat...
PDF
PPTX
BEGAN Boundary Equilibrium Generative Adversarial Networks
PDF
Fashion AI Literature
PDF
Image Generation with Tensorflow
PDF
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Image Translation with GAN
(20180715) ksiim gan in medical imaging - vuno - kyuhwan jung
Generative Adversarial Networks and Their Medical Imaging Applications
Computer Vision Gans
brief Introduction to Different Kinds of GANs
Unsupervised Cross-Domain Image Generation
Deep Generative Modelling (updated)
Let's paint a Picasso - A Look at Generative Adversarial Networks (GAN) and i...
Introduction to Generative Adversarial Network
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
GANs and Applications
StarGAN
Unpaired Image Translations Using GANs: A Review
Generative Adversarial Networks (GANs) and Disentangled Representations @ N...
Deep neural network with GANs pre- training for tuberculosis type classificat...
BEGAN Boundary Equilibrium Generative Adversarial Networks
Fashion AI Literature
Image Generation with Tensorflow
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Ad

More from Hao-Wen (Herman) Dong (6)

PDF
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
PDF
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
PDF
Organizing Machine Learning Projects - Repository Organization
PDF
What is Critical in GAN Training?
PDF
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
PDF
Introduction to Deep Generative Models
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Organizing Machine Learning Projects - Repository Organization
What is Critical in GAN Training?
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
Introduction to Deep Generative Models

Recently uploaded (20)

PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPT
Presentation of a Romanian Institutee 2.
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Science Form five needed shit SCIENEce so
PPT
Mutation in dna of bacteria and repairss
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
perinatal infections 2-171220190027.pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPT
LEC Synthetic Biology and its application.ppt
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
Substance Disorders- part different drugs change body
PPT
veterinary parasitology ````````````.ppt
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
Microbes in human welfare class 12 .pptx
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Presentation of a Romanian Institutee 2.
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Science Form five needed shit SCIENEce so
Mutation in dna of bacteria and repairss
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
TORCH INFECTIONS in pregnancy with toxoplasma
perinatal infections 2-171220190027.pptx
Seminar Hypertension and Kidney diseases.pptx
LEC Synthetic Biology and its application.ppt
The Land of Punt — A research by Dhani Irwanto
Substance Disorders- part different drugs change body
veterinary parasitology ````````````.ppt
Biomechanics of the Hip - Basic Science.pptx
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Microbes in human welfare class 12 .pptx

Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN

  • 1. Recent Progress on Utilizing Tag Information with GANs StarGAN & TD-GAN Hao-Wen Dong Dec 26, 2017 Research Center for IT Innovation, Academia Sinica
  • 2. Table of contents 1. Introduction 2. StarGAN 3. TD-GAN 4. Conclusions & Discussions 1
  • 4. Main Idea How to utilize tag information? 2
  • 5. Straightforward Approach Fig. 1: Girl Friend Factory∗: generating anime characters with specific attributes by conditional GANs ∗https://hiroshiba.github.io/girl_friend_factory/index.html 3
  • 6. Inspiring Viewpoints StarGAN [1] Key Assumption: images of different tags can be viewed as different domains Approach: multi-domain image-to-image translation Paper: Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo, “StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation”, arXiv preprint arXiv:1711.09020, 2017. 4
  • 7. Inspiring Viewpoints TD-GAN [2] Key Assumption: images and tags record the same object from two different perspectives Approach: enforce consistency between learned disentangled representations for images and tags Paper: Chaoyue Wang, Chaohui Wang, Chang Xu, and Dacheng Tao, “Tag Disentangled Generative Adversarial Networks for Object Image Re-rendering”, in Proc. 36th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2017. 5
  • 9. StarGAN - Motivation Fig. 2: CycleGAN [3]: Cycle-Consistent Adversarial Networks 6
  • 10. StarGAN - Motivation Fig. 3: Comparison of cross-domain and multi-domain models 7
  • 11. StarGAN - Qualitative Results Fig. 4: Results on CelebA 8
  • 12. StarGAN - Qualitative Results Fig. 5: Results on CelebA via transferring knowledge learned from RaFD 9
  • 13. StarGAN - System Overview Fig. 6: System overview 10
  • 14. StarGAN - Formulation G : (x, c) → y D : x → (Dsrc(x), Dcls(x)) x : input image, c : domain label, y : output image 11
  • 15. StarGAN - Objective Functions Adversarial Loss Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))] 12
  • 16. StarGAN - Objective Functions Adversarial Loss Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))] Domain Classification Loss Lr cls = Ex,c′ [− log Dcls(c′ |x)] (real images) Lf cls = Ex,c[− log Dcls(c|G(x, c))] (fake images) 12
  • 17. StarGAN - Objective Functions Adversarial Loss Ladv = Ex[log Dsrc(x)] + Ex,c[log(1 − Dsrc(G(x, c)))] Domain Classification Loss Lr cls = Ex,c′ [− log Dcls(c′ |x)] (real images) Lf cls = Ex,c[− log Dcls(c|G(x, c))] (fake images) Reconstruction Loss Lrec = Ex,c,c′ [||x − G(G(x, c), c′ )||1] 12
  • 18. StarGAN - Objective Functions Full Objective Functions LD = −Ladv + λclsLr cls LG = −Ladv + λclsLf cls + λrecLrec 13
  • 19. StarGAN - Qualitative Results Fig. 7: Qualitative comparison among different models 14
  • 20. StarGAN - Multiple Datasets CelebA dataset (binary) Black, Blond, Brown, Male, Young, etc. RaFD dataset (categorical) Angry, Fearful, Happy, Sad, Disgusted, etc. 15
  • 21. StarGAN - Multiple Datasets CelebA dataset (binary) Black, Blond, Brown, Male, Young, etc. RaFD dataset (categorical) Angry, Fearful, Happy, Sad, Disgusted, etc. Fig. 8: Label vectors and mask vector 15
  • 22. StarGAN - Training on Multiple Datasets Fig. 9: System overview - multiple datasets (I) 16
  • 23. StarGAN - Training on Multiple Datasets Fig. 9 (cont.): System overview - multiple datasets (II) 17
  • 24. StarGAN - Qualitative Results Fig. 10: Results on CelebA via transferring knowledge learned from RaFD 18
  • 25. StarGAN - Mask Vector Fig. 11: Learned role of mask vector 19
  • 26. StarGAN - Quantitative Experiments Amazon Mechanical Turk (AMT) vote for the best generated im- ages based on: • perceptual realism • quality of transfer in attribute(s) • preservation of a figure’s original identity 20
  • 27. StarGAN - Quantitative Results Method Hair color Gender Aged DIAT 9.3% 31.4% 6.9% CycleGAN 20.0% 16.6% 13.3% IcGAN 4.5% 12.9% 9.2% StarGAN 66.2 39.1 70.6 Table 1: AMT perceptual evaluation (by votes) 21
  • 28. StarGAN - Quantitative Results Method H+G H+A G+A H+G+A DIAT 20.4% 15.6% 18.7% 15.6% CycleGAN 14.0% 12.0% 11.2% 11.9% IcGAN 18.2% 10.9% 20.3% 20.3% StarGAN 47.4 61.5 49.8 52.2 Table 1 (cont.): AMT perceptual evaluation (by votes) 22
  • 30. TD-GAN - Motivation & Goals Key Assumption the image and its tags record the same object from two different perspectives, so they should share the same disentangled representations Goals • to extract disentangled and interpretable representations for both image and its tags • to explore the consistency between the image and its tags by integrating the tag mapping net 23
  • 31. TD-GAN - Qualitative Results Fig. 12: Multi-factor transformation 24
  • 32. TD-GAN - System Overview Disentangling network R x → R(x) image → disentangled representations 25
  • 33. TD-GAN - System Overview Disentangling network R x → R(x) image → disentangled representations Tag mapping net g C → g(C) tag code → disentangled representations 25
  • 34. TD-GAN - System Overview Generative network G g(C) or R(x) → G(g(C)) or G(R(x)) disentangled representations → re-rendered image 26
  • 35. TD-GAN - System Overview Generative network G g(C) or R(x) → G(g(C)) or G(R(x)) disentangled representations → re-rendered image Discriminative network D adversarial training with G and R 26
  • 36. TD-GAN - System Overview Fig. 13: System overview 27
  • 37. TD-GAN - Formulation Let the tagged training dataset be XL = {(x1, C1), . . . , (x|XL|, C|XL|)} , where tag codes Ci = (cide i , cview i , cexp i , . . .) and cide i , cview i , cexp i , . . . are one-hot encoding vectors. Let the untagged training dataset be XU . 28
  • 38. TD-GAN - Objective Functions Discrepency between disentangled representations f1(R, g) = 1 |XL| ∑ xi∈XL ||R(xi) − g(Ci)||2 2 29
  • 39. TD-GAN - Objective Functions Discrepency between disentangled representations f1(R, g) = 1 |XL| ∑ xi∈XL ||R(xi) − g(Ci)||2 2 Discrepency between real and rendered images f2(G, g) = 1 |XL| ∑ xi∈XL ||G(g(Ci)) − xi||2 2 29
  • 40. TD-GAN - Objective Functions Discrepency between disentangled representations f1(R, g) = 1 |XL| ∑ xi∈XL ||R(xi) − g(Ci)||2 2 Discrepency between real and rendered images f2(G, g) = 1 |XL| ∑ xi∈XL ||G(g(Ci)) − xi||2 2 29
  • 41. TD-GAN - Objective Functions Reconstruction Loss for untagged images ˜f1(G, R) = 1 |XU | ∑ xi∈XU ||G(R(xi)) − xi||2 2 Adversarial Loss f3(R, G, D) = E[log D(x)] + E[log(1 − D(G(R(x))))] 30
  • 42. TD-GAN - Objective Functions Full Objective Functions LR = min R λ1f1(R, g∗ ) + λ3f3(R, G∗ , D∗ ) Lg = min g λ1f1(R∗ , g) + λ2f2(G∗ , g) LG = min G λ2f2(G, g∗ ) + λ3f3(R∗ , G, D∗ ) LD = max D λ3f3(R∗ , G∗ , D) (R∗ , g∗ , G∗ , D∗ are fixed as the configurations obtained from the previous iteration) 31
  • 43. TD-GAN - Qualitative Results Fig. 14: Novel view synthesis results on 3D-chairs dataset 32
  • 44. TD-GAN - Semi-supervised Extension Test three representative training settings: • fully-500: all the 500 training models and their tags • fully-100: only the first 100 models and their tags • semi-(100,400): all the 500 training models but only the tags of the first 100 models 33
  • 45. TD-GAN - Qualitative Results Fig. 15: Novel view synthesis results trained in three settings 34
  • 46. TD-GAN - Quantitative Results Fig. 16: MSE of novel view synthesis results trained in three settings (using testing chair images under 0◦ as inputs) 35
  • 47. TD-GAN - Qualitative Results Fig. 17: Illumination transformation of human face results on Multi-PIE dataset 36
  • 48. TD-GAN - Quantitative Results XCov TD-GAN TD-GAN Flash-1 0.5776 0.5623 0.5280 Flash-4 5.0692 0.8972 0.8818 Flash-6 4.3991 1.2509 0.1079 Flash-9 3.4639 0.6145 0.5870 Flash-12 2.4624 0.7142 0.6973 All Flash (mean) 3.8675 0.6966 0.6667 Table 2: MSE (×10−2) of illumination transformation results 37
  • 49. TD-GAN - Possible Applications • virtual reality systems e.g. to naturally ‘paste’ persons into a virtual environment by re-rendering of faces for the continuous pose, illumination directions, and various expressions • architecture • simulators • video games • movies • visual effects 38
  • 51. Conclusions & Discussions • Assumptions matter. 39
  • 52. Conclusions & Discussions • Assumptions matter. (are they reasonable?) • images of different tags can be viewed as different domains • images and tags record the same object from two different perspectives 39
  • 53. Conclusions & Discussions • Assumptions matter. (are they reasonable?) • images of different tags can be viewed as different domains • images and tags record the same object from two different perspectives • How to utilize tag information? 39
  • 54. Conclusions & Discussions • Assumptions matter. (are they reasonable?) • images of different tags can be viewed as different domains • images and tags record the same object from two different perspectives • How to utilize tag information? (plenty of ways) • multi-domain image-to-image translation • enforce consistency between learned disentangled representations for images and tags 39
  • 55. Conclusions & Discussions • Assumptions matter. (are they reasonable?) • images of different tags can be viewed as different domains • images and tags record the same object from two different perspectives • How to utilize tag information? (plenty of ways) • multi-domain image-to-image translation • enforce consistency between learned disentangled representations for images and tags • Does disentangling make sense? 39
  • 56. Conclusions & Discussions • Assumptions matter. (are they reasonable?) • images of different tags can be viewed as different domains • images and tags record the same object from two different perspectives • How to utilize tag information? (plenty of ways) • multi-domain image-to-image translation • enforce consistency between learned disentangled representations for images and tags • Does disentangling make sense? (some trade-offs?) • interpretability vs. effectiveness (efficiency) • information underneath entangled factors 39
  • 58. ICGAN - System Overview Fig. 18: ICGAN [4] - system overview
  • 59. DC-IGN - System Overview Fig. 19: DC-IGN [5] - system overview
  • 60. References [1] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-Domain image-to-image translation,” ArXiv preprint arXiv:1711.09020, 2017. [2] C. Wang, C. Wang, C. Xu, and D. Tao, “Tag disentangled generative adversarial networks for object image Re-rendering,” in Proc. 36th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2017. [3] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in The IEEE International Conference on Computer Vision (ICCV), 2017. [4] G. Perarnau, J. van de Weijer, B. Raducanu, and J. M. Álvarez, “Invertible conditional gans for image editing,” in Proc. NIPS 2016 Workshop on Adversarial Training, 2016. [5] T. D. Kulkarni, W. Whitney, P. Kohli, and J. B. Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems (NIPS) 28, 2015.