SlideShare a Scribd company logo
Domain Adaptation with Adversarial Learning
Rishiraj Chakraborty, Sourya Sengupta and Chengzhu Xu
Dept. of Applied Math and System Design Eng.
May 12, 2019
Chakraborty, R. et al. (UWaterloo) May 12, 2019 1 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 2 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 3 / 35
Introduction
Domain adaptation is the process of training a machine learning
model on one input domain, so that it also performs well on data
from a different domain.
Domain adaptation is a common requirement in tasks like object
recognition, object detection, image categorization, speech
recognition, sentiment analysis etc. due to the excessive cost of
labelling newly obtained data.
This is done by finding a common embedding between the two
domains or by transferring one domain to the other domain.
We discuss domain adaptation techniques for both adversarial models
and generative models.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 4 / 35
Setup for Adversarial Models
For simplicity, we consider a binary classification problem.
A domain is a pair consisting of a distribution D on inputs X and a
labelling function f : X → [0, 1].
We consider a source domain Ds, fs and a target domain DT , fT .
Chakraborty, R. et al. (UWaterloo) May 12, 2019 5 / 35
Objective
We want to learn a hypothesis h : X → [0, 1] because f is unknown.
The expected value of disagreement between h and f based on Ds
(source risk), is
s(h, f ) = Ex∼Ds [|h(X) − f (X)|]
We want our hypothesis h to minimize the source risk s and also to
generalize well on DT , thereby minimizing T as well.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 6 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 7 / 35
Domain Divergence
Domain divergence is a notion of the distance between the source and
the target distributions.
A natural measure of domain divergence is the L1 or the variation
divergence,
d1(DS , DT ) = 2sup
B⊂B
|PrDS
[B] − PrDT
[B]|
where B is the set of measurable subsets under DS and DT .
Chakraborty, R. et al. (UWaterloo) May 12, 2019 8 / 35
Bounding the Target Risk
For a hypothesis h,
T (h) ≤
S (h) + d1(DS , DT ) + min{EDS
[|fS (x)| − fT (x)]}.EDT
[|fS (x)| − fT (x)]
The target risk is bounded by the source risk, the domain divergence
and the difference in the labelling functions across the two domains.
It is safe to assume that the inherent Bayes error in the difference
between the labelling functions is small.
Hence the domain divergence term is critical for bounding the target
risk.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 9 / 35
L1
Divergence Problems
The L1 divergence cannot be accurately estimated from finite samples of
arbitrary distributions.
Unnecessarily inflates the bound by taking supremum over all measurable
subsets.
We are only interested in subsets where there can be discrepancies on a
hypothesis from a hypothesis class of finite complexity.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 10 / 35
H-divergence
Given a domain X with two probability distributions DS and DT , let H be
a hypothesis class on X and I(h) denote the set for which h ∈ H is the
characteristic function, i.e. x ∈ I(h) ⇔ h(x) = 1. The H-divergence
between DS and DT is given by,
dH(DS , DT ) = 2sup
h∈H
|PrDS
[I(h)] − PrDT
[I(h)]|
The H-divergence can be estimated from a finite number of samples
for a hypothesis class of finite VC dimension.
The H-divergence for any H is never larger than the L1 divergence.
This is because the H-divergence compares the two distributions on
data sets which satisfy a certain hypothesis at a time.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 11 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 12 / 35
Empirircal H-divergence
For a symmetric hypothesis class H, it can be proved that one can
compute the empirical H-divergence between two samples S ∼ (DX
S )n
and T ∼ (DX
T )n as,
ˆdH = 2 1 − min
h∈H
1
n
n
i=1
I[h(xi ) = 0] +
1
n
N
i=n+1
I[h(xi ) = 1]
Chakraborty, R. et al. (UWaterloo) May 12, 2019 13 / 35
Proxy Distance
ˆdH = 2 1 − min
h∈H
1
n
n
i=1
I[h(xi ) = 0] +
1
n
N
i=n+1
I[h(xi ) = 1]
We can approximate the empirical H-divergence ˆdH by training a
neural network to classify between the source and the target domains.
We construct a new dataset U, with source samples labelled 0 and
the target samples labelled 1. The risk of a classifier trained on U
can approximate the proxy distance between the two domains as
ˆdA = 2(1 − 2 ).
Chakraborty, R. et al. (UWaterloo) May 12, 2019 14 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 15 / 35
Shallow Neural Networks
Objective is to learn a classification problem without discriminating
between the source and the target domains.
We take a hidden layer Gf that maps an input into a new D
dimensional representation.
We have a prediction layer which maps the D-dimensional
representation into probability components of the corresponding
classes; Gy : RD → [0, 1].
Given a source example (xi , yi ), the classification cross-entropy loss is
Ly (Gy (Gf (xi )), yi ) = −log(Gy (Gf (x))yi )
Chakraborty, R. et al. (UWaterloo) May 12, 2019 16 / 35
Domain Regularizer
The optimization problem is
min
θf ,θy
1
n
n
i=1
Li
y (θf , θy ) + λ.R(θf )
where R(θf ) is an optional regularizer weighted by hyper-parameter λ.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 17 / 35
Domain Regularizer
Empirical H-divergence
ˆdH = 2 1 − min
h∈H
1
n
n
i=1
I[h(xi ) = 0] +
1
n
N
i=n+1
I[h(xi ) = 1]
We use a domain classification layer Gd that maps the D-dimensional
representation from Gf into 0 and 1 for source and target respectively.
Cross-entropy loss is Ld (Gd (Gf (xi ))) = −log(Gd (Gf (xi )))
The min part in ˆdH can be approximated by
R(θd ) = max
θd
−
1
n
n
i=1
Li
d (θf , θd ) +
1
n
N
i=n+1
Li
d (θf , θd )
R(θd ) introduces a trade-off between the source risk and ˆdH and λ is
used to tune the trade off.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 18 / 35
Complete Optimization Problem
E(θf , θy , θd ) =
1
n
n
i=1 Li
y (θf , θy ) − λ 1
n
n
i=1 Li
d (θf , θd ) + 1
n
N
i=n+1 Li
d (θf , θd )
(ˆθf , ˆθy ) = argmin
θf ,θy
E(θf , θy , θd )
(ˆθd ) = argmax
θd
E(θf , θy , θd )
The discriminator competes to minimize the domain classification
error in approximate ˆdH and the hidden layer competes to learn
representation vectors irrespective of the domain.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 19 / 35
Generalized Architectures
Instead of the example shallow network, we can use more
sophisticated architectures for both the feature extractor and the
discriminator.
Generalized weight updates,
θf ← θf − µ
∂Li
y
∂θf
− λ
∂Li
d
∂θf
θy ← θy − µ
∂Li
y
∂θf
θd ← θd − µλ
∂Li
d
∂θd
Chakraborty, R. et al. (UWaterloo) May 12, 2019 20 / 35
Generalized Architectures
We see that for the classifier and the discriminator, the gradients are
subtracted like usual gradient descent, but for the feature extractor
the gradient of the discriminator loss is added.
To implement this with standard gradient descent optimizers we add
a gradient reversal layer which performs identity transformation
during the forward pass and changes sign of the incoming gradient
during backward propagation.
Figure: Deep Adversarial Neural Network (DANN)
Chakraborty, R. et al. (UWaterloo) May 12, 2019 21 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 22 / 35
Generative Models
So far, we talked about how to filter discriminative features between
two domains using adversarial learning. Domain adaptation is also
done by generating synthetic data with common high level semantics
between the two domains using Generative Adversarial Networks
(GANs).
A combination of discriminative and generative models is what
encompasses GANs. GAN based domain adaptation technique
involves source data, noise vectors or both to generate sample data
which are similar to target data.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 23 / 35
Generative Models
Figure: GAN Architecture
Chakraborty, R. et al. (UWaterloo) May 12, 2019 24 / 35
Generative Models
We will be discussing on two architectures of GAN on domain adaptation
Coupled Generative Adversarial Network (CoGAN)
Chakraborty, R. et al. (UWaterloo) May 12, 2019 25 / 35
Generative Models
We will be discussing on two architectures of GAN on domain adaptation
Coupled Generative Adversarial Network (CoGAN)
CyCADA: Cycle-Consistent Adversarial Domain Adaptation
(CyCADA)
Chakraborty, R. et al. (UWaterloo) May 12, 2019 25 / 35
Generative Models
CoGAN
Figure: CoGAN architecture
Chakraborty, R. et al. (UWaterloo) May 12, 2019 26 / 35
Generative Models
Objective Function of CoGAN:
Chakraborty, R. et al. (UWaterloo) May 12, 2019 27 / 35
Generative Models
CoGAN
CoGAN consists of two GANs- GAN1 and GAN2.
GANs learn a joint distrubtion of multi-domain images by weight
sharing between two generators and two discriminators.
Both the GANs generate samples like source images and target
images where common features are more prevalent.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 28 / 35
Generative Models
CyCADA
Figure: CyCADA architecture
Chakraborty, R. et al. (UWaterloo) May 12, 2019 29 / 35
Generative Models
CyCADA
CyCADA generates source samples styled as target samples (It means
the generator starts from the source image and tries to make it like
target image).
Other than GAN losses it incorporates other two important losses-
Cycle Loss and Semantic Loss.
It tries produce source target samples so that it preserve the features
of original source samples and also corresponding class labels do not
change.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 30 / 35
Contents
1 Introduction
2 Domain Divergence
3 Proxy Distance
4 Adversarial Models
5 Generative models
6 Results
Chakraborty, R. et al. (UWaterloo) May 12, 2019 31 / 35
Results
Model Dataset Source only DA
DANN MNIST to MNIST-M 0.5185 0.7886
Coupled GAN MNIST to USPS 0.64 0.910
CyCADA MNIST to USPS 0.64 0.96
Table: Table caption
Chakraborty, R. et al. (UWaterloo) May 12, 2019 32 / 35
References I
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and
J. W. Vaughan.
A theory of learning from different domains.
Machine learning, 79(1-2):151–175, 2010.
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira.
Analysis of representations for domain adaptation.
In Advances in neural information processing systems, pages 137–144,
2007.
Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle,
F. Laviolette, M. Marchand, and V. Lempitsky.
Domain-adversarial training of neural networks.
The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 33 / 35
References II
J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A.
Efros, and T. Darrell.
Cycada: Cycle-consistent adversarial domain adaptation.
arXiv preprint arXiv:1711.03213, 2017.
M.-Y. Liu and O. Tuzel.
Coupled generative adversarial networks.
In Advances in neural information processing systems, pages 469–477,
2016.
M. Long, Z. Cao, J. Wang, and M. I. Jordan.
Conditional adversarial domain adaptation.
In Advances in Neural Information Processing Systems, pages
1640–1650, 2018.
Chakraborty, R. et al. (UWaterloo) May 12, 2019 34 / 35
Thank You
Chakraborty, R. et al. (UWaterloo) May 12, 2019 35 / 35

More Related Content

PPTX
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
PDF
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
PDF
Bq25399403
PDF
Bm35359363
PPTX
Ai inductive bias and knowledge
PDF
FScaFi: A Core Calculus for Collective Adaptive Systems Programming
PDF
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
PDF
Probabilistic Retrieval
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Bq25399403
Bm35359363
Ai inductive bias and knowledge
FScaFi: A Core Calculus for Collective Adaptive Systems Programming
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
Probabilistic Retrieval

What's hot (19)

PDF
Mapping Subsets of Scholarly Information
PPT
Ch 9-1.Machine Learning: Symbol-based
PDF
Mm2521542158
PDF
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
PDF
Interval Pattern Structures: An introdution
PPT
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
PDF
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
PDF
Presentation
PDF
Application of transportation problem under pentagonal neutrosophic environment
PDF
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
PDF
Joco pavone
PDF
Small world effect
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
[DL輪読会]DisCo RL: Distribution-Conditioned Reinforcement Learning for General...
PPT
lecture_mooney.ppt
PDF
Evaluation of Uncertain Location
PDF
Elliptic curve Cryptography and Diffie- Hellman Key exchange
PDF
Decision tree lecture 3
PPTX
Decision trees
Mapping Subsets of Scholarly Information
Ch 9-1.Machine Learning: Symbol-based
Mm2521542158
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Interval Pattern Structures: An introdution
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Presentation
Application of transportation problem under pentagonal neutrosophic environment
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Joco pavone
Small world effect
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
[DL輪読会]DisCo RL: Distribution-Conditioned Reinforcement Learning for General...
lecture_mooney.ppt
Evaluation of Uncertain Location
Elliptic curve Cryptography and Diffie- Hellman Key exchange
Decision tree lecture 3
Decision trees
Ad

Similar to Deep Domain Adaptation using Adversarial Learning and GAN (20)

PDF
Domain adaptation: A Theoretical View
PDF
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
PDF
Domain Invariant Representation Learning with Domain Density Transformations
PDF
Analysis on Domain Adaptation based on different papers
PPTX
Gans - Generative Adversarial Nets
PPTX
Synthetic Image Data Generation using GAN &Triple GAN.pptx
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
PDF
Introduction to Generative Adversarial Network
PDF
DiscoGAN
PDF
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PDF
Generative Adversarial Network
PDF
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
PDF
Domain adaptation
PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PDF
Introduction to Deep Generative Models
PPTX
GANs-Presentation-Purdue-CS578-Project.pptx
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
PDF
Generative AdvDeep Learning: Generative Adversarial Network 2ersarial Network 2
PDF
Deep learning MindMap
Domain adaptation: A Theoretical View
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Domain Invariant Representation Learning with Domain Density Transformations
Analysis on Domain Adaptation based on different papers
Gans - Generative Adversarial Nets
Synthetic Image Data Generation using GAN &Triple GAN.pptx
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
Introduction to Generative Adversarial Network
DiscoGAN
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Generative Adversarial Network
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Domain adaptation
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Introduction to Deep Generative Models
GANs-Presentation-Purdue-CS578-Project.pptx
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
Generative AdvDeep Learning: Generative Adversarial Network 2ersarial Network 2
Deep learning MindMap
Ad

Recently uploaded (20)

PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Institutional Correction lecture only . . .
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Lesson notes of climatology university.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
Classroom Observation Tools for Teachers
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pharma ospi slides which help in ospi learning
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Institutional Correction lecture only . . .
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Sports Quiz easy sports quiz sports quiz
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Lesson notes of climatology university.
102 student loan defaulters named and shamed – Is someone you know on the list?
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
Classroom Observation Tools for Teachers
O5-L3 Freight Transport Ops (International) V1.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Complications of Minimal Access Surgery at WLH
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Deep Domain Adaptation using Adversarial Learning and GAN

  • 1. Domain Adaptation with Adversarial Learning Rishiraj Chakraborty, Sourya Sengupta and Chengzhu Xu Dept. of Applied Math and System Design Eng. May 12, 2019 Chakraborty, R. et al. (UWaterloo) May 12, 2019 1 / 35
  • 2. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 2 / 35
  • 3. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 3 / 35
  • 4. Introduction Domain adaptation is the process of training a machine learning model on one input domain, so that it also performs well on data from a different domain. Domain adaptation is a common requirement in tasks like object recognition, object detection, image categorization, speech recognition, sentiment analysis etc. due to the excessive cost of labelling newly obtained data. This is done by finding a common embedding between the two domains or by transferring one domain to the other domain. We discuss domain adaptation techniques for both adversarial models and generative models. Chakraborty, R. et al. (UWaterloo) May 12, 2019 4 / 35
  • 5. Setup for Adversarial Models For simplicity, we consider a binary classification problem. A domain is a pair consisting of a distribution D on inputs X and a labelling function f : X → [0, 1]. We consider a source domain Ds, fs and a target domain DT , fT . Chakraborty, R. et al. (UWaterloo) May 12, 2019 5 / 35
  • 6. Objective We want to learn a hypothesis h : X → [0, 1] because f is unknown. The expected value of disagreement between h and f based on Ds (source risk), is s(h, f ) = Ex∼Ds [|h(X) − f (X)|] We want our hypothesis h to minimize the source risk s and also to generalize well on DT , thereby minimizing T as well. Chakraborty, R. et al. (UWaterloo) May 12, 2019 6 / 35
  • 7. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 7 / 35
  • 8. Domain Divergence Domain divergence is a notion of the distance between the source and the target distributions. A natural measure of domain divergence is the L1 or the variation divergence, d1(DS , DT ) = 2sup B⊂B |PrDS [B] − PrDT [B]| where B is the set of measurable subsets under DS and DT . Chakraborty, R. et al. (UWaterloo) May 12, 2019 8 / 35
  • 9. Bounding the Target Risk For a hypothesis h, T (h) ≤ S (h) + d1(DS , DT ) + min{EDS [|fS (x)| − fT (x)]}.EDT [|fS (x)| − fT (x)] The target risk is bounded by the source risk, the domain divergence and the difference in the labelling functions across the two domains. It is safe to assume that the inherent Bayes error in the difference between the labelling functions is small. Hence the domain divergence term is critical for bounding the target risk. Chakraborty, R. et al. (UWaterloo) May 12, 2019 9 / 35
  • 10. L1 Divergence Problems The L1 divergence cannot be accurately estimated from finite samples of arbitrary distributions. Unnecessarily inflates the bound by taking supremum over all measurable subsets. We are only interested in subsets where there can be discrepancies on a hypothesis from a hypothesis class of finite complexity. Chakraborty, R. et al. (UWaterloo) May 12, 2019 10 / 35
  • 11. H-divergence Given a domain X with two probability distributions DS and DT , let H be a hypothesis class on X and I(h) denote the set for which h ∈ H is the characteristic function, i.e. x ∈ I(h) ⇔ h(x) = 1. The H-divergence between DS and DT is given by, dH(DS , DT ) = 2sup h∈H |PrDS [I(h)] − PrDT [I(h)]| The H-divergence can be estimated from a finite number of samples for a hypothesis class of finite VC dimension. The H-divergence for any H is never larger than the L1 divergence. This is because the H-divergence compares the two distributions on data sets which satisfy a certain hypothesis at a time. Chakraborty, R. et al. (UWaterloo) May 12, 2019 11 / 35
  • 12. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 12 / 35
  • 13. Empirircal H-divergence For a symmetric hypothesis class H, it can be proved that one can compute the empirical H-divergence between two samples S ∼ (DX S )n and T ∼ (DX T )n as, ˆdH = 2 1 − min h∈H 1 n n i=1 I[h(xi ) = 0] + 1 n N i=n+1 I[h(xi ) = 1] Chakraborty, R. et al. (UWaterloo) May 12, 2019 13 / 35
  • 14. Proxy Distance ˆdH = 2 1 − min h∈H 1 n n i=1 I[h(xi ) = 0] + 1 n N i=n+1 I[h(xi ) = 1] We can approximate the empirical H-divergence ˆdH by training a neural network to classify between the source and the target domains. We construct a new dataset U, with source samples labelled 0 and the target samples labelled 1. The risk of a classifier trained on U can approximate the proxy distance between the two domains as ˆdA = 2(1 − 2 ). Chakraborty, R. et al. (UWaterloo) May 12, 2019 14 / 35
  • 15. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 15 / 35
  • 16. Shallow Neural Networks Objective is to learn a classification problem without discriminating between the source and the target domains. We take a hidden layer Gf that maps an input into a new D dimensional representation. We have a prediction layer which maps the D-dimensional representation into probability components of the corresponding classes; Gy : RD → [0, 1]. Given a source example (xi , yi ), the classification cross-entropy loss is Ly (Gy (Gf (xi )), yi ) = −log(Gy (Gf (x))yi ) Chakraborty, R. et al. (UWaterloo) May 12, 2019 16 / 35
  • 17. Domain Regularizer The optimization problem is min θf ,θy 1 n n i=1 Li y (θf , θy ) + λ.R(θf ) where R(θf ) is an optional regularizer weighted by hyper-parameter λ. Chakraborty, R. et al. (UWaterloo) May 12, 2019 17 / 35
  • 18. Domain Regularizer Empirical H-divergence ˆdH = 2 1 − min h∈H 1 n n i=1 I[h(xi ) = 0] + 1 n N i=n+1 I[h(xi ) = 1] We use a domain classification layer Gd that maps the D-dimensional representation from Gf into 0 and 1 for source and target respectively. Cross-entropy loss is Ld (Gd (Gf (xi ))) = −log(Gd (Gf (xi ))) The min part in ˆdH can be approximated by R(θd ) = max θd − 1 n n i=1 Li d (θf , θd ) + 1 n N i=n+1 Li d (θf , θd ) R(θd ) introduces a trade-off between the source risk and ˆdH and λ is used to tune the trade off. Chakraborty, R. et al. (UWaterloo) May 12, 2019 18 / 35
  • 19. Complete Optimization Problem E(θf , θy , θd ) = 1 n n i=1 Li y (θf , θy ) − λ 1 n n i=1 Li d (θf , θd ) + 1 n N i=n+1 Li d (θf , θd ) (ˆθf , ˆθy ) = argmin θf ,θy E(θf , θy , θd ) (ˆθd ) = argmax θd E(θf , θy , θd ) The discriminator competes to minimize the domain classification error in approximate ˆdH and the hidden layer competes to learn representation vectors irrespective of the domain. Chakraborty, R. et al. (UWaterloo) May 12, 2019 19 / 35
  • 20. Generalized Architectures Instead of the example shallow network, we can use more sophisticated architectures for both the feature extractor and the discriminator. Generalized weight updates, θf ← θf − µ ∂Li y ∂θf − λ ∂Li d ∂θf θy ← θy − µ ∂Li y ∂θf θd ← θd − µλ ∂Li d ∂θd Chakraborty, R. et al. (UWaterloo) May 12, 2019 20 / 35
  • 21. Generalized Architectures We see that for the classifier and the discriminator, the gradients are subtracted like usual gradient descent, but for the feature extractor the gradient of the discriminator loss is added. To implement this with standard gradient descent optimizers we add a gradient reversal layer which performs identity transformation during the forward pass and changes sign of the incoming gradient during backward propagation. Figure: Deep Adversarial Neural Network (DANN) Chakraborty, R. et al. (UWaterloo) May 12, 2019 21 / 35
  • 22. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 22 / 35
  • 23. Generative Models So far, we talked about how to filter discriminative features between two domains using adversarial learning. Domain adaptation is also done by generating synthetic data with common high level semantics between the two domains using Generative Adversarial Networks (GANs). A combination of discriminative and generative models is what encompasses GANs. GAN based domain adaptation technique involves source data, noise vectors or both to generate sample data which are similar to target data. Chakraborty, R. et al. (UWaterloo) May 12, 2019 23 / 35
  • 24. Generative Models Figure: GAN Architecture Chakraborty, R. et al. (UWaterloo) May 12, 2019 24 / 35
  • 25. Generative Models We will be discussing on two architectures of GAN on domain adaptation Coupled Generative Adversarial Network (CoGAN) Chakraborty, R. et al. (UWaterloo) May 12, 2019 25 / 35
  • 26. Generative Models We will be discussing on two architectures of GAN on domain adaptation Coupled Generative Adversarial Network (CoGAN) CyCADA: Cycle-Consistent Adversarial Domain Adaptation (CyCADA) Chakraborty, R. et al. (UWaterloo) May 12, 2019 25 / 35
  • 27. Generative Models CoGAN Figure: CoGAN architecture Chakraborty, R. et al. (UWaterloo) May 12, 2019 26 / 35
  • 28. Generative Models Objective Function of CoGAN: Chakraborty, R. et al. (UWaterloo) May 12, 2019 27 / 35
  • 29. Generative Models CoGAN CoGAN consists of two GANs- GAN1 and GAN2. GANs learn a joint distrubtion of multi-domain images by weight sharing between two generators and two discriminators. Both the GANs generate samples like source images and target images where common features are more prevalent. Chakraborty, R. et al. (UWaterloo) May 12, 2019 28 / 35
  • 30. Generative Models CyCADA Figure: CyCADA architecture Chakraborty, R. et al. (UWaterloo) May 12, 2019 29 / 35
  • 31. Generative Models CyCADA CyCADA generates source samples styled as target samples (It means the generator starts from the source image and tries to make it like target image). Other than GAN losses it incorporates other two important losses- Cycle Loss and Semantic Loss. It tries produce source target samples so that it preserve the features of original source samples and also corresponding class labels do not change. Chakraborty, R. et al. (UWaterloo) May 12, 2019 30 / 35
  • 32. Contents 1 Introduction 2 Domain Divergence 3 Proxy Distance 4 Adversarial Models 5 Generative models 6 Results Chakraborty, R. et al. (UWaterloo) May 12, 2019 31 / 35
  • 33. Results Model Dataset Source only DA DANN MNIST to MNIST-M 0.5185 0.7886 Coupled GAN MNIST to USPS 0.64 0.910 CyCADA MNIST to USPS 0.64 0.96 Table: Table caption Chakraborty, R. et al. (UWaterloo) May 12, 2019 32 / 35
  • 34. References I S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine learning, 79(1-2):151–175, 2010. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016. Chakraborty, R. et al. (UWaterloo) May 12, 2019 33 / 35
  • 35. References II J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017. M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In Advances in neural information processing systems, pages 469–477, 2016. M. Long, Z. Cao, J. Wang, and M. I. Jordan. Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems, pages 1640–1650, 2018. Chakraborty, R. et al. (UWaterloo) May 12, 2019 34 / 35
  • 36. Thank You Chakraborty, R. et al. (UWaterloo) May 12, 2019 35 / 35