Unit v transfer learning

Transfer Learning
Part I: Overview
Sinno Jialin Pan
Institute for Infocomm Research (I2R), Singapore

Transfer of Learning
A psychological point of view
• The study of dependency of human conduct,
learning or performance on prior experience.
• [Thorndike and Woodworth, 1901] explored how
individuals would transfer in one context to another
context that share similar characteristics.
 C++  Java
 Maths/Physics  Computer Science/Economics

Transfer Learning
In the machine learning community
• The ability of a system to recognize and apply
knowledge and skills learned in previous tasks to
novel tasks or new domains, which share some
commonality.
• Given a target task, how to identify the
commonality between the task and previous
(source) tasks, and transfer knowledge from the
previous tasks to the target one?

Fields of Transfer Learning
• Transfer learning for
reinforcement learning.
[Taylor and Stone, Transfer
Learning for Reinforcement
Learning Domains: A Survey,
JMLR 2009]
• Transfer learning for
classification and
regression problems.
[Pan and Yang, A Survey on
Transfer Learning, IEEE TKDE
2009]
Focus!

Motivating Example I:
Indoor WiFi localization
-30dBm -70dBm -40dBm

Indoor WiFi Localization (cont.)
Training
Training
Localization
model
Test
Time Period A
Localization
model
Test
Time Period B
~1.5 meters
~6 meters
Time Period A
Time Period A
Average Error
Distance
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
Drop!

Indoor WiFi Localization (cont.)
Training
Training Test
Device A
Test
Device B
~ 1.5 meters
~10 meters
Device A
Device A
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-33dbm, .., -82dbm), L=(1, 3)
…
S=(-57dbm, .., -63dbm), L=(10, 23)
Localization
model
Localization
model
Drop!
Average Error
Distance

Difference between Tasks/Domains
8
Time Period A Time Period B
Device B
Device A

Motivating Example II:
Sentiment Classification

Sentiment Classification (cont.)
Training
Training Test
Electronics
Test
~ 84.6%
~72.65%
Sentiment
Classifier
Sentiment
Classifier
Drop!
Electronics
Classification
Accuracy
ElectronicsDVD

Difference between Tasks/Domains
11
Electronics Video Games
(1) Compact; easy to operate;
very good picture quality;
looks sharp!
(2) A very good game! It is
action packed and full of
excitement. I am very much
hooked on this game.
(3) I purchased this unit from
Circuit City and I was very
excited about the quality of the
picture. It is really nice and
sharp.
(4) Very realistic shooting
action and good plots. We
played this and were hooked.
(5) It is also quite blurry in
very dark settings. I will never
buy HP again.
(6) The game is so boring. I
am extremely unhappy and will
probably never buy UbiSoft
again.

A Major Assumption
Training and future (test) data come from
a same task and a same domain.
Represented in same feature and label
spaces.
Follow a same distribution.

Training
The Goal of Transfer Learning
Training
Classification or
Regression Models
DVD
Source
Tasks/Domains
Device B
Time Period B
A few labeled
training data
Electronics
Time Period A
Device A
Electronics
Time Period A
Device A
Target Task/Domain
Target Task/Domain

Transfer
Learning
Heterogeneous
Transfer Learning
Transfer learning
settings
Inductive Transfer Learning
Feature
space
Heterogeneous
Tasks
Single-Task Transfer Learning
Identical
Homogeneous
Different
Domain Adaption
Sample Selection Bias
/ Covariate Shift
Multi-Task Learning
Domain difference is caused
by feature representations
Domain difference is
caused by sample bias
Tasks are learned simultaneously
Focus on optimizing a target task

Tasks
Identical Different
Domain Adaption
/ Covariate Shift
Multi-Task Learning
Assumption

Case 1 Case 2
Domain Adaption in NLP
Sample Selection Bias /
Covariate Shift
Instance-based Transfer Learning
Approaches
Feature-based Transfer Learning
Approaches

Case 1
Sample Selection Bias /
Covariate Shift
Instance-based Transfer
Learning Approaches
Problem Setting
Assumption

Instance-based Approaches
Recall, given a target task,

Instance-based Approaches (cont.)

Assumption:

Sample Selection Bias / Covariate Shift
[Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]

Feature-based Approaches
Case 2 Problem Setting
Explicit/Implicit Assumption

Feature-based Approaches (cont.)
How to learn ?
 Solution 1: Encode domain knowledge to learn the
transformation.
 Solution 2: Learn the transformation by designing
objective functions to minimize difference directly.

25
Solution 1: Encode domain knowledge to learn the transformation
Electronics Video Games
(1) Compact; easy to operate;
very good picture quality;
looks sharp!
(2) A very good game! It is
action packed and full of
excitement. I am very much
hooked on this game.
(3) I purchased this unit from
Circuit City and I was very
excited about the quality of the
picture. It is really nice and
sharp.
(4) Very realistic shooting
action and good plots. We
played this and were hooked.
(5) It is also quite blurry in
very dark settings. I will
never_buy HP again.
(6) The game is so boring. I
am extremely unhappy and
will probably never_buy
UbiSoft again.

Common features
26
Solution 1: Encode domain knowledge to learn the transformation (cont.)
boring
realistic
hooked
blurry
sharp
compact never_buy
good
exciting
Video game domain
specific features
Electronics Domain
specific features
blurry
never_buy
boring
exciting
sharp
hooked
compact
realisticgood

Solution 1: Encode domain knowledge to learn the transformation (cont.)
 How to select good pivot features is an open
problem.
 Mutual Information on source domain labeled data
 Term frequency on both source and target domain data.
 How to estimate correlations between pivot and
domain specific features?
 Structural Correspondence Learning (SCL) [Biltzer etal. 2006]
 Spectral Feature Alignment (SFA) [Pan etal. 2010]
27

Solution 2: learning the transformation without domain knowledge
TargetSource
Latent factors
Temperature Signal
properties
Building
structure
Power
of APs

Solution 2: learning the transformation without domain knowledge
TargetSource
Latent factors
Temperature Signal
properties
Building
structure
Power
of APs
Cause the data distributions between domains different

Solution 2: learning the transformation without domain knowledge (cont.)
TargetSource
Signal
properties
Principal
components
Noisy
component
Building
structure

Solution 2: learning the transformation without domain knowledge (cont.)
Learning by only minimizing distance between
distributions may map the data to noisy factors.
31

Transfer Component Analysis [Pan etal., 2009]
Main idea: the learned should map the source and
target domain data to the latent space spanned by the
factors which can reduce domain difference and
preserve original data structure.
32
High level optimization problem

Maximum Mean Discrepancy (MMD)
[Alex Smola, Arthur Gretton and Kenji Kukumizu, ICML-08 tutorial]

Transfer Component Analysis (cont.)

To maximize the
data variance
To minimize the distance
between domains
To preserve the local
geometric structure
 It is a SDP problem, expensive!
 It is transductive, cannot generalize on unseen instances!
 PCA is post-processed on the learned kernel matrix, which may
potentially discard useful information.
[Pan etal., 2008]

Empirical kernel map
Resultant parametric
kernel
Out-of-sample
kernel evaluation

To minimize the distance
between domains
Regularization on W
To maximize the
data variance

Tasks
Identical Different
Domain Adaption
/ Covariate Shift
Multi-Task Learning
Multi-Task Learning
Assumption
Problem Setting

Approaches
Approaches
Parameter-based Transfer
Learning Approaches
Modified from Multi-Task
Learning Methods
Target-Task-Driven Transfer
Learning Methods
Self-Taught Learning
Methods

Multi-Task Learning Methods
Approaches
Learning Approaches
Modified from Multi-Task
Learning Methods
Setting

Recall that for each task (source or target)
Tasks are learned
independently
Motivation of Multi-Task Learning:
Can the related tasks be learned jointly?
Which kind of commonality can be used across tasks?

-- Parameter-based approaches
Assumption:
If tasks are related, they should share similar parameter vectors.
For example [Evgeniou and Pontil, 2004]
Common part
Specific part for
individual task

-- Parameter-based approaches (cont.)

-- Parameter-based approaches (summary)
A general framework:
[Zhang and Yeung, 2010] [Saha etal, 2010]

-- Feature-based approaches
Assumption:
If tasks are related, they should share some good common features.
Goal:
Learn a low-dimensional representation shared across related tasks.

-- Feature-based approaches (cont.)
[Argyriou etal., 2007]

Illustration

[Ando and Zhang, 2005]
[Ji etal, 2008]

Approaches
Approaches
Target-Task-Driven Transfer
Learning Methods
Self-Taught Learning
Methods

Self-taught Learning Methods
-- Feature-based approaches
Motivation:
There exist some higher-level features that can help the target
learning task even only a few labeled data are given.
Steps:
1, Learn higher-level features from a lot of unlabeled data from
the source tasks.
2, Use the learned higher-level features to represent the data of the
target task.
3, Training models from the new representations of the target task
with corresponding labels.

Self-taught Learning Methods
Higher-level feature construction
Solution 1: Sparse Coding [Raina etal., 2007]
Solution 2: Deep learning [Glorot etal., 2011]

Target-Task-Driven Methods
-- Instance-based approaches
Intuition
Assumption
Part of the labeled data from
the source domain can be
reused after re-weighting
Main Idea
TrAdaBoost [Dai etal 2007]
For each boosting iteration,
 Use the same strategy as
AdaBoost to update the weights of
target domain data.
 Propose a new mechanism to
decrease the weights of
misclassified source domain data.

Summary
Tasks
Identical Different
Feature-based Transfer
Learning Approaches
Learning Approaches
Learning Approaches
Feature-based Transfer
Learning Approaches
Learning Approaches

Some Research Issues
 How to avoid negative transfer? Given a target
domain/task, how to find source domains/tasks to
ensure positive transfer.
 Transfer learning meets active learning
 Given a specific application, which kind of
transfer learning methods should be used.

Reference
 [Thorndike and Woodworth, The Influence of Improvement in one
mental function upon the efficiency of the other functions, 1901]
 [Taylor and Stone, Transfer Learning for Reinforcement Learning
Domains: A Survey, JMLR 2009]
 [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009]
 [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press
2009]
 [Biltzer etal.. Domain Adaptation with Structural Correspondence
Learning, EMNLP 2006]
 [Pan etal., Cross-Domain Sentiment Classification via Spectral Feature
Alignment, WWW 2010]
 [Pan etal., Transfer Learning via Dimensionality Reduction, AAAI
2008]

Reference (cont.)
 [Pan etal., Domain Adaptation via Transfer Component Analysis,
IJCAI 2009]
 [Evgeniou and Pontil, Regularized Multi-Task Learning, KDD 2004]
 [Zhang and Yeung, A Convex Formulation for Learning Task
Relationships in Multi-Task Learning, UAI 2010]
 [Saha etal, Learning Multiple Tasks using Manifold Regularization,
NIPS 2010]
 [Argyriou etal., Multi-Task Feature Learning, NIPS 2007]
 [Ando and Zhang, A Framework for Learning Predictive Structures
from Multiple Tasks and Unlabeled Data, JMLR 2005]
 [Ji etal, Extracting Shared Subspace for Multi-label Classification,
KDD 2008]

Reference (cont.)
 [Raina etal., Self-taught Learning: Transfer Learning from Unlabeled
Data, ICML 2007]
 [Dai etal., Boosting for Transfer Learning, ICML 2007]
 [Glorot etal., Domain Adaptation for Large-Scale Sentiment
Classification: A Deep Learning Approach, ICML 2011]

Unit v transfer learning

More Related Content

What's hot (17)

Similar to Unit v transfer learning (20)

Recently uploaded (20)

Unit v transfer learning