SlideShare a Scribd company logo
Transfer Learning
Part I: Overview
Sinno Jialin Pan
Institute for Infocomm Research (I2R), Singapore
Transfer of Learning
A psychological point of view
• The study of dependency of human conduct,
learning or performance on prior experience.
• [Thorndike and Woodworth, 1901] explored how
individuals would transfer in one context to another
context that share similar characteristics.
 C++  Java
 Maths/Physics  Computer Science/Economics
Transfer Learning
In the machine learning community
• The ability of a system to recognize and apply
knowledge and skills learned in previous tasks to
novel tasks or new domains, which share some
commonality.
• Given a target task, how to identify the
commonality between the task and previous
(source) tasks, and transfer knowledge from the
previous tasks to the target one?
Fields of Transfer Learning
• Transfer learning for
reinforcement learning.
[Taylor and Stone, Transfer
Learning for Reinforcement
Learning Domains: A Survey,
JMLR 2009]
• Transfer learning for
classification and
regression problems.
[Pan and Yang, A Survey on
Transfer Learning, IEEE TKDE
2009]
Focus!
Motivating Example I:
Indoor WiFi localization
-30dBm -70dBm -40dBm
Indoor WiFi Localization (cont.)
Training
Training
Localization
model
Test
Time Period A
Localization
model
Test
Time Period B
~1.5 meters
~6 meters
Time Period A
Time Period A
Average Error
Distance
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
Drop!
Indoor WiFi Localization (cont.)
Training
Training Test
Device A
Test
Device B
~ 1.5 meters
~10 meters
Device A
Device A
S=(-37dbm, .., -77dbm), L=(1, 3)
S=(-41dbm, .., -83dbm), L=(1, 4)
…
S=(-49dbm, .., -34dbm), L=(9, 10)
S=(-61dbm, .., -28dbm), L=(15,22)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-37dbm, .., -77dbm)
S=(-41dbm, .., -83dbm)
…
S=(-49dbm, .., -34dbm)
S=(-61dbm, .., -28dbm)
S=(-33dbm, .., -82dbm), L=(1, 3)
…
S=(-57dbm, .., -63dbm), L=(10, 23)
Localization
model
Localization
model
Drop!
Average Error
Distance
Difference between Tasks/Domains
8
Time Period A Time Period B
Device B
Device A
Motivating Example II:
Sentiment Classification
Sentiment Classification (cont.)
Training
Training Test
Electronics
Test
~ 84.6%
~72.65%
Sentiment
Classifier
Sentiment
Classifier
Drop!
Electronics
Classification
Accuracy
ElectronicsDVD
Difference between Tasks/Domains
11
Electronics Video Games
(1) Compact; easy to operate;
very good picture quality;
looks sharp!
(2) A very good game! It is
action packed and full of
excitement. I am very much
hooked on this game.
(3) I purchased this unit from
Circuit City and I was very
excited about the quality of the
picture. It is really nice and
sharp.
(4) Very realistic shooting
action and good plots. We
played this and were hooked.
(5) It is also quite blurry in
very dark settings. I will never
buy HP again.
(6) The game is so boring. I
am extremely unhappy and will
probably never buy UbiSoft
again.
A Major Assumption
Training and future (test) data come from
a same task and a same domain.
Represented in same feature and label
spaces.
Follow a same distribution.
Training
The Goal of Transfer Learning
Training
Classification or
Regression Models
DVD
Source
Tasks/Domains
Device B
Time Period B
A few labeled
training data
Electronics
Time Period A
Device A
Electronics
Time Period A
Device A
Target Task/Domain
Target Task/Domain
Notations
Domain: Task:
Transfer
Learning
Heterogeneous
Transfer Learning
Transfer learning
settings
Inductive Transfer Learning
Feature
space
Heterogeneous
Tasks
Single-Task Transfer Learning
Identical
Homogeneous
Different
Domain Adaption
Sample Selection Bias
/ Covariate Shift
Multi-Task Learning
Domain difference is caused
by feature representations
Domain difference is
caused by sample bias
Tasks are learned simultaneously
Focus on optimizing a target task
Inductive Transfer Learning
Tasks
Single-Task Transfer Learning
Identical Different
Domain Adaption
Sample Selection Bias
/ Covariate Shift
Multi-Task Learning
Domain difference is caused
by feature representations
Domain difference is
caused by sample bias
Tasks are learned simultaneously
Focus on optimizing a target task
Assumption
Single-Task Transfer Learning
Case 1 Case 2
Domain Adaption in NLP
Sample Selection Bias /
Covariate Shift
Instance-based Transfer Learning
Approaches
Feature-based Transfer Learning
Approaches
Single-Task Transfer Learning
Case 1
Sample Selection Bias /
Covariate Shift
Instance-based Transfer
Learning Approaches
Problem Setting
Assumption
Single-Task Transfer Learning
Instance-based Approaches
Recall, given a target task,
Single-Task Transfer Learning
Instance-based Approaches (cont.)
Single-Task Transfer Learning
Instance-based Approaches (cont.)
Assumption:
Single-Task Transfer Learning
Instance-based Approaches (cont.)
Sample Selection Bias / Covariate Shift
[Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]
Single-Task Transfer Learning
Feature-based Approaches
Case 2 Problem Setting
Explicit/Implicit Assumption
Single-Task Transfer Learning
Feature-based Approaches (cont.)
How to learn ?
 Solution 1: Encode domain knowledge to learn the
transformation.
 Solution 2: Learn the transformation by designing
objective functions to minimize difference directly.
25
Single-Task Transfer Learning
Solution 1: Encode domain knowledge to learn the transformation
Electronics Video Games
(1) Compact; easy to operate;
very good picture quality;
looks sharp!
(2) A very good game! It is
action packed and full of
excitement. I am very much
hooked on this game.
(3) I purchased this unit from
Circuit City and I was very
excited about the quality of the
picture. It is really nice and
sharp.
(4) Very realistic shooting
action and good plots. We
played this and were hooked.
(5) It is also quite blurry in
very dark settings. I will
never_buy HP again.
(6) The game is so boring. I
am extremely unhappy and
will probably never_buy
UbiSoft again.
Common features
26
Single-Task Transfer Learning
Solution 1: Encode domain knowledge to learn the transformation (cont.)
boring
realistic
hooked
blurry
sharp
compact never_buy
good
exciting
Video game domain
specific features
Electronics Domain
specific features
blurry
never_buy
boring
exciting
sharp
hooked
compact
realisticgood
Single-Task Transfer Learning
Solution 1: Encode domain knowledge to learn the transformation (cont.)
 How to select good pivot features is an open
problem.
 Mutual Information on source domain labeled data
 Term frequency on both source and target domain data.
 How to estimate correlations between pivot and
domain specific features?
 Structural Correspondence Learning (SCL) [Biltzer etal. 2006]
 Spectral Feature Alignment (SFA) [Pan etal. 2010]
27
Single-Task Transfer Learning
Solution 2: learning the transformation without domain knowledge
TargetSource
Latent factors
Temperature Signal
properties
Building
structure
Power
of APs
Single-Task Transfer Learning
Solution 2: learning the transformation without domain knowledge
TargetSource
Latent factors
Temperature Signal
properties
Building
structure
Power
of APs
Cause the data distributions between domains different
Single-Task Transfer Learning
Solution 2: learning the transformation without domain knowledge (cont.)
TargetSource
Signal
properties
Principal
components
Noisy
component
Building
structure
Single-Task Transfer Learning
Solution 2: learning the transformation without domain knowledge (cont.)
Learning by only minimizing distance between
distributions may map the data to noisy factors.
31
Single-Task Transfer Learning
Transfer Component Analysis [Pan etal., 2009]
Main idea: the learned should map the source and
target domain data to the latent space spanned by the
factors which can reduce domain difference and
preserve original data structure.
32
High level optimization problem
Single-Task Transfer Learning
Maximum Mean Discrepancy (MMD)
[Alex Smola, Arthur Gretton and Kenji Kukumizu, ICML-08 tutorial]
Single-Task Transfer Learning
Transfer Component Analysis (cont.)
Single-Task Transfer Learning
Transfer Component Analysis (cont.)
Single-Task Transfer Learning
Transfer Component Analysis (cont.)
To maximize the
data variance
To minimize the distance
between domains
To preserve the local
geometric structure
 It is a SDP problem, expensive!
 It is transductive, cannot generalize on unseen instances!
 PCA is post-processed on the learned kernel matrix, which may
potentially discard useful information.
[Pan etal., 2008]
Single-Task Transfer Learning
Transfer Component Analysis (cont.)
Empirical kernel map
Resultant parametric
kernel
Out-of-sample
kernel evaluation
Single-Task Transfer Learning
Transfer Component Analysis (cont.)
To minimize the distance
between domains
Regularization on W
To maximize the
data variance
Inductive Transfer Learning
Tasks
Single-Task Transfer Learning
Identical Different
Domain Adaption
Sample Selection Bias
/ Covariate Shift
Multi-Task Learning
Domain difference is caused
by feature representations
Domain difference is
caused by sample bias
Tasks are learned simultaneously
Focus on optimizing a target task
Inductive Transfer Learning
Multi-Task Learning
Tasks are learned simultaneously
Focus on optimizing a target task
Assumption
Problem Setting
Inductive Transfer Learning
Instance-based Transfer Learning
Approaches
Feature-based Transfer Learning
Approaches
Parameter-based Transfer
Learning Approaches
Modified from Multi-Task
Learning Methods
Target-Task-Driven Transfer
Learning Methods
Self-Taught Learning
Methods
Inductive Transfer Learning
Multi-Task Learning Methods
Feature-based Transfer Learning
Approaches
Parameter-based Transfer
Learning Approaches
Modified from Multi-Task
Learning Methods
Setting
Inductive Transfer Learning
Multi-Task Learning Methods
Recall that for each task (source or target)
Tasks are learned
independently
Motivation of Multi-Task Learning:
Can the related tasks be learned jointly?
Which kind of commonality can be used across tasks?
Inductive Transfer Learning
Multi-Task Learning Methods
-- Parameter-based approaches
Assumption:
If tasks are related, they should share similar parameter vectors.
For example [Evgeniou and Pontil, 2004]
Common part
Specific part for
individual task
Inductive Transfer Learning
Multi-Task Learning Methods
-- Parameter-based approaches (cont.)
Inductive Transfer Learning
Multi-Task Learning Methods
-- Parameter-based approaches (summary)
A general framework:
[Zhang and Yeung, 2010] [Saha etal, 2010]
Inductive Transfer Learning
Multi-Task Learning Methods
-- Feature-based approaches
Assumption:
If tasks are related, they should share some good common features.
Goal:
Learn a low-dimensional representation shared across related tasks.
Inductive Transfer Learning
Multi-Task Learning Methods
-- Feature-based approaches (cont.)
[Argyriou etal., 2007]
Inductive Transfer Learning
Multi-Task Learning Methods
-- Feature-based approaches (cont.)
Illustration
Inductive Transfer Learning
Multi-Task Learning Methods
-- Feature-based approaches (cont.)
Inductive Transfer Learning
Multi-Task Learning Methods
-- Feature-based approaches (cont.)
[Ando and Zhang, 2005]
[Ji etal, 2008]
Inductive Transfer Learning
Instance-based Transfer Learning
Approaches
Feature-based Transfer Learning
Approaches
Target-Task-Driven Transfer
Learning Methods
Self-Taught Learning
Methods
Inductive Transfer Learning
Self-taught Learning Methods
-- Feature-based approaches
Motivation:
There exist some higher-level features that can help the target
learning task even only a few labeled data are given.
Steps:
1, Learn higher-level features from a lot of unlabeled data from
the source tasks.
2, Use the learned higher-level features to represent the data of the
target task.
3, Training models from the new representations of the target task
with corresponding labels.
Inductive Transfer Learning
Self-taught Learning Methods
-- Feature-based approaches (cont.)
Higher-level feature construction
Solution 1: Sparse Coding [Raina etal., 2007]
Solution 2: Deep learning [Glorot etal., 2011]
Inductive Transfer Learning
Target-Task-Driven Methods
-- Instance-based approaches
Intuition
Assumption
Part of the labeled data from
the source domain can be
reused after re-weighting
Main Idea
TrAdaBoost [Dai etal 2007]
For each boosting iteration,
 Use the same strategy as
AdaBoost to update the weights of
target domain data.
 Propose a new mechanism to
decrease the weights of
misclassified source domain data.
Summary
Inductive Transfer Learning
Tasks
Single-Task Transfer Learning
Identical Different
Feature-based Transfer
Learning Approaches
Instance-based Transfer
Learning Approaches
Parameter-based Transfer
Learning Approaches
Feature-based Transfer
Learning Approaches
Instance-based Transfer
Learning Approaches
Some Research Issues
 How to avoid negative transfer? Given a target
domain/task, how to find source domains/tasks to
ensure positive transfer.
 Transfer learning meets active learning
 Given a specific application, which kind of
transfer learning methods should be used.
Reference
 [Thorndike and Woodworth, The Influence of Improvement in one
mental function upon the efficiency of the other functions, 1901]
 [Taylor and Stone, Transfer Learning for Reinforcement Learning
Domains: A Survey, JMLR 2009]
 [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009]
 [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press
2009]
 [Biltzer etal.. Domain Adaptation with Structural Correspondence
Learning, EMNLP 2006]
 [Pan etal., Cross-Domain Sentiment Classification via Spectral Feature
Alignment, WWW 2010]
 [Pan etal., Transfer Learning via Dimensionality Reduction, AAAI
2008]
Reference (cont.)
 [Pan etal., Domain Adaptation via Transfer Component Analysis,
IJCAI 2009]
 [Evgeniou and Pontil, Regularized Multi-Task Learning, KDD 2004]
 [Zhang and Yeung, A Convex Formulation for Learning Task
Relationships in Multi-Task Learning, UAI 2010]
 [Saha etal, Learning Multiple Tasks using Manifold Regularization,
NIPS 2010]
 [Argyriou etal., Multi-Task Feature Learning, NIPS 2007]
 [Ando and Zhang, A Framework for Learning Predictive Structures
from Multiple Tasks and Unlabeled Data, JMLR 2005]
 [Ji etal, Extracting Shared Subspace for Multi-label Classification,
KDD 2008]
Reference (cont.)
 [Raina etal., Self-taught Learning: Transfer Learning from Unlabeled
Data, ICML 2007]
 [Dai etal., Boosting for Transfer Learning, ICML 2007]
 [Glorot etal., Domain Adaptation for Large-Scale Sentiment
Classification: A Deep Learning Approach, ICML 2011]
Thank You

More Related Content

PDF
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
PDF
Imperceptible and secure image watermarking using DCT and random spread techn...
PDF
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
PDF
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
PDF
Neural Network as a function
PDF
Deep learning
PDF
Understanding Convolutional Neural Networks
PDF
Effective Compression of Digital Video
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
Imperceptible and secure image watermarking using DCT and random spread techn...
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Neural Network as a function
Deep learning
Understanding Convolutional Neural Networks
Effective Compression of Digital Video

What's hot (17)

PDF
www.ijerd.com
PDF
Ik3415621565
PDF
E1083237
PDF
Image compression using embedded zero tree wavelet
PDF
Digital Image Watermarking Basics
DOCX
image compression using matlab project report
PPTX
Convolutional neural networks deepa
PDF
Offline Character Recognition Using Monte Carlo Method and Neural Network
PDF
Fk35963966
PDF
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
Wavelet Based Image Watermarking
PDF
A systematic image compression in the combination of linear vector quantisati...
PDF
Digital Image Processing
PDF
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
PDF
Jv2517361741
PDF
A Secure Color Image Steganography in Transform Domain
www.ijerd.com
Ik3415621565
E1083237
Image compression using embedded zero tree wavelet
Digital Image Watermarking Basics
image compression using matlab project report
Convolutional neural networks deepa
Offline Character Recognition Using Monte Carlo Method and Neural Network
Fk35963966
Use of Wavelet Transform Extension for Graphics Image Compression using JPEG2...
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Wavelet Based Image Watermarking
A systematic image compression in the combination of linear vector quantisati...
Digital Image Processing
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Jv2517361741
A Secure Color Image Steganography in Transform Domain
Ad

Similar to Unit v transfer learning (20)

PDF
Transfer Learning: An overview
PDF
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
PDF
Transfer Learning for Improving Model Predictions in Robotic Systems
PPTX
Siguccs20101026
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PDF
IPT.pdf
PDF
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
PPTX
OR Ndejje Univ (1).pptx
PPTX
OR Ndejje Univ.pptx
PPTX
Developing Tools for “What if…” Testing of Large-scale Software Systems
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
Learning Theory 101 ...and Towards Learning the Flat Minima
PPTX
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
PPT
Introduction to Machine Vision
PDF
Conv xg
PDF
A Beginner's Guide to Monocular Depth Estimation
PPTX
Fcv rep darrell
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PPT
CS3114_09212011.ppt
PDF
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
Transfer Learning: An overview
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning for Improving Model Predictions in Robotic Systems
Siguccs20101026
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
IPT.pdf
27332020002_PC-CS601_Robotics_Debjit Doira.pdf
OR Ndejje Univ (1).pptx
OR Ndejje Univ.pptx
Developing Tools for “What if…” Testing of Large-scale Software Systems
Deep learning for molecules, introduction to chainer chemistry
Learning Theory 101 ...and Towards Learning the Flat Minima
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
Introduction to Machine Vision
Conv xg
A Beginner's Guide to Monocular Depth Estimation
Fcv rep darrell
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
CS3114_09212011.ppt
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
Ad

Recently uploaded (20)

PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PPTX
introduction to high performance computing
PPTX
Artificial Intelligence
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT
Total quality management ppt for engineering students
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Current and future trends in Computer Vision.pptx
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
737-MAX_SRG.pdf student reference guides
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
introduction to high performance computing
Artificial Intelligence
Safety Seminar civil to be ensured for safe working.
Nature of X-rays, X- Ray Equipment, Fluoroscopy
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Total quality management ppt for engineering students
UNIT - 3 Total quality Management .pptx
Categorization of Factors Affecting Classification Algorithms Selection
Current and future trends in Computer Vision.pptx
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Visual Aids for Exploratory Data Analysis.pdf
III.4.1.2_The_Space_Environment.p pdffdf
737-MAX_SRG.pdf student reference guides
Exploratory_Data_Analysis_Fundamentals.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF

Unit v transfer learning

  • 1. Transfer Learning Part I: Overview Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore
  • 2. Transfer of Learning A psychological point of view • The study of dependency of human conduct, learning or performance on prior experience. • [Thorndike and Woodworth, 1901] explored how individuals would transfer in one context to another context that share similar characteristics.  C++  Java  Maths/Physics  Computer Science/Economics
  • 3. Transfer Learning In the machine learning community • The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains, which share some commonality. • Given a target task, how to identify the commonality between the task and previous (source) tasks, and transfer knowledge from the previous tasks to the target one?
  • 4. Fields of Transfer Learning • Transfer learning for reinforcement learning. [Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009] • Transfer learning for classification and regression problems. [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009] Focus!
  • 5. Motivating Example I: Indoor WiFi localization -30dBm -70dBm -40dBm
  • 6. Indoor WiFi Localization (cont.) Training Training Localization model Test Time Period A Localization model Test Time Period B ~1.5 meters ~6 meters Time Period A Time Period A Average Error Distance S=(-37dbm, .., -77dbm), L=(1, 3) S=(-41dbm, .., -83dbm), L=(1, 4) … S=(-49dbm, .., -34dbm), L=(9, 10) S=(-61dbm, .., -28dbm), L=(15,22) S=(-37dbm, .., -77dbm), L=(1, 3) S=(-41dbm, .., -83dbm), L=(1, 4) … S=(-49dbm, .., -34dbm), L=(9, 10) S=(-61dbm, .., -28dbm), L=(15,22) S=(-37dbm, .., -77dbm) S=(-41dbm, .., -83dbm) … S=(-49dbm, .., -34dbm) S=(-61dbm, .., -28dbm) S=(-37dbm, .., -77dbm) S=(-41dbm, .., -83dbm) … S=(-49dbm, .., -34dbm) S=(-61dbm, .., -28dbm) Drop!
  • 7. Indoor WiFi Localization (cont.) Training Training Test Device A Test Device B ~ 1.5 meters ~10 meters Device A Device A S=(-37dbm, .., -77dbm), L=(1, 3) S=(-41dbm, .., -83dbm), L=(1, 4) … S=(-49dbm, .., -34dbm), L=(9, 10) S=(-61dbm, .., -28dbm), L=(15,22) S=(-37dbm, .., -77dbm) S=(-41dbm, .., -83dbm) … S=(-49dbm, .., -34dbm) S=(-61dbm, .., -28dbm) S=(-37dbm, .., -77dbm) S=(-41dbm, .., -83dbm) … S=(-49dbm, .., -34dbm) S=(-61dbm, .., -28dbm) S=(-33dbm, .., -82dbm), L=(1, 3) … S=(-57dbm, .., -63dbm), L=(10, 23) Localization model Localization model Drop! Average Error Distance
  • 8. Difference between Tasks/Domains 8 Time Period A Time Period B Device B Device A
  • 10. Sentiment Classification (cont.) Training Training Test Electronics Test ~ 84.6% ~72.65% Sentiment Classifier Sentiment Classifier Drop! Electronics Classification Accuracy ElectronicsDVD
  • 11. Difference between Tasks/Domains 11 Electronics Video Games (1) Compact; easy to operate; very good picture quality; looks sharp! (2) A very good game! It is action packed and full of excitement. I am very much hooked on this game. (3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp. (4) Very realistic shooting action and good plots. We played this and were hooked. (5) It is also quite blurry in very dark settings. I will never buy HP again. (6) The game is so boring. I am extremely unhappy and will probably never buy UbiSoft again.
  • 12. A Major Assumption Training and future (test) data come from a same task and a same domain. Represented in same feature and label spaces. Follow a same distribution.
  • 13. Training The Goal of Transfer Learning Training Classification or Regression Models DVD Source Tasks/Domains Device B Time Period B A few labeled training data Electronics Time Period A Device A Electronics Time Period A Device A Target Task/Domain Target Task/Domain
  • 15. Transfer Learning Heterogeneous Transfer Learning Transfer learning settings Inductive Transfer Learning Feature space Heterogeneous Tasks Single-Task Transfer Learning Identical Homogeneous Different Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task
  • 16. Inductive Transfer Learning Tasks Single-Task Transfer Learning Identical Different Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task Assumption
  • 17. Single-Task Transfer Learning Case 1 Case 2 Domain Adaption in NLP Sample Selection Bias / Covariate Shift Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches
  • 18. Single-Task Transfer Learning Case 1 Sample Selection Bias / Covariate Shift Instance-based Transfer Learning Approaches Problem Setting Assumption
  • 19. Single-Task Transfer Learning Instance-based Approaches Recall, given a target task,
  • 21. Single-Task Transfer Learning Instance-based Approaches (cont.) Assumption:
  • 22. Single-Task Transfer Learning Instance-based Approaches (cont.) Sample Selection Bias / Covariate Shift [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]
  • 23. Single-Task Transfer Learning Feature-based Approaches Case 2 Problem Setting Explicit/Implicit Assumption
  • 24. Single-Task Transfer Learning Feature-based Approaches (cont.) How to learn ?  Solution 1: Encode domain knowledge to learn the transformation.  Solution 2: Learn the transformation by designing objective functions to minimize difference directly.
  • 25. 25 Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation Electronics Video Games (1) Compact; easy to operate; very good picture quality; looks sharp! (2) A very good game! It is action packed and full of excitement. I am very much hooked on this game. (3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp. (4) Very realistic shooting action and good plots. We played this and were hooked. (5) It is also quite blurry in very dark settings. I will never_buy HP again. (6) The game is so boring. I am extremely unhappy and will probably never_buy UbiSoft again.
  • 26. Common features 26 Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation (cont.) boring realistic hooked blurry sharp compact never_buy good exciting Video game domain specific features Electronics Domain specific features blurry never_buy boring exciting sharp hooked compact realisticgood
  • 27. Single-Task Transfer Learning Solution 1: Encode domain knowledge to learn the transformation (cont.)  How to select good pivot features is an open problem.  Mutual Information on source domain labeled data  Term frequency on both source and target domain data.  How to estimate correlations between pivot and domain specific features?  Structural Correspondence Learning (SCL) [Biltzer etal. 2006]  Spectral Feature Alignment (SFA) [Pan etal. 2010] 27
  • 28. Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge TargetSource Latent factors Temperature Signal properties Building structure Power of APs
  • 29. Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge TargetSource Latent factors Temperature Signal properties Building structure Power of APs Cause the data distributions between domains different
  • 30. Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge (cont.) TargetSource Signal properties Principal components Noisy component Building structure
  • 31. Single-Task Transfer Learning Solution 2: learning the transformation without domain knowledge (cont.) Learning by only minimizing distance between distributions may map the data to noisy factors. 31
  • 32. Single-Task Transfer Learning Transfer Component Analysis [Pan etal., 2009] Main idea: the learned should map the source and target domain data to the latent space spanned by the factors which can reduce domain difference and preserve original data structure. 32 High level optimization problem
  • 33. Single-Task Transfer Learning Maximum Mean Discrepancy (MMD) [Alex Smola, Arthur Gretton and Kenji Kukumizu, ICML-08 tutorial]
  • 34. Single-Task Transfer Learning Transfer Component Analysis (cont.)
  • 35. Single-Task Transfer Learning Transfer Component Analysis (cont.)
  • 36. Single-Task Transfer Learning Transfer Component Analysis (cont.) To maximize the data variance To minimize the distance between domains To preserve the local geometric structure  It is a SDP problem, expensive!  It is transductive, cannot generalize on unseen instances!  PCA is post-processed on the learned kernel matrix, which may potentially discard useful information. [Pan etal., 2008]
  • 37. Single-Task Transfer Learning Transfer Component Analysis (cont.) Empirical kernel map Resultant parametric kernel Out-of-sample kernel evaluation
  • 38. Single-Task Transfer Learning Transfer Component Analysis (cont.) To minimize the distance between domains Regularization on W To maximize the data variance
  • 39. Inductive Transfer Learning Tasks Single-Task Transfer Learning Identical Different Domain Adaption Sample Selection Bias / Covariate Shift Multi-Task Learning Domain difference is caused by feature representations Domain difference is caused by sample bias Tasks are learned simultaneously Focus on optimizing a target task Inductive Transfer Learning Multi-Task Learning Tasks are learned simultaneously Focus on optimizing a target task Assumption Problem Setting
  • 40. Inductive Transfer Learning Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Modified from Multi-Task Learning Methods Target-Task-Driven Transfer Learning Methods Self-Taught Learning Methods
  • 41. Inductive Transfer Learning Multi-Task Learning Methods Feature-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Modified from Multi-Task Learning Methods Setting
  • 42. Inductive Transfer Learning Multi-Task Learning Methods Recall that for each task (source or target) Tasks are learned independently Motivation of Multi-Task Learning: Can the related tasks be learned jointly? Which kind of commonality can be used across tasks?
  • 43. Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches Assumption: If tasks are related, they should share similar parameter vectors. For example [Evgeniou and Pontil, 2004] Common part Specific part for individual task
  • 44. Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches (cont.)
  • 45. Inductive Transfer Learning Multi-Task Learning Methods -- Parameter-based approaches (summary) A general framework: [Zhang and Yeung, 2010] [Saha etal, 2010]
  • 46. Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches Assumption: If tasks are related, they should share some good common features. Goal: Learn a low-dimensional representation shared across related tasks.
  • 47. Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.) [Argyriou etal., 2007]
  • 48. Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.) Illustration
  • 49. Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.)
  • 50. Inductive Transfer Learning Multi-Task Learning Methods -- Feature-based approaches (cont.) [Ando and Zhang, 2005] [Ji etal, 2008]
  • 51. Inductive Transfer Learning Instance-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Target-Task-Driven Transfer Learning Methods Self-Taught Learning Methods
  • 52. Inductive Transfer Learning Self-taught Learning Methods -- Feature-based approaches Motivation: There exist some higher-level features that can help the target learning task even only a few labeled data are given. Steps: 1, Learn higher-level features from a lot of unlabeled data from the source tasks. 2, Use the learned higher-level features to represent the data of the target task. 3, Training models from the new representations of the target task with corresponding labels.
  • 53. Inductive Transfer Learning Self-taught Learning Methods -- Feature-based approaches (cont.) Higher-level feature construction Solution 1: Sparse Coding [Raina etal., 2007] Solution 2: Deep learning [Glorot etal., 2011]
  • 54. Inductive Transfer Learning Target-Task-Driven Methods -- Instance-based approaches Intuition Assumption Part of the labeled data from the source domain can be reused after re-weighting Main Idea TrAdaBoost [Dai etal 2007] For each boosting iteration,  Use the same strategy as AdaBoost to update the weights of target domain data.  Propose a new mechanism to decrease the weights of misclassified source domain data.
  • 55. Summary Inductive Transfer Learning Tasks Single-Task Transfer Learning Identical Different Feature-based Transfer Learning Approaches Instance-based Transfer Learning Approaches Parameter-based Transfer Learning Approaches Feature-based Transfer Learning Approaches Instance-based Transfer Learning Approaches
  • 56. Some Research Issues  How to avoid negative transfer? Given a target domain/task, how to find source domains/tasks to ensure positive transfer.  Transfer learning meets active learning  Given a specific application, which kind of transfer learning methods should be used.
  • 57. Reference  [Thorndike and Woodworth, The Influence of Improvement in one mental function upon the efficiency of the other functions, 1901]  [Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009]  [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009]  [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]  [Biltzer etal.. Domain Adaptation with Structural Correspondence Learning, EMNLP 2006]  [Pan etal., Cross-Domain Sentiment Classification via Spectral Feature Alignment, WWW 2010]  [Pan etal., Transfer Learning via Dimensionality Reduction, AAAI 2008]
  • 58. Reference (cont.)  [Pan etal., Domain Adaptation via Transfer Component Analysis, IJCAI 2009]  [Evgeniou and Pontil, Regularized Multi-Task Learning, KDD 2004]  [Zhang and Yeung, A Convex Formulation for Learning Task Relationships in Multi-Task Learning, UAI 2010]  [Saha etal, Learning Multiple Tasks using Manifold Regularization, NIPS 2010]  [Argyriou etal., Multi-Task Feature Learning, NIPS 2007]  [Ando and Zhang, A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, JMLR 2005]  [Ji etal, Extracting Shared Subspace for Multi-label Classification, KDD 2008]
  • 59. Reference (cont.)  [Raina etal., Self-taught Learning: Transfer Learning from Unlabeled Data, ICML 2007]  [Dai etal., Boosting for Transfer Learning, ICML 2007]  [Glorot etal., Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach, ICML 2011]