1
Click to edit Master title style
On Transfer Learning Techniques for Machine Learning
Assistive Robotics Technology Laboratory
School of Electrical and Computer Engineering
Purdue University, West Lafayette, IN, USA
Debasmit Das
Advisory Committee
C. S. George Lee (Chair)
Stanley Chan
Guang Lin
Guang Cheng
2
Current ML Methods
[Canziani et al. ISCAS’17]
Evolution of Deep Architectures
INTRO
• Focus only on recognition performance.
• Highly resource intensive requiring lots of
labeled data, too much compute, memory
and energy.
• Cannot be deployed in resource constrained
environments – e.g. mobile devices, annotation
free novel environments, etc.
• Most of these models are closed set.
• Need efficient machine learning techniques.
3
Efficient ML Goals INTRO
Use less amount of
training data/labels
Produce models with
fewer parameters
Effects Effects
• Less training time.
• Less memory footprint.
(for data storage)
• Overall, less energy consumed.
• Less inference time.
• Less memory footprint.
(for model storage)
• Overall, less energy consumed.
• Data-efficient models imply model-efficient models but not the other way around.
Without sacrificing recognition performance
4
INTROLearning with less labels
Data-efficient
Learning
Transfer
Learning
Self-supervised
Learning
Generative
Learning
Learns to transfer knowledge
from data abundant source
domain to sparsely labeled
target domain. E.g. Domain
Adaptation, Few-shot learning Learn generative models to
generate synthetic data using
few-labeled and unlabeled
data. E.g. GANS, VAE
Define surrogate task from
unlabeled data to learn useful
features for a different task.
E.g. Predict Rotation, Relative
location
• Transfer Learning is similar to way humans learn from experience and apply to new situations.
(Focus of my thesis)
5
Transfer Learning (TL)
• Allows pre-trained machine learning models to
be adapted and applied to label-starved new
tasks and new domains.
• New tasks can be novel categories.
• New domain can be a novel variety of the same
category.
• Automatic Annotation : Reduces human effort
of labeling new domains/tasks.
• Faster Learning : Learning novel tasks from
less data prevents long training time.
• Data Efficiency : In some domains, obtaining
data is cumbersome. E.g. Medical tests, Robotics.
Added Benefits
INTRO
6
Transfer Learning Tasks
Transfer Learning
Domain
Adaptation
Small Sample
Learning
Unsupervised
Domain Adaptation
Semi-supervised
Domain Adaptation
Few-shot
Learning
Zero-shot
Learning
(Target Domain
Sparsely labeled)
(Same Categories) (Different Categories)
(Target Domain
Sparsely labeled)
(Target Domain
fully unlabeled)
(Target Domain
fully unlabeled)
Hypothesis
Transfer Learning
(Only source prototypes/
models available)
Completed
before prelim
Not included
in this thesis
UDA FSL ZSL
HTL
SSDA
INTRO
Completed
after prelim
7
Training and Testing conditions
UDA HTL
FSL ZSL
• Distribution discrepancy between training and testing
conditions.
• Testing data unlabeled but same categories as training. • Base (novel) categories contain models/prototypes
(few labeled data).
• Base categories used as training and novel categories
used for testing.
• Base categories used as training and novel categories
used for testing.
• Base categories contain abundant labeled data.
Novel categories contain few labeled data.
• Base categories used as training and novel categories
used for testing.
• Base (novel) categories contain abundant labeled
(unlabeled) data. Class-level semantic information
available.
INTRO
8
Graphs/Hyper-graphs Manifolds
Neural Networks
Proposed Approach
• Structural Priors constructed from source domain data.
• These priors learn a structure i.e. an encoding between different data entities.
• Structural Priors extract relational information and enable better transfer learning.
Problem Structure Entity
Unsupervised Domain
Adaptation (UDA)
Graphs and
Hyper-graphs
Sample - Sample
Few Shot Learning (FSL) Neural Network Sample - Class Prototype
Hypothesis Transfer
Learning (HTL)
Manifold Class Prototype - Class
Prototype
Zero Shot Learning (ZSL) Neural Network Sample - Semantics
INTRO
9
Publications
Unsupervised Domain Adaptation
Zero Shot Learning
Few Shot Learning
[J1] Debasmit, Das, and C. S. George Lee. "Sample-to-sample correspondence for unsupervised domain adaptation."
Engineering Applications of Artificial Intelligence (EAAI) (73) (2018): 80-91.
[C1] Debasmit, Das, and C. S. George Lee. "Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain
Adaptation." Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2018, pp. 342-352.
[C2] Debasmit, Das, and C. S. George Lee. "Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching."
Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 3758-3762.
[J3] Debasmit, Das, and C. S. George Lee. “A Constrained Generative Approach to Zero-shot Object Recognition." Under
review at IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
[C3] Debasmit, Das, and C. S. George Lee. "Zero-shot Image Recognition Using Relational Matching, Adaptation and
Calibration," Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2019.
[J2] Debasmit, Das, and C. S. George Lee. "A Two-Stage Approach to Few-Shot Learning for Image Recognition." IEEE
Transactions on Image Processing (TIP), 2020.
Hypothesis Transfer Learning
[P1] Debasmit, Das, J.H. Moon and C. S. George Lee. “Parametric and Non-parametric approach to Few-shot Learning." To
be submitted.
INTRO
10
Approach Overview
Method Algorithm Architecture Parametric Non-
parametric
Main Concept
J1 [EAAI]     Graph Matching
C1 [ICANN]     Deep Graph Matching
C2 [ICIP]     Hyper-graph Matching
J2 [TIP]     Predictive Statistics
P1     Manifold Projection
J3 [TNNLS]     Constrained Generation
C3 [IJCNN]     Constrained Embedding
HTL FSL ZSLUDA
INTRO
11
Unsupervised Domain Adaptation (UDA)
OBJECTIVE
MOTIVATION
• Local information more useful.
• Structural information preserved
across domains.
• Pseudo-labels refine classifiers.
RESULTS
APPROACH
• Minimize domain discrepancy.
• Labeled source and unlabeled
target.
• Transform source to target.
Create maximum
margin classifier
• Better but slower than global methods.
• Representation learning slower but better.
• Third order matching better than second order
matching.
Graph matching to
minimize domain discrepancy
Source
Domain
Target
Domain
Class 2
Class 3
Sample 1
Sample 2
Sample 3
INTRO
Class 1
12
Few-Shot Learning (FSL)
OBJECTIVE
MOTIVATION
• Curse of dimensionality causes over-fitting.
• Ill-sampling produces incorrect prototype
estimation.
• Uncertain variance causes misclassification.
RESULTS
APPROACH
• Competitive results with respect to previous work.
• Relative feature extractor most effective.
• More discriminative feature space.
• Estimate Novel Class Prototypes.
• Labeled source and sparsely labeled
target.
• Extract prior from source.
INTRO
13
Hypothesis Transfer Learning (HTL)
OBJECTIVE
MOTIVATION
• Estimate Novel Class Prototypes.
• Source prototypes and sparsely
labeled target
• Extract prior from source.
• Neural network prior over-fits to less data
• Use non-parametric method using manifolds
• Or, simple parametric methods like Bayes.
RESULTS
APPROACH
• Manifold approach better than Bayesian approach.
• Both the approaches most effective in few-class regime.
• Closed-form Bayes method better than approximation
methods.
INTRO
Base class prototype
Novel class prototype
(Unknown)
Novel class sample
Transferrable Knowledge
Novel Knowledge
• Base prototypes used
for manifold construction.
• Base prototypes used for prior.
• Novel sample projected
on manifold.
• Novel sample used for
likelihood calculation.
OR
OR
14
Zero-Shot Learning (ZSL)
OBJECTIVE
MOTIVATION
• Recognize novel categories with zero
labeled data.
• Labeled source and unlabeled
target.
• Relate feature and semantics.
• Nearest Neighbor predictions produce hubs.
• Domain shift between predictions and
ground truth.
• Predictions biased towards seen classes.
RESULTS
APPROACH
• Generative approach performs better than embedding
approach.
• Domain adaptation is the most effective
• Structural Matching improves generalization.
• Discrimination between seen and unseen classes prevent
biasness.
INTRO
Embed
Generate
Feature
Space
Semantic
Space
Unlabeled
Test Data
Domain Adaptation
Generated/
Embedded data
15
Impact of proposed approaches
UDA [J1, C1, C2] HTL [P1]
FSL [J2] ZSL [J3, C3]
Core Idea : Match distribution
Using graphs/hyper-graphs.
Impact : Generative Models,
Anomaly detection.
Core Idea : Discriminative
Low-dimensional space and
generating statistics.
Impact : Discriminative/
Generative Learning.
Core Idea : Adaptive matching
between features and semantics.
Impact : Media Retrieval,
Description generation.
Core Idea : Estimate novel
class-prototype.
Impact : Manifold distance
Metric.
INTRO
16
Click to edit Master title style
BEFORE PRELIM
17
Graph Matching UDA
• Previous UDA methods use global information
to minimize domain discrepancy.
• Local information useful but can cause
ambiguous one-to-one matching.
• Therefore structural matching in the form of
higher-order information is proposed.
Motivation
Source Domain Class 1 sample
Source Domain Class 2 sample
Target domain sample
UDA
18
Proposed Approach
Matching Formulation
Method 1
• Set
• Uses Conditional
Gradient Descent
+ Network
Simplex for
optimization.
Method 2
• Initial preprocessing to
obtain exemplars.
• Uses Conditional
Gradient Descent
+ ADMM for
optimization.
Method 3
• Set
• Learn features as
well as matching.
• Optimization
using stochastic
gradient descent.
• Second refining stage to
obtain maximum margin
classifier.
UDA
Source Domain
Target Domain
19
Experimental Results
Ablation Studies
Without Adaptation
With Graph Matching
With Graph Matching
& Pseudo-labeling
UDA
Dataset: Office-Caltech
20
Two Stage FSL
• Curse of dimensionality: Addressed by using
a new representation that uses relative distances
between features.
• Ill-sampling of data: The novel class prototype is
estimated by learning a model that predicts the
mean.
• Uncertain Class Variance: The novel class
variance is estimated by learning a model that
predicts the variance.
Motivation
Feature Space
FSL
21
Proposed Approach
Relative Feature Probability with absolute
& relative features
Loss Function
FSL
22
Experimental Results
Ablation Study
PN – Prototypical Network V – Variance Estimator R – Relative Feature T – Category-agnostic Transformer
Feature visualization
without (left) and
with (right) relative
features
Dataset: MiniImageNet
Performance change with no. of base
categories
FSL
23
Embedding-based ZSL
Feature Space
Semantic Space
• Hubness problem : Addressed using pairwise
structural matching.
• Domain Shift Problem : Addressed using our
sample-to-sample correspondence approach.
• Seen Class Biasness Problem : Addressed using
scaled calibration mechanism.
Motivation
ZSL
Each semantic vector of a class is a
histogram of attributes.
24
Proposed Approach
Relational Matching Calibration
Minimize loss using gradient descent
Seen
Unseen
Total
We use sample-to-sample
matching for domain
adaptation
ZSL
25
Experimental Results
Hubness Measurement
Hubness measured using skewness of NN
prediction distribution
Effect of the
calibration factor
Effect of the
structural matching
weight
Without Domain
Adaptation
With Domain
AdaptationUnseen Features
Seen Features
Unseen Semantic
Embedding
Seen Semantic
Embedding
ZSL
26
Click to edit Master title style
AFTER PRELIM
27
Generative ZSL
Feature Space
Semantic Space
• Base Categories (source domain) contain
abundant labeled data.
• Novel Categories (target domain) contain
unlabeled data.
• However, class level semantic information
available for all categories.
• Need to relate the feature space and space.
ZSL
Each semantic vector of a class is a
histogram of attributes.
28
ZSLMotivation
• Relation between semantic and feature space is
biased towards seen classes because of no training
data for unseen classes.
• As a result, generative methods have been
proposed to generate data for unseen classes.
• However, the generative model itself maybe biased
towards seen classes.
• This is because no labeled data from unseen
classes used for learning the generative model.
• Need to constrain the generation process such that
seen and unseen classes distinguished from one
another.
• Need to also close a cycle between semantic and
feature space to preserve semantic consistency.
Base class semantic
embedding
Novel class semantic
embedding
Novel class
test sample
29
Constrained Training
• Discriminator is used to discriminate synthetic
unseen data and real seen data.
• The generated features are also reconstructed
back to their corresponding semantic descriptor.
Generator
Critic Fake or
Real ?
Reconstruct
Reconstruction Loss
Seen or
Unseen ?
D
Noise
Semantics
Real Feature
Fake Feature
Unseen Class
Fake Feature
Seen Class Real Feature D Discriminator of
seen and unseen class
C
D
R
ZSL
30
Selective Domain Adaptation
• Discriminator is used to separate test data into
seen and unseen from which the unseen are
selected.
• The generated data of the unseen classes are
adapted with respect to the selected unseen test
data.
DUnlabeled
Test Data
Unlabeled Seen
Test Data
Unlabeled Unseen
Test DataGenerated
Unseen Data
Sample-to-Sample
Domain Adaptation
Generator
Unseen Class Semantics
Reconstructed Semantics
Matching
Regularization
ZSL
Threshold used
>
31
Comparative Analysis
tr – Unseen class accuracy in traditional setting
u – Unseen class accuracy in generalized setting
s – Seen class accuracy in generalized setting
H – Harmonic mean of u and s
• Animals with Attributes (AwA)
[Lampert et al. TPAMI’14]
(Att – 85, Ysrc - 40 , Ytar - 10 )
• Pascal & Yahoo (aPY)
[Farhadi et al. CVPR’09]
(Att – 64, Ysrc - 20 , Ytar - 12 )
• Caltech-UCSD Birds (CUB)
[Welinder et al. ‘10]
(Att – 312, Ysrc - 150 , Ytar - 50 )
• Scene Understanding (SUN)
[Patterson et al. CVPR’12]
(Att – 102, Ysrc - 645, Ytar - 72 )
• Flowers Dataset (FLO)
[Nilsback et al. ICVGIP’08]
(Att – 1024, Ysrc - 82, Ytar - 20 )
Datasets
ZSL
32
ZSLFurther Analyses
Generated novel class
features without
domain adaptation
on the AwA dataset
Generated novel class
features with
domain adaptation
on the AwA dataset
Convergence study
with increasing epochs
on AWA dataset
Ablation study
B - WGAN Baseline
R - Reconstructor
D - Discriminator
A – Domain Adaptation
33
Sensitivity Studies
Sensitivity to threshold
Sensitivity to number of generated
features on FLO dataset
ZSL
34
Conclusion
• Results on standard image recognition datasets better than most of the
previous generative and non-generative approaches.
• Ablation studies shows that all of the contributions are important but
domain adaptation is the most effective.
• Results on fine-grained datasets show that there is lot of scope for
improvement.
• Need to involve fine-grained learning architectures into our framework.
• Can explore using the same set of architectural constraints on other
generative models like Variational Auto-encoders, Normalizing Flows etc.
ZSL
35
Hypothesis Transfer Learning (HTL)
Feature Space
• No access to base categories (source domain)
data.
• Only high-level information about source
categories available. E.g. Model parameters, class
prototypes etc.
• Novel Categories (target domain) contain
sparsely labeled data.
• Need to estimate the location of the novel class
parameters.
HTL
36
HTLMotivation
Relatively unexplored Topic. Constrained Target Models to
be some combination of source models.
• Linear Combination [Tommasi et al. TPAMI’14]
• Non-Linear Combination [Jie et al. ICCV’11]
• Feature Selection [Kurborskij et al. CVIU’17]
Source Model 1
Source Model 2
Source Model 3
Target Model
• Source models not consistent. Does not provide
reliable benchmark for comparison.
• Source class prototypes can produce reliable
benchmark because of data-dependency only.
Source
Prototype 1
Source Prototype 2
Source
Prototype 3
Target Prototype
37
Proposed Solution HTL
• Limited Information from source cannot be used for training neural networks
because of over-fitting.
• Need a non-parametric method or a parametric method with minimal parameters.
Manifold Approach Bayesian Approach
• Non-parametric approach.
• Inspired from the assumption that class
data lies on subspace [Basri et al. TPAMI’03].
• Construct manifold from source prototypes.
• Project novel class samples to obtain novel class
prototype.
• Parametric approach.
• Inspired from the assumption that class
data belongs to a model family.
• Construct prior distribution from source
prototypes and likelihood from novel class
samples.
• Posterior distribution used to obtain novel class
prototype.
38
Manifold Approach HTL
Predict using absorbing Markov chain (M2)Estimate novel class prototypes (M1)
Estimate novel class prototype as
Obtain manifold
mean. Closed form solution
exists.
• Choose novel class as transient and base class
as absorbing. Obtain most probable base class.
• Choose base class as transient and novel class
as absorbing. Obtain most probable novel class.
• Among the most probable base class and most
probable novel class, use nearest neighbor to
distinguish.
39
Bayesian Approach HTL
Prior distribution is
varied but
likelihood is fixed
Normal Prior on Mean Variance is fixed but
obtained heuristically
from source prototypes
Posterior density is normal with mean
Normal-Gamma Prior on Mean & Precision
Posterior density is normal-gamma with mode
Normal prior on Mean and Gamma prior on Precision
Gamma prior turns to Uniform prior when
Posterior distribution not closed-form.
Requires Variational-Bayes Approximation.
Case 1 (B1) Case 3 (B4)
Case 2
(B3)
(B2)
40
Experimental Results HTL
Recognition performance on ImageNet
as shots are varied
Recognition performance on CUB-200
as shots are varied
Recognition performance on ImageNet
as total no. of classes are varied
Recognition performance on ImageNet
as fraction of base classes are varied
41
Further Analyses HTL
For Model
B4
42
Conclusion
• Manifold approach performs better than Bayesian approach in few-shot and
few-class regime.
• Both the approaches are the most effective in few-class regime, that is when
the number of base classes is less.
• Closed-form Bayes method perform better than approximation methods.
• Markov-chain-based distance has incremental effect and can be used as a
metric during the training stage.
• Explore other priors and hyper-priors for the Bayesian model.
HTL
43
Research Summary
MOTIVATION APPROACH
RESULTS
• Current ML methods consume lots of resources.
• Goal is to make ML more data-efficient.
• TL simulates human learning by reusing models.
• Competitive when compared with previous work.
• Ablation studies show importance of each component.
• Need to improve for fine-grained datasets.
Graphs/Hyper-graphs
for UDA
Manifolds for HTL
Neural Networks
for FSL
Neural Networks
for ZSL
Sample
Prototype
Semantics
• Structural Prior. • Additional constraints
and post-processing.
44
Limitations of Current Work
• Abundant Source Data: Proposed transfer learning
approaches mostly require abundant labeled data from
source domain.
• Batched Target: Target domain data available as a batch
instead of in a sequential incremental manner.
• Same Modalities: Source and target domains have same
feature spaces and cannot have different modalities.
• Hence, there is a need to tackle more realistic transfer
learning settings.
45
Future Direction
Unsupervised Transfer Learning
Source
Target
Sequential Transfer Learning Heterogeneous Transfer Learning
Source Target
Source Target 1 Target 2 Target 3
• Use Self-supervised learning.
• Clustering for pre-processing.
• Need to develop incremental algorithms.
• Correspondence can be time-dependent.
• Need common subspace for two
domains.
• Metric minimization for subspace
required.
46
THANK YOU
Any Questions ?

More Related Content

PDF
Introduction to Neural Networks in Tensorflow
PPTX
GANs Presentation.pptx
PPTX
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
PDF
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
PPTX
ConvNeXt: A ConvNet for the 2020s explained
PDF
Transfer defect learning
PPTX
Introduction to PyTorch
PDF
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Neural Networks in Tensorflow
GANs Presentation.pptx
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
ConvNeXt: A ConvNet for the 2020s explained
Transfer defect learning
Introduction to PyTorch
Introduction to Deep Learning, Keras, and TensorFlow

What's hot (20)

PDF
Graph neural networks overview
PPTX
INTRODUCTION TO NLP, RNN, LSTM, GRU
PDF
Case Study of Convolutional Neural Network
PPTX
ニューラルチューリングマシン入門
PPT
Uml diagrama de atividades
PDF
An Introduction to Neural Architecture Search
PDF
BASIC ICU & OT NOTES 2021 (VERSION 3) FINAL.pdf
PDF
【ECCV 2018】Implicit 3D Orientation Learning for 6D Object Detection from RGB ...
PDF
Use case of rpa
PPTX
Deep learning with tensorflow
PDF
[PR12] intro. to gans jaejun yoo
PDF
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
PPTX
Isolation forest
PDF
[기초개념] Graph Convolutional Network (GCN)
PDF
Session-based recommendations with recurrent neural networks
PDF
Maximum Entropy Inverse reinforcement learning
PPTX
GCNによる取引関係グラフからの企業の特徴量抽出
PDF
Introduction to Machine Learning
PDF
MLP-Mixer: An all-MLP Architecture for Vision
PDF
Deep Q-Learning
Graph neural networks overview
INTRODUCTION TO NLP, RNN, LSTM, GRU
Case Study of Convolutional Neural Network
ニューラルチューリングマシン入門
Uml diagrama de atividades
An Introduction to Neural Architecture Search
BASIC ICU & OT NOTES 2021 (VERSION 3) FINAL.pdf
【ECCV 2018】Implicit 3D Orientation Learning for 6D Object Detection from RGB ...
Use case of rpa
Deep learning with tensorflow
[PR12] intro. to gans jaejun yoo
[DL輪読会]BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Isolation forest
[기초개념] Graph Convolutional Network (GCN)
Session-based recommendations with recurrent neural networks
Maximum Entropy Inverse reinforcement learning
GCNによる取引関係グラフからの企業の特徴量抽出
Introduction to Machine Learning
MLP-Mixer: An all-MLP Architecture for Vision
Deep Q-Learning
Ad

Similar to PhD Defense Slides (20)

PDF
Preliminary Exam Slides
PPTX
Towards a Comprehensive Machine Learning Benchmark
PPTX
Novel Optimized Models for Deep Learning
PDF
Artificial Intelligence for Automated Software Testing
PDF
Introduction to Data Science
PDF
Unit4_AML_MTech that has many ML concepts covered
PPT
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
PDF
ML DL AI DS BD - An Introduction
PDF
Maximizing the Representation Gap between In-domain & OOD examples
PDF
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
PDF
PEARC17:A real-time machine learning and visualization framework for scientif...
PDF
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
PDF
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PDF
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
PPTX
Sim-to-Real Transfer in Deep Reinforcement Learning
PDF
Icann2018ppt final
PDF
Generation of Random EMF Models for Benchmarks
PPTX
Everything you need to know about AutoML
PPTX
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Preliminary Exam Slides
Towards a Comprehensive Machine Learning Benchmark
Novel Optimized Models for Deep Learning
Artificial Intelligence for Automated Software Testing
Introduction to Data Science
Unit4_AML_MTech that has many ML concepts covered
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
ML DL AI DS BD - An Introduction
Maximizing the Representation Gap between In-domain & OOD examples
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
PEARC17:A real-time machine learning and visualization framework for scientif...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
DEF CON 24 - Clarence Chio - machine duping 101
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Sim-to-Real Transfer in Deep Reinforcement Learning
Icann2018ppt final
Generation of Random EMF Models for Benchmarks
Everything you need to know about AutoML
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Ad

More from Debasmit Das (9)

PPTX
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
PDF
Icip2018posterv3
PPTX
Recursive Oscillators
PPTX
Surface Plasmon Resonance
PPTX
IIT Roorkee Motorsports (IITRMS)
PPT
Pseudo Spectral Optimal Control for Coverage Path Planning
PPTX
Intention Inference
PPTX
Graph Matching Unsupervised Domain Adaptation
PPT
Resistive Sensors
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Icip2018posterv3
Recursive Oscillators
Surface Plasmon Resonance
IIT Roorkee Motorsports (IITRMS)
Pseudo Spectral Optimal Control for Coverage Path Planning
Intention Inference
Graph Matching Unsupervised Domain Adaptation
Resistive Sensors

Recently uploaded (20)

PPTX
chrmotography.pptx food anaylysis techni
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Steganography Project Steganography Project .pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPT
statistic analysis for study - data collection
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
chrmotography.pptx food anaylysis techni
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Steganography Project Steganography Project .pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Optimise Shopper Experiences with a Strong Data Estate.pdf
statistic analysis for study - data collection
Pilar Kemerdekaan dan Identi Bangsa.pptx
Navigating the Thai Supplements Landscape.pdf
retention in jsjsksksksnbsndjddjdnFPD.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction
A Complete Guide to Streamlining Business Processes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
SET 1 Compulsory MNH machine learning intro
STERILIZATION AND DISINFECTION-1.ppthhhbx

PhD Defense Slides

  • 1. 1 Click to edit Master title style On Transfer Learning Techniques for Machine Learning Assistive Robotics Technology Laboratory School of Electrical and Computer Engineering Purdue University, West Lafayette, IN, USA Debasmit Das Advisory Committee C. S. George Lee (Chair) Stanley Chan Guang Lin Guang Cheng
  • 2. 2 Current ML Methods [Canziani et al. ISCAS’17] Evolution of Deep Architectures INTRO • Focus only on recognition performance. • Highly resource intensive requiring lots of labeled data, too much compute, memory and energy. • Cannot be deployed in resource constrained environments – e.g. mobile devices, annotation free novel environments, etc. • Most of these models are closed set. • Need efficient machine learning techniques.
  • 3. 3 Efficient ML Goals INTRO Use less amount of training data/labels Produce models with fewer parameters Effects Effects • Less training time. • Less memory footprint. (for data storage) • Overall, less energy consumed. • Less inference time. • Less memory footprint. (for model storage) • Overall, less energy consumed. • Data-efficient models imply model-efficient models but not the other way around. Without sacrificing recognition performance
  • 4. 4 INTROLearning with less labels Data-efficient Learning Transfer Learning Self-supervised Learning Generative Learning Learns to transfer knowledge from data abundant source domain to sparsely labeled target domain. E.g. Domain Adaptation, Few-shot learning Learn generative models to generate synthetic data using few-labeled and unlabeled data. E.g. GANS, VAE Define surrogate task from unlabeled data to learn useful features for a different task. E.g. Predict Rotation, Relative location • Transfer Learning is similar to way humans learn from experience and apply to new situations. (Focus of my thesis)
  • 5. 5 Transfer Learning (TL) • Allows pre-trained machine learning models to be adapted and applied to label-starved new tasks and new domains. • New tasks can be novel categories. • New domain can be a novel variety of the same category. • Automatic Annotation : Reduces human effort of labeling new domains/tasks. • Faster Learning : Learning novel tasks from less data prevents long training time. • Data Efficiency : In some domains, obtaining data is cumbersome. E.g. Medical tests, Robotics. Added Benefits INTRO
  • 6. 6 Transfer Learning Tasks Transfer Learning Domain Adaptation Small Sample Learning Unsupervised Domain Adaptation Semi-supervised Domain Adaptation Few-shot Learning Zero-shot Learning (Target Domain Sparsely labeled) (Same Categories) (Different Categories) (Target Domain Sparsely labeled) (Target Domain fully unlabeled) (Target Domain fully unlabeled) Hypothesis Transfer Learning (Only source prototypes/ models available) Completed before prelim Not included in this thesis UDA FSL ZSL HTL SSDA INTRO Completed after prelim
  • 7. 7 Training and Testing conditions UDA HTL FSL ZSL • Distribution discrepancy between training and testing conditions. • Testing data unlabeled but same categories as training. • Base (novel) categories contain models/prototypes (few labeled data). • Base categories used as training and novel categories used for testing. • Base categories used as training and novel categories used for testing. • Base categories contain abundant labeled data. Novel categories contain few labeled data. • Base categories used as training and novel categories used for testing. • Base (novel) categories contain abundant labeled (unlabeled) data. Class-level semantic information available. INTRO
  • 8. 8 Graphs/Hyper-graphs Manifolds Neural Networks Proposed Approach • Structural Priors constructed from source domain data. • These priors learn a structure i.e. an encoding between different data entities. • Structural Priors extract relational information and enable better transfer learning. Problem Structure Entity Unsupervised Domain Adaptation (UDA) Graphs and Hyper-graphs Sample - Sample Few Shot Learning (FSL) Neural Network Sample - Class Prototype Hypothesis Transfer Learning (HTL) Manifold Class Prototype - Class Prototype Zero Shot Learning (ZSL) Neural Network Sample - Semantics INTRO
  • 9. 9 Publications Unsupervised Domain Adaptation Zero Shot Learning Few Shot Learning [J1] Debasmit, Das, and C. S. George Lee. "Sample-to-sample correspondence for unsupervised domain adaptation." Engineering Applications of Artificial Intelligence (EAAI) (73) (2018): 80-91. [C1] Debasmit, Das, and C. S. George Lee. "Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain Adaptation." Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2018, pp. 342-352. [C2] Debasmit, Das, and C. S. George Lee. "Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching." Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 3758-3762. [J3] Debasmit, Das, and C. S. George Lee. “A Constrained Generative Approach to Zero-shot Object Recognition." Under review at IEEE Transactions on Neural Networks and Learning Systems (TNNLS). [C3] Debasmit, Das, and C. S. George Lee. "Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration," Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2019. [J2] Debasmit, Das, and C. S. George Lee. "A Two-Stage Approach to Few-Shot Learning for Image Recognition." IEEE Transactions on Image Processing (TIP), 2020. Hypothesis Transfer Learning [P1] Debasmit, Das, J.H. Moon and C. S. George Lee. “Parametric and Non-parametric approach to Few-shot Learning." To be submitted. INTRO
  • 10. 10 Approach Overview Method Algorithm Architecture Parametric Non- parametric Main Concept J1 [EAAI]     Graph Matching C1 [ICANN]     Deep Graph Matching C2 [ICIP]     Hyper-graph Matching J2 [TIP]     Predictive Statistics P1     Manifold Projection J3 [TNNLS]     Constrained Generation C3 [IJCNN]     Constrained Embedding HTL FSL ZSLUDA INTRO
  • 11. 11 Unsupervised Domain Adaptation (UDA) OBJECTIVE MOTIVATION • Local information more useful. • Structural information preserved across domains. • Pseudo-labels refine classifiers. RESULTS APPROACH • Minimize domain discrepancy. • Labeled source and unlabeled target. • Transform source to target. Create maximum margin classifier • Better but slower than global methods. • Representation learning slower but better. • Third order matching better than second order matching. Graph matching to minimize domain discrepancy Source Domain Target Domain Class 2 Class 3 Sample 1 Sample 2 Sample 3 INTRO Class 1
  • 12. 12 Few-Shot Learning (FSL) OBJECTIVE MOTIVATION • Curse of dimensionality causes over-fitting. • Ill-sampling produces incorrect prototype estimation. • Uncertain variance causes misclassification. RESULTS APPROACH • Competitive results with respect to previous work. • Relative feature extractor most effective. • More discriminative feature space. • Estimate Novel Class Prototypes. • Labeled source and sparsely labeled target. • Extract prior from source. INTRO
  • 13. 13 Hypothesis Transfer Learning (HTL) OBJECTIVE MOTIVATION • Estimate Novel Class Prototypes. • Source prototypes and sparsely labeled target • Extract prior from source. • Neural network prior over-fits to less data • Use non-parametric method using manifolds • Or, simple parametric methods like Bayes. RESULTS APPROACH • Manifold approach better than Bayesian approach. • Both the approaches most effective in few-class regime. • Closed-form Bayes method better than approximation methods. INTRO Base class prototype Novel class prototype (Unknown) Novel class sample Transferrable Knowledge Novel Knowledge • Base prototypes used for manifold construction. • Base prototypes used for prior. • Novel sample projected on manifold. • Novel sample used for likelihood calculation. OR OR
  • 14. 14 Zero-Shot Learning (ZSL) OBJECTIVE MOTIVATION • Recognize novel categories with zero labeled data. • Labeled source and unlabeled target. • Relate feature and semantics. • Nearest Neighbor predictions produce hubs. • Domain shift between predictions and ground truth. • Predictions biased towards seen classes. RESULTS APPROACH • Generative approach performs better than embedding approach. • Domain adaptation is the most effective • Structural Matching improves generalization. • Discrimination between seen and unseen classes prevent biasness. INTRO Embed Generate Feature Space Semantic Space Unlabeled Test Data Domain Adaptation Generated/ Embedded data
  • 15. 15 Impact of proposed approaches UDA [J1, C1, C2] HTL [P1] FSL [J2] ZSL [J3, C3] Core Idea : Match distribution Using graphs/hyper-graphs. Impact : Generative Models, Anomaly detection. Core Idea : Discriminative Low-dimensional space and generating statistics. Impact : Discriminative/ Generative Learning. Core Idea : Adaptive matching between features and semantics. Impact : Media Retrieval, Description generation. Core Idea : Estimate novel class-prototype. Impact : Manifold distance Metric. INTRO
  • 16. 16 Click to edit Master title style BEFORE PRELIM
  • 17. 17 Graph Matching UDA • Previous UDA methods use global information to minimize domain discrepancy. • Local information useful but can cause ambiguous one-to-one matching. • Therefore structural matching in the form of higher-order information is proposed. Motivation Source Domain Class 1 sample Source Domain Class 2 sample Target domain sample UDA
  • 18. 18 Proposed Approach Matching Formulation Method 1 • Set • Uses Conditional Gradient Descent + Network Simplex for optimization. Method 2 • Initial preprocessing to obtain exemplars. • Uses Conditional Gradient Descent + ADMM for optimization. Method 3 • Set • Learn features as well as matching. • Optimization using stochastic gradient descent. • Second refining stage to obtain maximum margin classifier. UDA Source Domain Target Domain
  • 19. 19 Experimental Results Ablation Studies Without Adaptation With Graph Matching With Graph Matching & Pseudo-labeling UDA Dataset: Office-Caltech
  • 20. 20 Two Stage FSL • Curse of dimensionality: Addressed by using a new representation that uses relative distances between features. • Ill-sampling of data: The novel class prototype is estimated by learning a model that predicts the mean. • Uncertain Class Variance: The novel class variance is estimated by learning a model that predicts the variance. Motivation Feature Space FSL
  • 21. 21 Proposed Approach Relative Feature Probability with absolute & relative features Loss Function FSL
  • 22. 22 Experimental Results Ablation Study PN – Prototypical Network V – Variance Estimator R – Relative Feature T – Category-agnostic Transformer Feature visualization without (left) and with (right) relative features Dataset: MiniImageNet Performance change with no. of base categories FSL
  • 23. 23 Embedding-based ZSL Feature Space Semantic Space • Hubness problem : Addressed using pairwise structural matching. • Domain Shift Problem : Addressed using our sample-to-sample correspondence approach. • Seen Class Biasness Problem : Addressed using scaled calibration mechanism. Motivation ZSL Each semantic vector of a class is a histogram of attributes.
  • 24. 24 Proposed Approach Relational Matching Calibration Minimize loss using gradient descent Seen Unseen Total We use sample-to-sample matching for domain adaptation ZSL
  • 25. 25 Experimental Results Hubness Measurement Hubness measured using skewness of NN prediction distribution Effect of the calibration factor Effect of the structural matching weight Without Domain Adaptation With Domain AdaptationUnseen Features Seen Features Unseen Semantic Embedding Seen Semantic Embedding ZSL
  • 26. 26 Click to edit Master title style AFTER PRELIM
  • 27. 27 Generative ZSL Feature Space Semantic Space • Base Categories (source domain) contain abundant labeled data. • Novel Categories (target domain) contain unlabeled data. • However, class level semantic information available for all categories. • Need to relate the feature space and space. ZSL Each semantic vector of a class is a histogram of attributes.
  • 28. 28 ZSLMotivation • Relation between semantic and feature space is biased towards seen classes because of no training data for unseen classes. • As a result, generative methods have been proposed to generate data for unseen classes. • However, the generative model itself maybe biased towards seen classes. • This is because no labeled data from unseen classes used for learning the generative model. • Need to constrain the generation process such that seen and unseen classes distinguished from one another. • Need to also close a cycle between semantic and feature space to preserve semantic consistency. Base class semantic embedding Novel class semantic embedding Novel class test sample
  • 29. 29 Constrained Training • Discriminator is used to discriminate synthetic unseen data and real seen data. • The generated features are also reconstructed back to their corresponding semantic descriptor. Generator Critic Fake or Real ? Reconstruct Reconstruction Loss Seen or Unseen ? D Noise Semantics Real Feature Fake Feature Unseen Class Fake Feature Seen Class Real Feature D Discriminator of seen and unseen class C D R ZSL
  • 30. 30 Selective Domain Adaptation • Discriminator is used to separate test data into seen and unseen from which the unseen are selected. • The generated data of the unseen classes are adapted with respect to the selected unseen test data. DUnlabeled Test Data Unlabeled Seen Test Data Unlabeled Unseen Test DataGenerated Unseen Data Sample-to-Sample Domain Adaptation Generator Unseen Class Semantics Reconstructed Semantics Matching Regularization ZSL Threshold used >
  • 31. 31 Comparative Analysis tr – Unseen class accuracy in traditional setting u – Unseen class accuracy in generalized setting s – Seen class accuracy in generalized setting H – Harmonic mean of u and s • Animals with Attributes (AwA) [Lampert et al. TPAMI’14] (Att – 85, Ysrc - 40 , Ytar - 10 ) • Pascal & Yahoo (aPY) [Farhadi et al. CVPR’09] (Att – 64, Ysrc - 20 , Ytar - 12 ) • Caltech-UCSD Birds (CUB) [Welinder et al. ‘10] (Att – 312, Ysrc - 150 , Ytar - 50 ) • Scene Understanding (SUN) [Patterson et al. CVPR’12] (Att – 102, Ysrc - 645, Ytar - 72 ) • Flowers Dataset (FLO) [Nilsback et al. ICVGIP’08] (Att – 1024, Ysrc - 82, Ytar - 20 ) Datasets ZSL
  • 32. 32 ZSLFurther Analyses Generated novel class features without domain adaptation on the AwA dataset Generated novel class features with domain adaptation on the AwA dataset Convergence study with increasing epochs on AWA dataset Ablation study B - WGAN Baseline R - Reconstructor D - Discriminator A – Domain Adaptation
  • 33. 33 Sensitivity Studies Sensitivity to threshold Sensitivity to number of generated features on FLO dataset ZSL
  • 34. 34 Conclusion • Results on standard image recognition datasets better than most of the previous generative and non-generative approaches. • Ablation studies shows that all of the contributions are important but domain adaptation is the most effective. • Results on fine-grained datasets show that there is lot of scope for improvement. • Need to involve fine-grained learning architectures into our framework. • Can explore using the same set of architectural constraints on other generative models like Variational Auto-encoders, Normalizing Flows etc. ZSL
  • 35. 35 Hypothesis Transfer Learning (HTL) Feature Space • No access to base categories (source domain) data. • Only high-level information about source categories available. E.g. Model parameters, class prototypes etc. • Novel Categories (target domain) contain sparsely labeled data. • Need to estimate the location of the novel class parameters. HTL
  • 36. 36 HTLMotivation Relatively unexplored Topic. Constrained Target Models to be some combination of source models. • Linear Combination [Tommasi et al. TPAMI’14] • Non-Linear Combination [Jie et al. ICCV’11] • Feature Selection [Kurborskij et al. CVIU’17] Source Model 1 Source Model 2 Source Model 3 Target Model • Source models not consistent. Does not provide reliable benchmark for comparison. • Source class prototypes can produce reliable benchmark because of data-dependency only. Source Prototype 1 Source Prototype 2 Source Prototype 3 Target Prototype
  • 37. 37 Proposed Solution HTL • Limited Information from source cannot be used for training neural networks because of over-fitting. • Need a non-parametric method or a parametric method with minimal parameters. Manifold Approach Bayesian Approach • Non-parametric approach. • Inspired from the assumption that class data lies on subspace [Basri et al. TPAMI’03]. • Construct manifold from source prototypes. • Project novel class samples to obtain novel class prototype. • Parametric approach. • Inspired from the assumption that class data belongs to a model family. • Construct prior distribution from source prototypes and likelihood from novel class samples. • Posterior distribution used to obtain novel class prototype.
  • 38. 38 Manifold Approach HTL Predict using absorbing Markov chain (M2)Estimate novel class prototypes (M1) Estimate novel class prototype as Obtain manifold mean. Closed form solution exists. • Choose novel class as transient and base class as absorbing. Obtain most probable base class. • Choose base class as transient and novel class as absorbing. Obtain most probable novel class. • Among the most probable base class and most probable novel class, use nearest neighbor to distinguish.
  • 39. 39 Bayesian Approach HTL Prior distribution is varied but likelihood is fixed Normal Prior on Mean Variance is fixed but obtained heuristically from source prototypes Posterior density is normal with mean Normal-Gamma Prior on Mean & Precision Posterior density is normal-gamma with mode Normal prior on Mean and Gamma prior on Precision Gamma prior turns to Uniform prior when Posterior distribution not closed-form. Requires Variational-Bayes Approximation. Case 1 (B1) Case 3 (B4) Case 2 (B3) (B2)
  • 40. 40 Experimental Results HTL Recognition performance on ImageNet as shots are varied Recognition performance on CUB-200 as shots are varied Recognition performance on ImageNet as total no. of classes are varied Recognition performance on ImageNet as fraction of base classes are varied
  • 42. 42 Conclusion • Manifold approach performs better than Bayesian approach in few-shot and few-class regime. • Both the approaches are the most effective in few-class regime, that is when the number of base classes is less. • Closed-form Bayes method perform better than approximation methods. • Markov-chain-based distance has incremental effect and can be used as a metric during the training stage. • Explore other priors and hyper-priors for the Bayesian model. HTL
  • 43. 43 Research Summary MOTIVATION APPROACH RESULTS • Current ML methods consume lots of resources. • Goal is to make ML more data-efficient. • TL simulates human learning by reusing models. • Competitive when compared with previous work. • Ablation studies show importance of each component. • Need to improve for fine-grained datasets. Graphs/Hyper-graphs for UDA Manifolds for HTL Neural Networks for FSL Neural Networks for ZSL Sample Prototype Semantics • Structural Prior. • Additional constraints and post-processing.
  • 44. 44 Limitations of Current Work • Abundant Source Data: Proposed transfer learning approaches mostly require abundant labeled data from source domain. • Batched Target: Target domain data available as a batch instead of in a sequential incremental manner. • Same Modalities: Source and target domains have same feature spaces and cannot have different modalities. • Hence, there is a need to tackle more realistic transfer learning settings.
  • 45. 45 Future Direction Unsupervised Transfer Learning Source Target Sequential Transfer Learning Heterogeneous Transfer Learning Source Target Source Target 1 Target 2 Target 3 • Use Self-supervised learning. • Clustering for pre-processing. • Need to develop incremental algorithms. • Correspondence can be time-dependent. • Need common subspace for two domains. • Metric minimization for subspace required.