Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration

1
Click to edit Master title style
Zero-shot Image Recognition Using Relational
Matching, Adaptation and Calibration
Debasmit Das C.S. George Lee
Assistive Robotics Technology Laboratory
School of Electrical and Computer Engineering
Purdue University, West Lafayette, IN, USA
Funding Source : National Science Foundation (IIS-1813935), NVIDIA Hardware Grant

2
Outline
• INTRODUCTION
- Problem Description.
- Previous Work.
- Challenges.
• PROPOSED APPROACH
- Relational Matching.
- Domain Adaptation.
- Scaled Calibration.
• EXPERIMENTAL RESULTS
- Comparative studies.
- Parameter, Convergence results etc.

3
IntroductionZero Shot Learning (ZSL)
Feature Space
Semantic Space
• Base Categories (source domain) contain
abundant labeled data.
• Novel Categories (target domain) contain
unlabeled data.
• However, class level semantic information available
for all categories.
• Find relationship between feature and semantic
space.
Example
Target Domain
Source Domain

4
IntroductionRelated Work of ZSL
Zero-shot
Learning
Embedding
Methods
Transductive
approaches
Generative
approaches
Hybrid
approaches
• Linear embedding
[Bernardino et al. ICML’15]
• Deep Embedding
[Zhang et al. CVPR’17]
• Multiview
[Fu et al. TPAMI’15]
• Dictionary Learning
[Kodirov et al. ICCV’15]
• Constrained VAE
[Verma et al. CVPR’18]
• Feature GAN
[Xian et al. CVPR’18]
• Semantic Similarity
[Zhang et al. CVPR ’15]
• Convex Combo
[Norouzi et al. ICLR’13]
[Relate feature & semantics ]
[Use unlabeled test data] [Generate data]
[Novel class from old class]

5
Challenges of ZSL
Hubness Domain Shift Seen Class Biasedness
• In the GZSL Setting ,
test data can be from
both seen and
unseen categories.
• Most unseen test
data predicted as
seen categories.
• Initially studied by
Chao et al. ECCV’16.
• Domain shift between
unseen test data and
unseen semantic
embeddings.
• Since unseen test data
not used in training.
• Phenomenon where only
a few candidates become
nearest neighbor
predictions.
• Due to curse of
dimensionality.
• Initially studied by
Radovanovic et al.
JMLR’10.
Introduction

6
Proposed Solution
One-to-one and pairwise
regression
Domain Adaptation Calibration
• Need to adapt semantic
embeddings to unseen
test data.
• Use previous DA
approach [Das & Lee
EAAI’18].
• Find correspondences
between semantic
embedding and unseen
test samples.
• Scaled calibration to
reduce scores of seen
classes.
• Implicit reduction of
variance of seen
classes.
• Structural matching
between semantics
and feature.
• Implicit
reduction of
dimensionality.
Proposed Approach
ADDRESS HUBNESS ADDRESS DOMAIN
SHIFT
ADDRESS BIASEDNESS

7
Proposed Framework Proposed Approach

8
Relational Matching
• Firstly, match between a seen sample and the
corresponding semantic embedding.
• Secondly, try to match the structure (pair-wise
distance matrix) between the seen prototypes and
semantic embeddings.
One-to-one regression Pairwise regression
Minimize with gradient descent
Proposed Approach

9
Domain Adaptation
• Adapt the unseen semantic embeddings (A) close to
the unseen test data (U).
• Find correspondences (C) between each data point
and semantic embedding with class regularization.
Correspondence based loss Group Lasso based regularization
Conditional Gradient based optimization
Proposed Approach

10
Scaled Calibration
• Modify the nearest neighbor Euclidean distance scores.
• Euclidean distance scores for seen classes are scaled while
that of unseen classes are kept the same.
Seen Total
Unseen
Proposed Approach

11
Experimental Results
• Animals with Attributes (AwA2)
[Lampert et al. TPAMI’14]
(Att – 85, Ysrc - 40 , Ytar - 10 )
• Pascal & Yahoo (aPY)
[Farhadi et al. CVPR’09]
(Att – 64, Ysrc - 20 , Ytar - 12 )
• Caltech-UCSD Birds (CUB)
[Welinder et al. ‘10]
(Att – 312, Ysrc - 150 , Ytar - 50 )
• Scene Understanding (SUN)
[Patterson et al. CVPR’12]
(Att – 102, Ysrc - 645, Ytar - 72 )
DatasetsComparison with previous work on four datasets.
Comparative Study
tr – Unseen class accuracy in traditional setting
u – Unseen class accuracy in generalized setting
s – Seen class accuracy in generalized setting
H – Harmonic mean of u and s
R – Relational Matching
RA – Relational Matching + Domain Adaptation
RC – Relational Matching + Scaled Calibration
RAC – Relational Matching + Domain
Adaptation + Scaled Calibration

12
Experimental ResultsSensitivity Studies I
Effect of the calibration factor
Effect of the structural matching weight

13
Experimental ResultsSensitivity Studies II
Effect of changing the
proportion of seen classes
Effect of changing the
number of test samples
AwA2 SUN

14
Experimental ResultsConvergence Analysis
Convergence results on
AwA2 dataset
Convergence results on
SUN datasetEffect of no. of epochs
on test accuracy

15
Experimental ResultsVisualization & Hubness
Feature Visualization
Without Domain
Adaptation
With Domain
Adaptation
Hubness Measurement
Hubness measured using
skewness of NN
prediction disitribution.
Unseen Features Seen Features Unseen Semantic Embedding
Seen Semantic Embedding

16
• Three-step approach to ZSL with structural
matching, domain adaptation and calibration.
• Tested on four challenging ZSL datasets on which it
has substantial improvement in performance.
• Domain adaptation found to be most effective.
Hubness is also reduced.
Conclusion
Future Work
Distinguishing between novel and base categories and
investigate generative models.

Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration

More Related Content

What's hot (20)

Similar to Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration (20)

More from Debasmit Das (7)

Recently uploaded (20)

Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration

Editor's Notes