SlideShare a Scribd company logo
Few Shot Learning
Asif Ali
M.E SCUT CHINA
Date: May 31, 2019
Contents
• Introduction
• Problem statement, Why?
• Approaches
– Meta learning
• Matching network
• MAML
– Metric Learning
• Relation Networks
• Prototypical Networks
– AUGMENTAION BASED
• Delta encoder
• Few shot learning through informative retrieval lens
Introduction
• The ability of deep neural networks to extract complex statistics and learn high level features from vast
datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark
contrast to human perception — even a child could recognise a giraffe after seeing a single picture.
• Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc
hack
Can machine learning do better?
Few-shot learning aims to solve these issues
Few shot learning
• Whereas most machine learning based object categorization algorithms require
training on hundreds or thousands of samples/images and very large datasets,
one/FEW-shot learning aims to learn information about object categories from
one, or only a few, training samples/images.
• It is estimated that a child has learned almost all of the 10 ~ 30 thousand object
categories in the world by the age of six. This is due not only to the human mind's
computational power, but also to its ability to synthesize and learn new object
classes from existing information about different, previously learned classes.
Problem statement
Using a large annotated offline dataset,
dog
elephant
monkey
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
knowledge
transfer
lemur
rabbit
mongoose
model
for novel
categories
given task
Online
training
…
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
classifier
for novel
categories
classification
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
detector
for novel
categories
detection
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
regressor
for novel
categories
Online
training
Offline
trainin
g
perform regression for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Why work on few-shot learning?
1. It brings the DL closer to real-world business usecases.
• Companies hesitate to spend much time and money on
annotated data for a solution that they may profit.
• Relevant objects are continuously replaced with new ones. DL
has to be agile.
2. It involves a bunch of exciting cutting-edge technologies.
Meta-
learning
methods
Networks
generating
networks
Data
synthesizers
Semantic
metric spaces
Graph neural
networks
Neural Turing
Machines
GANs
Meta-learning
Learn a learning strategy
to adjust well to a new
few-shot learning task
Data augmentation
Synthesize more data from the novel
classes to facilitate the regular
learning
Metric learning
Learn a `semantic` embedding
space using a distance loss
function
Few-
shot
learning Each category is
represented by just a
few examples
Learn to perform
classification,
detection, regression
The n-shot, k-way task
• The ability of a algorithm to perform few-shot learning is typically measured by its
performance on n-shot, k-way tasks. These are run as follows:
1. A model is given a query sample belonging to a new, previously unseen class
2. It is also given a support set, S, consisting of n examples each from k different unseen classes.
3. The algorithm then has to determine which of the support set classes the query sample belongs to
Training a
meta learner
to learn on
each task
Meta-Learning
Standard learning: datadatadata
instances
training a
learner on
the data
model
Meta learning: datadatatasks
mod
el
mod
el
learning
strategy
data knowledge
task-
specific
learner
task-agnostic
specific
classes
training data
target data
task-specific
meta-
learner
datadatatask
data
meta-
learner
New
task
Recurrent meta-learners
Matching Networks in Vinyals et.al., NIPS 2016
Distance-based classification: based on similarity between
the query and support samples in the embedding space
(adaptive metric):
𝑦 =
𝑖
𝑎 𝑥, 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
𝑓, 𝑔 - LSTM embeddings of 𝑥 dependent on the support set S
• Embedding space is class-agnostic
• LSTM attention mechanism adjusts the embedding to the task
to be elaborated later
Concept of episodes: test
conditions in the training.
• N new categories
• M training examples per
category
• one query example in {1..N}
categories.
• Typically, N=5, M=1, 5.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
Optimization as a model for few-shot learning
• META-LEARN LSTM learn a general initialization
of the learner (classifier) network that allows for
quick convergence of training.
Problem: Gradient-based
optimization in high
capacity classifiers requires
many iterative steps over
many examples to perform
well.
Solution: an LSTM-based
meta-learner model to
learn the exact optimization
algorithm to train another
learner neural network
classifier in the few-shot
learning.
Optimizers
Optimize the learner to perform well after fine-tuning on the task data done by
a single (or few) step(s) of Gradient Descent.
MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017
Standard objective (task-specific, for task T):
min
θ
ℒT θ , learned via update θ′
= θ − α ∙ 𝛻θℒT(θ)
Meta-objective (across tasks):
min
θ T~p(ℑ) ℒT θ′ , learned via an update θ ← θ − β𝛻θ T~p(ℑ) ℒT θ′
reprinted from
Li et.al., 2017
Meta-SGD Li et.al., 2017
“Interestingly, the learning process can continue forever, thus enabling life-long learning,
and at any moment, the meta-learner can be applied to learn a learner for any new task.”
Meta-SGD Li et.al., 2017
Render α as a vector of size θ.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Meta-SGD 54.24 / 70.86
Metric Learning
Offline training
datadata
data
instan
ces
deep
embeddi
ng
model
Training: achieve good distributions for offline categories
Inference: Nearest Neightbour in the embedding space
query
semantic
embedding
space
class 3
class 1
class 2
class C
class A
class B
New task data d(q,A)
d(q,C)
d(q,B)
classification
Metric Learning
Relation networks, Sung et.al., CVPR 2018
Use the Siamese Networks principle :
• Concatenate embeddings of query and support samples
• Relation module is trained to produces score 1 for correct class and 0 for others
• Extends to zero-shot learning by replacing support embeddings with semantic features.
replicated from Sung et.al., Learning to
Compare - Relation network for few-shot
learning, CVPR 2018
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
Metric Learning
Matching Networks,Vinyals et.al., NIPS 2016
Objective: maximize the cross-entropy for the non-parametric softmax
classifier (𝑥,𝑦) 𝑙𝑜𝑔𝑃 𝜃 𝑦 𝑥, 𝑆 , with
𝑃 𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
Each ca
by a sin
Prototypical Networks, Snell et al, 2016:
Each category is represented by it mean sample (prototype).
Objective: maximize the cross-entropy with the prototypes-based
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Prototypical
Networks
49.42 / 68.20
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
Prototypical Networks
• In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class
prototypes to achieve impressive few-shot performance — exceeding Matching Networks
without the complication of FCE. The key assumption is made is that there exists an
embedding in which samples from each class cluster around a single prototypical
representation which is simply the mean of the individual samples
Sample synthesis
Offline stage
datadata
data
instan
ces
train a
synthesizer
sampling from
class distribution
synthesizer
model
data knowledge
On new task data
datafew data
instances
synthesizer
model
novel classes
datadata
many
data
instances
train a
model
task
model
datadataoffline
data
More augmentation approaches
Δ-encoder Schwartz et.al., NeurIPS 2018
• Use a variant of autoencoder to capture the intra-class
difference between two class samples in the latent space.
• Transfer class distributions from training to novel classes.
Encoder
Decoder
𝑍
Sampled
target
Sampled
reference
Sampled
delta
New class
reference
Synthesized
new class
example
Synthesis
Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an
effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
Few shot learning through Information Retrieval lens
Goal: Ranking For classification
We want to classify the points by finding out which class is most similar one
So we are going to rank all the other w.r.t to some similarity measure
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Mean Average Precision
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Few shot learning/ one shot learning/ machine learning
PROBLEMS AHEAD
The mean Average Precision is a terrible loss function (for gradient descent purposes)
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019
Thank you

More Related Content

PPTX
Zero shot learning
PDF
Introduction to Few shot learning
PDF
Generative adversarial networks
PDF
Meta learning tutorial
PPTX
Meta-Learning Presentation
PDF
AI-900: Microsoft Azure AI Fundamentals 2021
PDF
Large Language Models - From RNN to BERT
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Zero shot learning
Introduction to Few shot learning
Generative adversarial networks
Meta learning tutorial
Meta-Learning Presentation
AI-900: Microsoft Azure AI Fundamentals 2021
Large Language Models - From RNN to BERT
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf

What's hot (20)

PPTX
Transfer learning-presentation
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Support vector machine
PDF
An Introduction to Neural Architecture Search
PPTX
Introduction to Machine Learning
PPTX
Using SHAP to Understand Black Box Models
PDF
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
PPTX
CNN Tutorial
PDF
Multi-armed Bandits
PDF
Pr045 deep lab_semantic_segmentation
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
PDF
PR-409: Denoising Diffusion Probabilistic Models
PDF
Transfer Learning
PDF
bag-of-words models
PDF
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
PPTX
One shot learning
PDF
Zero shot-learning: paper presentation
PDF
Automated Machine Learning
PPTX
PDF
An introduction to Machine Learning
Transfer learning-presentation
Introduction to Machine Learning with SciKit-Learn
Support vector machine
An Introduction to Neural Architecture Search
Introduction to Machine Learning
Using SHAP to Understand Black Box Models
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
CNN Tutorial
Multi-armed Bandits
Pr045 deep lab_semantic_segmentation
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
PR-409: Denoising Diffusion Probabilistic Models
Transfer Learning
bag-of-words models
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
One shot learning
Zero shot-learning: paper presentation
Automated Machine Learning
An introduction to Machine Learning
Ad

Similar to Few shot learning/ one shot learning/ machine learning (20)

PDF
Optimization as a model for few shot learning
PDF
ML crash course
PPTX
in5490-classification (1).pptx
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PDF
Machine Duping 101: Pwning Deep Learning Systems
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
PDF
Visual concept learning
PPTX
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
PDF
Deep Meta Learning
PDF
Model Evaluation in the land of Deep Learning
PPTX
AI & ML in Defence Systems - Sunil Chomal
PPT
presentation.ppt
PDF
Large Scale Distributed Deep Networks
PPTX
Machine learning ppt.
PPTX
Machine Learning and Real-World Applications
PPTX
Intro to machine learning
PPTX
DEEP LEARNING (UNIT 2 ) by surbhi saroha
PDF
AI and Deep Learning
PDF
MACHINE LEARNING TOOLBOX
PPTX
Computer Vision for Beginners
Optimization as a model for few shot learning
ML crash course
in5490-classification (1).pptx
DEF CON 24 - Clarence Chio - machine duping 101
Machine Duping 101: Pwning Deep Learning Systems
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Visual concept learning
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Deep Meta Learning
Model Evaluation in the land of Deep Learning
AI & ML in Defence Systems - Sunil Chomal
presentation.ppt
Large Scale Distributed Deep Networks
Machine learning ppt.
Machine Learning and Real-World Applications
Intro to machine learning
DEEP LEARNING (UNIT 2 ) by surbhi saroha
AI and Deep Learning
MACHINE LEARNING TOOLBOX
Computer Vision for Beginners
Ad

Recently uploaded (20)

PPTX
Lecture Notes Electrical Wiring System Components
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Digital Logic Computer Design lecture notes
PPT
Project quality management in manufacturing
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PPTX
UNIT 4 Total Quality Management .pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Lecture Notes Electrical Wiring System Components
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Structs to JSON How Go Powers REST APIs.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
bas. eng. economics group 4 presentation 1.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lesson 3_Tessellation.pptx finite Mathematics
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Operating System & Kernel Study Guide-1 - converted.pdf
Digital Logic Computer Design lecture notes
Project quality management in manufacturing
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
UNIT 4 Total Quality Management .pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf

Few shot learning/ one shot learning/ machine learning

  • 1. Few Shot Learning Asif Ali M.E SCUT CHINA Date: May 31, 2019
  • 2. Contents • Introduction • Problem statement, Why? • Approaches – Meta learning • Matching network • MAML – Metric Learning • Relation Networks • Prototypical Networks – AUGMENTAION BASED • Delta encoder • Few shot learning through informative retrieval lens
  • 3. Introduction • The ability of deep neural networks to extract complex statistics and learn high level features from vast datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark contrast to human perception — even a child could recognise a giraffe after seeing a single picture. • Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc hack Can machine learning do better? Few-shot learning aims to solve these issues
  • 4. Few shot learning • Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of samples/images and very large datasets, one/FEW-shot learning aims to learn information about object categories from one, or only a few, training samples/images. • It is estimated that a child has learned almost all of the 10 ~ 30 thousand object categories in the world by the age of six. This is due not only to the human mind's computational power, but also to its ability to synthesize and learn new object classes from existing information about different, previously learned classes.
  • 5. Problem statement Using a large annotated offline dataset, dog elephant monkey Offline trainin g perform for novel categories, represented by just a few samples each. knowledge transfer lemur rabbit mongoose model for novel categories given task Online training …
  • 6. Problem statement Using a large annotated offline dataset, knowledge transfer classifier for novel categories classification Online training Offline trainin g perform for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 7. Problem statement Using a large annotated offline dataset, knowledge transfer detector for novel categories detection Online training Offline trainin g perform for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 8. Problem statement Using a large annotated offline dataset, knowledge transfer regressor for novel categories Online training Offline trainin g perform regression for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 9. Why work on few-shot learning? 1. It brings the DL closer to real-world business usecases. • Companies hesitate to spend much time and money on annotated data for a solution that they may profit. • Relevant objects are continuously replaced with new ones. DL has to be agile. 2. It involves a bunch of exciting cutting-edge technologies. Meta- learning methods Networks generating networks Data synthesizers Semantic metric spaces Graph neural networks Neural Turing Machines GANs
  • 10. Meta-learning Learn a learning strategy to adjust well to a new few-shot learning task Data augmentation Synthesize more data from the novel classes to facilitate the regular learning Metric learning Learn a `semantic` embedding space using a distance loss function Few- shot learning Each category is represented by just a few examples Learn to perform classification, detection, regression
  • 11. The n-shot, k-way task • The ability of a algorithm to perform few-shot learning is typically measured by its performance on n-shot, k-way tasks. These are run as follows: 1. A model is given a query sample belonging to a new, previously unseen class 2. It is also given a support set, S, consisting of n examples each from k different unseen classes. 3. The algorithm then has to determine which of the support set classes the query sample belongs to
  • 12. Training a meta learner to learn on each task Meta-Learning Standard learning: datadatadata instances training a learner on the data model Meta learning: datadatatasks mod el mod el learning strategy data knowledge task- specific learner task-agnostic specific classes training data target data task-specific meta- learner datadatatask data meta- learner New task
  • 13. Recurrent meta-learners Matching Networks in Vinyals et.al., NIPS 2016 Distance-based classification: based on similarity between the query and support samples in the embedding space (adaptive metric): 𝑦 = 𝑖 𝑎 𝑥, 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆 𝑓, 𝑔 - LSTM embeddings of 𝑥 dependent on the support set S • Embedding space is class-agnostic • LSTM attention mechanism adjusts the embedding to the task to be elaborated later Concept of episodes: test conditions in the training. • N new categories • M training examples per category • one query example in {1..N} categories. • Typically, N=5, M=1, 5. Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31
  • 14. Optimization as a model for few-shot learning • META-LEARN LSTM learn a general initialization of the learner (classifier) network that allows for quick convergence of training. Problem: Gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well. Solution: an LSTM-based meta-learner model to learn the exact optimization algorithm to train another learner neural network classifier in the few-shot learning.
  • 15. Optimizers Optimize the learner to perform well after fine-tuning on the task data done by a single (or few) step(s) of Gradient Descent. MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017 Standard objective (task-specific, for task T): min θ ℒT θ , learned via update θ′ = θ − α ∙ 𝛻θℒT(θ) Meta-objective (across tasks): min θ T~p(ℑ) ℒT θ′ , learned via an update θ ← θ − β𝛻θ T~p(ℑ) ℒT θ′ reprinted from Li et.al., 2017 Meta-SGD Li et.al., 2017 “Interestingly, the learning process can continue forever, thus enabling life-long learning, and at any moment, the meta-learner can be applied to learn a learner for any new task.” Meta-SGD Li et.al., 2017 Render α as a vector of size θ. Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Meta-SGD 54.24 / 70.86
  • 16. Metric Learning Offline training datadata data instan ces deep embeddi ng model Training: achieve good distributions for offline categories Inference: Nearest Neightbour in the embedding space query semantic embedding space class 3 class 1 class 2 class C class A class B New task data d(q,A) d(q,C) d(q,B) classification
  • 17. Metric Learning Relation networks, Sung et.al., CVPR 2018 Use the Siamese Networks principle : • Concatenate embeddings of query and support samples • Relation module is trained to produces score 1 for correct class and 0 for others • Extends to zero-shot learning by replacing support embeddings with semantic features. replicated from Sung et.al., Learning to Compare - Relation network for few-shot learning, CVPR 2018 Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Relation networks 50.44 / 65.32 Meta-SGD 54.24 / 70.86 LEO 61.76 / 77.59
  • 18. Metric Learning Matching Networks,Vinyals et.al., NIPS 2016 Objective: maximize the cross-entropy for the non-parametric softmax classifier (𝑥,𝑦) 𝑙𝑜𝑔𝑃 𝜃 𝑦 𝑥, 𝑆 , with 𝑃 𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆 Each ca by a sin Prototypical Networks, Snell et al, 2016: Each category is represented by it mean sample (prototype). Objective: maximize the cross-entropy with the prototypes-based Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Relation networks 50.44 / 65.32 Prototypical Networks 49.42 / 68.20 Meta-SGD 54.24 / 70.86 LEO 61.76 / 77.59
  • 19. Prototypical Networks • In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class prototypes to achieve impressive few-shot performance — exceeding Matching Networks without the complication of FCE. The key assumption is made is that there exists an embedding in which samples from each class cluster around a single prototypical representation which is simply the mean of the individual samples
  • 20. Sample synthesis Offline stage datadata data instan ces train a synthesizer sampling from class distribution synthesizer model data knowledge On new task data datafew data instances synthesizer model novel classes datadata many data instances train a model task model datadataoffline data
  • 21. More augmentation approaches Δ-encoder Schwartz et.al., NeurIPS 2018 • Use a variant of autoencoder to capture the intra-class difference between two class samples in the latent space. • Transfer class distributions from training to novel classes. Encoder Decoder 𝑍 Sampled target Sampled reference Sampled delta New class reference Synthesized new class example Synthesis Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
  • 22. Few shot learning through Information Retrieval lens Goal: Ranking For classification We want to classify the points by finding out which class is most similar one So we are going to rank all the other w.r.t to some similarity measure Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 23. Mean Average Precision Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 25. PROBLEMS AHEAD The mean Average Precision is a terrible loss function (for gradient descent purposes)
  • 28. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 29. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 30. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019

Editor's Notes

  • #13: Meta learning: “learn on other problems how to improve learning for our target problem”
  • #14: References: Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra, Matching networks for one shot learning. In NIPS 2016 Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap, Meta-Learning with Memory-Augmented Neural Networks, ICML 2016
  • #16: References: Finn, Chelsea, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017 Li2017 - Z. Li, F. Zhou, F. Chen, and H. Li. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv:1707.09835, 2017
  • #18: References Sung, Flood & Yang, Yongxin & Zhang, Li & Xiang, Tao & H. S. Torr, Philip & Hospedales, Timothy. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018
  • #22: Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an effective sample synthesis method for few-shot object recognition', NeurIPS 2018. Chen, Z., Fu, Y., Zhang, Y., Jiang, Y. G., Xue, X., & Sigal, L. (2018). Semantic feature augmentation in few-shot learning. arXiv preprint arXiv:1804.05298