Few shot learning/ one shot learning/ machine learning

Few Shot Learning
Asif Ali
M.E SCUT CHINA
Date: May 31, 2019

Contents
• Introduction
• Problem statement, Why?
• Approaches
– Meta learning
• Matching network
• MAML
– Metric Learning
• Relation Networks
• Prototypical Networks
– AUGMENTAION BASED
• Delta encoder
• Few shot learning through informative retrieval lens

Introduction
• The ability of deep neural networks to extract complex statistics and learn high level features from vast
datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark
contrast to human perception — even a child could recognise a giraffe after seeing a single picture.
• Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc
hack
Can machine learning do better?
Few-shot learning aims to solve these issues

Few shot learning
• Whereas most machine learning based object categorization algorithms require
training on hundreds or thousands of samples/images and very large datasets,
one/FEW-shot learning aims to learn information about object categories from
one, or only a few, training samples/images.
• It is estimated that a child has learned almost all of the 10 ~ 30 thousand object
categories in the world by the age of six. This is due not only to the human mind's
computational power, but also to its ability to synthesize and learn new object
classes from existing information about different, previously learned classes.

Problem statement
Using a large annotated offline dataset,
dog
elephant
monkey
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
knowledge
transfer
lemur
rabbit
mongoose
model
for novel
categories
given task
Online
training
…

Problem statement
knowledge
transfer
classifier
for novel
categories
classification
Online
training
Offline
trainin
g
dog
elephant
monkey
…
lemur
rabbit
mongoose

Problem statement
knowledge
transfer
detector
for novel
categories
detection
Online
training
Offline
trainin
g
dog
elephant
monkey
…
lemur
rabbit
mongoose

Problem statement
knowledge
transfer
regressor
for novel
categories
Online
training
Offline
trainin
g
perform regression for novel categories,
dog
elephant
monkey
…
lemur
rabbit
mongoose

Why work on few-shot learning?
1. It brings the DL closer to real-world business usecases.
• Companies hesitate to spend much time and money on
annotated data for a solution that they may profit.
• Relevant objects are continuously replaced with new ones. DL
has to be agile.
2. It involves a bunch of exciting cutting-edge technologies.
Meta-
learning
methods
Networks
generating
networks
Data
synthesizers
Semantic
metric spaces
Graph neural
networks
Neural Turing
Machines
GANs

Meta-learning
Learn a learning strategy
to adjust well to a new
few-shot learning task
Data augmentation
Synthesize more data from the novel
classes to facilitate the regular
learning
Metric learning
Learn a `semantic` embedding
space using a distance loss
function
Few-
shot
learning Each category is
represented by just a
few examples
Learn to perform
classification,
detection, regression

The n-shot, k-way task
• The ability of a algorithm to perform few-shot learning is typically measured by its
performance on n-shot, k-way tasks. These are run as follows:
1. A model is given a query sample belonging to a new, previously unseen class
2. It is also given a support set, S, consisting of n examples each from k different unseen classes.
3. The algorithm then has to determine which of the support set classes the query sample belongs to

Training a
meta learner
to learn on
each task
Meta-Learning
Standard learning: datadatadata
instances
training a
learner on
the data
model
Meta learning: datadatatasks
mod
el
mod
el
learning
strategy
data knowledge
task-
specific
learner
task-agnostic
specific
classes
training data
target data
task-specific
meta-
learner
datadatatask
data
meta-
learner
New
task

Recurrent meta-learners
Matching Networks in Vinyals et.al., NIPS 2016
Distance-based classification: based on similarity between
the query and support samples in the embedding space
(adaptive metric):
𝑦 =
𝑖
𝑎 𝑥, 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
𝑓, 𝑔 - LSTM embeddings of 𝑥 dependent on the support set S
• Embedding space is class-agnostic
• LSTM attention mechanism adjusts the embedding to the task
to be elaborated later
Concept of episodes: test
conditions in the training.
• N new categories
• M training examples per
category
• one query example in {1..N}
categories.
• Typically, N=5, M=1, 5.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31

Optimization as a model for few-shot learning
• META-LEARN LSTM learn a general initialization
of the learner (classifier) network that allows for
quick convergence of training.
Problem: Gradient-based
optimization in high
capacity classifiers requires
many iterative steps over
many examples to perform
well.
Solution: an LSTM-based
meta-learner model to
learn the exact optimization
algorithm to train another
learner neural network
classifier in the few-shot
learning.

Optimizers
Optimize the learner to perform well after fine-tuning on the task data done by
a single (or few) step(s) of Gradient Descent.
MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017
Standard objective (task-specific, for task T):
min
θ
ℒT θ , learned via update θ′
= θ − α ∙ 𝛻θℒT(θ)
Meta-objective (across tasks):
min
θ T~p(ℑ) ℒT θ′ , learned via an update θ ← θ − β𝛻θ T~p(ℑ) ℒT θ′
reprinted from
Li et.al., 2017
Meta-SGD Li et.al., 2017
“Interestingly, the learning process can continue forever, thus enabling life-long learning,
and at any moment, the meta-learner can be applied to learn a learner for any new task.”
Meta-SGD Li et.al., 2017
Render α as a vector of size θ.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Meta-SGD 54.24 / 70.86

Metric Learning
Offline training
datadata
data
instan
ces
deep
embeddi
ng
model
Training: achieve good distributions for offline categories
Inference: Nearest Neightbour in the embedding space
query
semantic
embedding
space
class 3
class 1
class 2
class C
class A
class B
New task data d(q,A)
d(q,C)
d(q,B)
classification

Metric Learning
Relation networks, Sung et.al., CVPR 2018
Use the Siamese Networks principle :
• Concatenate embeddings of query and support samples
• Relation module is trained to produces score 1 for correct class and 0 for others
• Extends to zero-shot learning by replacing support embeddings with semantic features.
replicated from Sung et.al., Learning to
Compare - Relation network for few-shot
learning, CVPR 2018
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59

Metric Learning
Matching Networks,Vinyals et.al., NIPS 2016
Objective: maximize the cross-entropy for the non-parametric softmax
classifier (𝑥,𝑦) 𝑙𝑜𝑔𝑃 𝜃 𝑦 𝑥, 𝑆 , with
𝑃 𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
Each ca
by a sin
Prototypical Networks, Snell et al, 2016:
Each category is represented by it mean sample (prototype).
Objective: maximize the cross-entropy with the prototypes-based
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Prototypical
Networks
49.42 / 68.20
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59

Prototypical Networks
• In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class
prototypes to achieve impressive few-shot performance — exceeding Matching Networks
without the complication of FCE. The key assumption is made is that there exists an
embedding in which samples from each class cluster around a single prototypical
representation which is simply the mean of the individual samples

Sample synthesis
Offline stage
datadata
data
instan
ces
train a
synthesizer
sampling from
class distribution
synthesizer
model
data knowledge
On new task data
datafew data
instances
synthesizer
model
novel classes
datadata
many
data
instances
train a
model
task
model
datadataoffline
data

More augmentation approaches
Δ-encoder Schwartz et.al., NeurIPS 2018
• Use a variant of autoencoder to capture the intra-class
difference between two class samples in the latent space.
• Transfer class distributions from training to novel classes.
Encoder
Decoder
𝑍
Sampled
target
Sampled
reference
Sampled
delta
New class
reference
Synthesized
new class
example
Synthesis
Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an
effective sample synthesis method for few-shot object recognition', NeurIPS 2018.

Few shot learning through Information Retrieval lens
Goal: Ranking For classification
We want to classify the points by finding out which class is most similar one
So we are going to rank all the other w.r.t to some similarity measure
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610

Mean Average Precision

Few shot learning/ one shot learning/ machine learning

PROBLEMS AHEAD
The mean Average Precision is a terrible loss function (for gradient descent purposes)

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019

Few shot learning/ one shot learning/ machine learning

More Related Content

What's hot (20)

Similar to Few shot learning/ one shot learning/ machine learning (20)

Recently uploaded (20)

Few shot learning/ one shot learning/ machine learning

Editor's Notes