SlideShare a Scribd company logo
[N] – SHOT Learning
In deep learning models
davinnovation@gmail.com
Learning process
Human Deep Learning
What they need…
single one picture book 60,000 train data (MNIST)
In Traditional Deep Learning…
http://pinktentacle.com/tag/waseda-university/
One Shot Learning
Give Inference ability to Learning model
One Shot Learning
Train One(Few) Data, Works Awesome
One Shot Learning – Many approaches
Transfer Learning
Domain Adaptation
Imitation Learning
…..
Started from Meta Learning
Meta means something that is "self-referencing".
‘Knowing about knowing’
Today…
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Turing Machine
Basic Structure
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
N x M memory
Block
𝑀𝑡 : t is time
N
M
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Write Heads
Controller
External Input External Output
Read Heads
σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ...
Weight vector : 𝑤𝑡
1 2
2 1
1 3
1.1
1.9
1.2
𝑟𝑡 ← ෍
𝑖
𝑤𝑡(𝑖)𝑀𝑡(𝑖)
Read vector : 𝑟𝑡
i = 0 i = 1 …
i = 0 i = 1 …
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡0.1 1.8
2 1
0.1 2.7
erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1)
𝑀′
𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡]
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡
1 1.9
2.9 1.1
0.1 2.7
add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1)
𝑀𝑡 𝑖 ← 𝑀′
𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡
~1
~1
~0
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑐
𝑘 𝑡
β 𝑡
𝑀𝑡
𝑤𝑡
𝑐
𝑖 ←
exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
β 𝑡 = 0 0.5
0.5
.61
.39
β 𝑡 = 5
.98
.02
β 𝑡 = 50
𝑤𝑡
𝑐
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Interpolation𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡−1
𝑔𝑡
𝑤𝑡
𝑔
← 𝑔𝑡 𝑤𝑡
𝑐
+ (1 − 𝑔𝑡) 𝑤𝑡−1
.61
.39
𝑤𝑡
𝑐
.01
.90
𝑤𝑡−1
𝑔𝑡 = 0 .01
.90
𝑔𝑡 = 0.5 .36
.64
𝑤𝑡
𝑔
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Convolutional
Shift
𝑤𝑡
𝑔
𝑤′ 𝑡
𝑠𝑡
𝑤′
𝑡(𝑖) ← σ 𝑗=0
𝑁−1
𝑤𝑡
𝑔
𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
0.9
𝑤′
𝑡(0)
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
.01
𝑤′
𝑡(1)
.01
.90
𝑤′
𝑡
0
1
𝑠𝑡
1
0
𝑠𝑡
0.5
0.5
𝑠𝑡
Right shift Left shift Spread shift
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Sharpening 𝑤𝑡
𝑤′ 𝑡
γ 𝑡
𝑤𝑡 𝑖 ←
𝑤′
𝑡(𝑖)γ 𝑡
σ 𝑗 𝑤′
𝑡(𝑗)γ 𝑡
.01
.90
𝑤′
𝑡
γ 𝑡 = 0 0.5
0.5
.01
.90
.0001
.9998
γ 𝑡 = 1
γ 𝑡 = 2
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Neural Network
(RNN+LSTM)
Controller
Experiment – Copy train time
Experiment – Copy result
NTM
LSTM
Input : Random binary vector ( 8bit )
NTM > LSTM
Quick & Clear
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
NTM is ‘Meta learning’
=>
Use Memory-Augment, make rapid learn model.
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Memory
Read Heads Write Heads
Controller
External Input External OutputAddressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
=> Doesn’t use location-based addressing
=> Only Use Content-based addressing
Memory-Augmented Neural Networks (MANN)
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑟
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑟
𝑘 𝑡
𝑀𝑡
𝑤𝑡
𝑟
𝑖 ←
exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
.53
.47
𝑤𝑡
𝑟
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
0.9 0.1 0 ...
0.1 1.8
2 1
0.1 2.7
𝑤𝑡
𝑢
← γ 𝑤𝑡−1
𝑢
+ 𝑤𝑡
𝑟
+ 𝑤𝑡
𝑤
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Least Recently Used Access ( LRUA )
Compute from MANN
𝑤𝑡
𝑤
← σ α 𝑤𝑡−1
𝑟
+ (1 − σ α ) 𝑤𝑡−1
𝑙𝑢
Decay Parameter
𝑤𝑡
𝑙𝑢
(𝑖) = ቊ
0, 𝑖𝑓 𝑤𝑡
𝑢
𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ
1, 𝑒𝑙𝑠𝑒
𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡
𝑤
𝑖 𝑘 𝑡
Least Used Memory
Sigmoid Function
Scalar gate parameter
𝑤𝑡
𝑤
𝑘 𝑡 : write vector
Experiment – Data
One Episode :
Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1)
Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇)
𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃)
(𝑥0, 𝑛𝑢𝑙𝑙 )
(𝑦′0)
RNN
𝑥1, 𝑦0
(𝑦′1)
Experiment – Data
Omniglot Dataset : 1600 > classes
 1200 class train, 423 class test ( downscale to 20x20 )
+ plus rotate augmentation
Experiment – Classification Result
Trained with one-hot vector representations
With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ )
Instance
: class emerge count…?
Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
MANN + ‘decision predict or pass’
Just Like Quiz Show!
Active One-Shot
https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
RNN
𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1)
[0,1] || (𝑦′ 𝑡, 0)
𝑦𝑡, 𝑥 𝑡+1
[0,1]
𝑟𝑡 = −0.05
𝑦𝑡, 𝑥 𝑡+1
(𝑦′ 𝑡, 0)
𝑟𝑡 = ቊ
+1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡
−1
Results https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
Want some more?
http://proceedings.mlr.press/v37/romera-
paredes15.pdf
https://arxiv.org/pdf/1606.04080.pdf
Reference
• https://tensorflow.blog/tag/one-shot-learning/
• http://www.modulabs.co.kr/DeepLAB_library/11115
• https://www.youtube.com/watch?v=CzQSQ_0Z-QU
• https://www.slideshare.net/JisungDavidKim/oneshot-
learning
• https://norman3.github.io/papers/docs/neural_turing_
machine.html
• https://www.slideshare.net/webaquebec/guillaume-
chevalier-deep-learning-avec-tensor-flow
• https://www.slideshare.net/ssuser06e0c5/metalearnin
g-with-memory-augmented-neural-networks

More Related Content

PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
PPT
Neural
PPTX
Differential Neural Computers
PDF
Paper Study: Transformer dissection
PDF
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
CNN Attention Networks
PPS
Neural Networks
PPTX
Deep learning study 2
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Neural
Differential Neural Computers
Paper Study: Transformer dissection
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
CNN Attention Networks
Neural Networks
Deep learning study 2

Similar to One shot learning - deep learning ( meta learn ) (20)

PPTX
Neural network basic and introduction of Deep learning
PDF
Lecture 4 neural networks
PDF
Artificial neural networks
PPTX
Machine learning Module-2, 6th Semester Elective
PDF
NIPS2017 NMT-VNN
PDF
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
PDF
Neural Turing Machines
PDF
NVIDIA HPC ソフトウエア斜め読み
PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
PPTX
Principal Component Analysis (PCA) .
PDF
Echo state networks and locomotion patterns
PPTX
Robot, Learning From Data
PDF
Lesson_8_DeepLearning.pdf
PDF
Lecture 5: Neural Networks II
PDF
Introduction to Neural Networks
PPTX
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
PDF
Training Neural Networks
PPTX
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
PDF
Josh Patterson MLconf slides
PPT
Neural-Networks.ppt
Neural network basic and introduction of Deep learning
Lecture 4 neural networks
Artificial neural networks
Machine learning Module-2, 6th Semester Elective
NIPS2017 NMT-VNN
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Neural Turing Machines
NVIDIA HPC ソフトウエア斜め読み
Machine Learning Essentials Demystified part2 | Big Data Demystified
Principal Component Analysis (PCA) .
Echo state networks and locomotion patterns
Robot, Learning From Data
Lesson_8_DeepLearning.pdf
Lecture 5: Neural Networks II
Introduction to Neural Networks
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
Training Neural Networks
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Josh Patterson MLconf slides
Neural-Networks.ppt
Ad

More from Dong Heon Cho (20)

PDF
Forward-Forward Algorithm
PDF
What is Texture.pdf
PDF
PDF
Neural Radiance Field
PPTX
2020 > Self supervised learning
PDF
All about that pooling
PPTX
Background elimination review
PDF
Transparent Latent GAN
PPTX
Image matting atoc
PPTX
Multi object Deep reinforcement learning
PPTX
Multi agent reinforcement learning for sequential social dilemmas
PPTX
Multi agent System
PPTX
Hybrid reward architecture
PDF
Use Jupyter notebook guide in 5 minutes
PPTX
AlexNet and so on...
PDF
Deep Learning AtoC with Image Perspective
PDF
LOL win prediction
PDF
How can we train with few data
PDF
Domain adaptation gan
PPTX
Dense sparse-dense training for dnn and Other Models
Forward-Forward Algorithm
What is Texture.pdf
Neural Radiance Field
2020 > Self supervised learning
All about that pooling
Background elimination review
Transparent Latent GAN
Image matting atoc
Multi object Deep reinforcement learning
Multi agent reinforcement learning for sequential social dilemmas
Multi agent System
Hybrid reward architecture
Use Jupyter notebook guide in 5 minutes
AlexNet and so on...
Deep Learning AtoC with Image Perspective
LOL win prediction
How can we train with few data
Domain adaptation gan
Dense sparse-dense training for dnn and Other Models
Ad

Recently uploaded (20)

PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Business Analytics and business intelligence.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Microsoft Core Cloud Services powerpoint
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to the R Programming Language
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
How to run a consulting project- client discovery
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Business Analytics and business intelligence.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
SAP 2 completion done . PRESENTATION.pptx
annual-report-2024-2025 original latest.
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Pilar Kemerdekaan dan Identi Bangsa.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Microsoft Core Cloud Services powerpoint
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to the R Programming Language
retention in jsjsksksksnbsndjddjdnFPD.pptx
ISS -ESG Data flows What is ESG and HowHow
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Optimise Shopper Experiences with a Strong Data Estate.pdf
STERILIZATION AND DISINFECTION-1.ppthhhbx
How to run a consulting project- client discovery

One shot learning - deep learning ( meta learn )

  • 1. [N] – SHOT Learning In deep learning models davinnovation@gmail.com
  • 3. What they need… single one picture book 60,000 train data (MNIST)
  • 4. In Traditional Deep Learning… http://pinktentacle.com/tag/waseda-university/
  • 5. One Shot Learning Give Inference ability to Learning model
  • 6. One Shot Learning Train One(Few) Data, Works Awesome
  • 7. One Shot Learning – Many approaches Transfer Learning Domain Adaptation Imitation Learning …..
  • 8. Started from Meta Learning Meta means something that is "self-referencing". ‘Knowing about knowing’ Today…
  • 9. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Turing Machine Basic Structure
  • 10. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) N x M memory Block 𝑀𝑡 : t is time N M
  • 11. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Write Heads Controller External Input External Output Read Heads σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ... Weight vector : 𝑤𝑡 1 2 2 1 1 3 1.1 1.9 1.2 𝑟𝑡 ← ෍ 𝑖 𝑤𝑡(𝑖)𝑀𝑡(𝑖) Read vector : 𝑟𝑡 i = 0 i = 1 … i = 0 i = 1 …
  • 12. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡0.1 1.8 2 1 0.1 2.7 erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1) 𝑀′ 𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡] ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Erase -> Add
  • 13. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡 1 1.9 2.9 1.1 0.1 2.7 add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1) 𝑀𝑡 𝑖 ← 𝑀′ 𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡 ~1 ~1 ~0 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) Erase -> Add
  • 14. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently
  • 15. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing
  • 16. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑐 𝑘 𝑡 β 𝑡 𝑀𝑡 𝑤𝑡 𝑐 𝑖 ← exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 β 𝑡 = 0 0.5 0.5 .61 .39 β 𝑡 = 5 .98 .02 β 𝑡 = 50 𝑤𝑡 𝑐
  • 17. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Interpolation𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡−1 𝑔𝑡 𝑤𝑡 𝑔 ← 𝑔𝑡 𝑤𝑡 𝑐 + (1 − 𝑔𝑡) 𝑤𝑡−1 .61 .39 𝑤𝑡 𝑐 .01 .90 𝑤𝑡−1 𝑔𝑡 = 0 .01 .90 𝑔𝑡 = 0.5 .36 .64 𝑤𝑡 𝑔
  • 18. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Convolutional Shift 𝑤𝑡 𝑔 𝑤′ 𝑡 𝑠𝑡 𝑤′ 𝑡(𝑖) ← σ 𝑗=0 𝑁−1 𝑤𝑡 𝑔 𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 0.9 𝑤′ 𝑡(0) .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 .01 𝑤′ 𝑡(1) .01 .90 𝑤′ 𝑡 0 1 𝑠𝑡 1 0 𝑠𝑡 0.5 0.5 𝑠𝑡 Right shift Left shift Spread shift
  • 19. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Sharpening 𝑤𝑡 𝑤′ 𝑡 γ 𝑡 𝑤𝑡 𝑖 ← 𝑤′ 𝑡(𝑖)γ 𝑡 σ 𝑗 𝑤′ 𝑡(𝑗)γ 𝑡 .01 .90 𝑤′ 𝑡 γ 𝑡 = 0 0.5 0.5 .01 .90 .0001 .9998 γ 𝑡 = 1 γ 𝑡 = 2
  • 20. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Neural Network (RNN+LSTM) Controller
  • 21. Experiment – Copy train time
  • 22. Experiment – Copy result NTM LSTM Input : Random binary vector ( 8bit ) NTM > LSTM Quick & Clear
  • 23. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf NTM is ‘Meta learning’ => Use Memory-Augment, make rapid learn model.
  • 24. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Memory Read Heads Write Heads Controller External Input External OutputAddressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently => Doesn’t use location-based addressing => Only Use Content-based addressing Memory-Augmented Neural Networks (MANN)
  • 25. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 26. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑟 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 27. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑟 𝑘 𝑡 𝑀𝑡 𝑤𝑡 𝑟 𝑖 ← exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 .53 .47 𝑤𝑡 𝑟 Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 28. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads 0.9 0.1 0 ... 0.1 1.8 2 1 0.1 2.7 𝑤𝑡 𝑢 ← γ 𝑤𝑡−1 𝑢 + 𝑤𝑡 𝑟 + 𝑤𝑡 𝑤 ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Least Recently Used Access ( LRUA ) Compute from MANN 𝑤𝑡 𝑤 ← σ α 𝑤𝑡−1 𝑟 + (1 − σ α ) 𝑤𝑡−1 𝑙𝑢 Decay Parameter 𝑤𝑡 𝑙𝑢 (𝑖) = ቊ 0, 𝑖𝑓 𝑤𝑡 𝑢 𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ 1, 𝑒𝑙𝑠𝑒 𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡 𝑤 𝑖 𝑘 𝑡 Least Used Memory Sigmoid Function Scalar gate parameter 𝑤𝑡 𝑤 𝑘 𝑡 : write vector
  • 29. Experiment – Data One Episode : Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1) Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇) 𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃) (𝑥0, 𝑛𝑢𝑙𝑙 ) (𝑦′0) RNN 𝑥1, 𝑦0 (𝑦′1)
  • 30. Experiment – Data Omniglot Dataset : 1600 > classes  1200 class train, 423 class test ( downscale to 20x20 ) + plus rotate augmentation
  • 31. Experiment – Classification Result Trained with one-hot vector representations With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ ) Instance : class emerge count…?
  • 33. Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf RNN 𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1) [0,1] || (𝑦′ 𝑡, 0) 𝑦𝑡, 𝑥 𝑡+1 [0,1] 𝑟𝑡 = −0.05 𝑦𝑡, 𝑥 𝑡+1 (𝑦′ 𝑡, 0) 𝑟𝑡 = ቊ +1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡 −1
  • 36. Reference • https://tensorflow.blog/tag/one-shot-learning/ • http://www.modulabs.co.kr/DeepLAB_library/11115 • https://www.youtube.com/watch?v=CzQSQ_0Z-QU • https://www.slideshare.net/JisungDavidKim/oneshot- learning • https://norman3.github.io/papers/docs/neural_turing_ machine.html • https://www.slideshare.net/webaquebec/guillaume- chevalier-deep-learning-avec-tensor-flow • https://www.slideshare.net/ssuser06e0c5/metalearnin g-with-memory-augmented-neural-networks