SlideShare a Scribd company logo
Meta Dropout:
Learning to Perturb Features for Generalization
Hae Beom Lee¹, Taewook Nam¹, Eunho Yang¹², Sung Ju Hwang¹²
KAIST¹, AITRICS²
Few-shot Learning
Humans can generalize even with a single observation of a class.
[Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011
Observation
Query examples
Human
Few-shot Learning
On the other hand, deep neural networks require large number of training instances
to generalize well, and overfits with few training instances.
Few-shot
learning
Observation
Deep Neural Networks
How can we learn a model that generalize well even with few training instances?
Human
Query examples
Lack of data results in poor estimation of the decision boundary.
Learning to Perturb Latent Features
train
train
train
Lack of data results in poor estimation of the decision boundary.
Learning to Perturb Latent Features
train
train
train
test
test
test
Lack of data results in poor estimation of the decision boundary.
Learning to Perturb Latent Features
train
train
train
test
test
test
What if we learn to perturb latent features in order to explain the test examples?
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
Lack of data results in poor estimation of the decision boundary.
Learning to Perturb Latent Features
train
train
train
test
test
test
How to learn
𝝓 ?
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
What if we learn to perturb latent features in order to explain the test examples?
→ But test examples are not observable in standard learning framework.
Meta-Learning for few-shot classification
Meta Learning: Learn a model that can generalize over a task distribution!
Few-shot Classification
Knowledge
Transfer !
Meta-training
Meta-test
Test
Test
Training Test
Training
Training
: meta-knowledge
𝑝 𝜙(𝒛|𝒙)
[Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017
Model-Agnostic Meta-Learning (MAML)
Model Agnostic Meta Learning (MAML) aims to find a good initial model
parameter that can rapidly adapt to any tasks only with a few gradient steps.
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Initial model
parameter
∇ℒ1
∇ℒ2
∇ℒ3
Model-Agnostic Meta-Learning (MAML)
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
Initial model
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific
parameter
Task-specific parameter
for a novel task
∇ℒ∗
∇ℒ1
∇ℒ2
∇ℒ3
Model Agnostic Meta Learning (MAML) aims to find a good initial model
parameter that can rapidly adapt to any tasks only with a few gradient steps.
Model-Agnostic Meta-Learning (MAML)
train
train
train
𝜃MAML
∗
𝜃
Initial model
parameter
𝜃MAML
∗
∇ 𝜃ℒtrain
Except for sharing the initial model parameter 𝜽, MAML inner-gradient
∇ 𝜃ℒtrain
does not involve any further knowledge than 𝐷train
,
which may result in suboptimal decision boundaries at the end of task adaptation.
MAML + Meta Dropout
Initial model
parameter
train
train
train
𝜃MetaDrop
∗
𝜃MAML
∗
𝜃MAML
∗
𝜃
𝜃MetaDrop
∗
∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain
∇ 𝜃ℒtrain
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
𝑝 𝜙(𝒛|𝒙)
In Meta-dropout, we introduce the input-dependent noise distribution, and
computes the gradients of the expected loss over the noise distribution.
It improves the final model decision boundary at the end of the task adaptation.
Model Architecture
Initial model
parameter
𝜃MAML
∗
𝜃
𝜃MetaDrop
∗
∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain
∇ 𝜃ℒtrain
Multiplicative
noise
Conv
FC
Noise 𝒛
Main model
: 4-conv net
Noise 𝒛
𝜙
We let each bottom layer generate the noise for the upper layer, and the form of
the multiplicative noise is a softplus transformation of a Gaussian distribution.
Learning Objective
Meta-learning → Maximize the performance of the test examples.
Test log-likelihood
No perturb
Inner-gradient step
Train 1
Train 2
Train 1
Train 2
Test
Test
Generalization Performance
miniImageNet
5-way MAML Meta-dropout
1-shot
5-shot
49.58%
64.55%
51.93%
67.42%
Train 1
Train 2
Train 1
Train 2
Test
Test
Visualization of Stochastic Features
Original Images Stochastic
Channel 1
Stochastic
Channel 2
Comparison against Existing Regularizers
Models
Omniglot miniImageNet
1shot 5shot 1shot 5shot
No Perturbation 95.23 98.38 49.58 64.55
Manifold Mixup 89.78 97.86 48.62 63.86
Variational Information Bottleneck 94.98 98.85 48.12 64.78
Information Dropout 94.49 98.65 50.36 65.91
Meta-dropout 96.63 99.04 51.93 67.42
Meta-dropout outperforms the existing regularizers such as manifold mixup
and information-theoretic regularizers.
Adversarial Robustness
Meta-dropout Improves both clean and adversarial accuracies.
Omniglot 20-way 1-shot
𝐿∞-norm attack
adversarial
Meta-dropout
MAML
clean
Adversarial Robustness
The defense of Meta-dropout also generalizes across different attacks.
𝐿∞-norm attack 𝐿1-norm attack 𝐿2-norm attack
Omniglot 20-way 1-shot
Summary
• In this work, we showed that we can learn to perturb latent features in input-
dependent manner, in order to improve generalization.
• Meta-learning framework enables the effective learning of the perturbation
function.
• Meta-dropout outperforms the existing regularizers in meta-learning settings.
• Meta-dropout Improves both clean and adversarial accuracies on various
types of attacks.

More Related Content

PDF
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
PPTX
Survey on contrastive self supervised l earning
PDF
Neural_Programmer_Interpreter
PPTX
InfoGAIL
PDF
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PDF
YOLOv4: optimal speed and accuracy of object detection review
PDF
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PPTX
Learning to compare: relation network for few shot learning
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Survey on contrastive self supervised l earning
Neural_Programmer_Interpreter
InfoGAIL
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
YOLOv4: optimal speed and accuracy of object detection review
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Learning to compare: relation network for few shot learning

What's hot (20)

PDF
Deep Dive into Hyperparameter Tuning
PDF
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
PDF
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
PPTX
Deep Learning in Computer Vision
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
PPTX
Hyperparameter Tuning
PPTX
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PPTX
Radial basis function network ppt bySheetal,Samreen and Dhanashri
PPTX
Fast AutoAugment
PDF
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
PDF
Hands-on Deep Learning in Python
PDF
Making neural programming architectures generalize via recursion
PDF
PDF
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PDF
GPU-accelerated Action Acquisition Through Multiple Time Scales Recurrent Neu...
PDF
Mlp mixer image_process_210613 deeplearning paper review!
PDF
PR-297: Training data-efficient image transformers & distillation through att...
PDF
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Deep Dive into Hyperparameter Tuning
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Deep Learning in Computer Vision
Online Coreset Selection for Rehearsal-based Continual Learning
Hyperparameter Tuning
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Fast AutoAugment
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
Hands-on Deep Learning in Python
Making neural programming architectures generalize via recursion
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
GPU-accelerated Action Acquisition Through Multiple Time Scales Recurrent Neu...
Mlp mixer image_process_210613 deeplearning paper review!
PR-297: Training data-efficient image transformers & distillation through att...
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Ad

Similar to Meta Dropout: Learning to Perturb Latent Features for Generalization (20)

PDF
Introduction to Few shot learning
PPTX
Few shot learning/ one shot learning/ machine learning
PDF
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
PDF
BOIL: Towards Representation Change for Few-shot Learning
PDF
Deep Meta Learning
PDF
M4L18 Unsupervised and Semi-Supervised Learning - Slides v2.pdf
PDF
Lecture 11 - Advance Learning Techniques
PDF
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
PDF
Learning how to learn
PDF
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
PPTX
ICML 2017 Meta network
PDF
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
PPTX
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
PPTX
Learning to learn with meta learning
PDF
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
PPTX
Design neural networks with meta learning
PDF
Learning to learn unlearned feature for segmentation
PDF
NIPS2017 Few-shot Learning and Graph Convolution
PPTX
MODULE 4 AAI_______________________.pptx
PDF
Visual concept learning
Introduction to Few shot learning
Few shot learning/ one shot learning/ machine learning
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
BOIL: Towards Representation Change for Few-shot Learning
Deep Meta Learning
M4L18 Unsupervised and Semi-Supervised Learning - Slides v2.pdf
Lecture 11 - Advance Learning Techniques
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning how to learn
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
ICML 2017 Meta network
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Learning to learn with meta learning
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Design neural networks with meta learning
Learning to learn unlearned feature for segmentation
NIPS2017 Few-shot Learning and Graph Convolution
MODULE 4 AAI_______________________.pptx
Visual concept learning
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
PDF
Representational Continuity for Unsupervised Continual Learning
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
PDF
Skill-Based Meta-Reinforcement Learning
PDF
Edge Representation Learning with Hypergraphs
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
PDF
Adversarial Self-Supervised Contrastive Learning
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
PDF
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Representational Continuity for Unsupervised Continual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Skill-Based Meta-Reinforcement Learning
Edge Representation Learning with Hypergraphs
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Accurate Learning of Graph Representations with Graph Multiset Pooling
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Adversarial Self-Supervised Contrastive Learning
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Cost-effective Interactive Attention Learning with Neural Attention Process
Adversarial Neural Pruning with Latent Vulnerability Suppression
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
A Presentation on Artificial Intelligence
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity

Meta Dropout: Learning to Perturb Latent Features for Generalization

  • 1. Meta Dropout: Learning to Perturb Features for Generalization Hae Beom Lee¹, Taewook Nam¹, Eunho Yang¹², Sung Ju Hwang¹² KAIST¹, AITRICS²
  • 2. Few-shot Learning Humans can generalize even with a single observation of a class. [Lake et al. 11] One shot Learning of Simple Visual Concepts, CogSci 2011 Observation Query examples Human
  • 3. Few-shot Learning On the other hand, deep neural networks require large number of training instances to generalize well, and overfits with few training instances. Few-shot learning Observation Deep Neural Networks How can we learn a model that generalize well even with few training instances? Human Query examples
  • 4. Lack of data results in poor estimation of the decision boundary. Learning to Perturb Latent Features train train train
  • 5. Lack of data results in poor estimation of the decision boundary. Learning to Perturb Latent Features train train train test test test
  • 6. Lack of data results in poor estimation of the decision boundary. Learning to Perturb Latent Features train train train test test test What if we learn to perturb latent features in order to explain the test examples? 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙)
  • 7. Lack of data results in poor estimation of the decision boundary. Learning to Perturb Latent Features train train train test test test How to learn 𝝓 ? 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙) What if we learn to perturb latent features in order to explain the test examples? → But test examples are not observable in standard learning framework.
  • 8. Meta-Learning for few-shot classification Meta Learning: Learn a model that can generalize over a task distribution! Few-shot Classification Knowledge Transfer ! Meta-training Meta-test Test Test Training Test Training Training : meta-knowledge 𝑝 𝜙(𝒛|𝒙) [Ravi and Larochelle. 17] Optimization as a Model for Few-shot Learning, ICLR 2017
  • 9. Model-Agnostic Meta-Learning (MAML) Model Agnostic Meta Learning (MAML) aims to find a good initial model parameter that can rapidly adapt to any tasks only with a few gradient steps. [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017 Task-specific parameter Task-specific parameter Task-specific parameter Initial model parameter ∇ℒ1 ∇ℒ2 ∇ℒ3
  • 10. Model-Agnostic Meta-Learning (MAML) [Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017 Initial model parameter Task-specific parameter Task-specific parameter Task-specific parameter Task-specific parameter for a novel task ∇ℒ∗ ∇ℒ1 ∇ℒ2 ∇ℒ3 Model Agnostic Meta Learning (MAML) aims to find a good initial model parameter that can rapidly adapt to any tasks only with a few gradient steps.
  • 11. Model-Agnostic Meta-Learning (MAML) train train train 𝜃MAML ∗ 𝜃 Initial model parameter 𝜃MAML ∗ ∇ 𝜃ℒtrain Except for sharing the initial model parameter 𝜽, MAML inner-gradient ∇ 𝜃ℒtrain does not involve any further knowledge than 𝐷train , which may result in suboptimal decision boundaries at the end of task adaptation.
  • 12. MAML + Meta Dropout Initial model parameter train train train 𝜃MetaDrop ∗ 𝜃MAML ∗ 𝜃MAML ∗ 𝜃 𝜃MetaDrop ∗ ∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain ∇ 𝜃ℒtrain 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙) 𝑝 𝜙(𝒛|𝒙) In Meta-dropout, we introduce the input-dependent noise distribution, and computes the gradients of the expected loss over the noise distribution. It improves the final model decision boundary at the end of the task adaptation.
  • 13. Model Architecture Initial model parameter 𝜃MAML ∗ 𝜃 𝜃MetaDrop ∗ ∇ 𝜃 𝔼 𝑝 𝜙(𝒁|𝑿) ℒtrain ∇ 𝜃ℒtrain Multiplicative noise Conv FC Noise 𝒛 Main model : 4-conv net Noise 𝒛 𝜙 We let each bottom layer generate the noise for the upper layer, and the form of the multiplicative noise is a softplus transformation of a Gaussian distribution.
  • 14. Learning Objective Meta-learning → Maximize the performance of the test examples. Test log-likelihood No perturb Inner-gradient step
  • 15. Train 1 Train 2 Train 1 Train 2 Test Test Generalization Performance miniImageNet 5-way MAML Meta-dropout 1-shot 5-shot 49.58% 64.55% 51.93% 67.42% Train 1 Train 2 Train 1 Train 2 Test Test
  • 16. Visualization of Stochastic Features Original Images Stochastic Channel 1 Stochastic Channel 2
  • 17. Comparison against Existing Regularizers Models Omniglot miniImageNet 1shot 5shot 1shot 5shot No Perturbation 95.23 98.38 49.58 64.55 Manifold Mixup 89.78 97.86 48.62 63.86 Variational Information Bottleneck 94.98 98.85 48.12 64.78 Information Dropout 94.49 98.65 50.36 65.91 Meta-dropout 96.63 99.04 51.93 67.42 Meta-dropout outperforms the existing regularizers such as manifold mixup and information-theoretic regularizers.
  • 18. Adversarial Robustness Meta-dropout Improves both clean and adversarial accuracies. Omniglot 20-way 1-shot 𝐿∞-norm attack adversarial Meta-dropout MAML clean
  • 19. Adversarial Robustness The defense of Meta-dropout also generalizes across different attacks. 𝐿∞-norm attack 𝐿1-norm attack 𝐿2-norm attack Omniglot 20-way 1-shot
  • 20. Summary • In this work, we showed that we can learn to perturb latent features in input- dependent manner, in order to improve generalization. • Meta-learning framework enables the effective learning of the perturbation function. • Meta-dropout outperforms the existing regularizers in meta-learning settings. • Meta-dropout Improves both clean and adversarial accuracies on various types of attacks.