Meta-GMVAE: Mixture of Gaussian VAEs for
Unsupervised Meta-Learning
Dong Bok Lee1, Dongchan Min1, Seanie Lee1, and Sung Ju Hwang1,2
KAIST1, AITRICS2, Seoul
Introduction
Unsupervised learning aims to learn meaningful representations from unlabeled data
that can be transferred to down-stream tasks.
Image
Unsupervised
Learning
Image recognition
…
Image segmentation
Representation
Introduction
Meta-learning shares the spirit of unsupervised learning in that they seek to learn more
effective learning procedure than learning from scratch.
Unsupervised Learning
Image recognition
…
Image segmentation
Representation
Meta-Learning
Task1: Tiger vs Cat
…
Task2: Car vs Bicycle
Model
…
…
Introduction
The fundamental difference of the two is that the most meta-learning approaches are
supervised, assuming full access to the labels.
Prototypical Network [1]
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
MAML [2]
Supervision
Introduction
Due to the assumption of supervision, existing meta-learning methods have limitation:
they requires massive amounts of human efforts on labeling.
Labeling
Cat Dog
Car
Introduction
To overcome this, two recent works have proposed unsupervised meta-learning:
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set
Introduction
They focus on constructing supervised meta-training dataset by pseudo labeling with
clustering and augmentation.
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set
Pseudo Labeling
Method
In this work, we have focused on developing a principled unsupervised meta-learning
method, namely Meta-GMVAE.
The main idea is to bridge the gap between the process of unsupervised meta-training
and that of supervised meta-test.
Method (Unsupervised Meta-training)
Specifically, we start from Variational Autoencoder [5], where its prior is modelled with
Gaussian Mixtures.
The assumption is that each modality can represent a label at meta-test.
Its generative process is as follows:
1.
2.
3.
[5] Diederak P. Kingma, and Max Welling: “Auto-encoding Varitional Bayes”, ICLR 2013
A graphical illustration of GMVAE
Method (Unsupervised Meta-training)
However, the difference from the previous work on GMVAE [6] is that they fix the prior
parameters since they target a single task learning.
[6] Nat Dilokthanakul et al: “Deep unsupervised clustering with Gaussian Mixture Variational Autoencoder”, arXiv 2016
1.
2.
3.
A graphical illustration of GMVAE
This part is fixed
Method (Unsupervised Meta-training)
To learn the set-dependent multi-modalities, we assume that there exists a parameter
for each episode dataset which is randomly drawn.
A graphical illustration of Meta-GMVAE
A graphical illustration of GMVAE
Method (Unsupervised Meta-training)
Then we derive the following variational lower bound for the marginal log-likelihood:
Method (Unsupervised Meta-training)
Here we use i.i.d assumption on the data log-likelihood:
where the number of datapoints is 𝑀.
Method (Unsupervised Meta-training)
Then we introduce Gaussian Mixture prior and set-dependent variational posterior:
where z is the latent variable.
Method (Unsupervised Meta-training)
Here we model set-dependent posterior to encode each data within the given dataset
into the latent space.
Specifically, this is implemented by TransfomerEncoder proposed by Vaswani et al. [7].
[7] Ashish Vaswani et al: “Attention is All You Need”, NeuRIPS 2017
Method (Unsupervised Meta-training)
Then we derive the variational lower bound as follows:
Method (Unsupervised Meta-training)
Finally, we estimate the variational lowerbound with Monte Carlo estimation.
where N is the size of Monte Carlo samples.
Method (Unsupervised Meta-training)
In our setting, the prior parameter characterizes the given dataset.
To obtain the parameter that optimally explain the given dataset, we propose to locally
maximize the lowerbound as follows:
where
This leads to the MLE solution of the prior distribution.
Method (Unsupervised Meta-training)
However, we do not have an analytic MLE solution of GMM.
To this end, we propose to obtain optimal using EM algorithm as follows:
where we tie the covariance matrix with identity matrix.
Method (Unsupervised Meta-training)
Then the training objective of Meta-GMVAE is as follows:
where is obtained by performing the EM algorithm on the latent MC samples.
Method (Supervised Meta-test)
However, there is no guarantee that each modality obtained by EM algorithm
corresponds to the label.
To overcome this, we use support set as observations of EM algorithm.
A graphical illustration of
predicting labels
A graphical illustration of Meta-GMVAE
Method (Supervised Meta-test)
Then, we perform semi-supervised EM to obtain prior parameters as follows:
where the support set is considered as labeled data.
Method (Supervised Meta-test)
Finally, we predict labels of query set using the aggregated posterior as follows:
where we reuse the Monte Carlo samples.
Method (SimCLR)
The ability to generate samples may not be necessary for discriminative tasks.
Moreover, it is known to be challenging to train generative models on real data.
Therefore, we propose to use features pretrained by SimCLR [8] as an input.
[8] Ting Chen et al: “A Simple Framework for Constrastive Learning of Visual Representations”, ICML 2020
A graphical illustration of SimCLR Image recognition accuracy on ImageNet
Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
1) Omniglot dataset
28 x 28 resolution
grayscale
1200 meta-training classes
323 meta-test classes
Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
2) Mini-ImageNet dataset
84 x 84 resolution
RGB
64 meta-training classes
20 meta-test classes
Experimental Setups (Baselines)
We compare our Meta-GMVAE with four baselines as follows:
1) Prototypical Networks (oracle) [1]: metric-based meta-learning with supervision.
2) MAML (oracle) [2]: gradient-based meta-learning with supervision.
3) CACTUS [3]: constructing pseudo tasks by clustering on deep embedding space.
4) UMTRA [4]: constructing pseudo tasks using augmentation.
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
[3] Kyle Hsu, Sergey Levine, and Chelsea Finn: “Unsupervised Learning vis Meta-Learning”, ICLR 2018
[4] Siavash Khodadadeh, Ladislau Boloni, and Mubarak Shaloni: “Unsupervised Meta-Learning for Few-shot Classification”, NeuRIPS 2019
Experimental Results (Few-shot Classification)
Meta-GMVAE obtains better performance than existing unsupervised methods in 5
settings and match the performance in 3 settings.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
Experimental Results (Few-shot Classification)
Interestingly, Meta-GMVAE even obtains better performance than MAML on a certain
setting (i.e., (5,1) on Omniglot), while utilizing as small as 0.1% of labels.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
Experimental Results (Visualization)
The below figures show how Meta-GMVAE learns and realizes class-concept.
At meta-train, Meta-GMVAE captures the similar visual structure but not class-concept.
However, it easily realizes the class-concept at meta-test.
The samples obtained and generated for each mode of Meta-GMVAE
Experimental Results (Ablation Study)
We conduct ablation study on Meta-GMVAE by eliminating each component.
The below shows that all the components are critical to the performance.
The results of ablation study on Meta-GMVAE
Experimental Results (Cross-way)
In more realistic settings, we can not know the way at meta-test time.
Meta-GMVAE also shows its robustness on the cross-way classification experiments.
The results of cross-way 1-shot experiments on Omniglot dataset
Experimental Results (Cross-way Visualization)
We further visualize the latent space for the cross-shot generalization experiment.
The below figure shows that Meta-GMVAE trained 20-way can cluster 5-way task.
The visualization of the latent space
Conclusion
1. We propose a principled meta-learning model, namely Meta-GMVAE, which meta-
learns the set-conditioned prior and posterior network for a VAE.
2. We propose to learn the multi-modal structure of a given dataset with the Gaussian
mixture prior, such that it can adapt to a novel dataset via the EM algorithm.
3. We show that Meta-GMVAE largely outperforms relevant unsupervised meta-
learning baselines on two benchmark datasets, while obtaining even better
performance than a supervised meta-learning model under a specific setting.

More Related Content

PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Introduction to Few shot learning
PDF
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
PDF
ViT (Vision Transformer) Review [CDM]
PPTX
Introduction to continual learning
PPTX
State of transformers in Computer Vision
PDF
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
PPTX
Generative Adversarial Networks (GANs)
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Introduction to Few shot learning
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
ViT (Vision Transformer) Review [CDM]
Introduction to continual learning
State of transformers in Computer Vision
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Generative Adversarial Networks (GANs)

What's hot (20)

PPTX
Transformers In Vision From Zero to Hero (DLI).pptx
PDF
Continual Learning with Deep Architectures - Tutorial ICML 2021
PPTX
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
PDF
Score based generative model
PPTX
Image Classification using deep learning
PDF
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PDF
Pixel Recurrent Neural Networks
PDF
continual learning survey
PPTX
Transfer learning-presentation
PPTX
Introduction to Deep Learning
PPTX
Disentangled Representation Learning of Deep Generative Models
PPTX
機械学習と深層学習入門
PDF
An introduction to computer vision with Hugging Face
PDF
Transformer in Computer Vision
PDF
World model
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PDF
Graph neural networks overview
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PDF
Deep learning and neural networks (using simple mathematics)
Transformers In Vision From Zero to Hero (DLI).pptx
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Score based generative model
Image Classification using deep learning
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
Pixel Recurrent Neural Networks
continual learning survey
Transfer learning-presentation
Introduction to Deep Learning
Disentangled Representation Learning of Deep Generative Models
機械学習と深層学習入門
An introduction to computer vision with Hugging Face
Transformer in Computer Vision
World model
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Graph neural networks overview
Reinventing Deep Learning
 with Hugging Face Transformers
Deep learning and neural networks (using simple mathematics)
Ad

Similar to Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning (20)

PDF
Learning with Relative Attributes
PPTX
Ppt on Regularization, batch normamalization.pptx
PDF
Evaluation of a hybrid method for constructing multiple SVM kernels
PDF
Meta Dropout: Learning to Perturb Latent Features for Generalization
PPTX
A scalable collaborative filtering framework based on co-clustering
 
PDF
TWCC22_PPT_v3_KL.pdf quantum computer, quantum computer , quantum computer ,...
PDF
A simple framework for contrastive learning of visual representations
PDF
A Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
PDF
Deep learning-practical
DOCX
Learning to rank image tags with limited training examples
PDF
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
PDF
Convolutional auto-encoded extreme learning machine for incremental learning ...
DOCX
SVM & MLP on Matlab program
PPTX
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PDF
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
DOCX
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...
DOCX
Learning to rank image tags with limited
DOCX
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
PDF
IEEE 2015 Matlab Projects
PDF
Segmentation of Images by using Fuzzy k-means clustering with ACO
Learning with Relative Attributes
Ppt on Regularization, batch normamalization.pptx
Evaluation of a hybrid method for constructing multiple SVM kernels
Meta Dropout: Learning to Perturb Latent Features for Generalization
A scalable collaborative filtering framework based on co-clustering
 
TWCC22_PPT_v3_KL.pdf quantum computer, quantum computer , quantum computer ,...
A simple framework for contrastive learning of visual representations
A Low Rank Mechanism to Detect and Achieve Partially Completed Image Tags
Deep learning-practical
Learning to rank image tags with limited training examples
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Convolutional auto-encoded extreme learning machine for incremental learning ...
SVM & MLP on Matlab program
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES - IEEE PROJECTS I...
Learning to rank image tags with limited
LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
IEEE 2015 Matlab Projects
Segmentation of Images by using Fuzzy k-means clustering with ACO
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
PDF
Representational Continuity for Unsupervised Continual Learning
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
PDF
Skill-Based Meta-Reinforcement Learning
PDF
Edge Representation Learning with Hypergraphs
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PDF
Adversarial Self-Supervised Contrastive Learning
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Coreset Selection for Rehearsal-based Continual Learning
Representational Continuity for Unsupervised Continual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Skill-Based Meta-Reinforcement Learning
Edge Representation Learning with Hypergraphs
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Accurate Learning of Graph Representations with Graph Multiset Pooling
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Adversarial Self-Supervised Contrastive Learning
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Cost-effective Interactive Attention Learning with Neural Attention Process
Adversarial Neural Pruning with Latent Vulnerability Suppression

Recently uploaded (20)

PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Five Habits of High-Impact Board Members
PPT
What is a Computer? Input Devices /output devices
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
The various Industrial Revolutions .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Unlock new opportunities with location data.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PPT
Geologic Time for studying geology for geologist
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Getting Started with Data Integration: FME Form 101
A review of recent deep learning applications in wood surface defect identifi...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Benefits of Physical activity for teenagers.pptx
Five Habits of High-Impact Board Members
What is a Computer? Input Devices /output devices
Taming the Chaos: How to Turn Unstructured Data into Decisions
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
The various Industrial Revolutions .pptx
Zenith AI: Advanced Artificial Intelligence
Unlock new opportunities with location data.pdf
observCloud-Native Containerability and monitoring.pptx
Geologic Time for studying geology for geologist
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A comparative study of natural language inference in Swahili using monolingua...
A contest of sentiment analysis: k-nearest neighbor versus neural network
WOOl fibre morphology and structure.pdf for textiles
Group 1 Presentation -Planning and Decision Making .pptx
Developing a website for English-speaking practice to English as a foreign la...
Getting Started with Data Integration: FME Form 101

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning

  • 1. Meta-GMVAE: Mixture of Gaussian VAEs for Unsupervised Meta-Learning Dong Bok Lee1, Dongchan Min1, Seanie Lee1, and Sung Ju Hwang1,2 KAIST1, AITRICS2, Seoul
  • 2. Introduction Unsupervised learning aims to learn meaningful representations from unlabeled data that can be transferred to down-stream tasks. Image Unsupervised Learning Image recognition … Image segmentation Representation
  • 3. Introduction Meta-learning shares the spirit of unsupervised learning in that they seek to learn more effective learning procedure than learning from scratch. Unsupervised Learning Image recognition … Image segmentation Representation Meta-Learning Task1: Tiger vs Cat … Task2: Car vs Bicycle Model … …
  • 4. Introduction The fundamental difference of the two is that the most meta-learning approaches are supervised, assuming full access to the labels. Prototypical Network [1] [1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017 [2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018 MAML [2] Supervision
  • 5. Introduction Due to the assumption of supervision, existing meta-learning methods have limitation: they requires massive amounts of human efforts on labeling. Labeling Cat Dog Car
  • 6. Introduction To overcome this, two recent works have proposed unsupervised meta-learning: Unlabeled dataset Unsupervised Meta-training Supervised Meta-test Task 1 Support Set Task N Support Set … Query Set Query Set
  • 7. Introduction They focus on constructing supervised meta-training dataset by pseudo labeling with clustering and augmentation. Unlabeled dataset Unsupervised Meta-training Supervised Meta-test Task 1 Support Set Task N Support Set … Query Set Query Set Pseudo Labeling
  • 8. Method In this work, we have focused on developing a principled unsupervised meta-learning method, namely Meta-GMVAE. The main idea is to bridge the gap between the process of unsupervised meta-training and that of supervised meta-test.
  • 9. Method (Unsupervised Meta-training) Specifically, we start from Variational Autoencoder [5], where its prior is modelled with Gaussian Mixtures. The assumption is that each modality can represent a label at meta-test. Its generative process is as follows: 1. 2. 3. [5] Diederak P. Kingma, and Max Welling: “Auto-encoding Varitional Bayes”, ICLR 2013 A graphical illustration of GMVAE
  • 10. Method (Unsupervised Meta-training) However, the difference from the previous work on GMVAE [6] is that they fix the prior parameters since they target a single task learning. [6] Nat Dilokthanakul et al: “Deep unsupervised clustering with Gaussian Mixture Variational Autoencoder”, arXiv 2016 1. 2. 3. A graphical illustration of GMVAE This part is fixed
  • 11. Method (Unsupervised Meta-training) To learn the set-dependent multi-modalities, we assume that there exists a parameter for each episode dataset which is randomly drawn. A graphical illustration of Meta-GMVAE A graphical illustration of GMVAE
  • 12. Method (Unsupervised Meta-training) Then we derive the following variational lower bound for the marginal log-likelihood:
  • 13. Method (Unsupervised Meta-training) Here we use i.i.d assumption on the data log-likelihood: where the number of datapoints is 𝑀.
  • 14. Method (Unsupervised Meta-training) Then we introduce Gaussian Mixture prior and set-dependent variational posterior: where z is the latent variable.
  • 15. Method (Unsupervised Meta-training) Here we model set-dependent posterior to encode each data within the given dataset into the latent space. Specifically, this is implemented by TransfomerEncoder proposed by Vaswani et al. [7]. [7] Ashish Vaswani et al: “Attention is All You Need”, NeuRIPS 2017
  • 16. Method (Unsupervised Meta-training) Then we derive the variational lower bound as follows:
  • 17. Method (Unsupervised Meta-training) Finally, we estimate the variational lowerbound with Monte Carlo estimation. where N is the size of Monte Carlo samples.
  • 18. Method (Unsupervised Meta-training) In our setting, the prior parameter characterizes the given dataset. To obtain the parameter that optimally explain the given dataset, we propose to locally maximize the lowerbound as follows: where This leads to the MLE solution of the prior distribution.
  • 19. Method (Unsupervised Meta-training) However, we do not have an analytic MLE solution of GMM. To this end, we propose to obtain optimal using EM algorithm as follows: where we tie the covariance matrix with identity matrix.
  • 20. Method (Unsupervised Meta-training) Then the training objective of Meta-GMVAE is as follows: where is obtained by performing the EM algorithm on the latent MC samples.
  • 21. Method (Supervised Meta-test) However, there is no guarantee that each modality obtained by EM algorithm corresponds to the label. To overcome this, we use support set as observations of EM algorithm. A graphical illustration of predicting labels A graphical illustration of Meta-GMVAE
  • 22. Method (Supervised Meta-test) Then, we perform semi-supervised EM to obtain prior parameters as follows: where the support set is considered as labeled data.
  • 23. Method (Supervised Meta-test) Finally, we predict labels of query set using the aggregated posterior as follows: where we reuse the Monte Carlo samples.
  • 24. Method (SimCLR) The ability to generate samples may not be necessary for discriminative tasks. Moreover, it is known to be challenging to train generative models on real data. Therefore, we propose to use features pretrained by SimCLR [8] as an input. [8] Ting Chen et al: “A Simple Framework for Constrastive Learning of Visual Representations”, ICML 2020 A graphical illustration of SimCLR Image recognition accuracy on ImageNet
  • 25. Experimental Setups (Dataset) We ran experiment using two benchmark datasets: 1) Omniglot dataset 28 x 28 resolution grayscale 1200 meta-training classes 323 meta-test classes
  • 26. Experimental Setups (Dataset) We ran experiment using two benchmark datasets: 2) Mini-ImageNet dataset 84 x 84 resolution RGB 64 meta-training classes 20 meta-test classes
  • 27. Experimental Setups (Baselines) We compare our Meta-GMVAE with four baselines as follows: 1) Prototypical Networks (oracle) [1]: metric-based meta-learning with supervision. 2) MAML (oracle) [2]: gradient-based meta-learning with supervision. 3) CACTUS [3]: constructing pseudo tasks by clustering on deep embedding space. 4) UMTRA [4]: constructing pseudo tasks using augmentation. [1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017 [2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018 [3] Kyle Hsu, Sergey Levine, and Chelsea Finn: “Unsupervised Learning vis Meta-Learning”, ICLR 2018 [4] Siavash Khodadadeh, Ladislau Boloni, and Mubarak Shaloni: “Unsupervised Meta-Learning for Few-shot Classification”, NeuRIPS 2019
  • 28. Experimental Results (Few-shot Classification) Meta-GMVAE obtains better performance than existing unsupervised methods in 5 settings and match the performance in 3 settings. The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
  • 29. Experimental Results (Few-shot Classification) Interestingly, Meta-GMVAE even obtains better performance than MAML on a certain setting (i.e., (5,1) on Omniglot), while utilizing as small as 0.1% of labels. The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
  • 30. Experimental Results (Visualization) The below figures show how Meta-GMVAE learns and realizes class-concept. At meta-train, Meta-GMVAE captures the similar visual structure but not class-concept. However, it easily realizes the class-concept at meta-test. The samples obtained and generated for each mode of Meta-GMVAE
  • 31. Experimental Results (Ablation Study) We conduct ablation study on Meta-GMVAE by eliminating each component. The below shows that all the components are critical to the performance. The results of ablation study on Meta-GMVAE
  • 32. Experimental Results (Cross-way) In more realistic settings, we can not know the way at meta-test time. Meta-GMVAE also shows its robustness on the cross-way classification experiments. The results of cross-way 1-shot experiments on Omniglot dataset
  • 33. Experimental Results (Cross-way Visualization) We further visualize the latent space for the cross-shot generalization experiment. The below figure shows that Meta-GMVAE trained 20-way can cluster 5-way task. The visualization of the latent space
  • 34. Conclusion 1. We propose a principled meta-learning model, namely Meta-GMVAE, which meta- learns the set-conditioned prior and posterior network for a VAE. 2. We propose to learn the multi-modal structure of a given dataset with the Gaussian mixture prior, such that it can adapt to a novel dataset via the EM algorithm. 3. We show that Meta-GMVAE largely outperforms relevant unsupervised meta- learning baselines on two benchmark datasets, while obtaining even better performance than a supervised meta-learning model under a specific setting.