Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning

Meta-GMVAE: Mixture of Gaussian VAEs for
Unsupervised Meta-Learning
Dong Bok Lee1, Dongchan Min1, Seanie Lee1, and Sung Ju Hwang1,2
KAIST1, AITRICS2, Seoul

Introduction
Unsupervised learning aims to learn meaningful representations from unlabeled data
that can be transferred to down-stream tasks.
Image
Unsupervised
Learning
Image recognition
…
Image segmentation
Representation

Introduction
Meta-learning shares the spirit of unsupervised learning in that they seek to learn more
effective learning procedure than learning from scratch.
Unsupervised Learning
Image recognition
…
Image segmentation
Representation
Meta-Learning
Task1: Tiger vs Cat
…
Task2: Car vs Bicycle
Model
…
…

Introduction
The fundamental difference of the two is that the most meta-learning approaches are
supervised, assuming full access to the labels.
Prototypical Network [1]
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
MAML [2]
Supervision

Introduction
Due to the assumption of supervision, existing meta-learning methods have limitation:
they requires massive amounts of human efforts on labeling.
Labeling
Cat Dog
Car

Introduction
To overcome this, two recent works have proposed unsupervised meta-learning:
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set

Introduction
They focus on constructing supervised meta-training dataset by pseudo labeling with
clustering and augmentation.
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set
Pseudo Labeling

Method
In this work, we have focused on developing a principled unsupervised meta-learning
method, namely Meta-GMVAE.
The main idea is to bridge the gap between the process of unsupervised meta-training
and that of supervised meta-test.

Method (Unsupervised Meta-training)
Specifically, we start from Variational Autoencoder [5], where its prior is modelled with
Gaussian Mixtures.
The assumption is that each modality can represent a label at meta-test.
Its generative process is as follows:
1.
2.
3.
[5] Diederak P. Kingma, and Max Welling: “Auto-encoding Varitional Bayes”, ICLR 2013
A graphical illustration of GMVAE

However, the difference from the previous work on GMVAE [6] is that they fix the prior
parameters since they target a single task learning.
[6] Nat Dilokthanakul et al: “Deep unsupervised clustering with Gaussian Mixture Variational Autoencoder”, arXiv 2016
1.
2.
3.
This part is fixed

To learn the set-dependent multi-modalities, we assume that there exists a parameter
for each episode dataset which is randomly drawn.
A graphical illustration of Meta-GMVAE

Then we derive the following variational lower bound for the marginal log-likelihood:

Here we use i.i.d assumption on the data log-likelihood:
where the number of datapoints is 𝑀.

Then we introduce Gaussian Mixture prior and set-dependent variational posterior:
where z is the latent variable.

Here we model set-dependent posterior to encode each data within the given dataset
into the latent space.
Specifically, this is implemented by TransfomerEncoder proposed by Vaswani et al. [7].
[7] Ashish Vaswani et al: “Attention is All You Need”, NeuRIPS 2017

Then we derive the variational lower bound as follows:

Finally, we estimate the variational lowerbound with Monte Carlo estimation.
where N is the size of Monte Carlo samples.

In our setting, the prior parameter characterizes the given dataset.
To obtain the parameter that optimally explain the given dataset, we propose to locally
maximize the lowerbound as follows:
where
This leads to the MLE solution of the prior distribution.

However, we do not have an analytic MLE solution of GMM.
To this end, we propose to obtain optimal using EM algorithm as follows:
where we tie the covariance matrix with identity matrix.

Then the training objective of Meta-GMVAE is as follows:
where is obtained by performing the EM algorithm on the latent MC samples.

Method (Supervised Meta-test)
However, there is no guarantee that each modality obtained by EM algorithm
corresponds to the label.
To overcome this, we use support set as observations of EM algorithm.
A graphical illustration of
predicting labels
A graphical illustration of Meta-GMVAE

Then, we perform semi-supervised EM to obtain prior parameters as follows:
where the support set is considered as labeled data.

Finally, we predict labels of query set using the aggregated posterior as follows:
where we reuse the Monte Carlo samples.

Method (SimCLR)
The ability to generate samples may not be necessary for discriminative tasks.
Moreover, it is known to be challenging to train generative models on real data.
Therefore, we propose to use features pretrained by SimCLR [8] as an input.
[8] Ting Chen et al: “A Simple Framework for Constrastive Learning of Visual Representations”, ICML 2020
A graphical illustration of SimCLR Image recognition accuracy on ImageNet

Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
1) Omniglot dataset
28 x 28 resolution
grayscale
1200 meta-training classes
323 meta-test classes

Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
2) Mini-ImageNet dataset
84 x 84 resolution
RGB
64 meta-training classes
20 meta-test classes

Experimental Setups (Baselines)
We compare our Meta-GMVAE with four baselines as follows:
1) Prototypical Networks (oracle) [1]: metric-based meta-learning with supervision.
2) MAML (oracle) [2]: gradient-based meta-learning with supervision.
3) CACTUS [3]: constructing pseudo tasks by clustering on deep embedding space.
4) UMTRA [4]: constructing pseudo tasks using augmentation.
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
[3] Kyle Hsu, Sergey Levine, and Chelsea Finn: “Unsupervised Learning vis Meta-Learning”, ICLR 2018
[4] Siavash Khodadadeh, Ladislau Boloni, and Mubarak Shaloni: “Unsupervised Meta-Learning for Few-shot Classification”, NeuRIPS 2019

Experimental Results (Few-shot Classification)
Meta-GMVAE obtains better performance than existing unsupervised methods in 5
settings and match the performance in 3 settings.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets

Experimental Results (Few-shot Classification)
Interestingly, Meta-GMVAE even obtains better performance than MAML on a certain
setting (i.e., (5,1) on Omniglot), while utilizing as small as 0.1% of labels.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets

Experimental Results (Visualization)
The below figures show how Meta-GMVAE learns and realizes class-concept.
At meta-train, Meta-GMVAE captures the similar visual structure but not class-concept.
However, it easily realizes the class-concept at meta-test.
The samples obtained and generated for each mode of Meta-GMVAE

Experimental Results (Ablation Study)
We conduct ablation study on Meta-GMVAE by eliminating each component.
The below shows that all the components are critical to the performance.
The results of ablation study on Meta-GMVAE

Experimental Results (Cross-way)
In more realistic settings, we can not know the way at meta-test time.
Meta-GMVAE also shows its robustness on the cross-way classification experiments.
The results of cross-way 1-shot experiments on Omniglot dataset

Experimental Results (Cross-way Visualization)
We further visualize the latent space for the cross-shot generalization experiment.
The below figure shows that Meta-GMVAE trained 20-way can cluster 5-way task.
The visualization of the latent space

Conclusion
1. We propose a principled meta-learning model, namely Meta-GMVAE, which meta-
learns the set-conditioned prior and posterior network for a VAE.
2. We propose to learn the multi-modal structure of a given dataset with the Gaussian
mixture prior, such that it can adapt to a novel dataset via the EM algorithm.
3. We show that Meta-GMVAE largely outperforms relevant unsupervised meta-
learning baselines on two benchmark datasets, while obtaining even better
performance than a supervised meta-learning model under a specific setting.

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning

More Related Content

What's hot (20)

Similar to Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning (20)

More from MLAI2 (20)

Recently uploaded (20)

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning