Review : Prototype Mixture Models for Few-shot Semantic Segmentation

Prototype Mixture Models
for Few-shot Semantic Segmentation
University of Chinese Academy of Sciences, Beijing, China
Yonsei University Severance Hospital CCIDS
Choi Dongmin

Abstract
• Few-shot segmentation 
- challenging 
- single prototype from the support image causes semantic ambiguity
• Prototype mixture models (PMMs) 
- correlate diverse image regions with multiple prototypes 
- leverage the semantics to activate objects in the query image 
- S.O.T.A on Pascal VOC and MS-COCO

Introduction
Nguyen et al. Feature Weighting and Boosting for Few-Shot Segmentation. ICCV 2019
Few-shot Segmentation
Segmenting the Query image based on a feature representation learned on training images
given Support images and the related segmentation Support masks

Introduction
Single Prototype Model vs Prototype Mixture Model
A single prototype causes "semantic ambiguity" and deteriorates the distribution of features.

PMMs focus on solving the semantic ambiguity problem.

Introduction
Prototype Mixture Model
Expectation-Maximization (EM) algorithm 
treats each prototype vector within the mask region as a positive sample
Mixed prototypesDiverse foreground regions

Related Works
Semantic Segmentation
Chen et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017
S.O.T.A methods : UNet, PSPNet, DeepLab

Related Works
Few-shot learning
• Metric Learning 
- train networks to predict whether two images/regions belong to the
same category
• Meta-learning 
- specify optimization or loss functions which force faster adaptation
of the parameters to new categories with few examples

• Data Augmentation 
- generate additional examples for unseen categories

Related Works
Few-shot learning
• Metric Learning 
Chen et al. A CLOSER LOOK AT FEW-SHOT CLASSIFICATION. ICLR 2019
simple prototypes for each class, which captures representative and discriminative features

Related Works
• Largely following the Metric Learning framework 
- Feed learned knowledge to a metric module to segment query images
Shaban et al. One-Shot Learning for Semantic Segmentation. BMVC 2017
OSLSM (two-branch network)
Support branch
Query branch

Related Works
Zhang et al. SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation. CoRR abs/1810.09091 (2018)
SG-One, which uses a prototype vector
Prototype vector

Related Works
Zhang et al. SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation. CoRR abs/1810.09091 (2018)
PANet w/ a prototype alignment regularization between support and query branches

Related Works
• Metric Learning in few-shot segmentation 
- A core is the prototype vector, which commonly calculated by GAP 
- However, it typically disregards the spatial extent of objects and 
tends to mix semantics from various parts 
- Using single prototypes to represent object regions and 
the semantic ambiguity problem remains unsolved

The Proposed Approach
Overview

Overview
Support branch
Query branch
Negative sample set S−
Positive sample set S+
Activate query features in a duplex way (P-Match and P-Conv)

Features is spatially partitioned into 
foreground samples and background samples , 
( : feature vectors within the mask of the support image )
S ∈ RW×H×C
S+
S−
S+

PMMs : a probability mixture model
p(si |θ) = ΣK
k=1wk pk(si |θ)
- : the mixing weights  
- : the model parameters  
- : the feature sample 
- : the base model, which is a probability model 
based on a Kernel distance function (vector distance) 
wk (0 ≤ wk ≤ 1, ΣK
k=1wk = 1)
θ
si ∈ S ith
pk(si |θ) kth
pk(si |θ) = β(θ)eKernel(si, μk)
= βc(κ)eκ μT
k si
Normalization constant
one of the parameter μk ∈ θ
κc/2−1
(2π)c/2Ic/2−1(κ)
* θ = {μ, κ}

Model Learning using EM algorithm
Eik =
pk(si |θ)
ΣK
k=1pk(si |θ)
=
eκ μT
k si
ΣK
k=1eκ μT
k si
E-step :
Given model parameters and sample features extracted, 
calculating the expectation of the sample si
μk =
ΣN
i=1Eiksi
ΣN
k=1Eik
M-step :
The expectation is used to update the mean vectors of PMMs 
( is the number of samples )N = W × H

Model Learning using EM algorithm
The mean vectors and 
are used as 
prototype vectors to extract convolution features
for the query image. 
 
Such a prototype vector can represent 
a region around an object part
μ+
= {μ+
k , k = 1, …, K}
μ−
= {μ−
k , k = 1, …, K}

PMMs as Representation (P-Match)
squeezes representation information about an object part 
and can be used to match and activate the query features  
 
μ+
Q
Q′ = P-Match(μ+
k , Q), k = 1, …, K

PMMs as Classiﬁers (P-Conv)
Each prototype vector incorporating discriminative information 
across feature channels can be seen as classiﬁer, 
which produces probability maps  
 
Mk = {M+
k , M−
k }
Mk = P-Conv(μ+
k , μ−
k , Q), k = 1, …, K

P-Match and P-Conv
The semantic info across channels and discriminative info related to object
parts are collected from the support features to activate the query featureS Q

Residual Prototype Mixture Models
Ensemble by stacking multiple PMMs 
to further enhance the model representative capacity

Experiments
• Baseline : CANet w/o iterative optimization

• Data Augmentation 
: normalization, horizontal ﬂipping, random cropping and random resizing

• Pytorch 1.0 & Nvidia 2080Ti GPUs

• The EM algorithm iterates 10 rounds

• Optimization 
: Cross-entropy Loss with SGD (init lr = 0.0035, momentum 0.9, 
200,000 iterations, 8 pairs of support-query images per batch), 
LR decay following DeepLab’s policy

• For each training step, the categories in the train split are randomly selected
and then the support-query pairs are randomly sampled in the selected
categories.
Zhang et al. CANet: Class-Agnostic Segmentation Networks with Iterative Reﬁnement and Attentive Few-Shot Learning. CVPR 2019

Chen et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI 2018

Experiments
• Dataset 
- Pascal- : 20 object categories are partitioned into 4 splits 
with 3 for training and 1 for testing 
- COCO- : 80 classes are divided into 4 splits and each contains 
20 classes and the val dataset is used for evaluation

• Evaluation Metric : mIoU
5i
20i

Conclusion
• PMMs 
- correlate diverse image regions with multiple prototype to solve the
semantic ambiguity problem 
- During training, PMMs incorporate rich channel-wised and spatial
semantics from limited support images 
- During inference, PMMs are matched with query features in a duplex
manner to perform accurate semantic segmentation 
- S.O.T.A of few-shot segmentation 
- Capture the diverse semantics of object parts given few support
examples

Review : Prototype Mixture Models for Few-shot Semantic Segmentation

More Related Content

What's hot (20)

Similar to Review : Prototype Mixture Models for Few-shot Semantic Segmentation (20)

More from Dongmin Choi (20)

Recently uploaded (20)

Review : Prototype Mixture Models for Few-shot Semantic Segmentation