FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

FickleNet: Weakly and Semi-supervised Semantic
Image Segmentation using Stochastic Inference
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
SNU, Korea | CVPR 2019
2020.03.22

Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents

FickleNet
Introduction – Limitation of Prior Works
• Semantic segmentation in real life requires a large variety of object classes and labeled data
• Current weakly supervised segmentation methods show inferior results to fully supervised
segmentation
• Main obstacle to weakly supervised semantic image segmentation is the obtaining pixel-level
information(Locations or Boundaries)
• Most weakly supervised segmentation methods depend on localization maps obtained by a
classification network.
• These localization maps focus only on the small discriminative parts of objects(Hard to locate
boundaries)
Introduction / Related Work / Methods and Experiments / Conclusion

FickleNet
Introduction – FickleNet
• Generate variety of localization maps from a single image using random combinations of
hidden units in CNN
• Chooses hidden units at random for each sliding window position
(Similar to Dropout technique)
• Random selection of hidden units (stochastic approach) produces regions of different shapes
• Many existing studies used stochastic regularization in their training process(e.g. Dropout), but
not in inference phase.

FickleNet
Introduction – FickleNet
Multiple Localization
Maps
Hidden Unit
Selection

FickleNet
Introduction – Contributions
• FickleNet discovers the relationship between locations in an image
and enlarges the regions activated by the classifier.
• Introduce method of expanding feature maps which makes the
model work faster with only a small cost in GPU memory.
• FickleNet achieved SOTA performance on the PASCAL VOC 2012
benchmark in both weakly and semi supervised settings

Related Work
Image Level Processing
• Class Activation Map (CAM) is a good starting point for the
classification of pixels from image-level annotations
• CAM discovers the contribution of each hidden unit in NN, but
it tends to focus on the small discriminative region of a target.

Related Work
Feature Level Processing
• Multi-dilated convolution(MDC) uses several convolutional blocks,
dilated at different rates, and aggrates CAMs obtained from each block
that resembles ensemble learning
• Dilation rates are limited
• Standard dilated convolution is square with a fixed size, so MDC tends to
identify false positive regions

Related Work
Region Growing
• SRG (Seed Growing Region)

Related Work
Region Growing
• DSRG(Deep Seed Growing Region)
→ Seeds for region growing are obtained from CAM
→ VGG for classification network
→ DeepLab-ASPP for the segmentation network
→Seeds only come from discriminative parts of objects, difficult to grow into
non-discrimative parts.

Methods and Experiments
Stochastic Hidden Unit Selection
• Randomly select hidden units, to associate a non-discriminative part of an
object with a discriminative part of the same object.

Stochastic Hidden Unit Selection - Feature Map Expansion
• Apply spatial dropout to the feature X at each sliding window position.
• Different from standard dropout technique, which only samples hidden units in the feature
maps once.
• This method of selecting hidden units can generate receptive fields of many different shapes
and sizes
• Calling convolution function and dropout function w x h times in each forwarding pass is very
inefficient
• Therefore, expand feature maps so that no sliding window positions overlap

Stochastic Hidden Unit Selection – Center preserving spatial dropout
• Do not drop the center of the kernel of each sliding window
position
• Relationships between kernel center and other locations in each
stride can be found this way

Inference Localization Map
• Use gradient based CAM(Grad-CAM), which is a generalization of
class activation map(CAM)
• Grad-CAM discovers the class specific contribution of each hidden
unit to the classification score from gradient flow
• From the final output feature map, apply global average
pooling(GAP) and sigmoid function to obtain classification score

Inference Localization Map – Aggregate localization map
• FickleNet constructs N different localization maps from a single image and
aggregate them into a single localization map.

Inference Localization Map – Training Process
• Localization map provides pseudo-label to train a semantic image
segmentation network
• Use same background cues as DSRG
• Using aggregated map as a seed, apply region growing method based on the
probabilities obtained from the segmentation network.
Segmentation
Network
Aggregated
Map

FickleNet – Experimental Setup
• Dataset – PASCAL VOC 2012 image segmentation
(21 object classes / 10,582 training images with image-level annotation)
• Based on VGG-16 network pre-trained using the ImageNet
(modified by removing all fc layers and the last pooling layer)
• Segmentation is performed by DSRG, based on Deeplab-CRF
• Set the number of different localization maps to 200

FickleNet – Weakly Supervised Semantic Segmentation

FickleNet – Weakly Supervised Semantic Segmentation with ResNet

FickleNet – Semi Supervised Semantic Segmentation with ResNet

FickleNet – Semi and Weakly Supervised Semantic Segmentation

Ablation Study
1. Effects of the Map Expansion Technique
• Training and CAM extraction times are reduced factors of 15.4
and 14.2, at a cost of 12% in GPU memory use

Ablation Study
2. Iterative Inference and Dropout Rate
• Additional random selection identifies more regions of a target object
• The segmentation performance converge as N increases
• Dropout rate of 0.9 allows FickleNet to cover larger regions of the target object
than DSRG – More randomness, more non-discriminative parts

Ablation Study
3. Comparison to General Dropout
• Hidden unit in FickleNet may be activated at some window positions and dropped
at others so that every hidden unit is able to affect the classification score

Conclusion
• Addressed the problem of semantic image segmentation using only
image-level annotations
• Obtain many different localization maps and aggregate those maps into
a single localization map
• Implemented efficiently by expanding the feature maps
• Results of FickleNet on both weakly supervised and semi supervised
segmentation are better than those produced by other SOTAs

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

More Related Content

What's hot (17)

Similar to FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference (20)

More from Seunghyun Hwang (14)

Recently uploaded (20)

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference