SlideShare a Scribd company logo
FickleNet: Weakly and Semi-supervised Semantic
Image Segmentation using Stochastic Inference
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
SNU, Korea | CVPR 2019
2020.03.22
Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents
FickleNet
Introduction – Limitation of Prior Works
• Semantic segmentation in real life requires a large variety of object classes and labeled data
• Current weakly supervised segmentation methods show inferior results to fully supervised
segmentation
• Main obstacle to weakly supervised semantic image segmentation is the obtaining pixel-level
information(Locations or Boundaries)
• Most weakly supervised segmentation methods depend on localization maps obtained by a
classification network.
• These localization maps focus only on the small discriminative parts of objects(Hard to locate
boundaries)
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet
Introduction – FickleNet
• Generate variety of localization maps from a single image using random combinations of
hidden units in CNN
• Chooses hidden units at random for each sliding window position
(Similar to Dropout technique)
• Random selection of hidden units (stochastic approach) produces regions of different shapes
• Many existing studies used stochastic regularization in their training process(e.g. Dropout), but
not in inference phase.
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet
Introduction – FickleNet
Introduction / Related Work / Methods and Experiments / Conclusion
Multiple Localization
Maps
Hidden Unit
Selection
FickleNet
Introduction – Contributions
Introduction / Related Work / Methods and Experiments / Conclusion
• FickleNet discovers the relationship between locations in an image
and enlarges the regions activated by the classifier.
• Introduce method of expanding feature maps which makes the
model work faster with only a small cost in GPU memory.
• FickleNet achieved SOTA performance on the PASCAL VOC 2012
benchmark in both weakly and semi supervised settings
Related Work
Image Level Processing
Introduction / Related Work / Methods and Experiments / Conclusion
• Class Activation Map (CAM) is a good starting point for the
classification of pixels from image-level annotations
• CAM discovers the contribution of each hidden unit in NN, but
it tends to focus on the small discriminative region of a target.
Related Work
Feature Level Processing
Introduction / Related Work / Methods and Experiments / Conclusion
• Multi-dilated convolution(MDC) uses several convolutional blocks,
dilated at different rates, and aggrates CAMs obtained from each block
that resembles ensemble learning
• Dilation rates are limited
• Standard dilated convolution is square with a fixed size, so MDC tends to
identify false positive regions
Related Work
Region Growing
Introduction / Related Work / Methods and Experiments / Conclusion
• SRG (Seed Growing Region)
Related Work
Region Growing
Introduction / Related Work / Methods and Experiments / Conclusion
• DSRG(Deep Seed Growing Region)
→ Seeds for region growing are obtained from CAM
→ VGG for classification network
→ DeepLab-ASPP for the segmentation network
→Seeds only come from discriminative parts of objects, difficult to grow into
non-discrimative parts.
Methods and Experiments
Stochastic Hidden Unit Selection
Introduction / Related Work / Methods and Experiments / Conclusion
• Randomly select hidden units, to associate a non-discriminative part of an
object with a discriminative part of the same object.
Methods and Experiments
Stochastic Hidden Unit Selection - Feature Map Expansion
Introduction / Related Work / Methods and Experiments / Conclusion
• Apply spatial dropout to the feature X at each sliding window position.
• Different from standard dropout technique, which only samples hidden units in the feature
maps once.
• This method of selecting hidden units can generate receptive fields of many different shapes
and sizes
• Calling convolution function and dropout function w x h times in each forwarding pass is very
inefficient
• Therefore, expand feature maps so that no sliding window positions overlap
Methods and Experiments
Stochastic Hidden Unit Selection – Center preserving spatial dropout
Introduction / Related Work / Methods and Experiments / Conclusion
• Do not drop the center of the kernel of each sliding window
position
• Relationships between kernel center and other locations in each
stride can be found this way
Methods and Experiments
Inference Localization Map
Introduction / Related Work / Methods and Experiments / Conclusion
• Use gradient based CAM(Grad-CAM), which is a generalization of
class activation map(CAM)
• Grad-CAM discovers the class specific contribution of each hidden
unit to the classification score from gradient flow
• From the final output feature map, apply global average
pooling(GAP) and sigmoid function to obtain classification score
Methods and Experiments
Inference Localization Map – Aggregate localization map
Introduction / Related Work / Methods and Experiments / Conclusion
• FickleNet constructs N different localization maps from a single image and
aggregate them into a single localization map.
Methods and Experiments
Inference Localization Map – Training Process
Introduction / Related Work / Methods and Experiments / Conclusion
• Localization map provides pseudo-label to train a semantic image
segmentation network
• Use same background cues as DSRG
• Using aggregated map as a seed, apply region growing method based on the
probabilities obtained from the segmentation network.
Segmentation
Network
Aggregated
Map
Methods and Experiments
FickleNet – Experimental Setup
Introduction / Related Work / Methods and Experiments / Conclusion
• Dataset – PASCAL VOC 2012 image segmentation
(21 object classes / 10,582 training images with image-level annotation)
• Based on VGG-16 network pre-trained using the ImageNet
(modified by removing all fc layers and the last pooling layer)
• Segmentation is performed by DSRG, based on Deeplab-CRF
• Set the number of different localization maps to 200
Methods and Experiments
FickleNet – Weakly Supervised Semantic Segmentation
Introduction / Related Work / Methods and Experiments / Conclusion
Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Weakly Supervised Semantic Segmentation with ResNet
Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Semi Supervised Semantic Segmentation with ResNet
Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Semi and Weakly Supervised Semantic Segmentation
Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
1. Effects of the Map Expansion Technique
• Training and CAM extraction times are reduced factors of 15.4
and 14.2, at a cost of 12% in GPU memory use
Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
2. Iterative Inference and Dropout Rate
• Additional random selection identifies more regions of a target object
• The segmentation performance converge as N increases
• Dropout rate of 0.9 allows FickleNet to cover larger regions of the target object
than DSRG – More randomness, more non-discriminative parts
Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
3. Comparison to General Dropout
• Hidden unit in FickleNet may be activated at some window positions and dropped
at others so that every hidden unit is able to affect the classification score
Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Addressed the problem of semantic image segmentation using only
image-level annotations
• Obtain many different localization maps and aggregate those maps into
a single localization map
• Implemented efficiently by expanding the feature maps
• Results of FickleNet on both weakly supervised and semi supervised
segmentation are better than those produced by other SOTAs

More Related Content

PPTX
Fast AutoAugment
PPTX
Flag segmentation, feature extraction & identification using support vector m...
PPTX
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PDF
Modeling and optimization of high index contrast gratings with aperiodic topo...
PDF
Design and optimization of compact freeform lens array for laser beam splitti...
PDF
DeepStrip: High Resolution Boundary Refinement
PDF
How useful is self-supervised pretraining for Visual tasks?
PDF
Unsupervised semi-supervised object detection
Fast AutoAugment
Flag segmentation, feature extraction & identification using support vector m...
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
Modeling and optimization of high index contrast gratings with aperiodic topo...
Design and optimization of compact freeform lens array for laser beam splitti...
DeepStrip: High Resolution Boundary Refinement
How useful is self-supervised pretraining for Visual tasks?
Unsupervised semi-supervised object detection

What's hot (17)

PDF
Passive stereo vision with deep learning
PDF
SINGLE IMAGE SUPER RESOLUTION IN SPATIAL AND WAVELET DOMAIN
PPT
The single image dehazing based on efficient transmission estimation
PDF
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
PDF
High quality single shot capture of facial geometry
PDF
3-d interpretation from single 2-d image for autonomous driving
PDF
B04410814
PPTX
Band ratioing presentation
PDF
Multispectral Satellite Color Image Segmentation Using Fuzzy Based Innovative...
PPT
regions
PPTX
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
PDF
LiDAR-based Autonomous Driving III (by Deep Learning)
PDF
Outsourcing the Design & Manufacturing of Projection Engines for 3D Metrology...
PDF
Stereo matching based on absolute differences for multiple objects detection
PDF
A Survey on Exemplar-Based Image Inpainting Techniques
PDF
Fisheye Omnidirectional View in Autonomous Driving
PDF
FutureTech 2010
Passive stereo vision with deep learning
SINGLE IMAGE SUPER RESOLUTION IN SPATIAL AND WAVELET DOMAIN
The single image dehazing based on efficient transmission estimation
Segmentation of Color Image using Adaptive Thresholding and Masking with Wate...
High quality single shot capture of facial geometry
3-d interpretation from single 2-d image for autonomous driving
B04410814
Band ratioing presentation
Multispectral Satellite Color Image Segmentation Using Fuzzy Based Innovative...
regions
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable ...
LiDAR-based Autonomous Driving III (by Deep Learning)
Outsourcing the Design & Manufacturing of Projection Engines for 3D Metrology...
Stereo matching based on absolute differences for multiple objects detection
A Survey on Exemplar-Based Image Inpainting Techniques
Fisheye Omnidirectional View in Autonomous Driving
FutureTech 2010
Ad

Similar to FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference (20)

PDF
Intelligent Multimedia Recommendation
PDF
Cs231n 2017 lecture12 Visualizing and Understanding
PPT
Fcv bio cv_cottrell
PPT
Fcv bio cv_cottrell
PDF
Final Poster
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PDF
ResNeSt: Split-Attention Networks
PDF
Introduction talk to Computer Vision
PPTX
Deep Learning in Computer Vision
PDF
Leveraging high level and low-level features for multimedia event detection.2...
PPTX
Deep Computer Vision - 1.pptx
PDF
Flickr Image Classification using SIFT Algorism
PDF
Computer Vision
PDF
Computer vision for transportation
PDF
Fcv learn le_cun
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
PPTX
Weave-D - 2nd Progress Evaluation Presentation
PDF
A Simple Framework for Contrastive Learning of Visual Representations
PDF
Non-Local Means and its Applications
Intelligent Multimedia Recommendation
Cs231n 2017 lecture12 Visualizing and Understanding
Fcv bio cv_cottrell
Fcv bio cv_cottrell
Final Poster
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
ResNeSt: Split-Attention Networks
Introduction talk to Computer Vision
Deep Learning in Computer Vision
Leveraging high level and low-level features for multimedia event detection.2...
Deep Computer Vision - 1.pptx
Flickr Image Classification using SIFT Algorism
Computer Vision
Computer vision for transportation
Fcv learn le_cun
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Weave-D - 2nd Progress Evaluation Presentation
A Simple Framework for Contrastive Learning of Visual Representations
Non-Local Means and its Applications
Ad

More from Seunghyun Hwang (14)

PDF
An annotation sparsification strategy for 3D medical image segmentation via r...
PDF
Do wide and deep networks learn the same things? Uncovering how neural networ...
PPTX
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
PDF
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
PDF
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
PDF
End-to-End Object Detection with Transformers
PDF
Deep Generative model-based quality control for cardiac MRI segmentation
PDF
Segmenting Medical MRI via Recurrent Decoding Cell
PDF
Progressive learning and Disentanglement of hierarchical representations
PDF
Learning Sparse Networks using Targeted Dropout
PDF
Your Classifier is Secretly an Energy based model and you should treat it lik...
PPTX
A Probabilistic U-Net for Segmentation of Ambiguous Images
PDF
Mix Conv: Mixed Depthwise Convolutional Kernels
PDF
Large Scale GAN Training for High Fidelity Natural Image Synthesis
An annotation sparsification strategy for 3D medical image segmentation via r...
Do wide and deep networks learn the same things? Uncovering how neural networ...
Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...
Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
End-to-End Object Detection with Transformers
Deep Generative model-based quality control for cardiac MRI segmentation
Segmenting Medical MRI via Recurrent Decoding Cell
Progressive learning and Disentanglement of hierarchical representations
Learning Sparse Networks using Targeted Dropout
Your Classifier is Secretly an Energy based model and you should treat it lik...
A Probabilistic U-Net for Segmentation of Ambiguous Images
Mix Conv: Mixed Depthwise Convolutional Kernels
Large Scale GAN Training for High Fidelity Natural Image Synthesis

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
MIND Revenue Release Quarter 2 2025 Press Release
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference

  • 1. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference Hwang seung hyun Yonsei University Severance Hospital CCIDS SNU, Korea | CVPR 2019 2020.03.22
  • 2. Introduction Related Work Methods and Experiments 01 02 03 Conclusion 04 Yonsei Unversity Severance Hospital CCIDS Contents
  • 3. FickleNet Introduction – Limitation of Prior Works • Semantic segmentation in real life requires a large variety of object classes and labeled data • Current weakly supervised segmentation methods show inferior results to fully supervised segmentation • Main obstacle to weakly supervised semantic image segmentation is the obtaining pixel-level information(Locations or Boundaries) • Most weakly supervised segmentation methods depend on localization maps obtained by a classification network. • These localization maps focus only on the small discriminative parts of objects(Hard to locate boundaries) Introduction / Related Work / Methods and Experiments / Conclusion
  • 4. FickleNet Introduction – FickleNet • Generate variety of localization maps from a single image using random combinations of hidden units in CNN • Chooses hidden units at random for each sliding window position (Similar to Dropout technique) • Random selection of hidden units (stochastic approach) produces regions of different shapes • Many existing studies used stochastic regularization in their training process(e.g. Dropout), but not in inference phase. Introduction / Related Work / Methods and Experiments / Conclusion
  • 5. FickleNet Introduction – FickleNet Introduction / Related Work / Methods and Experiments / Conclusion Multiple Localization Maps Hidden Unit Selection
  • 6. FickleNet Introduction – Contributions Introduction / Related Work / Methods and Experiments / Conclusion • FickleNet discovers the relationship between locations in an image and enlarges the regions activated by the classifier. • Introduce method of expanding feature maps which makes the model work faster with only a small cost in GPU memory. • FickleNet achieved SOTA performance on the PASCAL VOC 2012 benchmark in both weakly and semi supervised settings
  • 7. Related Work Image Level Processing Introduction / Related Work / Methods and Experiments / Conclusion • Class Activation Map (CAM) is a good starting point for the classification of pixels from image-level annotations • CAM discovers the contribution of each hidden unit in NN, but it tends to focus on the small discriminative region of a target.
  • 8. Related Work Feature Level Processing Introduction / Related Work / Methods and Experiments / Conclusion • Multi-dilated convolution(MDC) uses several convolutional blocks, dilated at different rates, and aggrates CAMs obtained from each block that resembles ensemble learning • Dilation rates are limited • Standard dilated convolution is square with a fixed size, so MDC tends to identify false positive regions
  • 9. Related Work Region Growing Introduction / Related Work / Methods and Experiments / Conclusion • SRG (Seed Growing Region)
  • 10. Related Work Region Growing Introduction / Related Work / Methods and Experiments / Conclusion • DSRG(Deep Seed Growing Region) → Seeds for region growing are obtained from CAM → VGG for classification network → DeepLab-ASPP for the segmentation network →Seeds only come from discriminative parts of objects, difficult to grow into non-discrimative parts.
  • 11. Methods and Experiments Stochastic Hidden Unit Selection Introduction / Related Work / Methods and Experiments / Conclusion • Randomly select hidden units, to associate a non-discriminative part of an object with a discriminative part of the same object.
  • 12. Methods and Experiments Stochastic Hidden Unit Selection - Feature Map Expansion Introduction / Related Work / Methods and Experiments / Conclusion • Apply spatial dropout to the feature X at each sliding window position. • Different from standard dropout technique, which only samples hidden units in the feature maps once. • This method of selecting hidden units can generate receptive fields of many different shapes and sizes • Calling convolution function and dropout function w x h times in each forwarding pass is very inefficient • Therefore, expand feature maps so that no sliding window positions overlap
  • 13. Methods and Experiments Stochastic Hidden Unit Selection – Center preserving spatial dropout Introduction / Related Work / Methods and Experiments / Conclusion • Do not drop the center of the kernel of each sliding window position • Relationships between kernel center and other locations in each stride can be found this way
  • 14. Methods and Experiments Inference Localization Map Introduction / Related Work / Methods and Experiments / Conclusion • Use gradient based CAM(Grad-CAM), which is a generalization of class activation map(CAM) • Grad-CAM discovers the class specific contribution of each hidden unit to the classification score from gradient flow • From the final output feature map, apply global average pooling(GAP) and sigmoid function to obtain classification score
  • 15. Methods and Experiments Inference Localization Map – Aggregate localization map Introduction / Related Work / Methods and Experiments / Conclusion • FickleNet constructs N different localization maps from a single image and aggregate them into a single localization map.
  • 16. Methods and Experiments Inference Localization Map – Training Process Introduction / Related Work / Methods and Experiments / Conclusion • Localization map provides pseudo-label to train a semantic image segmentation network • Use same background cues as DSRG • Using aggregated map as a seed, apply region growing method based on the probabilities obtained from the segmentation network. Segmentation Network Aggregated Map
  • 17. Methods and Experiments FickleNet – Experimental Setup Introduction / Related Work / Methods and Experiments / Conclusion • Dataset – PASCAL VOC 2012 image segmentation (21 object classes / 10,582 training images with image-level annotation) • Based on VGG-16 network pre-trained using the ImageNet (modified by removing all fc layers and the last pooling layer) • Segmentation is performed by DSRG, based on Deeplab-CRF • Set the number of different localization maps to 200
  • 18. Methods and Experiments FickleNet – Weakly Supervised Semantic Segmentation Introduction / Related Work / Methods and Experiments / Conclusion
  • 19. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion FickleNet – Weakly Supervised Semantic Segmentation with ResNet
  • 20. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion FickleNet – Semi Supervised Semantic Segmentation with ResNet
  • 21. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion FickleNet – Semi and Weakly Supervised Semantic Segmentation
  • 22. Methods and Experiments Ablation Study Introduction / Related Work / Methods and Experiments / Conclusion 1. Effects of the Map Expansion Technique • Training and CAM extraction times are reduced factors of 15.4 and 14.2, at a cost of 12% in GPU memory use
  • 23. Methods and Experiments Ablation Study Introduction / Related Work / Methods and Experiments / Conclusion 2. Iterative Inference and Dropout Rate • Additional random selection identifies more regions of a target object • The segmentation performance converge as N increases • Dropout rate of 0.9 allows FickleNet to cover larger regions of the target object than DSRG – More randomness, more non-discriminative parts
  • 24. Methods and Experiments Ablation Study Introduction / Related Work / Methods and Experiments / Conclusion 3. Comparison to General Dropout • Hidden unit in FickleNet may be activated at some window positions and dropped at others so that every hidden unit is able to affect the classification score
  • 25. Conclusion Introduction / Related Work / Methods and Experiments / Conclusion • Addressed the problem of semantic image segmentation using only image-level annotations • Obtain many different localization maps and aggregate those maps into a single localization map • Implemented efficiently by expanding the feature maps • Results of FickleNet on both weakly supervised and semi supervised segmentation are better than those produced by other SOTAs