SlideShare a Scribd company logo
DeconvNet, DecoupledNet,
TransferNet in Image Segmentation
NamHyuk Ahn @ Ajou Univ.
2016. 05. 11
Contents
- Semantic Segmentation
- Deconvolution Network for Supervised Learning
- Decoupled Network for Semi-Supervised Learning
- Transfer Learning in Semantic Segmentation
Semantic Segmentation
Semantic Segmentation
- Predict pixel-level label in image
- ct
[Shotton et al . 2007]
PASCAL VOC
- 20 classes
- 12K training / 1K test images

MS COCO
- 91 classes
- 120K training / 40K test

images
Datasets
Deconvolution Network for
Supervised Learning
Problems of FCN
- FCN only handle
single-scale semantic,
since it has fixed-size
receptive field
- Label map is so small,
tend to forget detail
structures of object
DeconvNet
- To address such issue, they use “deconvolution”
- Convolution Network extract features (VGG-16 net)
- Deconvolution Network generate probability map (same size
to input image)
- Probability map indicate probability each pixel belongs to one
of class
-
Deconvolution Network
- Unpooling
• Reconstruct structure of
original activation map
• Activation size is preserved,
but still sparse
- Deconvolution
• Densify sparse (enlarge)
activation map
Analysis of DeconvNet
- DeconvNet is better in segmentation since it produce
dense and enlarged pixel-wise map
- Shallow layers tend to capture overall structure of object
(shape, region, position), deep layers does complicated
patterns
- Unpooling captures example-specific structure so can
reconstruct object details in higher resolution
- Deconvolution captures class-specific shape, so closely
related to target class are amplified and noise activations
are suppresed
Analysis of DeconvNet
More details of DeconvNet
- Instance-wise segmentation
- Use batch normalization in both networks
- Two-stage training
- Ensemble with FCN
• FCN, DeconvNet are complementary relationship
• Best result
Instance-wise Segmentation
- Input proposal instances in network (not entire image)
- Get proposal instance using EdgeBox algorithm
- Identify more details of object with multi scale
- Reduce search space, so can reduce memory at train
Two-stage Training
- DeconvNet has lots of parameters, but don’t have
many segmentation data (10K in PASCAL VOC)
• Use two-stage training to address this issue
• Fist stage: Input center-cropped images
• Second stage: Input proposal sub-images
- So network generalize better
Result
- 2nd best in Pascal VOC only training
- Note: In paper they say mean IOU is 72.5, but in
presentation files, 74.8
Qualitative Example
Recap
- Possible to make dense, precise segmentation mask
since reconstruct coarse-to-fine construction
- With instance-wise segmentation, it can handle object
scale variation
- But lots of parameters (almost 2x VGG-16) 

so additional training stage is needed
Decoupled Network for Semi-
Supervised Learning
Motivation
- Make ground-truth of segmentation takes a lot of
cost so do it like semi-supervised learning
- Utilize many image-level annotation and few pixel-
level annotation
- Modify DeconvNet
- With less data (25 per class), achieve good result
(62.5 mean IOU)
Main idea
- Semantic segmentation can be decomposed to 

multi-label classification, binary segmentation
Person
Bottle
Multi-label classification Binary segmentationSemantic segmentation
Overview
- Classification network for multi-label classification
- Segmentation network for binary segmentation
- Bridging layers for delivering class-specific
information to segmentation network
Architecture
- Classification Network (Same as VGG-16)
- Segmentation Network
• Take class-specific activation map from bridge layer and do
binary segmentation (main difference with DeconvNet)
• Binary segmentation reduce parameters, so we can train with
few pixel-wise annotation data
Architecture
- Bridging Layers
• Segmentation network needs class-specific and spatial info to
produce class-specific segmentation mask
• Get spatial information from pool5 in classification network
• has useful info for shape generation, but contain mixed info
of all relevant label → identify class-specific activation
• Make saliency map to identify class-specific activation
Architecture
- Saliency Map
1. Produce score vector, set
dscore all 0 but 1 in idx
related to label that want
to track
2. Backprop to arbitrary
layer (pool5 in this paper)
- By saliency map we can get
class-specific information 

in each label (class)
Qualitative example of saliency map 

[Karen Simonyan et al,. 2014]
Architecture
- Bridging Layers
• Combine , to produce class-specific activation map
• Pass through fc layer and feed to segmentation network
• g has both spatial and class-specific information
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
Inference
- Computing segmentation map for each identified label
- Pixel-wise aggregate each segmentation map M
Training
- Train classification network with many image-level
annotation
- Train segmentation network and bridging layers with
few pixel-level annotation
Result
Qualitative Example
Recap
- Utilize many image-level annotation and few pixel-level
annotation
- Add bridging layer to DeconvNet for binary segmentation to
reduce parameter
- Bridging layer output both spatial and class-specific information
in each class (label)
- Train two networks separately (decoupled)
• Worse performance in fully-supervision since jointly optimization is
more desirable in fully-supervision
- With few strong annotated data (25 per class) achieve good
result (62.5 mean IOU)
Transfer Learning in Semantic
Segmentation
Motivation
- Pre-train network and inference to new dataset

(ex. train with MS COCO, inference to PASCAL VOC)
- This idea doesn’t work well with DecoupledNet
• DecoupledNet trained with class-specific input, so it
can’t be generalize to new class
• Train network with class-independent input!
Overview
- Attention model identify salient region of each class associated with input
image
• Output of attention model has location information of each class in
coarse feature map
- Encoder extract features; Decoder generate dense foreground
segmentation mask of each focused region
- Training stage
• Fix encoder (pre-trained) and train decoder, attention model using pixel-level
annotation from source domain
• Train attention model using image-level annotation in both domain
- After training, decoder is trained with source domain and attention is
trained with both domain so attention adapted to target domain
Overview
- Decoupled encoder-decoder make it possible to share information
for shape generation among different class
- Attention model provides
• Predictions for localization
• Class-specific information → enable to adapt decoder into target domain
- With attention model, able to get information transferable across
different domain and provide useful segmentation prior information
Architecture
- Encoder
• Extract feature descriptor as 

A is obtain from last conv layer to retain spatial information
• M, D is # of hidden unit (20x20), # of channel respectively
- Attention model
• To train weight vector , where represents
relevance of location to each class l
• Formally,
• And extra technique to reduce parameter [R. Memisevic. 2013] did
Architecture
- Attention model
• To apply attention to this model, it has to be trainable in both
domain
• Add additional layers on top of attention model, and train

both , under classification objective
• Finally, , z represents class-specific
feature
• Can optimize z using weak annotation with both domain

• Example of attention
Architecture
- Decoder
• Output of attention model is spare due to softmax, it may lost
information for shape generation
• Feed additional input A to z (multiply) → densified attention
• With densified attention, optimize segmentation loss, procedure is
same as DecoupledNet, but optimize decoder only with source domain
Analysis of TransferNet
- Decoder generates foreground segmentation of
attention to each label
- By decoupling classification (domain specific task), it
can capture class-independent information for shape
generation and apply unseen class
- Train attention model using not only pixel-level but also
image-level annotation, it can handle unseen class
• In DecoupledNet, bridging layer is trained by only pixel-level data

Train / Inference
- When train, optimize this eq
• Trained using only class label is good, but jointly train with
segmentation label to regularize noise
• After training, remove since it is required only in training to
learn attention from target domain
- Inference
1. Iteratively obtain attention and segmentation mask
2. Aggregate mask (same as DecoupledNet)
Result
Qualitative Example
Reference
- Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning
deconvolution network for semantic segmentation.” Proceedings of the
IEEE International Conference on Computer Vision. 2015.
- Seunghoon Hong, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep
neural network for semi-supervised semantic segmentation.” Advances in
Neural Information Processing Systems. 2015.
- Seunghoon Hong, et al. “Learning Transferrable Knowledge for Semantic
Segmentation with Deep Convolutional Neural Network.” arXiv preprint
arXiv:1512.07928 (2015).
- Hyeonwoo Noh. “Semantic Segmentation and Visual Question Answering”
(https://drive.google.com/file/d/0B5xl2L77gZfVRXZxQWNmSGlBemc/view)

More Related Content

PDF
Modern Convolutional Neural Network techniques for image segmentation
PDF
Case Study of Convolutional Neural Network
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PDF
#6 PyData Warsaw: Deep learning for image segmentation
PPTX
CNN and its applications by ketaki
PDF
Multimodal Residual Learning for Visual QA
PDF
Convolutional Neural Network Models - Deep Learning
PPTX
Convolutional Neural Network and RNN for OCR problem.
Modern Convolutional Neural Network techniques for image segmentation
Case Study of Convolutional Neural Network
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
#6 PyData Warsaw: Deep learning for image segmentation
CNN and its applications by ketaki
Multimodal Residual Learning for Visual QA
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network and RNN for OCR problem.

What's hot (20)

PDF
Convolutional Neural Networks : Popular Architectures
PPTX
Review-image-segmentation-by-deep-learning
PDF
PR-132: SSD: Single Shot MultiBox Detector
PDF
Understanding Convolutional Neural Networks
PDF
Deep learning
PDF
Learning Convolutional Neural Networks for Graphs
PDF
crfasrnn_presentation
PPTX
Convolutional neural network from VGG to DenseNet
PDF
Recent Object Detection Research & Person Detection
PPTX
Machine Learning - Convolutional Neural Network
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
YOLO9000 - PR023
PDF
Introduction to Convolutional Neural Networks
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
CNNs: from the Basics to Recent Advances
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PPTX
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PPTX
Deep learning lecture - part 1 (basics, CNN)
Convolutional Neural Networks : Popular Architectures
Review-image-segmentation-by-deep-learning
PR-132: SSD: Single Shot MultiBox Detector
Understanding Convolutional Neural Networks
Deep learning
Learning Convolutional Neural Networks for Graphs
crfasrnn_presentation
Convolutional neural network from VGG to DenseNet
Recent Object Detection Research & Person Detection
Machine Learning - Convolutional Neural Network
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
YOLO9000 - PR023
Introduction to Convolutional Neural Networks
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
CNNs: from the Basics to Recent Advances
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Deep learning lecture - part 1 (basics, CNN)
Ad

Similar to DeconvNet, DecoupledNet, TransferNet in Image Segmentation (20)

PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PDF
物件偵測與辨識技術
PDF
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PDF
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
PDF
A brief introduction to recent segmentation methods
PDF
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
PDF
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
PDF
Presentation roi
PPTX
Deep Learning in Computer Vision
PDF
CNN Algorithm
PPTX
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
PDF
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
IRJET- Semantic Segmentation using Deep Learning
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PPTX
Image segmentation hj_cho
Semantic segmentation with Convolutional Neural Network Approaches
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
物件偵測與辨識技術
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
The Future of Health Monitoring: Advances in Wearable Sensor Data Processing
A brief introduction to recent segmentation methods
“How Transformers Are Changing the Nature of Deep Learning Models,” a Present...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Presentation roi
Deep Learning in Computer Vision
CNN Algorithm
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET- Semantic Segmentation using Deep Learning
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Image segmentation hj_cho
Ad

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Construction Project Organization Group 2.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Lecture Notes Electrical Wiring System Components
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PPTX
UNIT 4 Total Quality Management .pptx
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Construction Project Organization Group 2.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Mechanical Engineering MATERIALS Selection
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Lecture Notes Electrical Wiring System Components
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
UNIT 4 Total Quality Management .pptx
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx

DeconvNet, DecoupledNet, TransferNet in Image Segmentation

  • 1. DeconvNet, DecoupledNet, TransferNet in Image Segmentation NamHyuk Ahn @ Ajou Univ. 2016. 05. 11
  • 2. Contents - Semantic Segmentation - Deconvolution Network for Supervised Learning - Decoupled Network for Semi-Supervised Learning - Transfer Learning in Semantic Segmentation
  • 4. Semantic Segmentation - Predict pixel-level label in image - ct [Shotton et al . 2007]
  • 5. PASCAL VOC - 20 classes - 12K training / 1K test images
 MS COCO - 91 classes - 120K training / 40K test
 images Datasets
  • 7. Problems of FCN - FCN only handle single-scale semantic, since it has fixed-size receptive field - Label map is so small, tend to forget detail structures of object
  • 8. DeconvNet - To address such issue, they use “deconvolution” - Convolution Network extract features (VGG-16 net) - Deconvolution Network generate probability map (same size to input image) - Probability map indicate probability each pixel belongs to one of class -
  • 9. Deconvolution Network - Unpooling • Reconstruct structure of original activation map • Activation size is preserved, but still sparse - Deconvolution • Densify sparse (enlarge) activation map
  • 10. Analysis of DeconvNet - DeconvNet is better in segmentation since it produce dense and enlarged pixel-wise map - Shallow layers tend to capture overall structure of object (shape, region, position), deep layers does complicated patterns - Unpooling captures example-specific structure so can reconstruct object details in higher resolution - Deconvolution captures class-specific shape, so closely related to target class are amplified and noise activations are suppresed
  • 12. More details of DeconvNet - Instance-wise segmentation - Use batch normalization in both networks - Two-stage training - Ensemble with FCN • FCN, DeconvNet are complementary relationship • Best result
  • 13. Instance-wise Segmentation - Input proposal instances in network (not entire image) - Get proposal instance using EdgeBox algorithm - Identify more details of object with multi scale - Reduce search space, so can reduce memory at train
  • 14. Two-stage Training - DeconvNet has lots of parameters, but don’t have many segmentation data (10K in PASCAL VOC) • Use two-stage training to address this issue • Fist stage: Input center-cropped images • Second stage: Input proposal sub-images - So network generalize better
  • 15. Result - 2nd best in Pascal VOC only training - Note: In paper they say mean IOU is 72.5, but in presentation files, 74.8
  • 17. Recap - Possible to make dense, precise segmentation mask since reconstruct coarse-to-fine construction - With instance-wise segmentation, it can handle object scale variation - But lots of parameters (almost 2x VGG-16) 
 so additional training stage is needed
  • 18. Decoupled Network for Semi- Supervised Learning
  • 19. Motivation - Make ground-truth of segmentation takes a lot of cost so do it like semi-supervised learning - Utilize many image-level annotation and few pixel- level annotation - Modify DeconvNet - With less data (25 per class), achieve good result (62.5 mean IOU)
  • 20. Main idea - Semantic segmentation can be decomposed to 
 multi-label classification, binary segmentation Person Bottle Multi-label classification Binary segmentationSemantic segmentation
  • 21. Overview - Classification network for multi-label classification - Segmentation network for binary segmentation - Bridging layers for delivering class-specific information to segmentation network
  • 22. Architecture - Classification Network (Same as VGG-16) - Segmentation Network • Take class-specific activation map from bridge layer and do binary segmentation (main difference with DeconvNet) • Binary segmentation reduce parameters, so we can train with few pixel-wise annotation data
  • 23. Architecture - Bridging Layers • Segmentation network needs class-specific and spatial info to produce class-specific segmentation mask • Get spatial information from pool5 in classification network • has useful info for shape generation, but contain mixed info of all relevant label → identify class-specific activation • Make saliency map to identify class-specific activation
  • 24. Architecture - Saliency Map 1. Produce score vector, set dscore all 0 but 1 in idx related to label that want to track 2. Backprop to arbitrary layer (pool5 in this paper) - By saliency map we can get class-specific information 
 in each label (class) Qualitative example of saliency map 
 [Karen Simonyan et al,. 2014]
  • 25. Architecture - Bridging Layers • Combine , to produce class-specific activation map • Pass through fc layer and feed to segmentation network • g has both spatial and class-specific information
  • 27. Inference - Computing segmentation map for each identified label - Pixel-wise aggregate each segmentation map M
  • 28. Training - Train classification network with many image-level annotation - Train segmentation network and bridging layers with few pixel-level annotation
  • 31. Recap - Utilize many image-level annotation and few pixel-level annotation - Add bridging layer to DeconvNet for binary segmentation to reduce parameter - Bridging layer output both spatial and class-specific information in each class (label) - Train two networks separately (decoupled) • Worse performance in fully-supervision since jointly optimization is more desirable in fully-supervision - With few strong annotated data (25 per class) achieve good result (62.5 mean IOU)
  • 32. Transfer Learning in Semantic Segmentation
  • 33. Motivation - Pre-train network and inference to new dataset
 (ex. train with MS COCO, inference to PASCAL VOC) - This idea doesn’t work well with DecoupledNet • DecoupledNet trained with class-specific input, so it can’t be generalize to new class • Train network with class-independent input!
  • 34. Overview - Attention model identify salient region of each class associated with input image • Output of attention model has location information of each class in coarse feature map - Encoder extract features; Decoder generate dense foreground segmentation mask of each focused region - Training stage • Fix encoder (pre-trained) and train decoder, attention model using pixel-level annotation from source domain • Train attention model using image-level annotation in both domain - After training, decoder is trained with source domain and attention is trained with both domain so attention adapted to target domain
  • 35. Overview - Decoupled encoder-decoder make it possible to share information for shape generation among different class - Attention model provides • Predictions for localization • Class-specific information → enable to adapt decoder into target domain - With attention model, able to get information transferable across different domain and provide useful segmentation prior information
  • 36. Architecture - Encoder • Extract feature descriptor as 
 A is obtain from last conv layer to retain spatial information • M, D is # of hidden unit (20x20), # of channel respectively - Attention model • To train weight vector , where represents relevance of location to each class l • Formally, • And extra technique to reduce parameter [R. Memisevic. 2013] did
  • 37. Architecture - Attention model • To apply attention to this model, it has to be trainable in both domain • Add additional layers on top of attention model, and train
 both , under classification objective • Finally, , z represents class-specific feature • Can optimize z using weak annotation with both domain
 • Example of attention
  • 38. Architecture - Decoder • Output of attention model is spare due to softmax, it may lost information for shape generation • Feed additional input A to z (multiply) → densified attention • With densified attention, optimize segmentation loss, procedure is same as DecoupledNet, but optimize decoder only with source domain
  • 39. Analysis of TransferNet - Decoder generates foreground segmentation of attention to each label - By decoupling classification (domain specific task), it can capture class-independent information for shape generation and apply unseen class - Train attention model using not only pixel-level but also image-level annotation, it can handle unseen class • In DecoupledNet, bridging layer is trained by only pixel-level data

  • 40. Train / Inference - When train, optimize this eq • Trained using only class label is good, but jointly train with segmentation label to regularize noise • After training, remove since it is required only in training to learn attention from target domain - Inference 1. Iteratively obtain attention and segmentation mask 2. Aggregate mask (same as DecoupledNet)
  • 43. Reference - Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning deconvolution network for semantic segmentation.” Proceedings of the IEEE International Conference on Computer Vision. 2015. - Seunghoon Hong, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep neural network for semi-supervised semantic segmentation.” Advances in Neural Information Processing Systems. 2015. - Seunghoon Hong, et al. “Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network.” arXiv preprint arXiv:1512.07928 (2015). - Hyeonwoo Noh. “Semantic Segmentation and Visual Question Answering” (https://drive.google.com/file/d/0B5xl2L77gZfVRXZxQWNmSGlBemc/view)