SlideShare a Scribd company logo
From Image-level to Pixel-level
Labeling with Convolutional Networks
Pedro O. Pinheiro, Ronan Collobert
Idiap Research Institute, Martigny, Switzerland
Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland
Facebook AI Research, Menlo Park, CA, USA
목차
• Introduction
• Method
• Result
• Conclusion & Discussion
Introduction
• From Image-level to Pixel-level Labeling with Convolutional Networks(2015, CVPR)
• Weakly Supervised Semantic Segmentation에 Image-level을 사용하는 방법 제안
• 하나의 이미지에 하나의 Object가 존재하는 경우를 가정
Inference Pipeline
Schematic illustration
Introduction
• Another papers..
• Weakly- and semi-supervised learning of a dcnn for semantic image segmentation (2015, ICCV)
• Constrained Convolutional Neural Networks for Weakly Supervised Segmentation (2015, ICCV)
• Fully Convolutional Multi-Class Multiple Instance Learning (2015, ICLR Workship)
• STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation (2016, TPAMI)
→ 위의 논문들은 단순 적용하기 어려워 보임
Method
• Architecture
• ImageNet Pretrained OverFeat(for classification)을 변형해서 사용
• Segmentation Net: 4개의 “Conv + ReLU” layers로 구성
• 400 x 400 RGB input → 21 x 21 x ( 𝐶 + 1) output
• 2015년 연구이다 보니 옛날 network를 기반으로 실험이 진행됨 → 성능 향상 여지가 보임
Method
• Multiple Instance Learning (Training phase)
• Network output Pixel level score 계산
• 𝑠𝑖,𝑗
𝑘
, for I, j = pixel location, k ∈ 𝐶
• Pixel level score를 취합하여 single image-level classification score 계산
• 𝑠 𝑘
= 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗(𝑠𝑖,𝑗
𝑘
)
• 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗() 함수로 단순 average, max 대신 Log-Sum-Exp(LSE)라는 함수 정의
• r → hyper-parameter, 높으면 max, 낮으면 average 함수와 유사한 효과 (실험에서는 5 사용)
• Smooth and convex approximation of max 방식 LSE를 통해 학습 안정성을 높임
LSE Function
Method
• Inference phase
• False Positive 줄이기 위해 약간의 Post Processing을 이용
• Image-level Prior(ILP)와 3개의 Smoothing Prior(SP)
• Image-level prior(ILP)
• 학습 때 사용한 𝑠 𝑘
= 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗(𝑠𝑖,𝑗
𝑘
) 정보를 이용하여 Inference 시 Class 정보 활용
Method
• Inference phase
• 3가지 Smoothing Prior(SP) 실험 진행
• SP-sppxl (super-pixel 이용) / SP-bb (bonding box candidates 이용) / SP-seg (class-independent segmentation 이용)
• 왼쪽부터 차례대로 좋은 성능을 보임
• SP-seg
• 지난 WSISS에서도 사용되었던 Multiscale combinatorial grouping(MCG) 기반으로 수행
• MCG 결과와 CNN의 결과를 곱한 값을 결과로 사용, 𝛿 𝑘 = 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟, grid search를 통해 선택
MCG output
Method
• Dataset
• Test set
• PASCAL VOC Segmentation 데이터셋(20 classes)에 대해 Test 진행
• Training set
• ImageNet 데이터셋 중 PASCAL VOC에 해당하는 20가지 + BG 총 21 class 데이터를 Train에 사용
• 20가지 → 약 70만장
• Background → 20가지 class를 제외한 나머지 이미지 중에 6만장 선택
• Horizontal Flip, Rotation, Scaling, Brightness and contrast modification augmentation 적용
Method
• Training details
• 400x400 RGB input
• Learning rate = 0.001
• Decrease by a factor of 0.5 for every 5 million examples
• SGD with batch size=16, momentum = 0.9, weight decay of 0.00005
• Dropout rate = 0.5 on each layer
Result
• 실험 결과 (vs. Weakly Supervised)
• 그 당시의 SOTA WSSS 방법론들과 성능 비교
• Multi-Image Model(MIM)
• Generalized Multi-Image Model(GMIM)
• Probabilistic Graphlet Cut(PGC)
• Averaged per-class accuracy 라는 metric으로 비교
• 각 class당 올바르게 분류된 pixel들의 비율을 의미
• LSE aggreg function이 가장 좋은 성능을 보임
Result
• 실험 결과 (vs. Fully Supervised)
• PASCAL VOC 2012 상위권 방법론들과 성능 비교
• Second Order Pooling(O2P)
• DivMBest
• Simultaneous Detection and Segmentation(SDS)
• Average Precision metric으로 비교
• 역시 Fully Supervised 방법에 비해서는 부족한 성능..
Result
• 실험 결과 (Inference Result, maybe cherry picking)
왼쪽부터 차례대로
원본 / ILP / ILP + SP
Conclusion
• From Image-level to Pixel-level Labeling with Convolutional Networks(2015, CVPR)
• Weakly Supervised Semantic Segmentation에 Image-level을 사용하는 방법 제안
• 옛날 논문 답게 굉장히 간단한 방법 제안, 낮은 성능..
• But, 굉장히 단순한 방법으로 빠르게 실험 가능할 것으로 보임
• 비슷한 아이디어로 Feature map을 이용하는 것처럼 Class Activation Map을 이용하면 어떨까?
Conclusion
• Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly
Supervised Semantic Segmentation(2018, CVPR)
Conclusion
• Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object
Features (2018, CVPR)

More Related Content

PDF
"Learning transferable architectures for scalable image recognition" Paper Re...
PDF
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
PDF
"Dataset and metrics for predicting local visible differences" Paper Review
PDF
"simple does it weakly supervised instance and semantic segmentation" Paper r...
PDF
"How does batch normalization help optimization" Paper Review
PDF
"Google Vizier: A Service for Black-Box Optimization" Paper Review
PDF
"Searching for Activation Functions" Paper Review
PDF
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"How does batch normalization help optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Searching for Activation Functions" Paper Review
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...

What's hot (20)

PDF
carrier of_tricks_for_image_classification
PDF
CNN Architecture A to Z
PDF
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PDF
[한국어] Neural Architecture Search with Reinforcement Learning
PDF
PR-339: Maintaining discrimination and fairness in class incremental learning
PDF
PR-218: MFAS: Multimodal Fusion Architecture Search
PPTX
Review MLP Mixer
PDF
스마트폰 위의 딥러닝
PPTX
Image net classification with deep convolutional neural networks
PPTX
Deep learning seminar_snu_161031
PPTX
Review SRGAN
PPTX
AlexNet, VGG, GoogleNet, Resnet
PPTX
Encoding in Style: a Style Encoder for Image-to-Image Translation
PPTX
Review EDSR
PDF
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
PDF
Deep Learning & Convolutional Neural Network
PDF
Deep learning super resolution
PDF
Detecting fake jpeg images
PDF
[PR12] image super resolution using deep convolutional networks
PDF
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
carrier of_tricks_for_image_classification
CNN Architecture A to Z
PR-203: Class-Balanced Loss Based on Effective Number of Samples
[한국어] Neural Architecture Search with Reinforcement Learning
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-218: MFAS: Multimodal Fusion Architecture Search
Review MLP Mixer
스마트폰 위의 딥러닝
Image net classification with deep convolutional neural networks
Deep learning seminar_snu_161031
Review SRGAN
AlexNet, VGG, GoogleNet, Resnet
Encoding in Style: a Style Encoder for Image-to-Image Translation
Review EDSR
[paper review] 손규빈 - Eye in the sky & 3D human pose estimation in video with ...
Deep Learning & Convolutional Neural Network
Deep learning super resolution
Detecting fake jpeg images
[PR12] image super resolution using deep convolutional networks
(Paper Review)Few-Shot Adversarial Learning of Realistic Neural Talking Head ...
Ad

Similar to "From image level to pixel-level labeling with convolutional networks" Paper Review (20)

PDF
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
PDF
Alexnet paper review
PDF
Summary in recent advances in deep learning for object detection
PDF
Summary in recent advances in deep learning for object detection
PDF
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
PPTX
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
PDF
Refinenet
PDF
FCN to DeepLab.v3+
PDF
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
PDF
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
PDF
History of Vision AI
PPTX
Segment Anything
PPTX
FaceNet: A Unified Embedding for Face Recognition and Clustering
PPTX
[Paper Review] Visualizing and understanding convolutional networks
PDF
Supervised Constrastive Learning
PDF
HistoryOfCNN
PPTX
2.supervised learning(epoch#2)-3
PPTX
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
PDF
Learning Less is More - 6D Camera Localization via 3D Surface Regression
PPTX
Convolutional Neural Networks
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
Alexnet paper review
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Refinenet
FCN to DeepLab.v3+
Semantic Image Synthesis with Spatially-Adaptive Normalization(GAUGAN, SPADE)
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
History of Vision AI
Segment Anything
FaceNet: A Unified Embedding for Face Recognition and Clustering
[Paper Review] Visualizing and understanding convolutional networks
Supervised Constrastive Learning
HistoryOfCNN
2.supervised learning(epoch#2)-3
Progressive Growing of GANs for Improved Quality, Stability, and Variation Re...
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Convolutional Neural Networks
Ad

More from LEE HOSEONG (14)

PDF
Unsupervised anomaly detection using style distillation
PDF
do adversarially robust image net models transfer better
PDF
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
PDF
Mixed Precision Training Review
PDF
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
PDF
YOLOv4: optimal speed and accuracy of object detection review
PDF
FixMatch:simplifying semi supervised learning with consistency and confidence
PDF
"Revisiting self supervised visual representation learning" Paper Review
PDF
Unsupervised visual representation learning overview: Toward Self-Supervision
PDF
Human uncertainty makes classification more robust, ICCV 2019 Review
PDF
Single Image Super Resolution Overview
PDF
2019 ICLR Best Paper Review
PDF
2019 cvpr paper_overview
PDF
Pelee: a real time object detection system on mobile devices Paper Review
Unsupervised anomaly detection using style distillation
do adversarially robust image net models transfer better
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
Mixed Precision Training Review
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
YOLOv4: optimal speed and accuracy of object detection review
FixMatch:simplifying semi supervised learning with consistency and confidence
"Revisiting self supervised visual representation learning" Paper Review
Unsupervised visual representation learning overview: Toward Self-Supervision
Human uncertainty makes classification more robust, ICCV 2019 Review
Single Image Super Resolution Overview
2019 ICLR Best Paper Review
2019 cvpr paper_overview
Pelee: a real time object detection system on mobile devices Paper Review

"From image level to pixel-level labeling with convolutional networks" Paper Review

  • 1. From Image-level to Pixel-level Labeling with Convolutional Networks Pedro O. Pinheiro, Ronan Collobert Idiap Research Institute, Martigny, Switzerland Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland Facebook AI Research, Menlo Park, CA, USA
  • 2. 목차 • Introduction • Method • Result • Conclusion & Discussion
  • 3. Introduction • From Image-level to Pixel-level Labeling with Convolutional Networks(2015, CVPR) • Weakly Supervised Semantic Segmentation에 Image-level을 사용하는 방법 제안 • 하나의 이미지에 하나의 Object가 존재하는 경우를 가정 Inference Pipeline Schematic illustration
  • 4. Introduction • Another papers.. • Weakly- and semi-supervised learning of a dcnn for semantic image segmentation (2015, ICCV) • Constrained Convolutional Neural Networks for Weakly Supervised Segmentation (2015, ICCV) • Fully Convolutional Multi-Class Multiple Instance Learning (2015, ICLR Workship) • STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation (2016, TPAMI) → 위의 논문들은 단순 적용하기 어려워 보임
  • 5. Method • Architecture • ImageNet Pretrained OverFeat(for classification)을 변형해서 사용 • Segmentation Net: 4개의 “Conv + ReLU” layers로 구성 • 400 x 400 RGB input → 21 x 21 x ( 𝐶 + 1) output • 2015년 연구이다 보니 옛날 network를 기반으로 실험이 진행됨 → 성능 향상 여지가 보임
  • 6. Method • Multiple Instance Learning (Training phase) • Network output Pixel level score 계산 • 𝑠𝑖,𝑗 𝑘 , for I, j = pixel location, k ∈ 𝐶 • Pixel level score를 취합하여 single image-level classification score 계산 • 𝑠 𝑘 = 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗(𝑠𝑖,𝑗 𝑘 ) • 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗() 함수로 단순 average, max 대신 Log-Sum-Exp(LSE)라는 함수 정의 • r → hyper-parameter, 높으면 max, 낮으면 average 함수와 유사한 효과 (실험에서는 5 사용) • Smooth and convex approximation of max 방식 LSE를 통해 학습 안정성을 높임 LSE Function
  • 7. Method • Inference phase • False Positive 줄이기 위해 약간의 Post Processing을 이용 • Image-level Prior(ILP)와 3개의 Smoothing Prior(SP) • Image-level prior(ILP) • 학습 때 사용한 𝑠 𝑘 = 𝑎𝑔𝑔𝑟𝑒𝑔𝑖,𝑗(𝑠𝑖,𝑗 𝑘 ) 정보를 이용하여 Inference 시 Class 정보 활용
  • 8. Method • Inference phase • 3가지 Smoothing Prior(SP) 실험 진행 • SP-sppxl (super-pixel 이용) / SP-bb (bonding box candidates 이용) / SP-seg (class-independent segmentation 이용) • 왼쪽부터 차례대로 좋은 성능을 보임 • SP-seg • 지난 WSISS에서도 사용되었던 Multiscale combinatorial grouping(MCG) 기반으로 수행 • MCG 결과와 CNN의 결과를 곱한 값을 결과로 사용, 𝛿 𝑘 = 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟, grid search를 통해 선택 MCG output
  • 9. Method • Dataset • Test set • PASCAL VOC Segmentation 데이터셋(20 classes)에 대해 Test 진행 • Training set • ImageNet 데이터셋 중 PASCAL VOC에 해당하는 20가지 + BG 총 21 class 데이터를 Train에 사용 • 20가지 → 약 70만장 • Background → 20가지 class를 제외한 나머지 이미지 중에 6만장 선택 • Horizontal Flip, Rotation, Scaling, Brightness and contrast modification augmentation 적용
  • 10. Method • Training details • 400x400 RGB input • Learning rate = 0.001 • Decrease by a factor of 0.5 for every 5 million examples • SGD with batch size=16, momentum = 0.9, weight decay of 0.00005 • Dropout rate = 0.5 on each layer
  • 11. Result • 실험 결과 (vs. Weakly Supervised) • 그 당시의 SOTA WSSS 방법론들과 성능 비교 • Multi-Image Model(MIM) • Generalized Multi-Image Model(GMIM) • Probabilistic Graphlet Cut(PGC) • Averaged per-class accuracy 라는 metric으로 비교 • 각 class당 올바르게 분류된 pixel들의 비율을 의미 • LSE aggreg function이 가장 좋은 성능을 보임
  • 12. Result • 실험 결과 (vs. Fully Supervised) • PASCAL VOC 2012 상위권 방법론들과 성능 비교 • Second Order Pooling(O2P) • DivMBest • Simultaneous Detection and Segmentation(SDS) • Average Precision metric으로 비교 • 역시 Fully Supervised 방법에 비해서는 부족한 성능..
  • 13. Result • 실험 결과 (Inference Result, maybe cherry picking) 왼쪽부터 차례대로 원본 / ILP / ILP + SP
  • 14. Conclusion • From Image-level to Pixel-level Labeling with Convolutional Networks(2015, CVPR) • Weakly Supervised Semantic Segmentation에 Image-level을 사용하는 방법 제안 • 옛날 논문 답게 굉장히 간단한 방법 제안, 낮은 성능.. • But, 굉장히 단순한 방법으로 빠르게 실험 가능할 것으로 보임 • 비슷한 아이디어로 Feature map을 이용하는 것처럼 Class Activation Map을 이용하면 어떨까?
  • 15. Conclusion • Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation(2018, CVPR)
  • 16. Conclusion • Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features (2018, CVPR)