SlideShare a Scribd company logo
visionNoob
(Jaewon Lee)
PR-110
An Analysis of Scale Invariance in Object Detection – SNIP
Singh, B., & Davis, L. S. CVPR’18
1
https://arxiv.org/abs/1711.08189
2
References for Object Detection
PR-002: Deformable Convolutional Networks (2017)
PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal
PR-016: You only look once: Unified, real-time object detection
PR-023: YOLO9000: Better, Faster, Stronger
PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
PR-057: Mask R-CNN
PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)
3
MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240)
Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534)
Deformable ConvNets + Xception
Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt
Ensemble of multiple models using unlabeled data with multiple scales.
(today!) An Analysis of Scale Invariance in Object Detection – SNIP
MS COCO Results
What makes object detection harder
than image classification?
4
What makes object detection harder
than image classification?
5
http://cocodataset.org/
http://www.image-net.org/
MSCOCO
ImageNet
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks.
In less than five years,
the top-5 error on ImageNet 15%[20] to 2%[16]
The mAP of the best performing detector [18] COCO
[25] is only 62% – even at 50% overlap.
# of classes in COCO = 80 # of classes in Image = 1000
6
Relative Scale =
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 )
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 )
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network
7
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
Current Practices for Object Detection
Convolution Neural Networks for Classification
8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
Spatial resolution which contain high-level semantic feature is much lower
-> Make Object Detection harder
9
(PR-002)
10
Tutorial_ Deep Learning for Objects and Scenes
11
12
13
Current Practices for Object Detection
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
High resolution models lead to significantly
better mAP result on small object
(note that typical resolution in coco is 640 x 480)
14
Are CNNs robust to up-sampling?
15
16
17
18
19
20
21
22
23
24
25
26
(pose, appearance, etc)
27
28
Pretrained classification network : 224 x 224
Original : 640 x 480
Inference : 1400 x 2000
AP small
objects
29
Reduce variation in scale without total number of training samples!
30
31
[0, 80]
[40, 160]
[120, ∞]
32
[0, 80]
[40, 160]
[120, ∞]
33
too small
small
medium
large
too large
Large data variation
Large scale variation
Out of Receptive field
Too low spatial resolution
medium
Normalize Scale
34
Large data variation
Small scale variation
35
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network
[0, 80]
[40, 160]
[120, ∞]
36
37
38
39
Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).
Q&A
40

More Related Content

PPTX
PR-146: CornerNet detecting objects as paired keypoints
PPT
Transferable GAN-generated Images Detection Framework.
PPT
[Seminar arxiv]fake face detection via adaptive residuals extraction network
PPT
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
PPT
“zero-shot” super-resolution using deep internal learning [CVPR2018]
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PDF
Feasibility of moment tensor inversion for a single-well microseismic data us...
PDF
Transfer learning for low frequency extrapolation from shot gathers for FWI a...
PR-146: CornerNet detecting objects as paired keypoints
Transferable GAN-generated Images Detection Framework.
[Seminar arxiv]fake face detection via adaptive residuals extraction network
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
“zero-shot” super-resolution using deep internal learning [CVPR2018]
Object Detection Beyond Mask R-CNN and RetinaNet I
Feasibility of moment tensor inversion for a single-well microseismic data us...
Transfer learning for low frequency extrapolation from shot gathers for FWI a...

What's hot (20)

PDF
Computer vision for transportation
PDF
Neural network-based low-frequency data extrapolation
PDF
Object Detection Beyond Mask R-CNN and RetinaNet II
PDF
Architecture Design for Deep Neural Networks III
PDF
Cognitive Engine: Boosting Scientific Discovery
PPTX
Surveillance scene classification using machine learning
PDF
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
PDF
SkyhookDM - Towards an Arrow-Native Storage System
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
PPTX
"Building and running the cloud GPU vacuum cleaner"
PDF
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
PDF
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
PPTX
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
PPTX
Tomoya Sato Master Thesis
PPTX
(Research Note) Delving deeper into convolutional neural networks for camera ...
PPTX
EDF2012 Peter Boncz - LOD benchmarking SRbench
PDF
Data-driven methods for the initialization of full-waveform inversion
PPTX
Coding the Continuum
PDF
Data-intensive IceCube Cloud Burst
Computer vision for transportation
Neural network-based low-frequency data extrapolation
Object Detection Beyond Mask R-CNN and RetinaNet II
Architecture Design for Deep Neural Networks III
Cognitive Engine: Boosting Scientific Discovery
Surveillance scene classification using machine learning
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
SkyhookDM - Towards an Arrow-Native Storage System
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
"Building and running the cloud GPU vacuum cleaner"
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Tomoya Sato Master Thesis
(Research Note) Delving deeper into convolutional neural networks for camera ...
EDF2012 Peter Boncz - LOD benchmarking SRbench
Data-driven methods for the initialization of full-waveform inversion
Coding the Continuum
Data-intensive IceCube Cloud Burst
Ad

Similar to PR-110: An Analysis of Scale Invariance in Object Detection – SNIP (20)

PPTX
Object detection with deep learning
PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
PPTX
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
Modern convolutional object detectors
PDF
Image Object Detection Pipeline
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PPTX
seminar ppt.pptx
PDF
D3L4-objects.pdf
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
PPTX
Recent Progress on Object Detection_20170331
PDF
Partial Object Detection in Inclined Weather Conditions
PDF
Object Detetcion using SSD-MobileNet
PDF
物件偵測與辨識技術
PDF
IRJET- Object Detection in an Image using Deep Learning
PPTX
Object Detection with Tensorflow
PPTX
odtslide-180529073940.pptx
PPTX
Object detection with Tensorflow Api
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
PDF
IRJET- Real-Time Object Detection System using Caffe Model
Object detection with deep learning
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
IRJET- Real-Time Object Detection using Deep Learning: A Survey
Modern convolutional object detectors
Image Object Detection Pipeline
Deep Learning for Computer Vision: Object Detection (UPC 2016)
seminar ppt.pptx
D3L4-objects.pdf
Object Detection - Míriam Bellver - UPC Barcelona 2018
Recent Progress on Object Detection_20170331
Partial Object Detection in Inclined Weather Conditions
Object Detetcion using SSD-MobileNet
物件偵測與辨識技術
IRJET- Object Detection in an Image using Deep Learning
Object Detection with Tensorflow
odtslide-180529073940.pptx
Object detection with Tensorflow Api
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Real-Time Object Detection System using Caffe Model
Ad

More from jaewon lee (8)

PDF
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PDF
PR-199: SNIPER:Efficient Multi Scale Training
PPTX
PR 171: Large margin softmax loss for Convolutional Neural Networks
PDF
PR157: Best of both worlds: human-machine collaboration for object annotation
PPTX
PR-122: Can-Creative Adversarial Networks
PPTX
Rgb data
PPTX
Pytorch kr devcon
PPTX
PR-134 How Does Batch Normalization Help Optimization?
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-199: SNIPER:Efficient Multi Scale Training
PR 171: Large margin softmax loss for Convolutional Neural Networks
PR157: Best of both worlds: human-machine collaboration for object annotation
PR-122: Can-Creative Adversarial Networks
Rgb data
Pytorch kr devcon
PR-134 How Does Batch Normalization Help Optimization?

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

PR-110: An Analysis of Scale Invariance in Object Detection – SNIP

  • 1. visionNoob (Jaewon Lee) PR-110 An Analysis of Scale Invariance in Object Detection – SNIP Singh, B., & Davis, L. S. CVPR’18 1 https://arxiv.org/abs/1711.08189
  • 2. 2 References for Object Detection PR-002: Deformable Convolutional Networks (2017) PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal PR-016: You only look once: Unified, real-time object detection PR-023: YOLO9000: Better, Faster, Stronger PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection PR-057: Mask R-CNN PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)
  • 3. 3 MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240) Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534) Deformable ConvNets + Xception Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt Ensemble of multiple models using unlabeled data with multiple scales. (today!) An Analysis of Scale Invariance in Object Detection – SNIP MS COCO Results
  • 4. What makes object detection harder than image classification? 4
  • 5. What makes object detection harder than image classification? 5 http://cocodataset.org/ http://www.image-net.org/ MSCOCO ImageNet [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In less than five years, the top-5 error on ImageNet 15%[20] to 2%[16] The mAP of the best performing detector [18] COCO [25] is only 62% – even at 50% overlap. # of classes in COCO = 80 # of classes in Image = 1000
  • 6. 6 Relative Scale = 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 ) 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 ) MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network
  • 7. 7 Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. Current Practices for Object Detection
  • 8. Convolution Neural Networks for Classification 8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Spatial resolution which contain high-level semantic feature is much lower -> Make Object Detection harder
  • 10. 10 Tutorial_ Deep Learning for Objects and Scenes
  • 11. 11
  • 12. 12
  • 13. 13 Current Practices for Object Detection Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. High resolution models lead to significantly better mAP result on small object (note that typical resolution in coco is 640 x 480)
  • 14. 14
  • 15. Are CNNs robust to up-sampling? 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 27. 27
  • 28. 28 Pretrained classification network : 224 x 224 Original : 640 x 480 Inference : 1400 x 2000 AP small objects
  • 29. 29 Reduce variation in scale without total number of training samples!
  • 30. 30
  • 33. 33 too small small medium large too large Large data variation Large scale variation Out of Receptive field Too low spatial resolution
  • 34. medium Normalize Scale 34 Large data variation Small scale variation
  • 35. 35 MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network [0, 80] [40, 160] [120, ∞]
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39 Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).