SlideShare a Scribd company logo
Recent Object Detection Development
& Person Detection Survey
kv
Outline
- Review Object Detection
- Research Trends: Anchor-free detector
- Person Detection
Review Object Detection
Object Detection
● Is deep learning dominated domain
● Modularized design, reuable
○ Components
○ Pipeline
○ Feature scaling design
Object Detection in 20 Years: A Survey
RCNN
General Object Detector Arch.
Backbone Neck Head
Backbone
Neck
Head
Dense
Head
One-Stage
● YOLO
● SSD
● RetinaNet
Two-Stage
● Faster-RCNN
● TunderNet
Component in Object Detection Pipeline
Backbone (feature extractor)
- ResNet50, ResNeXt, MobileNet
- Hourglass, DLA
Neck (in-net preprocessor)
- RPN
Dense Head
- FPN, BPN, HRPN
Head (task)
- AnchorHead
- retina, ssd
- fcos, ctdet
- BoxHead
Loss function
- CE, BCE
- Focal loss
- L1, Smooth L1
Computation Module
- Deformable Conv (v1, v2)
- GN (Group Normalization)
- SyncBN
- NMS, SoftNMS
- GA (Guided Anchoring)
Two-Stage: Faster RCNN
per ROI computation
per image computation
ResNet
RPN
Softmax
RoIPool
BoxReg
MLP
Scale in Object Detection
Scale in Object Detection
Backbone
● Without scale
○ ConvNet
● With scale
○ DLA
○ Hourglass
○ Modified-ResNet
Backbone Parameters
Backbone name Top1 # of parameters FLOPs/2
ResNet-50 22.28 25,557,032 3,877.95M
DLA-34 25.36 15,742,104 3,071.37M
ResNet-101 21.90 44,549,160 7,597.95M
Hourglass
reference: https://github.com/osmr/imgclsmob/blob/master/pytorch/README.md
Person Detection
Object Detection & Person Detection
Person detection ≈ class-agnostic object detection with crowdness prob.
Object Detection & Person Detection
● Crowdedness & Occlusion
● Scale & fine-grained
● Unusal pose
● Non-person, distractor
● Night scene
● Background distribution (domain shift)
Datasets
COCOPerson
CrowdHuman
Caltech
pedestrian
WiderPerson
WiderPerson19
CUHK Person
dataset #of img #of person density
COCO
Person
64,115 257,252 4.01
CrowdHuman 15,000 339,565 22.64
WiderPerson 9,000 399,786 39.87
CUHK Person 18,184 99,809 5.48
WiderPerson19
sur/ad
8,240/
88,260
58,190/
248,993
7.05/
2.82
Caltech
pedestrian
72,782 13,674 0.32
CityPerson 2,975 19,654 6.61
train, test, benchmark
Dataset: COCOPerson
Dataset: CrowdHuman
Annotations
● Full box
● Visible box
● Head box
Features
● Aim Crowdness issue
Dataset: CrowdHuman
Dataset: WiderPerson
TMM2019 http://www.cbsr.ia.ac.cn/users/jwan/papers/TMM2019-WiderPerson.pdf
Features
● Questionable annotation quality
● Limited scence distribution (by observation)
Annotations
● Full box
● class, tag
Dataset: WiderPerson
TMM2019 http://www.cbsr.ia.ac.cn/users/jwan/papers/TMM2019-WiderPerson.pdf
Features
● More balanced location distribution
Dataset: WiderPerson
Dataset: WiderPerson2019
https://wider-challenge.org/2019.html
Features
● vehicle & surveillance
● low quality but high
resolution images
Observations
COCOPerson
CrowdHuman
Caltech
pedestrian
WiderPerson
General Image
Vehicle
Surveillance
CUHK Person
Market1501
WiderPerson19
Observations
● Model train on COCOPerson can not perform well on real scenario (Not confirmed)
● COCOPerson contains some not reasonable annotation
● WilderPerson dataset is too noisy to use directly
● Full box is hard; visible box may cause higher fp rate
● CrowdHuman is hard but it aims to conquer crowdedness problem
Crowdedness Problem: Repulsion Loss
Attraction
RepGT (Repulsion Term)
RepBox (Repulsion Term)
Crowdedness Problem: Repulsion Loss
Crowdedness Problem: Apative-NMS
Apative-NMS
● Dynamic suppression according to
target density
● Subnetwork to learn density
scores
Crowdedness Problem: Apative-NMS
Drawbacks of anchor box
● Large #of anchors (SSD 40k, Retinanet 100k)
○ faster-rcnn low proposal still performs good
● Introduce extra hyperparameters
● May fail when mult-scale senario
● Imbalance between positive & negative anchors
Recent Trend in Object Detection
Era of anchor-free detector
One-Stage: Fast, Simple
Two-Stage: High Precision
(Recall)
Anchor-Free: Hybrid both
methods
2018
- 8/3 CornerNet (pair)
2019
- 1/23 ExtremeNet (4 pts)
- 4/2 FCOS
- 4/8 FoveaBox
- 4/18 CornerNet-Lite
- 4/19 CenterNet (triplet)
- 4/23 Center and Scale Prediction (CSP)
- 4/25 Objects as Points (CenterNet)
- 10/21 CSID (CSP+ID)
Algo Relations
Anchor-Free
TripletExtremeNet
FCOS
Single
point
CSP
CSID
Multiple
points
CornerNet
CenterNet
CornerNet
Object as paired keypoints
CornerNet
Object as a pair of keypoints (top-left & bottom-right)
Find Corner
Associative
Embedding
Grouping
CornerNet
Corner Pooling
Top-Left
Bottom-Right
Backbone matters:
Hourglass provides 8 AP
than FPN
CornerNet
Corner Pooling
Top-Left
Bottom-Right
● One dimensional embedding
CornerNet: Loss function
● Pixel-wise regression on heatmap with focal loss
● Smooth L1 on offset map
Heatmap OffsetGrouping
CornerNet
CenterNet: Keypoint Triplets
Problem of CornerNet
● Sensitive due to edge (top 100)
● High false positive rate
Improvement
● Correct prediction by checking the
central parts
Object as a keypoint triplet
CenterNet: Keypoint Triplets
Corner Pool
Associative
Embedding
Grouping
Center Pool
CenterNet: Keypoint Triplets
CenterNet: Keypoint Triplets
FCOS
Object as a point + 4d vector ● Balance between postivie &
negative samples
● Ambiguous case ~ 1.4% in COCO
● Hint for center
FCOS
Backbone + PFN + Head (classical arch)
FCOS: Centerness
Important Feature
● Center-ness eliminates ambiguous
samples
● Class score times center-ness score
@NMS
FCOS: Centerness
FCOS: Improvements
● 1x and 2x mean the model is
trained for 90K and 180K
iterations, respectively.
● center means center sample is
used in our training.
● liou means the model use linear
iou loss function. (1 - iou)
● giou means the use giou loss
function. (1 - giou)
Objects as Points (+2 vals)
● Simple method
○ One feature map that represents all scales
○ No bounding box matching
○ No non maximum suppression
● Better speed-accuracy trade-off
Objects as Points: “The true CenterNet”
Hourglass
● Use DCNv2 instead Conv
● Heatmap supports 2D, 3D, pose
estimation
Objects as Points: “The true CenterNet”
● Pixel-wise regression with focal loss
● Not normalize scale map
● Size reg. constant 0.1
● L1 loss (rather Smooth L1) on offset loss
● Training longer performs better (140 to 230)
CSP: Center & Scale Prediction
Prediction
● Center (Heatmap)
● Scale (Height)
Fix aspect ratio @0.41
(according to dataset)
Object as a point + 1 scalar
CSP: Center & Scale Prediction
CSP: Center & Scale Prediction
Why Choose Height?
Why Predict Center?
CSID: Center, Scale, Identity and Density aware
ID-Map learns two measures simultaneously
● Density of predicted center
● Identity of predicted center
CSID: Center, Scale, Identity and Density aware
ID-NMS
Algo Relations
Anchor-Free
TripletExtremeNet
FCOS
Single
point
CSP
CSID
Multiple
points
CornerNet
CenterNet
How points are groupped?
● Pooling
● Associative
embeddings
How ceneter is located?
● Centerness reg.
● Center target
● Domain contraints
Comparison
Algorithm CornerNet Triplet FCOS CenterNet CSP CSID
#of points 2 3 1 1 1 1, 1
Scale Backbone Backbone FPN Backbone FPN Backbone
Grouping
method
Corner Pool
Loss
Center Pool
Corner Pool
Loss
- - - ID Loss
Density loss
Key feature Pool
Embedding
Pool Centerness Simple Const.
aspect ratio
ID Map
Post-processing NMS Soft-NMS NMS - NMS ID-NMS
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: CityPerson
Algorithm
Name Backbone Reasonable Heavy Partial Bare inference time
FRCNN VGG-16 15.4 - - - -
OR-CNN VGG-16 12.8 55.7 15.3 6.7 -
RepLoss ResNet-50 13.2 56.9 16.8 7.6 -
CSP ResNet-50 11 49.3 10.4 7.3 3 fps
Adaptive-NMS ResNet-50 10.8 54 11.4 6.2 -
CSID DLA-34 8.8 46.6 8.3 5.8 6.25 fps
Training Frameworks
● Tensorflow Object Detection API
● mmdetection (CUHK)
● simpledet (TuSimple)
● Detectron, Detectron2
Conclusions
● Crowdedness is the major obstacle in person detection
● Anchor-free detector seems flexible & extensible to object task
● Center-based method + post-processing + specialized loss
○ CSID
○ CenterNet + A-NMS + RepLoss
● Trade-off between backbone & scaling level
○ ConvNet + FPN
○ DLA
● Still a challenging topic
Paper Lists: Person Detection
● CityPersons: A Diverse Dataset for Pedestrian Detection
● WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild
● CrowdHuman: A Benchmark for Detecting Human in a Crowd
● CenterNet: Keypoint Triplets for Object Detection
● Objects as Points
● FoveaBox: Beyond Anchor-based Object Detector
● Feature Selective Anchor-Free Module for Single-Shot Object Detection
● FCOS: Fully Convolutional One-Stage Object Detection
● Center and Scale Prediction: A Box-free Approach for Object Detection
● Bottom-up Object Detection by Grouping Extreme and Center Points
● CSID: Center, Scale, Identity and Density-aware Pedestrian Detection in a Crowd
● Repulsion Loss: Detecting Pedestrians in a Crowd
● Adaptive NMS: Refining Pedestrian Detection in a Crowd
● Discriminative Feature Transformation for Occluded Pedestrian Detection
● PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes
● Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd
● Double Anchor R-CNN for Human Detection in a Crowd

More Related Content

PDF
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
PDF
Anchor free object detection by deep learning
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PPTX
Object detection - RCNNs vs Retinanet
PPTX
Object Detection using Deep Neural Networks
PDF
ViT (Vision Transformer) Review [CDM]
PPTX
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Anchor free object detection by deep learning
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Object detection - RCNNs vs Retinanet
Object Detection using Deep Neural Networks
ViT (Vision Transformer) Review [CDM]

What's hot (20)

PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
Introduction to object detection
PDF
Deep VO and SLAM
PPTX
You only look once (YOLO) : unified real time object detection
PPTX
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
PDF
Deep learning based object detection basics
PPTX
[DL輪読会]End-to-End Object Detection with Transformers
PPTX
Machine Learning - Convolutional Neural Network
PPTX
Structured Light 技術俯瞰
PDF
Deep Learning Hardware: Past, Present, & Future
PPTX
Reinforcement learning
PDF
Codetecon #KRK 3 - Object detection with Deep Learning
PDF
論文紹介:OneFormer: One Transformer To Rule Universal Image Segmentation
PDF
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
PDF
Moving Object Detection And Tracking Using CNN
PPTX
PR-146: CornerNet detecting objects as paired keypoints
PDF
DeepLearningTutorial
PDF
物件偵測與辨識技術
PDF
Object Detection and Recognition
PPTX
Computer vision (machine learning for developers)
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Introduction to object detection
Deep VO and SLAM
You only look once (YOLO) : unified real time object detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
Deep learning based object detection basics
[DL輪読会]End-to-End Object Detection with Transformers
Machine Learning - Convolutional Neural Network
Structured Light 技術俯瞰
Deep Learning Hardware: Past, Present, & Future
Reinforcement learning
Codetecon #KRK 3 - Object detection with Deep Learning
論文紹介:OneFormer: One Transformer To Rule Universal Image Segmentation
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
Moving Object Detection And Tracking Using CNN
PR-146: CornerNet detecting objects as paired keypoints
DeepLearningTutorial
物件偵測與辨識技術
Object Detection and Recognition
Computer vision (machine learning for developers)
Ad

Similar to Recent Object Detection Research & Person Detection (20)

PDF
Centernet
PDF
Backbone search for object detection for applications in intrusion warning sy...
PDF
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
A Brief History of Object Detection / Tommi Kerola
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
PDF
Objects as points (CenterNet) review [CDM]
PDF
Object Detetcion using SSD-MobileNet
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
PDF
D3L4-objects.pdf
PPTX
slide-171212080528.pptx
PDF
Object Single Frame Using YOLO Model
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
PPTX
Real Time Object Dectection using machine learning
PDF
REVIEW ON OBJECT DETECTION WITH CNN
PDF
Adaptive object detection using adjacency and zoom prediction
PDF
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
PPTX
ObjRecog2-17 (1).pptx
PDF
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
Centernet
Backbone search for object detection for applications in intrusion warning sy...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
IRJET- Real-Time Object Detection using Deep Learning: A Survey
A Brief History of Object Detection / Tommi Kerola
Object Detection Beyond Mask R-CNN and RetinaNet I
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Objects as points (CenterNet) review [CDM]
Object Detetcion using SSD-MobileNet
Object Detection - Míriam Bellver - UPC Barcelona 2018
D3L4-objects.pdf
slide-171212080528.pptx
Object Single Frame Using YOLO Model
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
Real Time Object Dectection using machine learning
REVIEW ON OBJECT DETECTION WITH CNN
Adaptive object detection using adjacency and zoom prediction
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
ObjRecog2-17 (1).pptx
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
Ad

More from Kai-Wen Zhao (8)

PDF
Learning visual representation without human label
PDF
Deep Double Descent
PDF
Learning to discover monte carlo algorithm on spin ice manifold
PDF
Toward Disentanglement through Understand ELBO
PDF
Deep Reinforcement Learning: Q-Learning
PDF
Paper Review: An exact mapping between the Variational Renormalization Group ...
PDF
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
PDF
High Dimensional Data Visualization using t-SNE
Learning visual representation without human label
Deep Double Descent
Learning to discover monte carlo algorithm on spin ice manifold
Toward Disentanglement through Understand ELBO
Deep Reinforcement Learning: Q-Learning
Paper Review: An exact mapping between the Variational Renormalization Group ...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
High Dimensional Data Visualization using t-SNE

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Global journeys: estimating international migration
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PDF
Lecture1 pattern recognition............
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Clinical guidelines as a resource for EBP(1).pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Launch Your Data Science Career in Kochi – 2025
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Global journeys: estimating international migration
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
Lecture1 pattern recognition............
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Recent Object Detection Research & Person Detection