"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Panasonic

1
Minyoung Kim
May 1st, 2017
A Fast Object Detector for ADAS
Using Deep Learning

2
Panasonic Silicon Valley Laboratory
Silicon Valley Laboratory
(PSVL)
Cupertino, California

3
•  Pros
•  High performance
•  Beat state-of-the-art records in many tasks including image
classification and detection
•  Cons
•  Large set of database
•  High computational power
•  Deep Neural Networks with millions of parameters
•  Slower running time than most of conventional algorithms
Object Detection with Deep Learning

4
Tradeoffs
Speed vs. Accuracy

5
Object Detection System
Building Object Detection System
•  Training Deep Neural Network for Classification
•  Pedestrian detection: Binary classification
•  Object Proposal Generation at different scales
•  Generate box proposals (1000 ~ 2000 boxes)
•  Selective Search*, Edge Boxes**
•  Merge largely overlapping boxes
•  Non Maximum Suppression
* J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, IJCV 2013
** C. Lawrence Zitnick and Piotr Doll´ar, Microsoft Research
Run Recognizer
Proposal Generation
Recognition Network
Classification
Pedestrian
Background
Merge boxes

6
Time Consuming!
Proposal Generation & Scaling
•  Region proposal
•  Selective Search: 2 seconds per image (CPU)
•  Order of magnitude slower
•  Edge Boxes: 0.2 seconds per image
•  Scaling
•  Multiple forward propagations
•  Bottleneck
•  A forward propagation of an image
•  Less than 0.1 seconds (GPU)
Object Detection System
Proposal Generation
Scaling

7
PSVL Pedestrian Detection System
Our Pedestrian Detection System
INPUT
A Single Forward Propagation
OUTPUT
PSVL
Neural Detector

8
Recognition Network
Our Pedestrian Detection System
Add Regression Layer and Finetune
Fully Convolutional Network as Detector
Detection by a single forward propagation

9
Train DNN for recognition
•  GPU & Framework
•  NVIDIA Titan X, NVIDIA Tesla K80
•  Caffe*
•  Network Architectures
•  Modified GoogLeNet**
•  25~30 Convolutional layers
•  Input: Pedestrian and Backgrounds (80x32)
•  Output: Sigmoid or Softmax
•  Dataset
•  Caltech Pedestrian Detection Benchmark***
•  10 hours of 640(w) x 480(h) 30Hz video
•  About 250,000 frames with a total of
350,000 bounding boxes
Recognition Network
* http://caffe.berkeleyvision.org/
** C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich (2014)
*** http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

10
Convert recognition network to a fully convolutional network
Fully Convolutional Network
Base
Network
limited input size
Kernel sliding
Input size not limited
Fully connected Convolutional

11
Regression Layer
•  Regress bounding boxes on useful features
•  Nx4 box coordinates data
•  N: Feature Map resolution (NX x NY)
•  Original GT Box: B = [x1, y1, x2, y2]
•  New GT Box: B’ = rel(B) / m (m: multiplier of Window Size)
Fully Convolutional Network
240 120
m = 2
Output
Feature
Map
4
NX
NY

12
Training detector network
•  Network Architectures
•  Custom loss functions
•  Feature Map: Cross Entropy Loss with Boosting
•  Boosting
•  Ped: Correct Results (TPs) + Ground Truths (FNs)
•  True Positive if IOU > 0.5
•  False Negative if Ground Truths not detected
•  NonPed: FPs
•  False Positive if IOU < 0.5
•  Regression: Euclidean Loss with Feature Map Data incorporated
PSVL Neural Detector
+
640x480
Original
Images
Regression Layer
Fully
Convolutional
Network
Feature
Map
Box
Coord-
inates

13
More Data

14
Even fewer box prediction with Center-Height features

15
Performance – Very Fast with Competitive
Accuracy
•  From DeepCascade paper1)
•  DeepCascade: NVIDIA K20
•  15 fps
•  Ours: NVIDIA GTX770
•  34 fps
•  Speed Adjustment
•  34*0.96992) = 33 fps
•  Ours: NVIDIA Titan X
•  51.422 fps w/o cuDNN
•  85.565 fps with cuDNN4
(*): Left hand side for methods with unknown
fps or less than 0.2 fps
(**): DeepCascade without extra data
(***): SpatialPooling+/Katamari methods use
additional motion information
1) A. Angelova, A. Krizhevsky V. Vanhoucke, A. Ogale, D. Ferguson (2015)
2) http://caffe.berkeleyvision.org/performance_hardware.html
Performance of Pedestrian Detection Methods (Accuracy vs. Speed)
PSVL ND
(**)
(*)
(***)
Faster
Moreaccurate

16
Deploy PSVL ND on Google Nexus 9
•  Processor
•  NVIDIA Tegra K1
•  GPU: NVIDIA Kepler with 192 CUDA cores
•  Speed (without any optimization)
•  Base resolution (600x390): 5 fps
•  Lower resolution (280x240): 16 fps
ND on Portable Device

17
Threshold Information
Probability and NMS
Threshold Bar
Detection box with Probability
Toggle for Threshold Bar
ND Application

18
ND Application Demo (Cluster with Titan X)

19
ND Application Demo at ITSWC 2015
GTX980m Tegra K1

20
More Approaches
•  Faster-RCNN (2015)*
•  Region Proposal Network
•  +10 ms
•  Anchor boxes
•  Predicts offsets & confidences
Object Detection from others
* S. Ren, K. He, R. Girshick, J. Sun (NIPS 2015)

21
Object Detection from others
* J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (CVPR 2016)
** J. Redmon and A. Farhadi (CVPR 2017)
•  YOLO9000 (2017)**
•  Improved localization/recall
•  (-) fully connected layer
Similar Approaches
•  YOLO (2016)*
•  Fully Convolutional Network
•  + fc layer + regression

22
OURS
* F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer (arXiv:1602.07360)
PSVL Multiple-Object Detection System
•  Fire modules*
•  Only 13 MB size
•  16.5 fps on max scale (600x2200)
Performance
(Speed, Accuracy)

23
PSVL Multiple-Object Detection System
•  Only real-time demo at ITSWC 2016
•  30+ fps (2 views per GPU)
Demo at ITSWC 2016

25
PSVL Neural Tracker
•  Critical Risk Management by tracking nearby
objects (pedestrians, cars, cyclists)
•  arXiv:1609.09156
•  State-of-the-art on KITTI MOT
What’s Next?
( : weight sharing )
pairdata
datap
featp
datap
feat
ContrastiveLoss
NB
DIoU DArat
deconvI
reluA
reluI
deconvA
concat
concatp

27
Conclusion
Speed & Accuracy
No Separate Region Proposal
Network Size Optimization

"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Panasonic

More Related Content

What's hot (20)

Similar to "A Fast Object Detector for ADAS using Deep Learning," a Presentation from Panasonic (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Panasonic