SlideShare a Scribd company logo
An Object Detector based on Multiscale 
Sliding Window Search using a Fully Pipelined 
Binarized CNN on an FPGA
Hiroki Nakahara, Haruyoshi Yonekawa, Shimpei Sato
Tokyo Institute of Technology, Japan
FPT2017
@Melbourne
Outline
• Background
• Object detector algorithm
• Fully pipelined Binarized CNN
• Experimental results
• Conclusion
2
Introduction
3
Convolutional Neural Network (CNN)
• Convolutional + fully connected + pooling layers
• State‐of‐the‐art performance in an image 
recognition task
• Widely applicable
4Source: https://www.mathworks.com/discovery/convolutional‐neural‐network.html
Image Recognition Tasks
• Classification
• answer “category” of 
the object in an image
• Object Detection
• classification + localization
• Semantic Segmentation
• Object area in pixel level
5
Easy
Hard
Children
Requirements in Embedded System
6
Cloud Embedded
Many classes (1000s) Few classes (<10)
Large workloads Frame rates (15‐30 FPS)
High efficiency
(Performance/W)
Low cost & low power
(1W‐5W)
Server form factor Custom form factor
J. Freeman (Intel), “FPGA Acceleration in the era of high level design”, 2017
Outline
• Background
• Object detector algorithm
• Fully pipelined Binarized CNN
• Experimental results
• Conclusion
7
Object Detection Problem
• Detecting and classifying multiple objects at the same time
• Evaluation criteria (from Pascal VOC):
8
Ground truth
annotation
Detection results:
>50% overlap of
bounding box with
ground truth
One BBox for each
object
Confidence value
for each object
Person (50%)
#	 	 .
# 	 .
#	 	 .
# 	
1
11 , ∈ ,. ,…,
Average Precision (AP):
Proposed Object Detector
• Sliding window + Multi‐scaling + Fully pipelined BCNNs
9
...
Multi‐scale images
Wrapped
Images
by Sliding
Window
Classification
by a Fully pipelined
Binarized CNN
by Non‐maximum
Suppression
Sliding Window
• It is rectangular region of fixed width and height 
that “slides” across an image
10
Multi‐Scaling (Pyramid Pooling)
• Find objects in images at different scales
• Combined with a sliding window, it can find objects 
in various locations with the same window size
11
Non‐Maximum Suppression
• Given all scored bounding boxes in an image
• Rejects a bounding box which overlaps with a higher 
scoring one considering a threshold
12
Quantification of Iterations
13
q
q q
q qq...
1st image
2nd image
i-th image
∆
2
• Trade‐off: Time (Iters), AP, and HW
p: Image size (given)
q: Window size
Δx: Stride
→ Find good q and Δx
Outline
• Background
• Object detector algorithm
• Fully pipelined Binarized CNN
• Experimental results
• Conclusion
14
Binarized CNN
15
x1
w0 (Bias)
fsgn(Y)
Y
z
w1
x2
w2
xn
wn
...
x1 x2 Y
‐1 ‐1 1
‐1 +1 ‐1
+1 ‐1 ‐1
+1 +1 1
x1 x2 Y
0 0 1
0 1 0
1 0 0
1 1 1
Optimization Techniques
• Binary CNN
• Multiple fully pipelined architecture
16
Batch normalization free
(BNF) [RAW17]
Internal FC layer replacement into a binary average
pooling [FPL17]
[FPL17] H. Nakahara, T. Fujii, S. Sato, ‘’A fully connected layer elimination for a binarized convolutional neural network 
on an FPGA,’’ FPL 2017, pp. 1‐4.
[RAW17] H. Yonekawa and H. Nakahara, ‘’On‐chip memory based binarized convolutional deep neural network 
applying batch normalization free technique on an FPGA,’’ IPDPS Workshops 2017, pp. 98‐105.
Dataflow for a 2D Convolutional Operation
17
...
...
...
...
mfeature maps
Input maps
...
...
Adder
Binarized
Weights
Sign
Shift Register
Output
maps
Integer
Bias
Dataflow for a 2D Convolutional Operation
18
...
...
...
...
mfeature maps
Input maps
...
...
Adder
Binarized
Weights
Sign
Shift Register
Output
maps
Integer
Bias
Dataflow for a 2D Convolutional Operation
19
...
...
...
...
mfeature maps
Input maps
...
...
Adder
Binarized
Weights
Sign
Shift Register
Output
maps
Integer
Bias
Pipelined Conv2D Circuit
x00 x01 x02 x03 x04
x10 x11 x12 x13 x14
x20 x21 x22 x23 x24
x30 x31 x32 x33 x34
x40 x41 x42 x43 x44
x22 x21 x20 x14 x13 x12 x11 x10 x04 x03 x02 x01 x00
+
Binarized
Weight
Mem.
Integer
Bias
Mem.
Write
Ctrl.
Logic
Counter
Binarized Feature Maps
(L=5, K=3)
9
Binarized MACs
(EXNORs + Adder Tree)
Sign
bit
Shift Register (2L+K bits)
Read M F.Maps at a time
Used CNN Model
21
Integer Conv2D
Binary Conv2D
Max Pooling
Binary Conv2D
Binary Conv2D
Binary Conv2D
Max Pooling
Binary Conv2D
Binary Conv2D
Binary Conv2D
Max Pooling
Fully Connect
Fully Connect
Fully Connect
Integer Conv2D
Binary Conv2D
Max Pooling
Binary Conv2D
Binary Conv2D
Binary Conv2D
Max Pooling
Binary Conv2D
Binary Conv2D
Binary Conv2D
Average Pooling
Fully Connect
VGG11
Our VGG
• Based on the VGG11 model
• 3x3 kernel convolution
• Replacement bottleneck 
(memory intensive) layers
into an average pooling one
Overall Architecture
• Weight sharing
22
Pipelined 
BCNN 1
FIFO
Pipelined 
BCNN 2
FIFO
Pipelined 
BCNN P
FIFO
AXI4 Bus
...
GPIO ARM Processor DDR Mem
Camera
Input Image
...
Weight 
Mem
Outline
• Background
• Object detector algorithm
• Fully pipelined Binarized CNN
• Experimental results
• Conclusion
23
Implementation Setup
• Board: Xilinx Inc. Zynq UltraScale+ 
MPSoC zcu102 evaluation board
• Zynq UltraScale+ MPSoC FPGA (ZU9EG, 
68,250 slices, 269,200 FFs, 1,824 BRAMs, 
2,520 DSP48Es)
• FPGA design tool: Vivado HLS 2017.2    
and Vivado 2017.2
• Timing constraint: 200MHz
• Deep learning framework: 
Chainer 1.24.0
• Dataset: KITTI car detection
(moderate) scenario
24
Variation of Fully Pipelined CNNs
25
CNN Parameter Hardware Resource Accuracy Speed
Window
Size q
Stride
∆X
#18Kb
BRAMs
#FFs #LUTs #DSPs mAP FPS
96x96 24 240 194,930 114,870 0 74.36 11.10
48 71.29 45.30
64x64 16 232 189,820 169,500 0 84.80 8.70
32 82.20 34.95
48x48 12 232 171,850 172,100 0 70.50 7.80
24 64.20 31.65
32x32 8 232 169,930 178,220 0 56.32 8.55
16 52.30 34.20
Comparison with GPU based Detectors
26
0
10
20
30
40
50
60
70
80
90
100
0.01 0.1 1 10 100
mAP(%)
Detection Speed (FPS)
Ours
29.97
YOLOv2
MV3D(LIDAR)
SPD+RPN
Deep MANTA
RRC
FPS Acc (%)
RRC 0.27 90.22
Deep 
MANTA
1.42 90.03
SDP+RPN 2.50 89.90
MV3D
(LIDAR)
4.16 79.76
YOLOv2 50.00 28.37
Proposed 34.95 82.20
YOLOv2 (GPU):   250.0 W
Proposed(FPGA):   2.5 W
Conclusion
• Applied a pipelined binary CNN to an object detector
• Multiple pipeline architecture
• Weight sharing
• Find good parameters for the KITTI car detection
• Better performance and accuracy than GPUs
• Future works
• Preprocessing to reduce HW and Time
• Selective search, BING, etc..
• Post‐processing to adjust bounding boxes
• SVR, another CNN, etc..
27
https://github.com/HirokiNakahara/GUINNESS
28

More Related Content

PDF
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
PDF
A Random Forest using a Multi-valued Decision Diagram on an FPGa
PDF
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
PDF
Batch normalization
PDF
Deep LearningフレームワークChainerと最近の技術動向
PDF
Naist2015 dec ver1
PDF
FPL15 talk: Deep Convolutional Neural Network on FPGA
PDF
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
A Random Forest using a Multi-valued Decision Diagram on an FPGa
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
Batch normalization
Deep LearningフレームワークChainerと最近の技術動向
Naist2015 dec ver1
FPL15 talk: Deep Convolutional Neural Network on FPGA
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...

What's hot (20)

PDF
A brief introduction to recent segmentation methods
PDF
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
PPTX
モデル高速化百選
PPTX
Convolutional neural networks 이론과 응용
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
PDF
Towards Machine Comprehension of Spoken Content
PDF
DeepFix: a fully convolutional neural network for predicting human fixations...
PPTX
モデルアーキテクチャ観点からの高速化2019
PPTX
Reducing the dimensionality of data with neural networks
PDF
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
PDF
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
PDF
Learning where to look: focus and attention in deep vision
PDF
Transformer 動向調査 in 画像認識
PDF
Introduction to Chainer Chemistry
PDF
[251] implementing deep learning using cu dnn
PDF
Convolutional Neural Network
PDF
Deep Learning Initiative @ NECSTLab
PDF
Deep Learningによる超解像の進歩
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
PDF
Synthetic dialogue generation with Deep Learning
 
A brief introduction to recent segmentation methods
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
モデル高速化百選
Convolutional neural networks 이론과 응용
Faster R-CNN: Towards real-time object detection with region proposal network...
Towards Machine Comprehension of Spoken Content
DeepFix: a fully convolutional neural network for predicting human fixations...
モデルアーキテクチャ観点からの高速化2019
Reducing the dimensionality of data with neural networks
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
Learning where to look: focus and attention in deep vision
Transformer 動向調査 in 画像認識
Introduction to Chainer Chemistry
[251] implementing deep learning using cu dnn
Convolutional Neural Network
Deep Learning Initiative @ NECSTLab
Deep Learningによる超解像の進歩
Convolutional neural networks for image classification — evidence from Kaggle...
Synthetic dialogue generation with Deep Learning
 
Ad

Similar to FPT17: An object detector based on multiscale sliding window search using a fully pipelined binarized CNN on an FPGA (20)

PPTX
Object extraction from satellite imagery using deep learning
PDF
Deep Learning AtoC with Image Perspective
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PDF
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
PDF
Grid is Dead ? Nimrod on the Cloud
PDF
Computer vision for transportation
PDF
Object Detetcion using SSD-MobileNet
PDF
Data Science, Machine Learning and Neural Networks
PDF
An Introduction to Deep Learning
PDF
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
PDF
kanimozhi2019.pdf
PDF
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
PDF
Big Data Malaysia - A Primer on Deep Learning
PPTX
OBJECT DETECTION FOR VISUALLY IMPAIRED USING TENSORFLOW LITE.pptx
PDF
Deep learning and image analytics using Python by Dr Sanparit
PPTX
FINAL_Team_4.pptx
PPTX
Object detection with deep learning
PPTX
Anomaly Detection with Azure and .net
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
Object extraction from satellite imagery using deep learning
Deep Learning AtoC with Image Perspective
Object Detection Beyond Mask R-CNN and RetinaNet I
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
Grid is Dead ? Nimrod on the Cloud
Computer vision for transportation
Object Detetcion using SSD-MobileNet
Data Science, Machine Learning and Neural Networks
An Introduction to Deep Learning
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
kanimozhi2019.pdf
"Enabling Ubiquitous Visual Intelligence Through Deep Learning," a Keynote Pr...
Big Data Malaysia - A Primer on Deep Learning
OBJECT DETECTION FOR VISUALLY IMPAIRED USING TENSORFLOW LITE.pptx
Deep learning and image analytics using Python by Dr Sanparit
FINAL_Team_4.pptx
Object detection with deep learning
Anomaly Detection with Azure and .net
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
Ad

More from Hiroki Nakahara (20)

PDF
ROS User Group Meeting #28 マルチ深層学習とROS
PDF
FPGAX2019
PDF
SBRA2018講演資料
PDF
DSF2018講演スライド
PDF
(公開版)Reconf研2017GUINNESS
PDF
(公開版)FPGAエクストリームコンピューティング2017
PDF
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
PDF
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
PDF
Tensor flow usergroup 2016 (公開版)
PDF
FPGAX2016 ドキュンなFPGA
PDF
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
PDF
Altera sdk for open cl アンケート集計結果(公開版)
PDF
Nested RNSを用いたディープニューラルネットワークのFPGA実装
PDF
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
PDF
Verilog-HDL Tutorial (15) software
PDF
Verilog-HDL Tutorial (15) hardware
PDF
Verilog-HDL Tutorial (14)
PDF
Verilog-HDL Tutorial (13)
PDF
Verilog-HDL Tutorial (12)
PDF
Verilog-HDL Tutorial (11)
ROS User Group Meeting #28 マルチ深層学習とROS
FPGAX2019
SBRA2018講演資料
DSF2018講演スライド
(公開版)Reconf研2017GUINNESS
(公開版)FPGAエクストリームコンピューティング2017
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
Tensor flow usergroup 2016 (公開版)
FPGAX2016 ドキュンなFPGA
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
Altera sdk for open cl アンケート集計結果(公開版)
Nested RNSを用いたディープニューラルネットワークのFPGA実装
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
Verilog-HDL Tutorial (15) software
Verilog-HDL Tutorial (15) hardware
Verilog-HDL Tutorial (14)
Verilog-HDL Tutorial (13)
Verilog-HDL Tutorial (12)
Verilog-HDL Tutorial (11)

Recently uploaded (20)

PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
Project quality management in manufacturing
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
Well-logging-methods_new................
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Construction Project Organization Group 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Digital Logic Computer Design lecture notes
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Project quality management in manufacturing
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Structs to JSON How Go Powers REST APIs.pdf
Sustainable Sites - Green Building Construction
Well-logging-methods_new................
Operating System & Kernel Study Guide-1 - converted.pdf
Lesson 3_Tessellation.pptx finite Mathematics
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Construction Project Organization Group 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Digital Logic Computer Design lecture notes
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

FPT17: An object detector based on multiscale sliding window search using a fully pipelined binarized CNN on an FPGA