SlideShare a Scribd company logo
CONDOR
AN AUTOMATED FRAMEWORK TO ACCELERATE
CONVOLUTIONAL NEURAL NETWORKS ON FPGA
Soda-430-438 Woz Lounge

Berkeley, CA

May 23rd, 2018
Niccolò Raspa, Marco Bacis, 

Giuseppe Natale, Marco D. Santambrogio
CONDOR: An automated framework to accelerate convolutional neural networks on FPGA
CONDOR: An automated framework to accelerate convolutional neural networks on FPGA
CONDOR: An automated framework to accelerate convolutional neural networks on FPGA
Convolutional Neural Networks
!5
Deep Convolutional Neural Networks
!6
!7
CNN on siliconCNN on silicon
GPU ASIC
FPGA
LeNet - Training
!8
PROTOTXT

CAFFEMODEL
LeNet - Deployment
!9
PROTOTXT

CAFFEMODEL
?
Manual Design
!10
Extract the parameters

and the weights
Write the code Synthesis
Evaluate DesignPackage IP
Iterate
Automatic Design
!11
CONDOR
PROTOTXT

CAFFEMODEL
Framework Architecture
!12
Parse structure 

of the CNN
FRONTEND
Creation of HW
Accelerator
CORE LOGIC
Deployment
BACKEND
Create DAG computation
!13
PROTOTXT

CAFFEMODEL
{
Input Data
Convolution
Pooling
Fully Connected
Convolution
Pooling
Fully Connected
Input Dimension: (28, 28, 1)
Output Dimension (24, 24, 20)
Kernel: 5
Padding: 0
Stride: 1
Input dimension (28, 28, 1)
Input Dimension: (24, 24, 20)
Output Dimension (12, 12, 20)
Kernel: 2
Padding: 0
Stride: 2
Map computation in hardware
!14
Area
Convolution Pooling Fully ConnectedConvolution Pooling
Increase parallelism
!15
Area
Integration with SDAccel
!16
CONDOR
What if I don’t have an FPGA?
!17
CONDOR
Features
!18
Cloud Integration
via Amazon F1 Instances
Automatic creation of
an hardware accelerator for FPGA
Tune the tradeoff between 

performance and power consumption
Support main deep
learning libraries
Roadmap
!19
Automated
Framework
Methodology for
Acceleration of
CNN
Integration
with Caffe
Cloud Integration
M. Bacis, G. Natale, E. Del Sozzo, and M. D. Santambrogio.
“A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA”
In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Giuseppe Natale, Marco Bacis and Marco Domenico Santambrogio.
“On how to design dataflow FPGA-based accelerators for Convolutional Neural Networks”
In: 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
*
2017
Support the new 

standard ONXX
2018
Open Source 

Release
2019?
Extend
Methodology
XOHW
Competition
*
CONDOR
ARCHITECTURAL CHOICES FOR THE FPGA-BASED
ACCELERATION OF CNNs
Soda-430-438 Woz Lounge

Berkeley, CA

May 23rd, 2018
Niccolò Raspa, Marco Bacis, 

Giuseppe Natale, Marco D. Santambrogio
Framework
!21
Parse structure 

of the CNN
FRONTEND
Creation of HW
Accelerator
CORE LOGIC
Deployment
BACKEND
FPGAs and CNNs
!22
Dataflow Computation
Data reusability
Distributed Architecture
Our first approach
!23
[1] Giuseppe Natale, Marco Bacis, Marco D. Santambrogio
“On how to design fpga-based accelerators for Convolutional Neural Networks”, ISVLSI 2017
DMA
in
CONV POOL CONV LINEAR
w/b w/b w/b
POOL
POOL
POOL
POOL
Bigger networks, bigger FPGAs… or not?
!24
• Weights don’t fit on the on-chip BRAMs

• Unrolling leads to the explosion of DSPs (multipliers) usage
Methodology improvements
!25
• No complete unrolling - partial accumulations

• Generic set of “one size fits all” blocks

• Semi-dataflow architecture

• More complex data movement
Customizable data flow
!26
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Customizable data flow
!27
ReLU
Pool
ReLU
ReLU
Conv
Conv
Conv
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
MAC
weights
input
result
Dataflow Blocks
!28
• Convolution, Pooling, ReLU etc…

• Non-uniform memory partitioning

• Streaming pattern 

• Optimal full buffering

• Concurrent accesses
Partial accumulations approach
!29
• Custom level of parallelism

• Compute subset of both input/output feature maps

• Accumulation done with a FIFO and/or from DDR
Memory control and data buffering
!30
• Memory mapped to streaming and viceversa

• Exploit the maximum transaction size and bursts
Weights Double Buffering
!31
• Masks weights loading latency

• Allows to not flush the MAC pipeline on each iteration
ping
pong
Input Caching
!32
• Reduces memory accesses

• Stores entire input for a layer

• Used for small layers (avoid lots of small transactions)
Datamover/ControlInput
Cache
in
Datamover/ControlInput
Cache
in
Architecture Evaluation
!33
•5 MB BRAM

•2880 DSPs (27x15 bits mult)

•1 DDR port (512 bits wide)

•115.2 GFLOPs max (100Mhz)
Alphadata Virtex-7
Setup Results
•30.6 GOPs, 56MB parameters

•4 input, 4 output ports

•27.2 GFLOPs estimated

•14.4 GFLOPs reached
VGG16 Network
Lessons Learned
!34
• Floating point is dead, long live the fixed!

• Off-chip memory vs On-chip memory 

• Old hardware vs New Hardware
Next Steps
!35
• Possibility to use URAMs as on chip storage (33.75 MB)
• Higher number of DSPs (~2.3X)

• Efficient multiplication (8 bits fixed point -> 2 mul/dsp)

• Higher memory BW (4 DDR ports)
Next Steps
MAC/
Window
FSM
Acc/ReLU
Pooling
1024 out512 in 2-64 out
Weights
I/O Buffer
“A Framework with Cloud Integration for 

CNN Acceleration on FPGA Devices”
Marco Bacis
marco.bacis@mail.polimi.it
Niccolo’ Raspa

niccolo.raspa@mail.polimi.it
Giuseppe Natale
giuseppe.natale@polimi.it
Marco D. Santambrogio
marco.santambrogio@polimi.it
twitter.com/CondorAtNECST
facebook.com/CondorAtNECST

More Related Content

PDF
CONDOR: An automated framework to accelerate convolutional neural networks on...
PDF
CONDOR: An automated framework to accelerate convolutional neural networks on...
PDF
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
PPTX
Xilinx fpga cores
PPTX
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
PPTX
A Flexible Router Architecture for 3D Network-on-Chips
PDF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
PPTX
CONDOR: An automated framework to accelerate convolutional neural networks on...
CONDOR: An automated framework to accelerate convolutional neural networks on...
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
Xilinx fpga cores
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
A Flexible Router Architecture for 3D Network-on-Chips
Unifying Network Filtering Rules for the Linux Kernel with eBPF

What's hot (20)

PDF
ODSA Proof of Concept SmartNIC Speeds & Feeds
PPTX
GPU Performance Prediction Using High-level Application Models
PDF
00 opencapi acceleration framework yonglu_ver2
PDF
Sx 6-single-node
PPTX
Programmable Exascale Supercomputer
PDF
Training Lecture
PPT
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PDF
POWER9 for AI & HPC
PDF
From Rack scale computers to Warehouse scale computers
PDF
HPC Cloud: Clouds on supercomputers for HPC
PDF
ODSA Workshop: Development Effort Summary
PDF
ODSA Use Case - SmartNIC
PDF
XPDDS17: How to Abstract Hardware Acceleration Device in Cloud Environment - ...
PDF
Architecture innovations in POWER ISA v3.01 and POWER10
PDF
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
PPT
Dad i want a supercomputer on my next
PDF
Open Hardware and Future Computing
PDF
Training Slides: Basics 104: Simple Tungsten Clustering Deployments
PDF
Expectations for optical network from the viewpoint of system software research
PPTX
An Automatic Generation of NoC Architectures: An Application-Mapping Approach
ODSA Proof of Concept SmartNIC Speeds & Feeds
GPU Performance Prediction Using High-level Application Models
00 opencapi acceleration framework yonglu_ver2
Sx 6-single-node
Programmable Exascale Supercomputer
Training Lecture
PFQ@ 9th Italian Networking Workshop (Courmayeur)
POWER9 for AI & HPC
From Rack scale computers to Warehouse scale computers
HPC Cloud: Clouds on supercomputers for HPC
ODSA Workshop: Development Effort Summary
ODSA Use Case - SmartNIC
XPDDS17: How to Abstract Hardware Acceleration Device in Cloud Environment - ...
Architecture innovations in POWER ISA v3.01 and POWER10
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
Dad i want a supercomputer on my next
Open Hardware and Future Computing
Training Slides: Basics 104: Simple Tungsten Clustering Deployments
Expectations for optical network from the viewpoint of system software research
An Automatic Generation of NoC Architectures: An Application-Mapping Approach
Ad

Similar to CONDOR: An automated framework to accelerate convolutional neural networks on FPGA (20)

PPTX
Designing for High Performance Ceph at Scale
PPTX
Implementation strategies for digital ics
PPTX
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
PDF
A Dataflow Processing Chip for Training Deep Neural Networks
PDF
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
PPTX
Trends and challenges in IP based SOC design
PPT
Mp So C 18 Apr
PPTX
Dr.s.shiyamala fpga ppt
PDF
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
PPT
An Introduction to Field Programmable Gate Arrays
PPT
CASFPGA1.ppt
PDF
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
PPT
PDF
OOW 2013: Where did my CPU go
PDF
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
PDF
Cloud Networking is not Virtual Networking - London VMUG 20130425
PPSX
Summary Of Course Projects
PPTX
Exascale Capabl
PDF
Oow 2008 yahoo_pie-db
PDF
Cuda 6 performance_report
Designing for High Performance Ceph at Scale
Implementation strategies for digital ics
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
A Dataflow Processing Chip for Training Deep Neural Networks
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Trends and challenges in IP based SOC design
Mp So C 18 Apr
Dr.s.shiyamala fpga ppt
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
An Introduction to Field Programmable Gate Arrays
CASFPGA1.ppt
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
OOW 2013: Where did my CPU go
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Cloud Networking is not Virtual Networking - London VMUG 20130425
Summary Of Course Projects
Exascale Capabl
Oow 2008 yahoo_pie-db
Cuda 6 performance_report
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTCamp101@2020.10.1
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTCamp101@2020.10.1
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching

Recently uploaded (20)

PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Soil Improvement Techniques Note - Rabbi
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPT
Total quality management ppt for engineering students
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Current and future trends in Computer Vision.pptx
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
737-MAX_SRG.pdf student reference guides
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
introduction to high performance computing
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Soil Improvement Techniques Note - Rabbi
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Management Information system : MIS-e-Business Systems.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Total quality management ppt for engineering students
Information Storage and Retrieval Techniques Unit III
Current and future trends in Computer Vision.pptx
August 2025 - Top 10 Read Articles in Network Security & Its Applications
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
737-MAX_SRG.pdf student reference guides
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
introduction to high performance computing

CONDOR: An automated framework to accelerate convolutional neural networks on FPGA