SlideShare a Scribd company logo
1
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
OXiGen
From C to FPGA dataflow kernels
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
Marco Domenico Santambrogio: marco.santambrogio@polimi.it
May 17th - 30th 2019
NGCX, San Francisco
2
Image property of
Design Time
Performance
FPGA with HLS
FPGA with
HLS
FPGA with RTL
FPGA with RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
FPGA design methods
3
Image property of
Design Time
Performance
FPGA with HLS
FPGA with
HLS
FPGA with RTL
FPGA with RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
FPGA with
OXiGen
FPGA design methods
4
The dataflow model
Dataflow
core
Dataflow
core
Dataflow
core
Dataflow
core
Dataflow
core
Dataflow
core
Dataflow
core
Memory
Data
Data
5
Contributions
PRODUCTIVITY
AUTOMATED TRANSLATION
FROM C TO DATAFLOW
DESIGN-SPACE
EXPLORATION
AUTOMATED TESTING
PERFORMANCE
6
Target architecture
• FPGA resources logically divided in two portions:
– Manager: responsible for the communication with the host
– Kernel: performs the actual dataflow computation
7
OXiGen overview
Dataflow
translator
LLVM
DSE module
Backend
translator
Backend
synthesis
tool
TECHNOLOGY LIBRARY
SYNTHESIS-READY CODE OPT. CONFIG.FPGA BITSTREAM
Frontend flow
Function optimization flow
Backend flow
DFG IR
Function
analysisLLVM IR
8
void f(int in_1, float* in_2, …, float* out_n){
for( int i = 0, i < N; i++ ) {
for(int j … ) {
… statements …
for( int k … ) { … }
}
… statements …
int a[N][M] = { … };
float s = 0;
for( int j = 0; j < M; j++ ) {
… statements …
s += a[i][j] * … ;
}
}
}
OXiGen code example
9
Optimization strategies
REROLLING:
• Nested loop are unrolled by default
Resources driven design
Throughput driven design
Unoptimized design
Throughput
HW resources
VECTORIZATION
REROLLING
CYCLIC
DFG
DATA INTERLEAVING
10
𝜃: Implementation-specific
vi: Optimization-specific
∀ Optimization:
Free variables
Dataflow IR
Resource
model
Performance
model
Mixed Integer Linear Programming (MILP)
model
መ𝜃, ො𝑣 : Optimal values
Design space exploration
Technology Library
11
Resource
model
Mixed Integer Linear Programming (MILP)
model
መ𝜃, ො𝑣 : Optimal values
Design space exploration
Takes into account:
• BRAM use estimation
• Memory partitioning
• Technological implementation
(DSP Push)
• Rerolling factor
• Vectorization factor
• …
12
Performance
model
Mixed Integer Linear Programming (MILP)
model
መ𝜃, ො𝑣 : Optimal values
Design space exploration
Takes into account:
• Operator latency (pipeline
Push)
• Upper bound on synthesis
frequency
• Rerolling factor
• Vectorization factor
• …
Dataflow IR Technology Library
13
• MaxCompiler
• Galava MAX4 board
• Stratix V FPGA
Experimental evaluations
• Asian option pricing 30 avg. points
• Asian option pricing 780 avg. points
• Quantum Monte Carlo
APPLICATIONS
EXPERIMENTAL SETUP
14
Results
Algorithm Reroll.
factor
Cyclic
dataflow
Data
interl.
DSP
push
Pipel.
push
Freq. Speedup
w.r.t. SOA
Speedup
w.r.t. CPU
AOP 30 4 yes yes 0.1 0.3 210 1.34x w.r.t[1] 118.4x
AOP 30 4 yes yes 0.1 0.3 210 1.23x w.r.t.[2] 118.4x
AOP 780 98 yes yes 0.1 0.3 215 0.5x w.r.t.[2] 101.6x
VMC 128 yes yes 0.1 0.3 210 0.93x w.r.t.[3] 26x
The CPU baseline is a single-threaded implementation compiled with gcc 4.4.7 and –O3 optimization run on a Intel(R)
Core(TM) i7-6700 CPU @ 3.40GHz
[1] F. Peverelli, M. Rabozzi, E. Del Sozzo, and M. D. Santambrogio, “Oxigen: A tool for automatic acceleration of c
functions into dataflow fpga-based kernels,” in 2018 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW). IEEE, 2018, pp. 91–98.
[2] A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio, “A scalable dataflow
implementation of curran’s approximation algorithm,” in Parallel and Distributed Processing Symposium Workshops
(IPDPSW), 2017 IEEE International. IEEE, 2017, pp. 150–157..
[3] S. Cardamone, J. R. Kimmitt, H. G. Burton, and A. J. Thom, “Field programmable gate arrays and quantum monte
carlo: Power efficient co-processing for scalable high-performance computing,” arXiv preprintarXiv:1808.02402, 2018.
15
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
Thank you!
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
Marco Domenico Santambrogio: marco.santambrogio@polimi.it
https://www.slideshare.net/necstlab https://necst.it/

More Related Content

PDF
LPC2019 BPF Tracing Tools
PDF
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
PDF
P4, EPBF, and Linux TC Offload
PDF
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
PDF
FPGAの処理をソフトウェアコンポーネント化する設計ツールcReCompの高機能化の検討
PDF
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
PDF
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
PDF
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
LPC2019 BPF Tracing Tools
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
P4, EPBF, and Linux TC Offload
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
FPGAの処理をソフトウェアコンポーネント化する設計ツールcReCompの高機能化の検討
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond

What's hot (20)

PDF
FPGA処理をROSコンポーネント化する自動設計環境
PDF
Defcon 2011 network forensics 解题记录
PDF
BPF Hardware Offload Deep Dive
PDF
CETH for XDP [Linux Meetup Santa Clara | July 2016]
PDF
ebpf and IO Visor: The What, how, and what next!
PDF
04 New opportunities in photon science with high-speed X-ray imaging detecto...
PDF
An evaluation of LLVM compiler for SVE with fairly complicated loops
PDF
BPF - All your packets belong to me
PDF
Kernel development
PPTX
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
PDF
YOW2021 Computing Performance
PDF
Debugging Hung Python Processes With GDB
PDF
Arm tools and roadmap for SVE compiler support
PDF
Programming Languages & Tools for Higher Performance & Productivity
PDF
Fredmoyer postgresopen 2017
PDF
Programming Trends in High Performance Computing
PDF
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
PDF
The FE-I4 Pixel Readout System-on-Chip for ATLAS Experiment Upgrades
PDF
Track Finding in LHCb's 2020 Trigger
PDF
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
FPGA処理をROSコンポーネント化する自動設計環境
Defcon 2011 network forensics 解题记录
BPF Hardware Offload Deep Dive
CETH for XDP [Linux Meetup Santa Clara | July 2016]
ebpf and IO Visor: The What, how, and what next!
04 New opportunities in photon science with high-speed X-ray imaging detecto...
An evaluation of LLVM compiler for SVE with fairly complicated loops
BPF - All your packets belong to me
Kernel development
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
YOW2021 Computing Performance
Debugging Hung Python Processes With GDB
Arm tools and roadmap for SVE compiler support
Programming Languages & Tools for Higher Performance & Productivity
Fredmoyer postgresopen 2017
Programming Trends in High Performance Computing
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
The FE-I4 Pixel Readout System-on-Chip for ATLAS Experiment Upgrades
Track Finding in LHCb's 2020 Trigger
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
Ad

Similar to OXiGen: Automated FPGA design flow from C applications to dataflow kernels - pitch version (20)

PDF
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
PDF
OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-b...
PPTX
DATE 2020: Design, Automation and Test in Europe Conference
PDF
The CAOS framework: Democratize the acceleration of compute intensive applica...
PDF
The CAOS framework: democratize the acceleration of compute intensive applica...
PPTX
Programmable Exascale Supercomputer
PPTX
Exascale Capabl
PDF
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
PPTX
Introduction to FPGA acceleration
PPTX
HiPEAC-Keynote.pptx
PDF
Efficient fpga mapping of pipeline sdf fft cores
PDF
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
PPTX
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
PDF
SDAccel Design Contest: Vivado HLS
PPTX
Using FPGA in Embedded Devices
PDF
FPGA-enhanced Bioinformatics @ NECST
PDF
Can FPGAs Compete with GPUs?
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PPTX
Innovation project in ECE of B13-299.pptx
PPTX
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
OXiGen: Automated FPGA design flow from C applications to dataflow kernels - ...
OXiGen: A tool for automatic acceleration of C functions into dataflow FPGA-b...
DATE 2020: Design, Automation and Test in Europe Conference
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
Programmable Exascale Supercomputer
Exascale Capabl
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
Introduction to FPGA acceleration
HiPEAC-Keynote.pptx
Efficient fpga mapping of pipeline sdf fft cores
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
SDAccel Design Contest: Vivado HLS
Using FPGA in Embedded Devices
FPGA-enhanced Bioinformatics @ NECST
Can FPGAs Compete with GPUs?
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
Innovation project in ECE of B13-299.pptx
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTCamp101@2020.10.1
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTCamp101@2020.10.1
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Geodesy 1.pptx...............................................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Welding lecture in detail for understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PPT on Performance Review to get promotions
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Geodesy 1.pptx...............................................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Foundation to blockchain - A guide to Blockchain Tech
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Internet of Things (IOT) - A guide to understanding
Welding lecture in detail for understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT on Performance Review to get promotions
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CH1 Production IntroductoryConcepts.pptx

OXiGen: Automated FPGA design flow from C applications to dataflow kernels - pitch version

  • 1. 1 DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA OXiGen From C to FPGA dataflow kernels Francesco Peverelli: francesco1.peverelli@mail.polimi.it Marco Rabozzi: marco.rabozzi@polimi.it Emanuele Del Sozzo: emanuele.delsozzo@polimi.it Marco Domenico Santambrogio: marco.santambrogio@polimi.it May 17th - 30th 2019 NGCX, San Francisco
  • 2. 2 Image property of Design Time Performance FPGA with HLS FPGA with HLS FPGA with RTL FPGA with RTL x86 GPU DSP x86 DSP GPU First working version Optimized version Software project design time limit FPGA design methods
  • 3. 3 Image property of Design Time Performance FPGA with HLS FPGA with HLS FPGA with RTL FPGA with RTL x86 GPU DSP x86 DSP GPU First working version Optimized version Software project design time limit FPGA with OXiGen FPGA design methods
  • 5. 5 Contributions PRODUCTIVITY AUTOMATED TRANSLATION FROM C TO DATAFLOW DESIGN-SPACE EXPLORATION AUTOMATED TESTING PERFORMANCE
  • 6. 6 Target architecture • FPGA resources logically divided in two portions: – Manager: responsible for the communication with the host – Kernel: performs the actual dataflow computation
  • 7. 7 OXiGen overview Dataflow translator LLVM DSE module Backend translator Backend synthesis tool TECHNOLOGY LIBRARY SYNTHESIS-READY CODE OPT. CONFIG.FPGA BITSTREAM Frontend flow Function optimization flow Backend flow DFG IR Function analysisLLVM IR
  • 8. 8 void f(int in_1, float* in_2, …, float* out_n){ for( int i = 0, i < N; i++ ) { for(int j … ) { … statements … for( int k … ) { … } } … statements … int a[N][M] = { … }; float s = 0; for( int j = 0; j < M; j++ ) { … statements … s += a[i][j] * … ; } } } OXiGen code example
  • 9. 9 Optimization strategies REROLLING: • Nested loop are unrolled by default Resources driven design Throughput driven design Unoptimized design Throughput HW resources VECTORIZATION REROLLING CYCLIC DFG DATA INTERLEAVING
  • 10. 10 𝜃: Implementation-specific vi: Optimization-specific ∀ Optimization: Free variables Dataflow IR Resource model Performance model Mixed Integer Linear Programming (MILP) model መ𝜃, ො𝑣 : Optimal values Design space exploration Technology Library
  • 11. 11 Resource model Mixed Integer Linear Programming (MILP) model መ𝜃, ො𝑣 : Optimal values Design space exploration Takes into account: • BRAM use estimation • Memory partitioning • Technological implementation (DSP Push) • Rerolling factor • Vectorization factor • …
  • 12. 12 Performance model Mixed Integer Linear Programming (MILP) model መ𝜃, ො𝑣 : Optimal values Design space exploration Takes into account: • Operator latency (pipeline Push) • Upper bound on synthesis frequency • Rerolling factor • Vectorization factor • … Dataflow IR Technology Library
  • 13. 13 • MaxCompiler • Galava MAX4 board • Stratix V FPGA Experimental evaluations • Asian option pricing 30 avg. points • Asian option pricing 780 avg. points • Quantum Monte Carlo APPLICATIONS EXPERIMENTAL SETUP
  • 14. 14 Results Algorithm Reroll. factor Cyclic dataflow Data interl. DSP push Pipel. push Freq. Speedup w.r.t. SOA Speedup w.r.t. CPU AOP 30 4 yes yes 0.1 0.3 210 1.34x w.r.t[1] 118.4x AOP 30 4 yes yes 0.1 0.3 210 1.23x w.r.t.[2] 118.4x AOP 780 98 yes yes 0.1 0.3 215 0.5x w.r.t.[2] 101.6x VMC 128 yes yes 0.1 0.3 210 0.93x w.r.t.[3] 26x The CPU baseline is a single-threaded implementation compiled with gcc 4.4.7 and –O3 optimization run on a Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz [1] F. Peverelli, M. Rabozzi, E. Del Sozzo, and M. D. Santambrogio, “Oxigen: A tool for automatic acceleration of c functions into dataflow fpga-based kernels,” in 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2018, pp. 91–98. [2] A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio, “A scalable dataflow implementation of curran’s approximation algorithm,” in Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 2017, pp. 150–157.. [3] S. Cardamone, J. R. Kimmitt, H. G. Burton, and A. J. Thom, “Field programmable gate arrays and quantum monte carlo: Power efficient co-processing for scalable high-performance computing,” arXiv preprintarXiv:1808.02402, 2018.
  • 15. 15 DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Thank you! Francesco Peverelli: francesco1.peverelli@mail.polimi.it Marco Rabozzi: marco.rabozzi@polimi.it Emanuele Del Sozzo: emanuele.delsozzo@polimi.it Marco Domenico Santambrogio: marco.santambrogio@polimi.it https://www.slideshare.net/necstlab https://necst.it/