SlideShare a Scribd company logo
Image processing on FPGA
Eugene Khvedchenya
https://ua.linkedin.com/in/cvtalks
What is FPGA and who needs it ?
General implementation
OpenCL
Cache tuning
Multithreading
SIMD (SSE, NEON)
FPGA
Optimization pyramid
What’s inside?
LUT
Flip-Flop
ALU
BRAM
IO pads
FPGA
Development efforts
CPU vs FPGA
CPU vs FPGA
CPU vs FPGA
Development efforts
High Level Synthesis
Converts C++ code to hardware design
HLS compiler optimizes your code for FPGA
Automatically optimize RTL and timing
Provides #pragma’s for fine tuning
C++ API for arbitrary precision math
C++ API for stream data processing
Supports C++ 11
Things to remember
No branching penalty
Things to remember
No dynamic memory allocation
Things to remember
Instantaneous BRAM access
Register-level bandwidth 0.5M-bits / second
BRAM bandwidth 23T-bits / second
Numbers above for Xilinx Kintex®-7 410T device
Things to remember
Single producer - single consumer
Things to remember
Pipelining
Things to remember
● No branching penalty
● No cache penalty
● No dynamic memory allocation
● Instantaneous BRAM access
● Single producer - single consumer
● Pipelining
● Task-centric approach
HLS Development cycle
1. Get baseline version
2. Write simulation test
3. Run HLS synthesis
4. Simulate
5. Validate
6. Measure
7. Optimize
8. Goto 3
Sobel Edge Detection
Goal: Process image 1920x1080 @ 60HZ
Sobel Edge Detection
Baseline implementation
Iterate over image
● Convolve 3x3 window with Gx and Gy kernels
● Compute their absolute sum
● Write to corresponding output pixel
The FPGA frequency is this example is 150 Mhz
To meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster
Sobel Edge Detection
Baseline implementation
Sobel Edge Detection
Baseline implementation
40 cycles/pixel on FPGA
Timing violation
Sobel Edge Detection
Tuning FPGA implementation
Iterate over image
● Convolve 3x3 window with Gx and Gy kernels
Pipeline: Compute one field in the 3x3 filter window per clock cycle.
● Compute Gx and Gy absolute sum
● Write to corresponding output pixel
Sobel Edge Detection
Tuning FPGA implementation
Sobel Edge Detection
Tuning FPGA implementation
10 cycles/pixel on FPGA
Timing violation
Sobel Edge Detection
Tuning FPGA implementation
Iterate over image
● Pipeline: Apply pipeline to the inner loop (columns)
● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle
● Compute Gx and Gy absolute sum
○ Also computed in parallel
● Write to corresponding output pixel
Sobel Edge Detection
Tuning FPGA implementation
Sobel Edge Detection
Tuning FPGA implementation
1 cycle/pixel on FPGA
Memory-access violation
Sobel Edge Detection
Tuning FPGA implementation
Issues
● Nine concurrent memory accesses
● More hardware blocks required
● HLS module can only connect a single port capable of one transaction/clock
Sobel Edge Detection
Tuning FPGA implementation
● Use BRAM to store intermediate line buffer
● Read data from external memory to line buffer
● Fill memory window (Flip-flop elements)
● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle
● Compute their absolute sum
○ Also computed in parallel
● Write to corresponding output pixel
Sobel Edge Detection
Tuning FPGA implementation
1 cycle/pixel on FPGA
Achievement unlocked
The dark side
Of the FPGA development
● The tools aren’t great
● It works in simulator!
● Learning curve
● Debugging timing violations
Quick start
● FPGA Development board: Altera, Xilinx
● IDE & Samples: Vivado
● OpenCV support
● HLS for OpenCL
Image processing on FPGA
Eugene Khvedchenya
Questions?
https://ua.linkedin.com/in/cvtalks
ekhvedchenya@gmail.com
@cvtalks

More Related Content

PPTX
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
PDF
Maxim Kamensky - Applying image matching algorithms to video recognition and ...
PPTX
Taras Chaykivskyy - Computer Vision in Front-End
PDF
Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...
PPTX
Old code for code quality
PPTX
Optimizing Total War*: WARHAMMER II
PPTX
Report
PDF
Con-FESS 2015 - Is your profiler speaking to you?
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Maxim Kamensky - Applying image matching algorithms to video recognition and ...
Taras Chaykivskyy - Computer Vision in Front-End
Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...
Old code for code quality
Optimizing Total War*: WARHAMMER II
Report
Con-FESS 2015 - Is your profiler speaking to you?

What's hot (20)

PDF
GPU Pipeline - Realtime Rendering CH3
PPT
Challenges in Embedded Development
PDF
Minimizing CPU Shortage Risks in Integrated Embedded Software
PPTX
Getting Space Pirate Trainer* to Perform on Intel® Graphics
PPTX
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
PPTX
[TGDF 2020] Mobile Graphics Best Practices for Artist
PDF
Memory Leak Analysis in Android Games
PDF
Horovod ubers distributed deep learning framework by Alex Sergeev from Uber
PPT
Unity mobile game performance profiling – using arm mobile studio
PPSX
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
PPTX
PDF
GPU Computing for Data Science
PDF
TinyML as-a-Service
PDF
BruCON 2010 Lightning Talks - DIY Grid Computing
PDF
SpeedIT FLOW
PDF
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
PDF
Kernel Recipes 2014 - kGraft: Live Patching of the Linux Kernel
PDF
بررسی و انتخاب بهترین زبان برنامه نویسی
PPSX
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
PPTX
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
GPU Pipeline - Realtime Rendering CH3
Challenges in Embedded Development
Minimizing CPU Shortage Risks in Integrated Embedded Software
Getting Space Pirate Trainer* to Perform on Intel® Graphics
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[TGDF 2020] Mobile Graphics Best Practices for Artist
Memory Leak Analysis in Android Games
Horovod ubers distributed deep learning framework by Alex Sergeev from Uber
Unity mobile game performance profiling – using arm mobile studio
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...
GPU Computing for Data Science
TinyML as-a-Service
BruCON 2010 Lightning Talks - DIY Grid Computing
SpeedIT FLOW
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Kernel Recipes 2014 - kGraft: Live Patching of the Linux Kernel
بررسی و انتخاب بهترین زبان برنامه نویسی
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications
Ad

Viewers also liked (9)

PPT
Michael Norel - High Accuracy Camera Calibration
PPT
Andrii Babii - Application of fuzzy transform to image fusion
PPTX
James Pritts - Visual Recognition in the Wild: Image Retrieval, Faces, and Text
PDF
#3 Global AI Meetup (NLP) - Станислав Гафаров, MrBot
PDF
#3 Global AI Meetup (NLP) - Михаил Бурцев, DeepHackLab
PDF
#3 Global AI Meetup (NLP) - Олег Шляжко, Chatfuel
PPTX
Анализ ниши 80-го левела - нюансы, кейсы, практика
PDF
PDF
30 Reasons to Start a Business
Michael Norel - High Accuracy Camera Calibration
Andrii Babii - Application of fuzzy transform to image fusion
James Pritts - Visual Recognition in the Wild: Image Retrieval, Faces, and Text
#3 Global AI Meetup (NLP) - Станислав Гафаров, MrBot
#3 Global AI Meetup (NLP) - Михаил Бурцев, DeepHackLab
#3 Global AI Meetup (NLP) - Олег Шляжко, Chatfuel
Анализ ниши 80-го левела - нюансы, кейсы, практика
30 Reasons to Start a Business
Ad

Similar to Eugene Khvedchenia - Image processing using FPGAs (20)

PPTX
Sobel Edge Detection Using FPGA
PPTX
Edge Detection using 4 bit MAC on Basys3 FPGA
PDF
Can FPGAs Compete with GPUs?
DOCX
Research on image processing based on fpga
PPTX
Using FPGA in Embedded Devices
PDF
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
PDF
COMPARISON OF GPU AND FPGA HARDWARE ACCELERATION OF LANE DETECTION ALGORITHM
PDF
Comparison of GPU and FPGA Hardware Acceleration of Lane Detection Algorithm
PDF
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
PPTX
Introduction to FPGA acceleration
PDF
An Efficient FPGA Implemenation of MRI Image Filtering and Tumour Characteriz...
PDF
AN EFFICIENT FPGA IMPLEMENTATION OF MRI IMAGE FILTERING AND TUMOUR CHARACTERI...
PDF
Moving object detection on FPGA
PDF
On the Capability and Achievable Performance of FPGAs for HPC Applications
PDF
Hardware software co simulation of edge detection for image processing system...
PDF
Shantanu's Resume
PDF
Transformation and dynamic visualization of images from computer through an F...
PDF
Ku3419461949
PPTX
SoC FPGA Technology
PDF
Performance analysis of sobel edge filter on heterogeneous system using opencl
Sobel Edge Detection Using FPGA
Edge Detection using 4 bit MAC on Basys3 FPGA
Can FPGAs Compete with GPUs?
Research on image processing based on fpga
Using FPGA in Embedded Devices
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
COMPARISON OF GPU AND FPGA HARDWARE ACCELERATION OF LANE DETECTION ALGORITHM
Comparison of GPU and FPGA Hardware Acceleration of Lane Detection Algorithm
HARDWARE SOFTWARE CO-SIMULATION FOR TRAFFIC LOAD COMPUTATION USING MATLAB SIM...
Introduction to FPGA acceleration
An Efficient FPGA Implemenation of MRI Image Filtering and Tumour Characteriz...
AN EFFICIENT FPGA IMPLEMENTATION OF MRI IMAGE FILTERING AND TUMOUR CHARACTERI...
Moving object detection on FPGA
On the Capability and Achievable Performance of FPGAs for HPC Applications
Hardware software co simulation of edge detection for image processing system...
Shantanu's Resume
Transformation and dynamic visualization of images from computer through an F...
Ku3419461949
SoC FPGA Technology
Performance analysis of sobel edge filter on heterogeneous system using opencl

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)

Eugene Khvedchenia - Image processing using FPGAs