SlideShare a Scribd company logo
Managing High Performance  Data Pipeline Execution  with an FPGA Processor Presenter: Ben Hor – Xilinx, Inc Authors: Glenn Steiner, Dan Isaacs – Xilinx, Inc. David Pellerin – Impulse Accelerated Technologies
Agenda  What is Control Plane/Data Plane Processing and Why Might I Need It? FPGA’s Enable Balancing Computation Between a Processor and Application Specific Logic Implementation of a Control/Data Plane System is Straightforward Case Study: An HD Video Recognition System Connecting the Embedded Processor to the FPGA with Linux Summary
What is  Control Plane / Data Plane Processing and  Why Might I Need It?
Challenge Example:    HD Video Streaming 720P    74.25 MHz Pixel Rate  222.75 MBs data rate Hypothetical Dual Core – 2.5GHz, Dual Issue    (2 instructions per clock) 10 GHz Instruction Rate    22.4 instructions per byte of data processed What about OS overhead Task switching times Interrupt latency All bus bandwidth eaten up with video data Can’t Do It With a Standard Processor
Coprocessing: An Effective Way of Accelerating Software Distributes the load Move computational load where it belongs Dedicated processing element(s) provide dramatic acceleration
A Look at Coprocessing Architectures Fully Decoupled Common, but not interesting for this topic Single / Multi-Instruction Accelerator FPU Loosely Coupled - Separated Functions Message / Control Passing Typically Used for Control Plane / Data Plane Processing
What is Control Plane / Data Plane Data In  Data Out User Interface Processor Bus or  Dedicated Control Channel(s) Control Plane Data Plane Control Plane Processor (OS) Coprocessor Coprocessor Coprocessor
Control / Data Plane Example Control plane: controls the state of network elements Route selection RSVP, capability signaling, etc. Exception handling Data plane: manages data packets  Packet forwarding Packet differentiation Buffering, link scheduling Adapted from: Active correlation between the control and data plane – Z. Morley Mao
FPGA’s Enable Computation Balancing Between a Processor and Application Specific Logic
FPGAs: Ideal for Coprocessing  Tight integration between FPGA & Processor Reduced Latency Matched clock rates Configure the processors to meet system requirements Configure Processors Configure the Coprocessors Flexible logic enables experimentation
External Processor Challenges Latency for control signals to coprocessor Pin challenges Many pins reduce latency but at higher power & part cost High speed serial (PCIe) minimizes pins at cost of latency & power May not be the lowest cost solution FPGA embedded processors  solve these challenges and  enable performance balancing
Implementation of A  Control Plane / Data Plane System is Straight Forward
Building The Control Plane / Data Plane System Assemble the Control Plane processor Assemble the Data Pipeline Combining IP generated by multiple tools C to HDL Tools may be an effective option Control the Pipe with Processor and OS
Assemble the Control Plane Processor
 
 
Multiple Languages/Tools/Flows to create Coprocessors Low Level Hand Crafted - RTL (VHDL/Verilog) High Level Matlab / Simulink  ‘ C’  to FPGA  (HDL) ‘ C’ Variants Assemble and Connect the Data Plane
CASE STUDY: HD VIDEO RECOGNITION SYSTEM
The Case Study Problem 720P HD Video Stream DVI Input and DVI Output Locate the clown fish in the video Highlight the clown fish Continuously track the fish Adjust spotlight size based  upon likelihood of match
The Architected Solution How Control Plane Processor Was Created How the Data Processing Pipeline Was Created
Base Processor Reference Design Linux Xilinx MicroBlaze Processor Block RAM SystemAce Compact Flash ICC GPIO LEDs GPIO DIP Switch Debug Module UART Multiport Memory Controller DDR2 Memory GPIO Push Buttons Clock Generator Reset Module
DVI Pass-through Reference Design Basic “real-time” video processing DVI  Input DVI  Output Image  Processing
DVI Pass-through Reference Design Basic “real-time” video processing Image  Processing DVI  Input DVI  Output Streaming pixel processing Streaming video data  MicroBlaze controls filter coefficients in “real-time” Simple design example for customer IP integration System Generator Custom video accelerator pcore
Integrated Control/Data Plane System DVI The processor is used to dynamically configure filters Processor Local Bus (PLB) DVI  Filter control (UART)  New Pipeline Element DVI In Gamma In Gamma Out DVI Out Xilinx MicroBlaze Processor System 2D FIR Filter Object Detection
HD Object Detection & Highlighting
Connecting the  Embedded Processor to the FPGA with Linux
Control the Pipe with Linux Linux is Now the #1 OS for Embedded FPGA Systems Newest Generation Is More “Real-Time” Large Public Code Base Mostly Free FPGA IO Drivers Available
Configure Linux for the IO Device // Load the custom driver into Linux kernel module_init(xll_example_init); // Register driver to specific device number - 253 err = register_chrdev_region(devno, 1, "custom_io_example"); bash# mknod /dev/custom_io_example0 c 253 0
Controlling the Data Pipe with the Linux Application // Open custom I/O device from Linux application int custom_io_ex_ open (struct inode *inode, struct file *filp) // Read / Write to custom peripheral I/O using standard Linux read/function function calls ssize_t custom_io_ex_ read (struct file *filp, char __user *buf, size_t count, loff_t *f_pos) ssize_t custom_io_ex_ write (struct file *filp, const char __user *buf, size_t count, loff_t *f_pos)
SUMMARY FPGAs enable computational balancing between an FPGA based processor and a data processing pipeline reducing development risks Offloading streaming data processing tasks to an FPGA data-plane processing pipeline can enable meeting performance objectives An FPGA based single chip control-plane and data-plane processing solution can reduce cost and development time Offloading enables Processor to handle multitude of other tasks
Thank You Glenn Steiner, Dan Isaacs – Xilinx, Inc. David Pellerin – Impulse Accelerated Technologies

More Related Content

PPTX
Software hardware co-design using xilinx zynq soc
PDF
1 intro to_dpdk_and_hw
PDF
5 pipeline arch_rationale
PDF
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
PDF
XPDDS18: Xen Testing at Intel - Xudong Hao, Intel
PPTX
Improving Quality of Service via Intel RDT
PPT
The Microarchitecure Of FPGA Based Soft Processor
PPT
Introduction to fpga synthesis tools
Software hardware co-design using xilinx zynq soc
1 intro to_dpdk_and_hw
5 pipeline arch_rationale
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
XPDDS18: Xen Testing at Intel - Xudong Hao, Intel
Improving Quality of Service via Intel RDT
The Microarchitecure Of FPGA Based Soft Processor
Introduction to fpga synthesis tools

What's hot (20)

PPTX
Public Seminar_Final 18112014
PPTX
Exploration of Radars and Software Defined Radios using VisualSim
PDF
6 profiling tools
PDF
Linaro Connect 2016 (BKK16) - Introduction to LISA
PPTX
Performance out of the box developers
PPTX
Implementation of Soft-core Processor on FPGA
PDF
Intel python 2017
PPT
Digital design lect 26 27
PDF
Best Practices and Performance Studies for High-Performance Computing Clusters
PPTX
Revisit DCA, PCIe TPH and DDIO
PDF
SDVIs and In-Situ Visualization on TACC's Stampede
PDF
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
PPTX
SoC: System On Chip
PPT
Arista @ HPC on Wall Street 2012
PDF
Intel Knights Landing Slides
PDF
AMD_11th_Intl_SoC_Conf_UCI_Irvine
PDF
Using VPP and SRIO-V with Clear Containers
PDF
Chris brown ti
PPTX
The_Final_Presentation
PDF
Preparing Codes for Intel Knights Landing (KNL)
Public Seminar_Final 18112014
Exploration of Radars and Software Defined Radios using VisualSim
6 profiling tools
Linaro Connect 2016 (BKK16) - Introduction to LISA
Performance out of the box developers
Implementation of Soft-core Processor on FPGA
Intel python 2017
Digital design lect 26 27
Best Practices and Performance Studies for High-Performance Computing Clusters
Revisit DCA, PCIe TPH and DDIO
SDVIs and In-Situ Visualization on TACC's Stampede
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
SoC: System On Chip
Arista @ HPC on Wall Street 2012
Intel Knights Landing Slides
AMD_11th_Intl_SoC_Conf_UCI_Irvine
Using VPP and SRIO-V with Clear Containers
Chris brown ti
The_Final_Presentation
Preparing Codes for Intel Knights Landing (KNL)
Ad

Viewers also liked (7)

PDF
"Implementing Eye Tracking for Medical, Automotive and Headset Applications,"...
PDF
Beacon Marketing Seminar Faces of Content
PPTX
Betting, Big on Mobile - with Andrew Till
PDF
Internet of Things Insights
PDF
DDS in Action -- Part I
PDF
The Cloudy, Foggy and Misty Internet of Things -- Toward Fluid IoT Architect...
PDF
DDS and OPC UA Explained
"Implementing Eye Tracking for Medical, Automotive and Headset Applications,"...
Beacon Marketing Seminar Faces of Content
Betting, Big on Mobile - with Andrew Till
Internet of Things Insights
DDS in Action -- Part I
The Cloudy, Foggy and Misty Internet of Things -- Toward Fluid IoT Architect...
DDS and OPC UA Explained
Ad

Similar to Xilinx track g (20)

PPT
Choosing the right processor
PPTX
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
PDF
Using a Field Programmable Gate Array to Accelerate Application Performance
PPT
NI Compact RIO Platform
PPTX
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
PDF
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
PPT
Introduction to Blackfin BF532 DSP
PDF
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
PDF
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
PPT
Agnostic Device Drivers
PPTX
G rpc talk with intel (3)
PPT
Threading Successes 03 Gamebryo
PDF
Balancing Power & Performance Webinar
PDF
Introduction to Software Defined Visualization (SDVis)
PDF
HPC Impact: EDA Telemetry Neural Networks
PPT
Clusters (Distributed computing)
PPTX
dpdk acceleration techniques ncdşs şdcnş
PDF
PDF
PowerDRC/LVS 2.2 released by POLYTEDA
PPT
Using the Cypress PSoC Processor
Choosing the right processor
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Using a Field Programmable Gate Array to Accelerate Application Performance
NI Compact RIO Platform
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
Introduction to Blackfin BF532 DSP
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
Agnostic Device Drivers
G rpc talk with intel (3)
Threading Successes 03 Gamebryo
Balancing Power & Performance Webinar
Introduction to Software Defined Visualization (SDVis)
HPC Impact: EDA Telemetry Neural Networks
Clusters (Distributed computing)
dpdk acceleration techniques ncdşs şdcnş
PowerDRC/LVS 2.2 released by POLYTEDA
Using the Cypress PSoC Processor

More from Alona Gradman (19)

PDF
Bary pangrle mentor track d
PPT
C:\fakepath\apache track d updated
PPT
Apache track d updated
PPT
National instruments track e
PPT
Stephan berg track f
PPT
Mullbery& veriest track g
PPT
Altera trcak g
PPT
Arm updated track h
PPT
Evatronix track h
PPT
Target updated track f
PPT
Vsync track c
PPT
C:\fakepath\micrologic track c
PDF
Synopsys track c
PPT
Intel track a
PPT
Mips track a
PPT
E silicon track b
PPT
Magma trcak b
PPT
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
PPT
Chip Ex2010 Gert Goossens
Bary pangrle mentor track d
C:\fakepath\apache track d updated
Apache track d updated
National instruments track e
Stephan berg track f
Mullbery& veriest track g
Altera trcak g
Arm updated track h
Evatronix track h
Target updated track f
Vsync track c
C:\fakepath\micrologic track c
Synopsys track c
Intel track a
Mips track a
E silicon track b
Magma trcak b
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Chip Ex2010 Gert Goossens

Recently uploaded (20)

PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial disease of the cardiovascular and lymphatic systems
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
01-Introduction-to-Information-Management.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra
102 student loan defaulters named and shamed – Is someone you know on the list?
O7-L3 Supply Chain Operations - ICLT Program
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
2.FourierTransform-ShortQuestionswithAnswers.pdf
Computing-Curriculum for Schools in Ghana
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf

Xilinx track g

  • 1. Managing High Performance Data Pipeline Execution with an FPGA Processor Presenter: Ben Hor – Xilinx, Inc Authors: Glenn Steiner, Dan Isaacs – Xilinx, Inc. David Pellerin – Impulse Accelerated Technologies
  • 2. Agenda What is Control Plane/Data Plane Processing and Why Might I Need It? FPGA’s Enable Balancing Computation Between a Processor and Application Specific Logic Implementation of a Control/Data Plane System is Straightforward Case Study: An HD Video Recognition System Connecting the Embedded Processor to the FPGA with Linux Summary
  • 3. What is Control Plane / Data Plane Processing and Why Might I Need It?
  • 4. Challenge Example: HD Video Streaming 720P  74.25 MHz Pixel Rate 222.75 MBs data rate Hypothetical Dual Core – 2.5GHz, Dual Issue (2 instructions per clock) 10 GHz Instruction Rate  22.4 instructions per byte of data processed What about OS overhead Task switching times Interrupt latency All bus bandwidth eaten up with video data Can’t Do It With a Standard Processor
  • 5. Coprocessing: An Effective Way of Accelerating Software Distributes the load Move computational load where it belongs Dedicated processing element(s) provide dramatic acceleration
  • 6. A Look at Coprocessing Architectures Fully Decoupled Common, but not interesting for this topic Single / Multi-Instruction Accelerator FPU Loosely Coupled - Separated Functions Message / Control Passing Typically Used for Control Plane / Data Plane Processing
  • 7. What is Control Plane / Data Plane Data In Data Out User Interface Processor Bus or Dedicated Control Channel(s) Control Plane Data Plane Control Plane Processor (OS) Coprocessor Coprocessor Coprocessor
  • 8. Control / Data Plane Example Control plane: controls the state of network elements Route selection RSVP, capability signaling, etc. Exception handling Data plane: manages data packets Packet forwarding Packet differentiation Buffering, link scheduling Adapted from: Active correlation between the control and data plane – Z. Morley Mao
  • 9. FPGA’s Enable Computation Balancing Between a Processor and Application Specific Logic
  • 10. FPGAs: Ideal for Coprocessing Tight integration between FPGA & Processor Reduced Latency Matched clock rates Configure the processors to meet system requirements Configure Processors Configure the Coprocessors Flexible logic enables experimentation
  • 11. External Processor Challenges Latency for control signals to coprocessor Pin challenges Many pins reduce latency but at higher power & part cost High speed serial (PCIe) minimizes pins at cost of latency & power May not be the lowest cost solution FPGA embedded processors solve these challenges and enable performance balancing
  • 12. Implementation of A Control Plane / Data Plane System is Straight Forward
  • 13. Building The Control Plane / Data Plane System Assemble the Control Plane processor Assemble the Data Pipeline Combining IP generated by multiple tools C to HDL Tools may be an effective option Control the Pipe with Processor and OS
  • 14. Assemble the Control Plane Processor
  • 15.  
  • 16.  
  • 17. Multiple Languages/Tools/Flows to create Coprocessors Low Level Hand Crafted - RTL (VHDL/Verilog) High Level Matlab / Simulink ‘ C’ to FPGA (HDL) ‘ C’ Variants Assemble and Connect the Data Plane
  • 18. CASE STUDY: HD VIDEO RECOGNITION SYSTEM
  • 19. The Case Study Problem 720P HD Video Stream DVI Input and DVI Output Locate the clown fish in the video Highlight the clown fish Continuously track the fish Adjust spotlight size based upon likelihood of match
  • 20. The Architected Solution How Control Plane Processor Was Created How the Data Processing Pipeline Was Created
  • 21. Base Processor Reference Design Linux Xilinx MicroBlaze Processor Block RAM SystemAce Compact Flash ICC GPIO LEDs GPIO DIP Switch Debug Module UART Multiport Memory Controller DDR2 Memory GPIO Push Buttons Clock Generator Reset Module
  • 22. DVI Pass-through Reference Design Basic “real-time” video processing DVI Input DVI Output Image Processing
  • 23. DVI Pass-through Reference Design Basic “real-time” video processing Image Processing DVI Input DVI Output Streaming pixel processing Streaming video data MicroBlaze controls filter coefficients in “real-time” Simple design example for customer IP integration System Generator Custom video accelerator pcore
  • 24. Integrated Control/Data Plane System DVI The processor is used to dynamically configure filters Processor Local Bus (PLB) DVI Filter control (UART) New Pipeline Element DVI In Gamma In Gamma Out DVI Out Xilinx MicroBlaze Processor System 2D FIR Filter Object Detection
  • 25. HD Object Detection & Highlighting
  • 26. Connecting the Embedded Processor to the FPGA with Linux
  • 27. Control the Pipe with Linux Linux is Now the #1 OS for Embedded FPGA Systems Newest Generation Is More “Real-Time” Large Public Code Base Mostly Free FPGA IO Drivers Available
  • 28. Configure Linux for the IO Device // Load the custom driver into Linux kernel module_init(xll_example_init); // Register driver to specific device number - 253 err = register_chrdev_region(devno, 1, "custom_io_example"); bash# mknod /dev/custom_io_example0 c 253 0
  • 29. Controlling the Data Pipe with the Linux Application // Open custom I/O device from Linux application int custom_io_ex_ open (struct inode *inode, struct file *filp) // Read / Write to custom peripheral I/O using standard Linux read/function function calls ssize_t custom_io_ex_ read (struct file *filp, char __user *buf, size_t count, loff_t *f_pos) ssize_t custom_io_ex_ write (struct file *filp, const char __user *buf, size_t count, loff_t *f_pos)
  • 30. SUMMARY FPGAs enable computational balancing between an FPGA based processor and a data processing pipeline reducing development risks Offloading streaming data processing tasks to an FPGA data-plane processing pipeline can enable meeting performance objectives An FPGA based single chip control-plane and data-plane processing solution can reduce cost and development time Offloading enables Processor to handle multitude of other tasks
  • 31. Thank You Glenn Steiner, Dan Isaacs – Xilinx, Inc. David Pellerin – Impulse Accelerated Technologies

Editor's Notes

  • #23: Example of a “real-time” non-frame buffer based processing solution. There are several products that require a specialized streaming processing, and this example provide a quick and easy method for the developer to quickly the existing design with their algorithm. The fully integrated HW-CoSim environment enables a faster validation cycle with the hardware in the loop functionality.
  • #24: DE Gen - Data Enable Generator - Example of a “real-time” non-frame buffer based processing solution. There are several products that require a specialized streaming processing, and this example provide a quick and easy method for the developer to quickly the existing design with their algorithm. The fully integrated HW-CoSim environment enables a faster validation cycle with the hardware in the loop functionality.