SlideShare a Scribd company logo
FPGAs on The Cloud
Ioannis Tsagatakis
Ioannis Stefanis
Msc in Informatics & Multimedia
Department of Informatics Engineering TEI of Crete
Embedded Systems
2
Accelerated Computing: GPUs and FPGA
3
Massive Parallelism
● GPU
– SIMD
– Instruction Set
– Fixed Word Sizes
– Simple control logic
● FPGA
– MIMD
– No instruction set
– Any data width
– Complex control logic (FSMs)
4
AWS F1 FPGA Instances
● Cloud based FPGA
– No need to buy hardware
● Cloud based IDE
– Ready to used AMI
– HDL: Verilog, VHDL
– SDAccel: C/C++, OpenCL
– AFI tools
● Marketplace
– A new market for Ips
– Secure encrypted AFIs
● f1.2xlarge
– 1 VU9P UltraScale+
● 2.5M logic elements
● 6,800 DSP
– 8 vCPU Cores
– 122GB RAM
– PCIe X16
– 1.6$ per hour
● f1.16xlarge
– 8 FPGA/64 CPUs
● Run simulation design on C4
to save money
5
The SDAccel Development Environment
● Cloud IDE
or
● Local Install
● Virtual JTAG
Intefcace
6
AWS F1 Platform Model
7
The AWS F1 Shell Amazon AFI
Image
Predefined interface
Secured, encrypted User can’t
Dynamically (re)loaded see the bits
8
AFI Creation Flow
9
Kernel Creation: The 2 workflows
● Custom IP must packaged as an
SDAccel Kernel
● Strict interface requirements
● Design for performance
● SDAccel provides a Kernel
Wizard
● Kernel container file (XO file)
- XML metadata, Vivado project
- RTL files
● Or generate kernel from OpenCL
● Advanced optimizations
- Memory partitioning,
- Loop unrolling
- DSP block inferencing
10
An OpenCL Kernel
● Language support
– Embedded profile (1.0)
– Pipes (2.0)
– Image Objects (2.0)
● N dim ranges
● SIMD vector types
● Math library functions
11
Compiling the Platform
12
Creating the Amazon FPGA Image
● Created by an amazon
service
● Secured stored and
encrypted
● Developers have no
access to RTL IP
● The distributable
awsxclbin contains
only the AFI id
13
SDAccel Testing and Execution Modes
14
OpenCL Memory Model
15
OpenCL vs Cuda
● Cuda
– SIMD
– Easier programming
model
– Restricted memory
access patterns
– Faster development
– Vendor lock
– Easy deployment
● F1 FPGA
– MIMD
– More complexity
– Harder programming
– Deep pipelining
– Slow development
– Vendor lock
– Cloud deployment
16
Smith–Waterman algorithm (sw_emu)
------FPGA Accelerator Summary --------
Number of SmithWaterman instances on FPGA:16
Total processing elements:512
Length of reference string:256
Length of read(query) string:128
Read-Ref pair block size(HOST to FPGA):1024
Verify Mode is:0
---------------------------------------
Generating read-ref samples
Processing 16384 Samples
HW Block Size: 16384
Total Number of blocks: 1
INFO: [smithwaterman.cpp:654] TIME: [Wed Feb 21 22:37:07 2018] nruns = 1
INFO: [smithwaterman.cpp:655] TIME: [Wed Feb 21 22:37:07 2018] total [ms] = 43326.373
INFO: [smithwaterman.cpp:656] TIME: [Wed Feb 21 22:37:07 2018] Host write [ms] = 0.768
INFO: [smithwaterman.cpp:657] TIME: [Wed Feb 21 22:37:07 2018] Krnl exec [ms] = 43317.977
INFO: [smithwaterman.cpp:658] TIME: [Wed Feb 21 22:37:07 2018] Host read [ms] = 1.029
GCups(based on kernel execution time):0.0115426
GCups(based on total execution time):0.0115403
INFO: [smithwaterman.cpp:679] TIME: [Wed Feb 21 22:37:07 2018] Host2Device rate [mbps] = 15616.602
INFO: [smithwaterman.cpp:691] TIME: [Wed Feb 21 22:37:07 2018] Device2Host rate [mbps] = 1457.154
INFO: [main.cpp:172] TIME: [Wed Feb 21 22:37:07 2018] finished
~/aws-fpga/SDAccel/examples/xilinx/acceleration/smithwaterman
17
Smith–Waterman algorithm (wh_emu)
~/aws-fpga/SDAccel/examples/xilinx/acceleration/smithwaterman
xsimk
Generating read-ref samples
Processing 16384 Samples
HW Block Size: 16384
Total Number of blocks: 1
INFO: [SDx-EM 22] [Wall clock time: 23:05, Emulation time: 0.275298 ms] Data transfer between kernel(s) and
global memory(s)
BANK0 RD = 64.316 KB WR = 7.875 KB
BANK1 RD = 0.000 KB WR = 0.000 KB
BANK2 RD = 0.000 KB WR = 0.000 KB
BANK3 RD = 0.000 KB WR = 0.000 KB
…. after many hours …
INFO: [SDx-EM 22] [Wall clock time: 00:27, Emulation time: 4.77014 ms] Data transfer between kernel(s) and
global memory(s)
BANK0 RD = 1110.004 KB WR = 138.562 KB
BANK1 RD = 0.000 KB WR = 0.000 KB
BANK2 RD = 0.000 KB WR = 0.000 KB
BANK3 RD = 0.000 KB WR = 0.000 KB
….
18
Building Times
For the helloworld example
INFO: [XOCC 60-629] Linking for hardware target
INFO: [XOCC 60-895] Target platform: /home/centos/src/project_data/aws-
fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm
INFO: [XOCC 60-423] Target device: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0
INFO: [XOCC 60-251] Hardware accelerator integration...
Creating Vivado project and starting FPGA synthesis.
................................................................................................................................
Finished 1st of 5 tasks (FPGA synthesis). Elapsed time: 00h 34m 54s.
.....
Finished 2nd of 5 tasks (FPGA logic optimization). Elapsed time: 00h 05m 37s.
...............................
Finished 3rd of 5 tasks (FPGA logic placement). Elapsed time: 00h 43m 50s.
................................
Finished 4th of 5 tasks (FPGA routing). Elapsed time: 00h 56m 33s.
INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin
INFO: [XOCC 60-791] Total elapsed time: 2h 31m 50s
And then you have to build the AFI ...
Give up building the
19
Building Times
For the helloworld example
INFO: [XOCC 60-629] Linking for hardware target
INFO: [XOCC 60-895] Target platform: /home/centos/src/project_data/aws-
fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm
INFO: [XOCC 60-423] Target device: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0
INFO: [XOCC 60-251] Hardware accelerator integration...
Creating Vivado project and starting FPGA synthesis.
................................................................................................................................
Finished 1st of 5 tasks (FPGA synthesis). Elapsed time: 00h 34m 54s.
.....
Finished 2nd of 5 tasks (FPGA logic optimization). Elapsed time: 00h 05m 37s.
...............................
Finished 3rd of 5 tasks (FPGA logic placement). Elapsed time: 00h 43m 50s.
................................
Finished 4th of 5 tasks (FPGA routing). Elapsed time: 00h 56m 33s.
INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin
INFO: [XOCC 60-791] Total elapsed time: 2h 31m 50s
And then you have to build the AFI ...
Good luck
building the Smith-Waterman
Example
20
Conclusions
● Moderate* costs
● Easy setup with minor issues
● Cloud based IDE (rdp), or ssh
● Slow development
● Harder to learn than CUDA
● Good documentation and examples
● Market place is still small but
promising
●
No 3rd
party examples
Moderate cost ;
$3,500 Xilinx Virtex-7 FPGA VC707 Evaluation
Kit
$13,000 Xilinx Virtex-7 FPGA VC7222 Char. Kit
$1.500 Intel Xeon Phi 7120P Coprocessor
$1.400 Nvidia GeForce Titan X Pascal
21
Future work
22
FPGA vs GPU Accelerating Compute-Intensive Applications with GPUs and
FPGAs
S. Che, J. Li, J. W. Sheaffer, K. Skadron and J. Lach,
2008 Symposium on Application Specific Processors
CUDA and the GeForce 8800 GTX GPU
VHDL and the Xilinx Virtex-II Pro FPGA
23
FPGAs vs GPU
24
FPGA vs GPU
25
Is FPGA
and reconfigurable computing
the Future ?
Video on the cloud ? Deep Learning ?
26
Questions ?

More Related Content

PPTX
Ardui no
PPT
System On Chip (SOC)
PPTX
SoC: System On Chip
PPTX
System on Chip (SoC)
PPTX
embedded systems ppt 3
PDF
System-on-Chip
PDF
Digital VLSI Design : Introduction
Ardui no
System On Chip (SOC)
SoC: System On Chip
System on Chip (SoC)
embedded systems ppt 3
System-on-Chip
Digital VLSI Design : Introduction

What's hot (20)

ODP
VLSI TECHNOLOGY
PPT
Device Drivers
PDF
Digital Systems Design
PPTX
Intel Vs AMD!! Which is the best?
PPTX
Basics of arduino uno
PPTX
Raspberry pi
PDF
Introduction to VLSI Design
PPT
Basics Of VLSI
PPTX
Rc delay modelling in vlsi
PPTX
Embedded system
PPTX
Processors selection
PPTX
Core i3,i5,i7 and i9 processors
PPSX
LECT 1: ARM PROCESSORS
PPT
PPTX
AI Hardware Landscape 2021
PPTX
Final draft intel core i5 processors architecture
PDF
Introducing the Arduino
PPTX
Introduction to Microcontroller
PPTX
Arduino and its hw architecture
PDF
PowerArtist: RTL Design for Power Platform
VLSI TECHNOLOGY
Device Drivers
Digital Systems Design
Intel Vs AMD!! Which is the best?
Basics of arduino uno
Raspberry pi
Introduction to VLSI Design
Basics Of VLSI
Rc delay modelling in vlsi
Embedded system
Processors selection
Core i3,i5,i7 and i9 processors
LECT 1: ARM PROCESSORS
AI Hardware Landscape 2021
Final draft intel core i5 processors architecture
Introducing the Arduino
Introduction to Microcontroller
Arduino and its hw architecture
PowerArtist: RTL Design for Power Platform
Ad

Similar to FPGA on the Cloud (20)

PDF
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
PDF
PowerDRC/LVS 2.0.1 released by POLYTEDA
PPTX
HiPEAC-Keynote.pptx
PPTX
Installing Oracle Database on LDOM
PDF
TechWiseTV Workshop: Cisco UCS C4200
PDF
Efabless Marketplace webinar slides 2024
PDF
Achieving the Ultimate Performance with KVM
PDF
Achieving the Ultimate Performance with KVM
PDF
Introduction to CUDA programming in C language
PPTX
QEMU and Raspberry Pi. Instant Embedded Development
PDF
PowerDRC/LVS 2.2 released by POLYTEDA
PPTX
Fixed-point Multi-Core DSP Platform
PDF
GPU: Understanding CUDA
PDF
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
PPTX
Nytro-XV_NWD_VM_Performance_Acceleration
PDF
AMD EPYC™ Microprocessor Architecture
 
PDF
Storage & Backup solutions on virtual VAX and Alpha
PDF
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
PPTX
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PPTX
Introduction to EDA Tools
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
PowerDRC/LVS 2.0.1 released by POLYTEDA
HiPEAC-Keynote.pptx
Installing Oracle Database on LDOM
TechWiseTV Workshop: Cisco UCS C4200
Efabless Marketplace webinar slides 2024
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
Introduction to CUDA programming in C language
QEMU and Raspberry Pi. Instant Embedded Development
PowerDRC/LVS 2.2 released by POLYTEDA
Fixed-point Multi-Core DSP Platform
GPU: Understanding CUDA
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
Nytro-XV_NWD_VM_Performance_Acceleration
AMD EPYC™ Microprocessor Architecture
 
Storage & Backup solutions on virtual VAX and Alpha
[OpenStack Day in Korea 2015] Track 1-6 - 갈라파고스의 이구아나, 인프라에 오픈소스를 올리다. 그래서 보이...
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
Introduction to EDA Tools
Ad

More from jtsagata (17)

PDF
Advanced Notes on Pointers
PDF
C locales
PDF
GPGPU Computation
PDF
Eισαγωγή στο TDD
PDF
Παιγνίδια με Πίνακες και Δείκτες
PDF
Linux and C
PDF
Git intro
PDF
Greek utf8
PDF
Function pointers in C
PDF
Why computers can' compute
PDF
Τι είναι υπολογισμός
PDF
IEEE 754 Floating point
PDF
Η Τέχνη του TeX/LaTeX
ODP
Unikernels
PDF
Evolutionary keyboard Layout
ODP
Omilia
ODP
Το εργαλείο
Advanced Notes on Pointers
C locales
GPGPU Computation
Eισαγωγή στο TDD
Παιγνίδια με Πίνακες και Δείκτες
Linux and C
Git intro
Greek utf8
Function pointers in C
Why computers can' compute
Τι είναι υπολογισμός
IEEE 754 Floating point
Η Τέχνη του TeX/LaTeX
Unikernels
Evolutionary keyboard Layout
Omilia
Το εργαλείο

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
A Presentation on Touch Screen Technology
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Approach and Philosophy of On baking technology
PPTX
Tartificialntelligence_presentation.pptx
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
1. Introduction to Computer Programming.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Web App vs Mobile App What Should You Build First.pdf
Enhancing emotion recognition model for a student engagement use case through...
A Presentation on Touch Screen Technology
A novel scalable deep ensemble learning framework for big data classification...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Digital-Transformation-Roadmap-for-Companies.pptx
TLE Review Electricity (Electricity).pptx
Zenith AI: Advanced Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Approach and Philosophy of On baking technology
Tartificialntelligence_presentation.pptx
August Patch Tuesday
NewMind AI Weekly Chronicles - August'25-Week II
Hindi spoken digit analysis for native and non-native speakers
1. Introduction to Computer Programming.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Getting Started with Data Integration: FME Form 101
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation theory and applications.pdf

FPGA on the Cloud

  • 1. FPGAs on The Cloud Ioannis Tsagatakis Ioannis Stefanis Msc in Informatics & Multimedia Department of Informatics Engineering TEI of Crete Embedded Systems
  • 3. 3 Massive Parallelism ● GPU – SIMD – Instruction Set – Fixed Word Sizes – Simple control logic ● FPGA – MIMD – No instruction set – Any data width – Complex control logic (FSMs)
  • 4. 4 AWS F1 FPGA Instances ● Cloud based FPGA – No need to buy hardware ● Cloud based IDE – Ready to used AMI – HDL: Verilog, VHDL – SDAccel: C/C++, OpenCL – AFI tools ● Marketplace – A new market for Ips – Secure encrypted AFIs ● f1.2xlarge – 1 VU9P UltraScale+ ● 2.5M logic elements ● 6,800 DSP – 8 vCPU Cores – 122GB RAM – PCIe X16 – 1.6$ per hour ● f1.16xlarge – 8 FPGA/64 CPUs ● Run simulation design on C4 to save money
  • 5. 5 The SDAccel Development Environment ● Cloud IDE or ● Local Install ● Virtual JTAG Intefcace
  • 7. 7 The AWS F1 Shell Amazon AFI Image Predefined interface Secured, encrypted User can’t Dynamically (re)loaded see the bits
  • 9. 9 Kernel Creation: The 2 workflows ● Custom IP must packaged as an SDAccel Kernel ● Strict interface requirements ● Design for performance ● SDAccel provides a Kernel Wizard ● Kernel container file (XO file) - XML metadata, Vivado project - RTL files ● Or generate kernel from OpenCL ● Advanced optimizations - Memory partitioning, - Loop unrolling - DSP block inferencing
  • 10. 10 An OpenCL Kernel ● Language support – Embedded profile (1.0) – Pipes (2.0) – Image Objects (2.0) ● N dim ranges ● SIMD vector types ● Math library functions
  • 12. 12 Creating the Amazon FPGA Image ● Created by an amazon service ● Secured stored and encrypted ● Developers have no access to RTL IP ● The distributable awsxclbin contains only the AFI id
  • 13. 13 SDAccel Testing and Execution Modes
  • 15. 15 OpenCL vs Cuda ● Cuda – SIMD – Easier programming model – Restricted memory access patterns – Faster development – Vendor lock – Easy deployment ● F1 FPGA – MIMD – More complexity – Harder programming – Deep pipelining – Slow development – Vendor lock – Cloud deployment
  • 16. 16 Smith–Waterman algorithm (sw_emu) ------FPGA Accelerator Summary -------- Number of SmithWaterman instances on FPGA:16 Total processing elements:512 Length of reference string:256 Length of read(query) string:128 Read-Ref pair block size(HOST to FPGA):1024 Verify Mode is:0 --------------------------------------- Generating read-ref samples Processing 16384 Samples HW Block Size: 16384 Total Number of blocks: 1 INFO: [smithwaterman.cpp:654] TIME: [Wed Feb 21 22:37:07 2018] nruns = 1 INFO: [smithwaterman.cpp:655] TIME: [Wed Feb 21 22:37:07 2018] total [ms] = 43326.373 INFO: [smithwaterman.cpp:656] TIME: [Wed Feb 21 22:37:07 2018] Host write [ms] = 0.768 INFO: [smithwaterman.cpp:657] TIME: [Wed Feb 21 22:37:07 2018] Krnl exec [ms] = 43317.977 INFO: [smithwaterman.cpp:658] TIME: [Wed Feb 21 22:37:07 2018] Host read [ms] = 1.029 GCups(based on kernel execution time):0.0115426 GCups(based on total execution time):0.0115403 INFO: [smithwaterman.cpp:679] TIME: [Wed Feb 21 22:37:07 2018] Host2Device rate [mbps] = 15616.602 INFO: [smithwaterman.cpp:691] TIME: [Wed Feb 21 22:37:07 2018] Device2Host rate [mbps] = 1457.154 INFO: [main.cpp:172] TIME: [Wed Feb 21 22:37:07 2018] finished ~/aws-fpga/SDAccel/examples/xilinx/acceleration/smithwaterman
  • 17. 17 Smith–Waterman algorithm (wh_emu) ~/aws-fpga/SDAccel/examples/xilinx/acceleration/smithwaterman xsimk Generating read-ref samples Processing 16384 Samples HW Block Size: 16384 Total Number of blocks: 1 INFO: [SDx-EM 22] [Wall clock time: 23:05, Emulation time: 0.275298 ms] Data transfer between kernel(s) and global memory(s) BANK0 RD = 64.316 KB WR = 7.875 KB BANK1 RD = 0.000 KB WR = 0.000 KB BANK2 RD = 0.000 KB WR = 0.000 KB BANK3 RD = 0.000 KB WR = 0.000 KB …. after many hours … INFO: [SDx-EM 22] [Wall clock time: 00:27, Emulation time: 4.77014 ms] Data transfer between kernel(s) and global memory(s) BANK0 RD = 1110.004 KB WR = 138.562 KB BANK1 RD = 0.000 KB WR = 0.000 KB BANK2 RD = 0.000 KB WR = 0.000 KB BANK3 RD = 0.000 KB WR = 0.000 KB ….
  • 18. 18 Building Times For the helloworld example INFO: [XOCC 60-629] Linking for hardware target INFO: [XOCC 60-895] Target platform: /home/centos/src/project_data/aws- fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm INFO: [XOCC 60-423] Target device: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0 INFO: [XOCC 60-251] Hardware accelerator integration... Creating Vivado project and starting FPGA synthesis. ................................................................................................................................ Finished 1st of 5 tasks (FPGA synthesis). Elapsed time: 00h 34m 54s. ..... Finished 2nd of 5 tasks (FPGA logic optimization). Elapsed time: 00h 05m 37s. ............................... Finished 3rd of 5 tasks (FPGA logic placement). Elapsed time: 00h 43m 50s. ................................ Finished 4th of 5 tasks (FPGA routing). Elapsed time: 00h 56m 33s. INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin INFO: [XOCC 60-791] Total elapsed time: 2h 31m 50s And then you have to build the AFI ... Give up building the
  • 19. 19 Building Times For the helloworld example INFO: [XOCC 60-629] Linking for hardware target INFO: [XOCC 60-895] Target platform: /home/centos/src/project_data/aws- fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm INFO: [XOCC 60-423] Target device: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0 INFO: [XOCC 60-251] Hardware accelerator integration... Creating Vivado project and starting FPGA synthesis. ................................................................................................................................ Finished 1st of 5 tasks (FPGA synthesis). Elapsed time: 00h 34m 54s. ..... Finished 2nd of 5 tasks (FPGA logic optimization). Elapsed time: 00h 05m 37s. ............................... Finished 3rd of 5 tasks (FPGA logic placement). Elapsed time: 00h 43m 50s. ................................ Finished 4th of 5 tasks (FPGA routing). Elapsed time: 00h 56m 33s. INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin INFO: [XOCC 60-791] Total elapsed time: 2h 31m 50s And then you have to build the AFI ... Good luck building the Smith-Waterman Example
  • 20. 20 Conclusions ● Moderate* costs ● Easy setup with minor issues ● Cloud based IDE (rdp), or ssh ● Slow development ● Harder to learn than CUDA ● Good documentation and examples ● Market place is still small but promising ● No 3rd party examples Moderate cost ; $3,500 Xilinx Virtex-7 FPGA VC707 Evaluation Kit $13,000 Xilinx Virtex-7 FPGA VC7222 Char. Kit $1.500 Intel Xeon Phi 7120P Coprocessor $1.400 Nvidia GeForce Titan X Pascal
  • 22. 22 FPGA vs GPU Accelerating Compute-Intensive Applications with GPUs and FPGAs S. Che, J. Li, J. W. Sheaffer, K. Skadron and J. Lach, 2008 Symposium on Application Specific Processors CUDA and the GeForce 8800 GTX GPU VHDL and the Xilinx Virtex-II Pro FPGA
  • 25. 25 Is FPGA and reconfigurable computing the Future ? Video on the cloud ? Deep Learning ?