SlideShare a Scribd company logo
Field-Programmable Gate Arrays
       as tracking devices

          Roberto Rodríguez Osorio
            Javier Díaz Bruguera

        Group of Computer Architecture
   Dept. of Electronics and Computer Science
     University of Santiago de Compostela
Outline

Application-specific computing machines
ASIC vs FPGA
FPGA technology basics
Hard cores in FPGAs
Performance
Design effort
Choices
Applications




                                          2
Application-specific computing machines

        Microprocessor                   Application-Specific
                                          Integrated Circuit
     Code           Data
    memory         memory
                   M     p

                                                     t    p     M
   PC     IR       Register
                     file              Control
                                        logic            MAC
    Control
     logic        Functional           Control
                    units                             Datapath
                                       section
     Control
                   Datapath
     section

Performance:     10 cycles @ 3GHz   Performance:     1 cycle @ 1GHz
Dissipated power: ~35 W             Dissipated power: ~mW
                                                                      3
ASIC vs FPGA

                                                  $4M



                                                  $3M




                                                  $2M




                                                        NRE
                                                  $1M




0.35   0.25      0.2           0.15      0.1   0.05
              Technology (micrometers)




                                                              4
ASIC vs FPGA

     6
         Computational efficiency (Mops/w)
10
     5
10                Maximum efficiency                             FPGA

     4
                       (ASIC)                                    ASSP
                                                                 MPPA
10                                                               GPGPU
                                                                 VLIW
                                                                 ASIP
     3                                                           ManyCore
10                                                               ...


     2
10
     1
10
     0
10 2                1            0.5         0.25   0.13         0.07
                                                     Technology ( m)


 1986            1990         1994          1998      2002       2006
          Source: Theo A.C.M Claasen, ISSCC 99

                                                                            5
FPGA technology basics – Computing

         a          b                  carry                carry
                                       input   a   b    s   output
                                         0     0    0   0    0
 c out   FA             c in             0     0    1   1    0
                                         0     1    0   1    0
               s                         0     1    1   0    1
                                         1     0    0   1    0
             c in
                                         1     0    1   0    1
 a                             s
                                         1     1    0   0    1
 b
                                         1     1    1   1    1
 a
 b
 a
                               c out
 cin
 b
 c in
                                                                     6
FPGA technology basics – Do not compute

                               Logic blocks
a
         SRAM
b        Memory    s
         8x1-bit
cin




         SRAM
         Memory    cout
         8x1-bit




                                                7
FPGA technology basics – Interconnect
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
█   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █   █
                                                                                                8
FPGA technology basics – Interconnect




                                        9
FPGA technology basics – Interconnect




                                        10
FPGA technology basics – Interconnect + memory

FPGA fabric consists of a huge number of simple memory
elements connected by means of a reconfigurable network
Design software must break every computing tasks into
1-bit size operation with no more than 4, 5 or 6 variables
Operations are spatially distributed according to proximity
criteria
Routing may be troublesome
   Long paths are slow
   Routing though logic blocks increase area




                                                              11
Hard cores in FPGAs

Memory blocks           ████████████████████
Multipliers             ████████████████████
DSP blocks              ████████████████████
Microprocessors         ████████████████████
Floating point units?   ████████████████████
                        ████████████████████
                        ████████████████████
                        ████████████████████
                        ████████████████████
                        ████████████████████




                                               12
Memory blocks

Hundreds or thousands of small memory blocks
     Dual-port blocks
     18 K-bit each for Xilinx
     Flexible configurations
       Many short words or a few large word
Independent access
     Huge aggregated bandwidth




                                               13
Multipliers and DSP blocks

As FPGAs were becoming larger, some people tried to
  implement DSP algorithms on them
     However: Multipliers take too much area
     Therefore: Hardwired multipliers were introduced
DSP algorithms are often based on
     multiply & add
     multiply & accumulate
DSP blocks in modern FPGAs implement hardwired:
     multipliy, multiply & add, multiply & accumulate
     optional addition before multiplying
     three-input add
     1 large, 2 medium or 4 small operations on the same hardware
     shifting, comparisons, bit-wise operations,…
Up to 2000 DSP blocks in current FPGAs for massive
  parallelism
                                                                    14
Microprocessors

Xilinx:
   IBMs Power PC processors
    Virtex II Pro
    Virtex-4 FX
    Virtex-5 FX
  Microblaze soft processors

Altera:
   ARM RISC processors
   Nios soft processor




                                      15
Floating point units

Not implemented so far
• Suggested to help to accelerate scientific computing
• For engineering, fixed point arithmetic is usually enough


Would it happen?
☺ It happened with multipliers, transceivers, DSP blocks, …
  GPUs have already a strong position in this field




                                                              16
Performance

Compared to an ASIC
    10 times slower, larger and power hungry


Compared to a microprocessor
    Fast, depending on:
     Potential parallelism
     Required bandwidth
    Small and simple, even standalone
    Reduced power consumption (< 1W), they may run on batteries




                                                                  17
Design effort

Several scenarios:

Pure VHDL or Verilog coding
     Higher flexibility, efficiency and performance
     Long design time
     Costly debugging
Use macros combined with VHDL or Verilog
     Libraries of IP blocks easy the design process
     It is not guaranteed that the required functionalities can be found
High level languages (DSP logic (Matlab), Impulse-C,
  Handel-C,…)
     Efficient and simple implementation for simple algorithms
     Lack of expressiveness for complex algorithms



                                                                           18
Choices

Xilinx
         Virtex
         Spartan
Altera
         Stratix
         Cyclone
Others
         Actel
         Lattice Semiconductor
         …




                                           19
Choices - Xilinx

                    Spartan 3       Spartan 6        Virtex 6

Logic Cells        1728 – 74880   3840 - 147443   74496 – 566784


Block RAM           12 - 1872      216 - 4824      5616 – 32832
(Kbits)

Multipliers /        4 – 104
DSP                  84 - 126        8 - 180        288 - 2016

Evaluation board     < $200       $300 - $1000    $2000 - $2500
cost




                                                                   20
In the context of this applications

Device choice
• Logic bounded
    •   Standard logic
    •   Multipliers
• IO bounded
Parallel acquisition
• Switching memory blocks for acquisition and computation
High computing speed
• Via pipelining
Results storage
• Internal or external memory
Power consumption
Configuration
                                                            21

More Related Content

PDF
第二回Bitvisor読書会 前半 Intel-VT について
PDF
Présentation FPGA
PPTX
SmB café 13 sep '12 - Compaan Design
PPT
Fpga technology
PPTX
iMinds The Conference: Jan Lemeire
PPT
Lecture31
PDF
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
PPT
High Performance Computing Infrastructure: Past, Present, and Future
第二回Bitvisor読書会 前半 Intel-VT について
Présentation FPGA
SmB café 13 sep '12 - Compaan Design
Fpga technology
iMinds The Conference: Jan Lemeire
Lecture31
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
High Performance Computing Infrastructure: Past, Present, and Future

Similar to RR Osorio FPGA (20)

PDF
Performance Analysis: C vs CUDA
PPTX
Intro (lesson1)comp arch
PPTX
Gpu archi
PPTX
Basics Of Embedded Systems
PDF
Blackfin core architecture
PPTX
Critical Issues at Exascale for Algorithm and Software Design
PDF
DMAベースメニーコアにおける通信オーバーヘッド削減手法 @SWoPP2011 ARC-196
PDF
Day1 02.Introduction
PPT
microprocessors
PDF
07 开源硬件与digilent - 王庭晖
PPTX
MIS - Chapter 02
KEY
Hardware assited x86 emulation on godson 3
KEY
Cranking Floating Point Performance To 11 On The iPhone
PPT
PDF
05322201 Microprocessors And Microcontrollers Set1
PDF
05322201 Microprocessors And Microcontrollers Set1
PPT
verilog_case_study
PDF
Tutorial on FPGA Routing
PPT
Synopsys User Group Presentation
PPTX
arsi n group-fpga fpga advance.......pptx
Performance Analysis: C vs CUDA
Intro (lesson1)comp arch
Gpu archi
Basics Of Embedded Systems
Blackfin core architecture
Critical Issues at Exascale for Algorithm and Software Design
DMAベースメニーコアにおける通信オーバーヘッド削減手法 @SWoPP2011 ARC-196
Day1 02.Introduction
microprocessors
07 开源硬件与digilent - 王庭晖
MIS - Chapter 02
Hardware assited x86 emulation on godson 3
Cranking Floating Point Performance To 11 On The iPhone
05322201 Microprocessors And Microcontrollers Set1
05322201 Microprocessors And Microcontrollers Set1
verilog_case_study
Tutorial on FPGA Routing
Synopsys User Group Presentation
arsi n group-fpga fpga advance.......pptx
Ad

More from Miguel Morales (20)

PDF
Jj Taboada C Rays Climate
PDF
T Kurtukian Midas
ODP
M Morales Sealed Rpcs
PDF
G Kornakov E A Smultivariate Analysis
PDF
J A Garzon Trasgo2010 Intro
PDF
D Gonzalez Diaz Optimization Mstip R P Cs
PDF
J A Garzon Tim Trackfor Trasgos
PDF
G Rodriguez Tank Calibration
PDF
R Vazquez Showers Signatures
PDF
P Cabanelas Hades Telescope
PDF
P Fonte Trasgo 2010
PDF
A Gomez TimTrack at C E S G A
PPT
D Belver FEE for Trasgos
PDF
M Traxler TRB and Trasgo
PDF
Ja Garzon Tim Trackfor Trasgos
ODP
M Morales Sealed Rpcs
PDF
Jj Taboada C Rays Climate
PDF
D Gonzalez Diaz Optimization Mstip Rp Cs
PDF
G Rodriguez Tank Calibration
PDF
G Kornakov Ea Smultivariate Analysis
Jj Taboada C Rays Climate
T Kurtukian Midas
M Morales Sealed Rpcs
G Kornakov E A Smultivariate Analysis
J A Garzon Trasgo2010 Intro
D Gonzalez Diaz Optimization Mstip R P Cs
J A Garzon Tim Trackfor Trasgos
G Rodriguez Tank Calibration
R Vazquez Showers Signatures
P Cabanelas Hades Telescope
P Fonte Trasgo 2010
A Gomez TimTrack at C E S G A
D Belver FEE for Trasgos
M Traxler TRB and Trasgo
Ja Garzon Tim Trackfor Trasgos
M Morales Sealed Rpcs
Jj Taboada C Rays Climate
D Gonzalez Diaz Optimization Mstip Rp Cs
G Rodriguez Tank Calibration
G Kornakov Ea Smultivariate Analysis
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
MIND Revenue Release Quarter 2 2025 Press Release
cuic standard and advanced reporting.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Assigned Numbers - 2025 - Bluetooth® Document
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...
Network Security Unit 5.pdf for BCA BBA.
Accuracy of neural networks in brain wave diagnosis of schizophrenia
MIND Revenue Release Quarter 2 2025 Press Release

RR Osorio FPGA

  • 1. Field-Programmable Gate Arrays as tracking devices Roberto Rodríguez Osorio Javier Díaz Bruguera Group of Computer Architecture Dept. of Electronics and Computer Science University of Santiago de Compostela
  • 2. Outline Application-specific computing machines ASIC vs FPGA FPGA technology basics Hard cores in FPGAs Performance Design effort Choices Applications 2
  • 3. Application-specific computing machines Microprocessor Application-Specific Integrated Circuit Code Data memory memory M p t p M PC IR Register file Control logic MAC Control logic Functional Control units Datapath section Control Datapath section Performance: 10 cycles @ 3GHz Performance: 1 cycle @ 1GHz Dissipated power: ~35 W Dissipated power: ~mW 3
  • 4. ASIC vs FPGA $4M $3M $2M NRE $1M 0.35 0.25 0.2 0.15 0.1 0.05 Technology (micrometers) 4
  • 5. ASIC vs FPGA 6 Computational efficiency (Mops/w) 10 5 10 Maximum efficiency FPGA 4 (ASIC) ASSP MPPA 10 GPGPU VLIW ASIP 3 ManyCore 10 ... 2 10 1 10 0 10 2 1 0.5 0.25 0.13 0.07 Technology ( m) 1986 1990 1994 1998 2002 2006 Source: Theo A.C.M Claasen, ISSCC 99 5
  • 6. FPGA technology basics – Computing a b carry carry input a b s output 0 0 0 0 0 c out FA c in 0 0 1 1 0 0 1 0 1 0 s 0 1 1 0 1 1 0 0 1 0 c in 1 0 1 0 1 a s 1 1 0 0 1 b 1 1 1 1 1 a b a c out cin b c in 6
  • 7. FPGA technology basics – Do not compute Logic blocks a SRAM b Memory s 8x1-bit cin SRAM Memory cout 8x1-bit 7
  • 8. FPGA technology basics – Interconnect █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 8
  • 9. FPGA technology basics – Interconnect 9
  • 10. FPGA technology basics – Interconnect 10
  • 11. FPGA technology basics – Interconnect + memory FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable network Design software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variables Operations are spatially distributed according to proximity criteria Routing may be troublesome Long paths are slow Routing though logic blocks increase area 11
  • 12. Hard cores in FPGAs Memory blocks ████████████████████ Multipliers ████████████████████ DSP blocks ████████████████████ Microprocessors ████████████████████ Floating point units? ████████████████████ ████████████████████ ████████████████████ ████████████████████ ████████████████████ ████████████████████ 12
  • 13. Memory blocks Hundreds or thousands of small memory blocks Dual-port blocks 18 K-bit each for Xilinx Flexible configurations Many short words or a few large word Independent access Huge aggregated bandwidth 13
  • 14. Multipliers and DSP blocks As FPGAs were becoming larger, some people tried to implement DSP algorithms on them However: Multipliers take too much area Therefore: Hardwired multipliers were introduced DSP algorithms are often based on multiply & add multiply & accumulate DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulate optional addition before multiplying three-input add 1 large, 2 medium or 4 small operations on the same hardware shifting, comparisons, bit-wise operations,… Up to 2000 DSP blocks in current FPGAs for massive parallelism 14
  • 15. Microprocessors Xilinx: IBMs Power PC processors Virtex II Pro Virtex-4 FX Virtex-5 FX Microblaze soft processors Altera: ARM RISC processors Nios soft processor 15
  • 16. Floating point units Not implemented so far • Suggested to help to accelerate scientific computing • For engineering, fixed point arithmetic is usually enough Would it happen? ☺ It happened with multipliers, transceivers, DSP blocks, … GPUs have already a strong position in this field 16
  • 17. Performance Compared to an ASIC 10 times slower, larger and power hungry Compared to a microprocessor Fast, depending on: Potential parallelism Required bandwidth Small and simple, even standalone Reduced power consumption (< 1W), they may run on batteries 17
  • 18. Design effort Several scenarios: Pure VHDL or Verilog coding Higher flexibility, efficiency and performance Long design time Costly debugging Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design process It is not guaranteed that the required functionalities can be found High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…) Efficient and simple implementation for simple algorithms Lack of expressiveness for complex algorithms 18
  • 19. Choices Xilinx Virtex Spartan Altera Stratix Cyclone Others Actel Lattice Semiconductor … 19
  • 20. Choices - Xilinx Spartan 3 Spartan 6 Virtex 6 Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784 Block RAM 12 - 1872 216 - 4824 5616 – 32832 (Kbits) Multipliers / 4 – 104 DSP 84 - 126 8 - 180 288 - 2016 Evaluation board < $200 $300 - $1000 $2000 - $2500 cost 20
  • 21. In the context of this applications Device choice • Logic bounded • Standard logic • Multipliers • IO bounded Parallel acquisition • Switching memory blocks for acquisition and computation High computing speed • Via pipelining Results storage • Internal or external memory Power consumption Configuration 21