SlideShare a Scribd company logo
1
FPGA Design and Implementation
ASIC & VLSI
• Time-to-market: Some large ASICs can take a year or
more to design.
• Design Issues: you need a lot of time to handles the
mapping, routing, placement, and timing.
• The FPGA design flow eliminates the complex and time-
consuming floorplanning, place and route, timing
analysis.
Interconnect Resources
Logic Block
I/O Cell
CONCEPTUAL FPGA
FPGA
• Speed (Memory BRAM & Distributed)
• (RAM lost data). // size and cost
• Floating point & Fixed point issue.
• Flex.
08/28/09
Design Entry
Technology Mapping
Placement
Routing
Programming Unit
Configured FPGA
Design Flow
Process Diagram
Why HDL?
• To allow the designer to implement and verify complex
hardware functionality at a high level, without the
requirement of having to know the details of the low-
level design implementation.
• Advantage:
• FPGAs have lower prototyping costs
• FPGAs have shorter production times
• Synthesis: The process which translates VHDL code
into a complete circuit with logical elements( gates, flip
flops, etc…).
Maximum Throughput Designs
• Dataflow
• Unrolling
• Pipelining
• Merging
Loop Unrolling
• arrays a[i], b[i] and c[i] are mapped to RAMs.
• Rolled Loop: This implementation takes four clock cycles, one multiplier and each RAM can be a
single port.
• Unrolled Loop: The entire loop operation can be performed in a single clock cycle. requires four
multipliers and requires the ability to perform 4 reads and 4 write in the same clock cycle; may
require the arrays be implemented as register arrays rather than RAM.
Loop Merging
Computre_Engineering_Introduction_FPGA.ppt
Pipelining
• pipelining allows operations to happen
concurrently.
Pipelining
• Function pipelining is only possible as there is no resource contention or data dependency which
prevents pipelining. The input array “m[2]” is implemented with a single-port RAM. The function
cannot be pipelined because the two reads operations on input “m[2]” (“op_Read_m[0]” and
“op_Read_m[1]”) cannot be performed in the same clock cycle.
• Solution: The resource contention problem could be solved by using a dual-port RAM for array
“m[2]", allowing both reads to be performed in the same clock cycle or increasing the the interval
of pipeline
Array Optimizations
08/28/09
Array Optimizations
• Mapping: When there are many small arrays mapping to a single large
array will reduce the storage overhead.
• Partitioning: If each small array gets a separate memory, a lot of memory
space is potentially wasted and the design will be large and consequently
large power consumption.
• Horizontal mapping: this corresponds to creating a new array by
concatenating the original arrays. Physically, this gets implemented as a
single array with more elements.
• Vertical mapping: this corresponds to creating a new array by
concatenating the original words in the array. Physically, this gets
implemented by a single array with a larger bit-width.
Horizontal mapping
08/28/09
Horizontal mapping
• Although horizontal mapping can result in using less RAM
components and hence improve area, it can have an impact on
throughput and performance.
• In the previous example both the accesses to "array1" and "array2"
can be performed in the same clock cycle.
• If both arrays are mapped to the same RAM this will now require a
separate access, and clock cycle, for each read operation.
Vertical mapping
Array Partitioning
• Arrays can also be partitioned into smaller arrays because it has a limited
amount of read ports and write ports which can limit the throughput of a
load/store intensive algorithm.
• The bandwidth can sometimes be improved by splitting up the original
array (a single memory resource) into multiple smaller arrays (multiple
memories), effectively increasing the number of ports.
Array Partitioning
• If the elements of an array are accessed one at a time, an efficient
implementation in hardware is to keep them grouped together and
mapped into a RAM.
• If multiple elements of an array are required simultaneously, it may
be more advantageous for performance to implement them as
individual registers: allowing parallel access to the data.
• Implementing an array of storage elements as individual registers
may help performance but this consume large area and increase
power consumption.
xa7a100tfgg484-2i
2-D for size N =128*128
Input Array Dual port Independent Registers
LUT 1642 10778
FF 835 9548
Power 246 2031

More Related Content

PPTX
module nenddhd dhdbdh dehrbdbddnd d 1.pptx
PPTX
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
PPTX
VLSI design Dr B.jagadeesh UNIT-5.pptx
KEY
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
PDF
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
PPTX
DIGITAL DESIGNS SLIDES 7 ENGINEERING 2ND YEAR
PDF
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
PDF
8d545d46b1785a31eaab12d116e10ba41d996928Lecture%202%20and%203%20pdf (1).pdf
module nenddhd dhdbdh dehrbdbddnd d 1.pptx
VLSI_CAD_Introductionxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pptx
VLSI design Dr B.jagadeesh UNIT-5.pptx
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
EC8392 -DIGITAL ELECTRONICS -II YEAR ECE-by S.SESHA VIDHYA /ASP/ ECE/ RMKCET
DIGITAL DESIGNS SLIDES 7 ENGINEERING 2ND YEAR
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
8d545d46b1785a31eaab12d116e10ba41d996928Lecture%202%20and%203%20pdf (1).pdf

Similar to Computre_Engineering_Introduction_FPGA.ppt (20)

PPTX
Chapter_11_memory_system this is part of computer architecture.pptx
PPTX
Designing memory and array structures.pptx
PDF
VAST-Tree, EDBT'12
PPT
Loop Fusion for Memory Space Optimization
PPTX
module 1-2 - Design Methods, parameters and examples.pptx
PDF
It5304 syllabus
PPTX
Complier design
PPT
Introduction A digital circuit design
PDF
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
PDF
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
PPT
DOCX
Fpga lecture
PDF
FPGA Based VLSI Design
PPTX
arsi n group-fpga fpga advance.......pptx
PPTX
introduction COA(M1).pptx
PPTX
SoC FPGA Technology
PPTX
Memory Organization
PPT
PPS
Interconnect Architectures
PPT
Sathya Final review
Chapter_11_memory_system this is part of computer architecture.pptx
Designing memory and array structures.pptx
VAST-Tree, EDBT'12
Loop Fusion for Memory Space Optimization
module 1-2 - Design Methods, parameters and examples.pptx
It5304 syllabus
Complier design
Introduction A digital circuit design
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 13 - The R-Stream High-Level Program Transformation Tool / Pr...
Fpga lecture
FPGA Based VLSI Design
arsi n group-fpga fpga advance.......pptx
introduction COA(M1).pptx
SoC FPGA Technology
Memory Organization
Interconnect Architectures
Sathya Final review
Ad

Recently uploaded (20)

PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PPT on Performance Review to get promotions
PPTX
Lecture Notes Electrical Wiring System Components
PPT
introduction to datamining and warehousing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT on Performance Review to get promotions
Lecture Notes Electrical Wiring System Components
introduction to datamining and warehousing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Operating System & Kernel Study Guide-1 - converted.pdf
Safety Seminar civil to be ensured for safe working.
UNIT-1 - COAL BASED THERMAL POWER PLANTS
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Foundation to blockchain - A guide to Blockchain Tech
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Ad

Computre_Engineering_Introduction_FPGA.ppt

  • 1. 1 FPGA Design and Implementation
  • 2. ASIC & VLSI • Time-to-market: Some large ASICs can take a year or more to design. • Design Issues: you need a lot of time to handles the mapping, routing, placement, and timing. • The FPGA design flow eliminates the complex and time- consuming floorplanning, place and route, timing analysis.
  • 4. FPGA • Speed (Memory BRAM & Distributed) • (RAM lost data). // size and cost • Floating point & Fixed point issue. • Flex. 08/28/09
  • 5. Design Entry Technology Mapping Placement Routing Programming Unit Configured FPGA Design Flow Process Diagram
  • 6. Why HDL? • To allow the designer to implement and verify complex hardware functionality at a high level, without the requirement of having to know the details of the low- level design implementation. • Advantage: • FPGAs have lower prototyping costs • FPGAs have shorter production times • Synthesis: The process which translates VHDL code into a complete circuit with logical elements( gates, flip flops, etc…).
  • 7. Maximum Throughput Designs • Dataflow • Unrolling • Pipelining • Merging
  • 8. Loop Unrolling • arrays a[i], b[i] and c[i] are mapped to RAMs. • Rolled Loop: This implementation takes four clock cycles, one multiplier and each RAM can be a single port. • Unrolled Loop: The entire loop operation can be performed in a single clock cycle. requires four multipliers and requires the ability to perform 4 reads and 4 write in the same clock cycle; may require the arrays be implemented as register arrays rather than RAM.
  • 11. Pipelining • pipelining allows operations to happen concurrently.
  • 12. Pipelining • Function pipelining is only possible as there is no resource contention or data dependency which prevents pipelining. The input array “m[2]” is implemented with a single-port RAM. The function cannot be pipelined because the two reads operations on input “m[2]” (“op_Read_m[0]” and “op_Read_m[1]”) cannot be performed in the same clock cycle. • Solution: The resource contention problem could be solved by using a dual-port RAM for array “m[2]", allowing both reads to be performed in the same clock cycle or increasing the the interval of pipeline
  • 14. Array Optimizations • Mapping: When there are many small arrays mapping to a single large array will reduce the storage overhead. • Partitioning: If each small array gets a separate memory, a lot of memory space is potentially wasted and the design will be large and consequently large power consumption. • Horizontal mapping: this corresponds to creating a new array by concatenating the original arrays. Physically, this gets implemented as a single array with more elements. • Vertical mapping: this corresponds to creating a new array by concatenating the original words in the array. Physically, this gets implemented by a single array with a larger bit-width.
  • 16. Horizontal mapping • Although horizontal mapping can result in using less RAM components and hence improve area, it can have an impact on throughput and performance. • In the previous example both the accesses to "array1" and "array2" can be performed in the same clock cycle. • If both arrays are mapped to the same RAM this will now require a separate access, and clock cycle, for each read operation.
  • 18. Array Partitioning • Arrays can also be partitioned into smaller arrays because it has a limited amount of read ports and write ports which can limit the throughput of a load/store intensive algorithm. • The bandwidth can sometimes be improved by splitting up the original array (a single memory resource) into multiple smaller arrays (multiple memories), effectively increasing the number of ports.
  • 19. Array Partitioning • If the elements of an array are accessed one at a time, an efficient implementation in hardware is to keep them grouped together and mapped into a RAM. • If multiple elements of an array are required simultaneously, it may be more advantageous for performance to implement them as individual registers: allowing parallel access to the data. • Implementing an array of storage elements as individual registers may help performance but this consume large area and increase power consumption.
  • 20. xa7a100tfgg484-2i 2-D for size N =128*128 Input Array Dual port Independent Registers LUT 1642 10778 FF 835 9548 Power 246 2031