Computre_Engineering_Introduction_FPGA.ppt

1
FPGA Design and Implementation

ASIC & VLSI
• Time-to-market: Some large ASICs can take a year or
more to design.
• Design Issues: you need a lot of time to handles the
mapping, routing, placement, and timing.
• The FPGA design flow eliminates the complex and time-
consuming floorplanning, place and route, timing
analysis.

Interconnect Resources
Logic Block
I/O Cell
CONCEPTUAL FPGA

FPGA
• Speed (Memory BRAM & Distributed)
• (RAM lost data). // size and cost
• Floating point & Fixed point issue.
• Flex.
08/28/09

Design Entry
Technology Mapping
Placement
Routing
Programming Unit
Configured FPGA
Design Flow
Process Diagram

Why HDL?
• To allow the designer to implement and verify complex
hardware functionality at a high level, without the
requirement of having to know the details of the low-
level design implementation.
• Advantage:
• FPGAs have lower prototyping costs
• FPGAs have shorter production times
• Synthesis: The process which translates VHDL code
into a complete circuit with logical elements( gates, flip
flops, etc…).

Maximum Throughput Designs
• Dataflow
• Unrolling
• Pipelining
• Merging

Loop Unrolling
• arrays a[i], b[i] and c[i] are mapped to RAMs.
• Rolled Loop: This implementation takes four clock cycles, one multiplier and each RAM can be a
single port.
• Unrolled Loop: The entire loop operation can be performed in a single clock cycle. requires four
multipliers and requires the ability to perform 4 reads and 4 write in the same clock cycle; may
require the arrays be implemented as register arrays rather than RAM.

Computre_Engineering_Introduction_FPGA.ppt

Pipelining
• pipelining allows operations to happen
concurrently.

Pipelining
• Function pipelining is only possible as there is no resource contention or data dependency which
prevents pipelining. The input array “m[2]” is implemented with a single-port RAM. The function
cannot be pipelined because the two reads operations on input “m[2]” (“op_Read_m[0]” and
“op_Read_m[1]”) cannot be performed in the same clock cycle.
• Solution: The resource contention problem could be solved by using a dual-port RAM for array
“m[2]", allowing both reads to be performed in the same clock cycle or increasing the the interval
of pipeline

Array Optimizations
• Mapping: When there are many small arrays mapping to a single large
array will reduce the storage overhead.
• Partitioning: If each small array gets a separate memory, a lot of memory
space is potentially wasted and the design will be large and consequently
large power consumption.
• Horizontal mapping: this corresponds to creating a new array by
concatenating the original arrays. Physically, this gets implemented as a
single array with more elements.
• Vertical mapping: this corresponds to creating a new array by
concatenating the original words in the array. Physically, this gets
implemented by a single array with a larger bit-width.

Horizontal mapping
• Although horizontal mapping can result in using less RAM
components and hence improve area, it can have an impact on
throughput and performance.
• In the previous example both the accesses to "array1" and "array2"
can be performed in the same clock cycle.
• If both arrays are mapped to the same RAM this will now require a
separate access, and clock cycle, for each read operation.

Array Partitioning
• Arrays can also be partitioned into smaller arrays because it has a limited
amount of read ports and write ports which can limit the throughput of a
load/store intensive algorithm.
• The bandwidth can sometimes be improved by splitting up the original
array (a single memory resource) into multiple smaller arrays (multiple
memories), effectively increasing the number of ports.

Array Partitioning
• If the elements of an array are accessed one at a time, an efficient
implementation in hardware is to keep them grouped together and
mapped into a RAM.
• If multiple elements of an array are required simultaneously, it may
be more advantageous for performance to implement them as
individual registers: allowing parallel access to the data.
• Implementing an array of storage elements as individual registers
may help performance but this consume large area and increase
power consumption.

xa7a100tfgg484-2i
2-D for size N =128*128
Input Array Dual port Independent Registers
LUT 1642 10778
FF 835 9548
Power 246 2031

Computre_Engineering_Introduction_FPGA.ppt

More Related Content

Similar to Computre_Engineering_Introduction_FPGA.ppt (20)

Recently uploaded (20)

Computre_Engineering_Introduction_FPGA.ppt