4. 4
Importance of Design Automation
• Shorter design cycle
• Design space exploration
• Fewer errors in the design
• Less Verification efforts
• Specification driven optimization at the higher abstraction level
IIT Guwahati
5. C-Based VLSI Design
• Enables designs at higher abstraction level (e.g., C, C++, Java)
• 14 out of the top-20 semiconductor companies use HLS tools
• Communications, signal processing, computation, crypto, healthcare,
etc.
• Tailor
• Tailor implementation to match characteristics of target technology (e.g.,
speed, resources, area budget)
– Video components in Tegra X1 chip designed using Catapult HLS.
– NVIDIA 4K processing was designed with C-based HLS.
– Qualcomm designing parts of Snapdragon with Catapult HLS.
– Vivado HLS is part of xilinx design flow
– intel HLS is part of quartus design flow
5
6. 6
HLS
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
always @(posedge ap_clk) begin
if(1'b1 == ap_CS_fsm_state5) begin
j_reg_126 <= j_4_reg_293;
end else if((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1)) begin
j_reg_126 <= 3'd0;
end
end
assign tmp_108_fu_235_p1_temp_6 = tmp_108_fu_235_p1 & 63'd12;
assign statemt_addr_28_reg_324_temp_7 = statemt_addr_28_reg_324 &
4'd19;
assign tmp_108_fu_235_p1_temp_6_temp_8 = tmp_108_fu_235_p1_temp_6
| statemt_addr_28_reg_324_temp_7;
ap_ST_fsm_state2: begin
if((exitcond_fu_175_p2 == 1'd1) & (1'b1 == ap_CS_fsm_state2)) begin
ap_NS_fsm = ap_ST_fsm_state1;
end else begin
ap_NS_fsm = ap_ST_fsm_state3;
end
end
7. 7
HLS
Data-path
Controller
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
8. 8
• Preprocessing: Intermediate representation (CDFG)
construction, data-dependency, live variable analysis,
compiler optimization.
• Scheduling: Assigns control step to the operations of the input
behaviour.
• Allocation: Computes minimum number of functional units
and registers.
• Binding: Variables are mapped to registers, operation to
functional units, data transfers to the interconnection units.
• Data path & Controller design: controller is designed based
on inter connections among the data path elements, data
transfer required in different control steps.
High-level Synthesis Steps
IIT Guwahati
10. Working with an example
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
CDFG
11. Preprocessing
I
Read(p1, dx)
Read(p2, x)
Read(p3, a)
Read(p1,y)
Read(p2, u)
c = x < a
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
B2
Write(p1, y)
Basic Blocks with 3-address codes Control and Dataflow graph (CDFG)
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
12. B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
Preprocessing
13. Preprocessing
IIT Guwahati 13
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Date dependency graph
a
14. Scheduling
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
IIT Guwahati 14
15. Register Allocation and Binding
IIT Guwahati 15
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Interval graph
Var
t1
t8
t7
t6
t5
t4
t3
t2
u
x
dx
y
c
3
a
S1 S4
S3
S2
R2
R1
R3
R2
R1
R2
R1
R11
R10
R9
R8
R7
R6
R5
R4
16. FU Allocation and Binding: Multiplier
V1
V6
V5
V4
V3
V2
M1
M2
M3
M2
M1
M3
S1 S4
S3
S2
IIT Guwahati 16
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
Mult operations with non-overlapping schedule
can be mapped to the same Multiplier FU
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
17. FU Allocation and Binding: Adder
IIT Guwahati 17
Var
V10
V8
V7
V9
S1 S4
S3
S2
A1
A1
A1
A1
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
FU allocation and Binding
A1: v10, v9, v7, v6
Add/Sub operations with non-overlapping
schedule can be mapped to the same adder FU
18. Functional Unit Allocation and Binding
IIT Guwahati 18
FU alloc and bind:
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
19. Register Transfer Level (RTL) Behaviour
IIT Guwahati 19
S1:
V1 : t1 = u * dx
V2 : t2 = 3 * x
V4 : t4 = u * dx
V10 : x = x + dx
S2:
V5 : t5 = t1 * t2
V3 : t3 = 3 * y
V9 : y = y + t4
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
Original behaviour
Register mapping
RTL behaviour
FU alloc and bind:
MULT: M1: V1, V5
MULT: M1: V2, V3
MULT: M1: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
FU mapping
27. Topics to be covered
• Scheduling Possibilities
• Register and FU allocation and binding
• Datapath and Controller Synthesis
IIT Guwahati 27
28. Scheduling Possibilities
IIT Guwahati 28
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required At least 4 Multipliers required
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1 t3
t5
t8
t6
t7
t2 t4
29. Scheduling Possibilities
IIT Guwahati 29
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
* *
*
<
+
+ c
I
V1 V2
V5
V8
V10
V11
V6 V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required
At least 2 Multipliers required
30. Automation of register and FU allocation and
binding
• How to automate register allocation and binding?
• How to automate FU allocation and binding
• Map the problem to Graph colouring problem or clique partitioning
problem and solve.
IIT Guwahati 30
Data path and Controller Synthesis:
-Mux based and Bus based architecture
-Various way to optimize interconnections