SlideShare a Scribd company logo
C-Based VLSI Design- An Overview
Dr. Chandan Karfa
Department of Computer Science and Engineering
IIT Guwahati 1
2
VLSI Design Flow
System Specification
Architectural Design
High-level Synthesis
Logic Synthesis
Physical Design
Fabrication
Packaging & Testing
IIT Guwahati
High-Level Synthesis (HLS)/C-Based VLSI Design
• C Gates
• Design time 
• Design complexity  (10x)
• Verification effort 
• Hardware/software co-design 
4
Importance of Design Automation
• Shorter design cycle
• Design space exploration
• Fewer errors in the design
• Less Verification efforts
• Specification driven optimization at the higher abstraction level
IIT Guwahati
C-Based VLSI Design
• Enables designs at higher abstraction level (e.g., C, C++, Java)
• 14 out of the top-20 semiconductor companies use HLS tools
• Communications, signal processing, computation, crypto, healthcare,
etc.
• Tailor
• Tailor implementation to match characteristics of target technology (e.g.,
speed, resources, area budget)
– Video components in Tegra X1 chip designed using Catapult HLS.
– NVIDIA 4K processing was designed with C-based HLS.
– Qualcomm designing parts of Snapdragon with Catapult HLS.
– Vivado HLS is part of xilinx design flow
– intel HLS is part of quartus design flow
5
6
HLS
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
always @(posedge ap_clk) begin
if(1'b1 == ap_CS_fsm_state5) begin
j_reg_126 <= j_4_reg_293;
end else if((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1)) begin
j_reg_126 <= 3'd0;
end
end
assign tmp_108_fu_235_p1_temp_6 = tmp_108_fu_235_p1 & 63'd12;
assign statemt_addr_28_reg_324_temp_7 = statemt_addr_28_reg_324 &
4'd19;
assign tmp_108_fu_235_p1_temp_6_temp_8 = tmp_108_fu_235_p1_temp_6
| statemt_addr_28_reg_324_temp_7;
ap_ST_fsm_state2: begin
if((exitcond_fu_175_p2 == 1'd1) & (1'b1 == ap_CS_fsm_state2)) begin
ap_NS_fsm = ap_ST_fsm_state1;
end else begin
ap_NS_fsm = ap_ST_fsm_state3;
end
end
7
HLS
Data-path
Controller
High-level Behaviour
High-level Synthesis
Register Transfer Level Description
IIT Guwahati
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
8
• Preprocessing: Intermediate representation (CDFG)
construction, data-dependency, live variable analysis,
compiler optimization.
• Scheduling: Assigns control step to the operations of the input
behaviour.
• Allocation: Computes minimum number of functional units
and registers.
• Binding: Variables are mapped to registers, operation to
functional units, data transfers to the interconnection units.
• Data path & Controller design: controller is designed based
on inter connections among the data path elements, data
transfer required in different control steps.
High-level Synthesis Steps
IIT Guwahati
9
High-level Synthesis Steps
| * |
<6 *>
<7 *>
5.
<3 *>
| * |
4.
<5 - >
| * |
| * |
6.
<8 - >
<9 +>
7.
| * |
<4 * >
3.
<0 * >
<2 + >
2.
< 1 *>
1.
Input behaviour
R1 : 3, v1
R2 : x u, v5
R3 : v0, v6
R4 : v3
FU1: op1, on3. ..
FU2: op2, op5, …
FU3: …
scheduling
Data-path
generation
Allocation &
binding
FU1:
Controller
generation
Data-path
Controller
Control signal
status signal
RTL behaviour
IIT Guwahati
pre-
processing
Working with an example
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
CDFG
Preprocessing
I
Read(p1, dx)
Read(p2, x)
Read(p3, a)
Read(p1,y)
Read(p2, u)
c = x < a
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
B2
Write(p1, y)
Basic Blocks with 3-address codes Control and Dataflow graph (CDFG)
Example: 2nd order differential equation solver
Diffeq: (x, dx, u, a, clock, y)
input: x, dx, u, a, clock;
output: y
while(x < a)
u1 = u-(3*x*u*dx)-(3*y*dx);
y1 = y+(u*dx);
x1 = x+dx;
x = x1, y = y1, u = u1;
end
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
Preprocessing
Preprocessing
IIT Guwahati 13
B1
V1 : t1 = u * dx
V2 : t2 = 3 * x
V3 : t3 = 3 * y
V4 : t4 = u * dx
V5 : t5 = t1 * t2
V6 : t6 = t3 * dx
V7 : t7 = u – t5
V8 : u = t7 – t6
V9 : y = y + t4
V10 : x = x + dx
V11 : c = x < a
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Date dependency graph
a
Scheduling
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
IIT Guwahati 14
Register Allocation and Binding
IIT Guwahati 15
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
Interval graph
Var
t1
t8
t7
t6
t5
t4
t3
t2
u
x
dx
y
c
3
a
S1 S4
S3
S2
R2
R1
R3
R2
R1
R2
R1
R11
R10
R9
R8
R7
R6
R5
R4
FU Allocation and Binding: Multiplier
V1
V6
V5
V4
V3
V2
M1
M2
M3
M2
M1
M3
S1 S4
S3
S2
IIT Guwahati 16
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
Mult operations with non-overlapping schedule
can be mapped to the same Multiplier FU
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
FU Allocation and Binding: Adder
IIT Guwahati 17
Var
V10
V8
V7
V9
S1 S4
S3
S2
A1
A1
A1
A1
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
FU allocation and Binding
A1: v10, v9, v7, v6
Add/Sub operations with non-overlapping
schedule can be mapped to the same adder FU
Functional Unit Allocation and Binding
IIT Guwahati 18
FU alloc and bind:
MULT: M1: V1, V5
MULT: M2: V2, V3
MULT: M3: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
Register Transfer Level (RTL) Behaviour
IIT Guwahati 19
S1:
V1 : t1 = u * dx
V2 : t2 = 3 * x
V4 : t4 = u * dx
V10 : x = x + dx
S2:
V5 : t5 = t1 * t2
V3 : t3 = 3 * y
V9 : y = y + t4
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
R1: t1, t3, t6
R2: t2, t5, t7
R3: t4
R4: t8
R5: u
R6: x
R7: dx
R8: y
R9: c
R10: 3
R11: a
Original behaviour
Register mapping
RTL behaviour
FU alloc and bind:
MULT: M1: V1, V5
MULT: M1: V2, V3
MULT: M1: V4, V6
ADD: A1: V7, V8, V9, V10
COMP: C1: V11
FU mapping
Datapath Synthesis
IIT Guwahati 20
FU
R1, R2, R1, R5, R4 R6, R1, R5, R6, R2
R1, R1, R2, R7, R4
Data path Synthesis
IIT Guwahati 21
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
Data path Generation
IIT Guwahati 22
Controller Synthesis
IIT Guwahati 23
*
ALU
DATA-PATH CONTROL-UNIT
r2
r1
u
y
x
dx
3
a
REGISTERS
enable
Mux control
ALU control (+,-,<)
c
Control Signals
IIT Guwahati 24
Control Assertion Pattern: <FU, FU_MUX_in, Reg-en, Reg_Mux_in>
FU: 1 bit
FU_MUX_in: 7 bits
Reg_en: 11 bits
Reg_MUX_in: 2 bits
Total: 21 bits
S1: <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
S2: <1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0 0, 1, 0>
S3: <…..>
S4: <…..>
S1:
V1 R1 = R5 <M1> R7
V2 R2 = R10 <M2> R6
V4 R3 = R5 <M3> R7
V10 R4 = R6 <A> R7
S2:
V5 R2 = R1 <M1> R2
V3 R1 = R10 <M2> R8
V9 R8 = R8 <A> R3
Final RTL
IIT Guwahati 25
<1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
26
High-level Synthesis Steps
| * |
<6 *>
<7 *>
5.
<3 *>
| * |
4.
<5 - >
| * |
| * |
6.
<8 - >
<9 +>
7.
| * |
<4 * >
3.
<0 * >
<2 + >
2.
< 1 *>
1.
Input behaviour
R1 : 3, v1
R2 : x u, v5
R3 : v0, v6
R4 : v3
FU1: op1, on3. ..
FU2: op2, op5, …
FU3: …
scheduling
Data-path
generation
Allocation &
binding
FU1:
Controller
generation
Data-path
Controller
Control signal
status signal
RTL behaviour
IIT Guwahati
pre-
processing
Topics to be covered
• Scheduling Possibilities
• Register and FU allocation and binding
• Datapath and Controller Synthesis
IIT Guwahati 27
Scheduling Possibilities
IIT Guwahati 28
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required At least 4 Multipliers required
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1 t3
t5
t8
t6
t7
t2 t4
Scheduling Possibilities
IIT Guwahati 29
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
* *
*
<
+
+ c
I
V1 V2
V5
V8
V10
V11
V6 V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
S1
S2
S4
S3
u dx x
3 y dx
*
-
*
*
-
*
*
*
<
+
+
c
I
V1 V2
V5
V8
V10
V11
V6
V9
V3
V4
V7
t1
t3
t5
t8
t6
t7
t2
t4
At least 3 Multipliers required
At least 2 Multipliers required
Automation of register and FU allocation and
binding
• How to automate register allocation and binding?
• How to automate FU allocation and binding
• Map the problem to Graph colouring problem or clique partitioning
problem and solve.
IIT Guwahati 30
Data path and Controller Synthesis:
-Mux based and Bus based architecture
-Various way to optimize interconnections
Thank You
IIT Guwahati 31

More Related Content

PDF
Vectorization in ATLAS
PDF
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
PPTX
Esd module2
PPT
ERTS UNIT 3.ppt
PDF
Vectorization on x86: all you need to know
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
PPT
computer architecture 4
PPT
pipeline and vector processing
Vectorization in ATLAS
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
Esd module2
ERTS UNIT 3.ppt
Vectorization on x86: all you need to know
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
computer architecture 4
pipeline and vector processing

Similar to W1M2_Introduction_HLS from under CBased VLSI.pdf (20)

PDF
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
PPT
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
PPT
Unit 3-pipelining &amp; vector processing
PDF
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
PDF
Parallel Processing Techniques Pipelining
PPT
Pipelining (COA)okokokokokokokokokokok.ppt
PDF
Slides13.pdf
PDF
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
PPT
Unit 6 of OS in computer science and engineering
PDF
ESL Anyone?
PDF
design-compiler.pdf
PDF
Enumerating cycles in bipartite graph using matrix approach
PPTX
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
PDF
Introduction to Polyhedral Compilation
PPT
Data Acquisition
PPT
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
PPT
Presentation on code optimization for compiler design
PPTX
C PROGRAMS - SARASWATHI RAMALINGAM
PDF
OptimizingARM
PDF
M|18 Querying Data at a Previous Point in Time
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Unit 3-pipelining &amp; vector processing
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
Parallel Processing Techniques Pipelining
Pipelining (COA)okokokokokokokokokokok.ppt
Slides13.pdf
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Unit 6 of OS in computer science and engineering
ESL Anyone?
design-compiler.pdf
Enumerating cycles in bipartite graph using matrix approach
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Introduction to Polyhedral Compilation
Data Acquisition
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Presentation on code optimization for compiler design
C PROGRAMS - SARASWATHI RAMALINGAM
OptimizingARM
M|18 Querying Data at a Previous Point in Time
Ad

Recently uploaded (20)

PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
GDM (1) (1).pptx small presentation for students
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Presentation on HIE in infants and its manifestations
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
01-Introduction-to-Information-Management.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Final Presentation General Medicine 03-08-2024.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
2.FourierTransform-ShortQuestionswithAnswers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Pharma ospi slides which help in ospi learning
GDM (1) (1).pptx small presentation for students
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Presentation on HIE in infants and its manifestations
Anesthesia in Laparoscopic Surgery in India
01-Introduction-to-Information-Management.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Final Presentation General Medicine 03-08-2024.pptx
Ad

W1M2_Introduction_HLS from under CBased VLSI.pdf

  • 1. C-Based VLSI Design- An Overview Dr. Chandan Karfa Department of Computer Science and Engineering IIT Guwahati 1
  • 2. 2 VLSI Design Flow System Specification Architectural Design High-level Synthesis Logic Synthesis Physical Design Fabrication Packaging & Testing IIT Guwahati
  • 3. High-Level Synthesis (HLS)/C-Based VLSI Design • C Gates • Design time  • Design complexity  (10x) • Verification effort  • Hardware/software co-design 
  • 4. 4 Importance of Design Automation • Shorter design cycle • Design space exploration • Fewer errors in the design • Less Verification efforts • Specification driven optimization at the higher abstraction level IIT Guwahati
  • 5. C-Based VLSI Design • Enables designs at higher abstraction level (e.g., C, C++, Java) • 14 out of the top-20 semiconductor companies use HLS tools • Communications, signal processing, computation, crypto, healthcare, etc. • Tailor • Tailor implementation to match characteristics of target technology (e.g., speed, resources, area budget) – Video components in Tegra X1 chip designed using Catapult HLS. – NVIDIA 4K processing was designed with C-based HLS. – Qualcomm designing parts of Snapdragon with Catapult HLS. – Vivado HLS is part of xilinx design flow – intel HLS is part of quartus design flow 5
  • 6. 6 HLS High-level Behaviour High-level Synthesis Register Transfer Level Description IIT Guwahati Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end always @(posedge ap_clk) begin if(1'b1 == ap_CS_fsm_state5) begin j_reg_126 <= j_4_reg_293; end else if((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1)) begin j_reg_126 <= 3'd0; end end assign tmp_108_fu_235_p1_temp_6 = tmp_108_fu_235_p1 & 63'd12; assign statemt_addr_28_reg_324_temp_7 = statemt_addr_28_reg_324 & 4'd19; assign tmp_108_fu_235_p1_temp_6_temp_8 = tmp_108_fu_235_p1_temp_6 | statemt_addr_28_reg_324_temp_7; ap_ST_fsm_state2: begin if((exitcond_fu_175_p2 == 1'd1) & (1'b1 == ap_CS_fsm_state2)) begin ap_NS_fsm = ap_ST_fsm_state1; end else begin ap_NS_fsm = ap_ST_fsm_state3; end end
  • 7. 7 HLS Data-path Controller High-level Behaviour High-level Synthesis Register Transfer Level Description IIT Guwahati Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end
  • 8. 8 • Preprocessing: Intermediate representation (CDFG) construction, data-dependency, live variable analysis, compiler optimization. • Scheduling: Assigns control step to the operations of the input behaviour. • Allocation: Computes minimum number of functional units and registers. • Binding: Variables are mapped to registers, operation to functional units, data transfers to the interconnection units. • Data path & Controller design: controller is designed based on inter connections among the data path elements, data transfer required in different control steps. High-level Synthesis Steps IIT Guwahati
  • 9. 9 High-level Synthesis Steps | * | <6 *> <7 *> 5. <3 *> | * | 4. <5 - > | * | | * | 6. <8 - > <9 +> 7. | * | <4 * > 3. <0 * > <2 + > 2. < 1 *> 1. Input behaviour R1 : 3, v1 R2 : x u, v5 R3 : v0, v6 R4 : v3 FU1: op1, on3. .. FU2: op2, op5, … FU3: … scheduling Data-path generation Allocation & binding FU1: Controller generation Data-path Controller Control signal status signal RTL behaviour IIT Guwahati pre- processing
  • 10. Working with an example Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end CDFG
  • 11. Preprocessing I Read(p1, dx) Read(p2, x) Read(p3, a) Read(p1,y) Read(p2, u) c = x < a B1 V1 : t1 = u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a B2 Write(p1, y) Basic Blocks with 3-address codes Control and Dataflow graph (CDFG) Example: 2nd order differential equation solver Diffeq: (x, dx, u, a, clock, y) input: x, dx, u, a, clock; output: y while(x < a) u1 = u-(3*x*u*dx)-(3*y*dx); y1 = y+(u*dx); x1 = x+dx; x = x1, y = y1, u = u1; end
  • 12. B1 V1 : t1 = u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a Preprocessing
  • 13. Preprocessing IIT Guwahati 13 B1 V1 : t1 = u * dx V2 : t2 = 3 * x V3 : t3 = 3 * y V4 : t4 = u * dx V5 : t5 = t1 * t2 V6 : t6 = t3 * dx V7 : t7 = u – t5 V8 : u = t7 – t6 V9 : y = y + t4 V10 : x = x + dx V11 : c = x < a u dx x 3 y dx * - * * - * * * < + + c V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 Date dependency graph a
  • 14. Scheduling S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 IIT Guwahati 14
  • 15. Register Allocation and Binding IIT Guwahati 15 R1: t1, t3, t6 R2: t2, t5, t7 R3: t4 R4: t8 R5: u R6: x R7: dx R8: y R9: c R10: 3 R11: a S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 Interval graph Var t1 t8 t7 t6 t5 t4 t3 t2 u x dx y c 3 a S1 S4 S3 S2 R2 R1 R3 R2 R1 R2 R1 R11 R10 R9 R8 R7 R6 R5 R4
  • 16. FU Allocation and Binding: Multiplier V1 V6 V5 V4 V3 V2 M1 M2 M3 M2 M1 M3 S1 S4 S3 S2 IIT Guwahati 16 MULT: M1: V1, V5 MULT: M2: V2, V3 MULT: M3: V4, V6 Mult operations with non-overlapping schedule can be mapped to the same Multiplier FU S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4
  • 17. FU Allocation and Binding: Adder IIT Guwahati 17 Var V10 V8 V7 V9 S1 S4 S3 S2 A1 A1 A1 A1 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 FU allocation and Binding A1: v10, v9, v7, v6 Add/Sub operations with non-overlapping schedule can be mapped to the same adder FU
  • 18. Functional Unit Allocation and Binding IIT Guwahati 18 FU alloc and bind: MULT: M1: V1, V5 MULT: M2: V2, V3 MULT: M3: V4, V6 ADD: A1: V7, V8, V9, V10 COMP: C1: V11
  • 19. Register Transfer Level (RTL) Behaviour IIT Guwahati 19 S1: V1 : t1 = u * dx V2 : t2 = 3 * x V4 : t4 = u * dx V10 : x = x + dx S2: V5 : t5 = t1 * t2 V3 : t3 = 3 * y V9 : y = y + t4 S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3 R1: t1, t3, t6 R2: t2, t5, t7 R3: t4 R4: t8 R5: u R6: x R7: dx R8: y R9: c R10: 3 R11: a Original behaviour Register mapping RTL behaviour FU alloc and bind: MULT: M1: V1, V5 MULT: M1: V2, V3 MULT: M1: V4, V6 ADD: A1: V7, V8, V9, V10 COMP: C1: V11 FU mapping
  • 20. Datapath Synthesis IIT Guwahati 20 FU R1, R2, R1, R5, R4 R6, R1, R5, R6, R2 R1, R1, R2, R7, R4
  • 21. Data path Synthesis IIT Guwahati 21 S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3
  • 23. Controller Synthesis IIT Guwahati 23 * ALU DATA-PATH CONTROL-UNIT r2 r1 u y x dx 3 a REGISTERS enable Mux control ALU control (+,-,<) c
  • 24. Control Signals IIT Guwahati 24 Control Assertion Pattern: <FU, FU_MUX_in, Reg-en, Reg_Mux_in> FU: 1 bit FU_MUX_in: 7 bits Reg_en: 11 bits Reg_MUX_in: 2 bits Total: 21 bits S1: <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1> S2: <1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0 0, 1, 0> S3: <…..> S4: <…..> S1: V1 R1 = R5 <M1> R7 V2 R2 = R10 <M2> R6 V4 R3 = R5 <M3> R7 V10 R4 = R6 <A> R7 S2: V5 R2 = R1 <M1> R2 V3 R1 = R10 <M2> R8 V9 R8 = R8 <A> R3
  • 25. Final RTL IIT Guwahati 25 <1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0 0, 0, 1>
  • 26. 26 High-level Synthesis Steps | * | <6 *> <7 *> 5. <3 *> | * | 4. <5 - > | * | | * | 6. <8 - > <9 +> 7. | * | <4 * > 3. <0 * > <2 + > 2. < 1 *> 1. Input behaviour R1 : 3, v1 R2 : x u, v5 R3 : v0, v6 R4 : v3 FU1: op1, on3. .. FU2: op2, op5, … FU3: … scheduling Data-path generation Allocation & binding FU1: Controller generation Data-path Controller Control signal status signal RTL behaviour IIT Guwahati pre- processing
  • 27. Topics to be covered • Scheduling Possibilities • Register and FU allocation and binding • Datapath and Controller Synthesis IIT Guwahati 27
  • 28. Scheduling Possibilities IIT Guwahati 28 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 At least 3 Multipliers required At least 4 Multipliers required S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4
  • 29. Scheduling Possibilities IIT Guwahati 29 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 S1 S2 S4 S3 u dx x 3 y dx * - * * - * * * < + + c I V1 V2 V5 V8 V10 V11 V6 V9 V3 V4 V7 t1 t3 t5 t8 t6 t7 t2 t4 At least 3 Multipliers required At least 2 Multipliers required
  • 30. Automation of register and FU allocation and binding • How to automate register allocation and binding? • How to automate FU allocation and binding • Map the problem to Graph colouring problem or clique partitioning problem and solve. IIT Guwahati 30 Data path and Controller Synthesis: -Mux based and Bus based architecture -Various way to optimize interconnections