2. ASIC Timing: Role of CAD Tools
2
ASIC timing has deep interactions with logic and layout
synthesis.
Logic
Synthesis
Layout
Synthesis
Connected cells with
delay constraints on
signal paths
Placed cells
with real locations,
real connecting wires
High-level description
+ Timing Specifications
3. ASIC Timing: Role of CAD Tools
3
Requirement on timing analysis
Logic-side tools must estimate delays through
unplaced/unrouted logic.
Layout tools must estimate delays through placed/routed
logic.
Logic
Synthesis
Layout
Synthesis
4. Our Topics for ASIC Timing
4
Logic-side: StaticTiming Analysis
How do we estimate the worst-case timing through a logic
network?
Turns out to be longest paths through a graph, which
properly models the gates and wires.
Layout-side: Interconnect Delay Analysis
We place the gates, route the wires.Then, how do we estimate
wire delays?
The problem is built up on electrical circuit model.We will
show key results.
5. Timing Analysis at the Logic Level
5
Goal:Verify timing behavior of our logic design
Input:
A gate-level netlist.
Timing models of the gates and/or wires.
Output:
Signal arrival time at various points in the network.
Longest delays through gate network.
Does the netlist satisfy the timing requirement? If not, where
are key problems?
This is surprisingly complicated in the real world...
6. Analyzing Design Performance
6
Assume design is synchronous.
All storage is in explicit sequential elements, e.g., flip-flop elements.
Consequence: we can just focus on delays through combinational
gates. Flip
Flops
Flip
Flops
Combinational
Logic
(No feedback
loops)
Clock
Launch Capture
7. Question: Can’t We Just Simulate Logic?
7
What logic simulation does?
Determines how a system will behave by simulating the logical
function.
Gives the most accurate answer with good simulation models.
… but it is (practically) impossible to give a complete answer –
especially timing.
Requires examination of an exponential number of cases.
All possible input vectors …
With all possible relative timings …
Under all possible manufacturing variations …
We need a different, faster solution...
8. Timing Analysis: Basic Model
8
Assume we know clock cycle
E.g., 1GHz clock, cycle = 1ns.
For logic to work correctly, longest delay through
network must be shorter than the clock cycle.
Flip
Flops
Flip
Flops
Combinational
Logic
Clock
1ns
Longest delay
< Clock cycle
9. Timing Analysis: Gate Delay Models
9
First: we need a model of delay through each logic gate.
Delay of a single gate:
∆
What’s gate delay ∆?
∆
1
X Y Y
X
∆
14. In Reality: Gate Delay is Very Complex
14
Gate type affects delay
Waveform shape affects delay
∆ ∆
≠
Gate loading affects delay
Transition direction
affects delay
∆ ≠ ∆
∆ ∆ ∆ ∆
≠ ≠
15. In Reality: Gate Delay is Very Complex
15
Gate input pin affects delay
Why?At transistor level, inputs are not symmetric.
At nanoscale, delays are even statistical
Why? Depends on process, voltage, and thermal (PVT) variations.
∆ ∆
≠
∆
∆
PDF
200 240 280
∆
16. Our Model: Pin-to-Pin Delay
16
In our lecture, we keep it simple: Fixed, pin-to-pin delay
model
No slopes, transition direction, distributions. Loading effects
“pushed” into gate delay itself.
Per-pin delays are essential, but we will use just 1 value per
gate, for simplicity.
Turns out this is enough to see all the interesting algorithm
ideas.
∆=3
∆=3
∆=5
∆=5
17. Do We Consider Logical Function?
17
Does logic function matter?
Try an example, where we “erase” gates.
In this example: PI = Primary Input, PO = Primary Output
What is the longest delay? 20
∆=8
∆=2
∆=1
∆=8
∆=1
∆=2
∆=1
PI
PI
PI
PO
18. Now, Suppose We Know Logic Gates
18
We cannot sensitize this path: cannot make a logic change
at this input propagate down this path to change this output.
∆=8
∆=1
∆=8
∆=1
0
1
∆=2
0
1
∆=2
PO
PI
PI
PI
Can we indeed have the longest path? No!
19. Topological vs. Logical Timing Analysis
19
When we ignore logic, this is called Topological Analysis.
We only work with graph and delays, don’t consider logic.
We can get wrong answers: what we found was called a
False Path.
Going forward: we ignore logic (Too tough to deal with)
Assume that all paths are statically sensitizable.
Means: Can find a constant pattern of inputs to other PIs that
makes some output sensitive to some input.
Reminder: this is exactly the Boolean Difference concept of
sensitivity.
This timing analysis has a name: StaticTiming Analysis
(STA).
20. STA Representation: Delay Graph
20
From gate-level network, we build a delay graph.
Vertices: Wires in gate network, one per gate output, also one
for each PI and PO.
Edges: Input pin to output pin of gate in network (one edge
per input pin). Put gate delays on edges.
∆=4
∆=4
PI
PI
c
PI
PO
∆=3
∆=3
e
a
b
d
a
b
c
d
e
4
4
3
3
21. Delay Graph
21
Common convention:Add Source/Sink nodes
Add one “source” (src) node that has a 0-weight edge to
each PI.
Add one “sink” (snk) node that has a 0-weight edge from
each PO.
Why do this?
Now, the network has exactly 1 “entry” node, and 1 “exit” node.
All the longest (or shortest) path question have same start/end
nodes.
a
b
c
d
e
4
4
3
3
snk
src
0
0
0
0
22. Representation: Delay Graph
22
What about interconnect delay?
Can still use delay graph: model each wire as a “special” gate
that just has a delay.
∆=4
∆=4
PI
PI
c
PI
PO
∆=3
∆=3
q
a
b d
e
w
z
x
y
∆=1
∆=2
∆=2
∆=1
∆=2
a
b
c
d
e
4
4
3
3
snk
src
0
0
0
0
x
y
w
z
q
1
2
2 2
1
23. Operations on Delay Graph
23
So how do we use delay graph to do timing analysis?
What we don’t do:Try to enumerate all the source-to-sink
paths.
Why not? Exponential explosion in number of paths, even for
small graph.
There’s a smarter answer: Node-oriented timing analysis
Find, for each node in delay graph, worst delay to the node
along any path.
0 1 2 n
… How many paths
from 0 to n?
2𝑛
24. Define Values on Nodes in Delay Graph
24
ArrivalTime at a node (AT)
AT(n) = Latest time the signal can become stable node n
Think: Longest path from source
Required ArrivalTime at node (RAT)
RAT(n) =Latest time the signal is allowed to become
stable at node n
Think: Longest path to sink
snk
src
n
Other paths
AT RAT
25. Define Values on Nodes in Delay Graph
25
Slack at node n: Slack(n) = RAT(n) –AT(n)
Amount of timing “margin” for the signal: positive is good,
negative is bad.
Determined by longest path through node.
Amount by which a signal can be delayed at node and
not increase the longest path through the network
Can increase delay at node (to minimize power, circuit
area) with positive slack and not degrade overall
performance.
snk
src
n
Other paths
AT RAT
Slack(n) = RAT(n) –AT(n)
26. Slack is Hugely Important in Timing Analysis
26
About slacks
Defined so negative slack always bad: it indicates a timing
problem.
Measures “sensitivity” of network to this node’s delay.
Positive slack
Good: can change something at this node, and not hurt network’s
overall timing.
Example: make this node slower, maybe save some power, not hurt
timing.
Negative slack
Bad: have problem at this node; more negative the slack, bigger the
problem.
Looking for a node to “fix” to help timing?These nodes are where to
look first.These affect the critical paths the most.
27. How To Compute ATs? Recursively
27
AT(n) = maximum delay to n =
0, if n is source
max {AT(p)+∆(p,n)}, else
p ∈ prec(n)
snk
src n
*
*
p
*
*
s
…
…
predecessor
paths
successor
paths
predecessor successor
∆(p,n)
28. How To Compute ATs?
28
Big idea
If we know the longest path to each predecessor of n, it’s a
simple “Maximum” operation to compute the longest path to
n itself.
src n
x
z
y
∆=7
∆=1
∆=5
AT(x)=5
AT(y)=10
AT(z)=5
AT(n) = max {AT(p)+∆(p,n)}
p ∈ {x,y,z}
= max {5+7, 10+1, 5+5}
=12
29. How To Compute RATs?
29
RAT(n): Latest time in cycle where n could change and signal
would still propagate to sink before end of cycle.
First, what is RAT(snk)?
How about internal node n?
snk
src n
*
*
p
*
*
s
…
…
predecessor
paths
successor
paths
predecessor successor
∆(n,s)
RAT(n) = min {RAT(s)−∆(n,s)}
s ∈ succ(n)
RAT(snk) = CycleTime
30. How To Compute RATs? Recursively
30
snk
src n
*
*
p
*
*
s
…
…
predecessor
paths
successor
paths
predecessor successor
∆(n,s)
RAT(n) =
CycleTime, if n is sink
min {RAT(s)−∆(n,s)}, else
s ∈ succ(n)
31. ATs versus RATs: Look at Clock Cycle
31
Why the differences betweenAT and RAT definitions?
AT(n) =
0, if n is source
max {AT(p)+∆(p,n)}, else
p ∈ prec(n)
RAT(n) =
CycleTime, if n is sink
min {RAT(s)−∆(n,s)}, else
s ∈ succ(n)
AT(n)
Launch Capture
Clock CycleTime
AT: longest logic
delay after launch
edge of clock.
RAT: longest logic
delay to the capture
edge of clock
RAT(n) longest
32. Negative Slack is BAD!
32
AT(n)
Launch Capture
Clock CycleTime
RAT(n)
Slack = RAT –AT is Negative!
Signal arrives too late, and
there is too much delay
from node to output.
Signal does not arrive at flip
flop input before the capture
edge of clock.
33. Example
33
Suppose clock cycle is 12.
AT=longest path from source TO node.
RAT=(cycle time 12) – (longest path FROM node to sink).
Slack = RAT – AT
src
a
c
b snk
d
e
f
g
h
i
j
k
0
0
0
1
4
1
2
3
5
3
2
1
3
4 2
0
0
0
5
35. Compute RATs
35
Clock cycle is 12.
src
a
c
b snk
d
e
f
g
h
i
j
k
0
0
0
1
4
1
2
3
5
3
2
1
3
4 2
0
0
0
5
Compute RATs from snk to src
-3
-3
-1
2
-2
4
3
10
7
12
12
12
12
36. Compute Slack
36
Slack = RAT - AT
src
a
c
b snk
d
e
f
g
h
i
j
k
0
0
0
1
4
1
2
3
5
3
2
1
3
4 2
0
0
0
5
-3
-3
-1
2
-2
4
3
10
7
12
12
12
12
0
0
0
0
1
2
6
4
10
7
12
15
15
-3
-3
-1
2
-3
2
-3
6
-3
5
0
-3
-3
37. Analyzing the Example
37
Worst (most negative) slack is -3.
Big results:
Your timing violation at sink = the worst slack value.
The worst slack appears along this entire worst path.
src
a
c
b snk
d
e
f
g
h
i
j
k
0
0
0
1
4
1
2
3
5
3
2
1
3
4 2
0
0
0
5
-3
-3
-1
2
-2
4
3
10
7
12
12
12
12
0
0
0
0
1
2
6
4
10
7
12
15
15
-3
-3
-1
2
-3
2
-3
6
-3
5
0
-3
-3
38. Analyzing the Example
38
Look at those slacks
A negative slack at an output (PO) means a failed timing
requirement.
A negative slack on internal node n means there is a path from n
to some problem PO.
So, slacks are hugely useful!
Beyond just knowing what is the worst path, slacks tell us the
problem gates on this path.
39. The Most Typical STA Problem
39
Answer this problem:What are all the too-slow paths that
violate timing?
Most useful report:
Report paths in order, from slowest to fastest.
In other words: Enumerate these paths, in delay order.
Flip
Flops
Flip
Flops
Logic
Clock
40. What Do We Need?
40
Calculate all the ATs.
Calculate all the RATs.
Calculate all the Slacks.
… do all of this very efficiently: Delay graphs are huge!
…enumerate the violating paths, in worst delay order.
src
a
c
b snk
d
e
f
g
h
i
j
k
0
0
0
1
4
1
2
3
5
3
2
1
3
4 2
0
0
0
5
-3
-3
-1
2
-2
4
3
10
7
12
12
12
12
0
0
0
0
1
2
6
4
10
7
12
15
15
-3
-3
-1
2
-3
2
-3
6
-3
5
0
-3
-3
41. Computational Strategy
41
Topological sorting (“Topsorting”) the delay graph.
Sort the vertices in the delay graph into one single ordered list.
Essential property: if there is an edge from 𝑝 to 𝑠, then 𝑝
appears before 𝑠 in sorted order.
ComputeATs by going forward through the sorted list.
Compute RATs by going backward through the sorted list.
b
c
d
3
4
5
11
9
6
15
e
a f
LegalTopsorting Order
a, b, c, d, e, f
a, b, d, c, e, f
42. Assume Have Topsort: Compute ATs
42
computeATs() {
AT(SRC) = 0;
foreach ( n in topsort order ) {
AT(n) = -∞;
foreach ( node p in pred(n) )
AT(n) = max( AT(n), AT(p) + ∆(p,n) );
}
}
snk
src n
*
*
p
*
*
s
…
…
predecessor successor
∆(p,n)
43. Compute RATs
43
Trick: Pretend all edges are reversed, they point from SNK to
SRC, and walk graph backwards.
computeRATs() {
RAT(sink) = CycleTime;
foreach ( n in reverse topsort order ) {
RAT(n) = ∞;
foreach (successor s in succ(n) )
RAT(n) = min( RAT(n), RAT(s) - ∆(n,s) );
}
}
snk
src n
*
*
p
*
*
s
…
…
predecessor successor
∆(n,s)
44. Using Slack For Path Reporting
44
Useful slack property: all nodes on longest path have same worst
slack value.
Surprising result: slack let us can find N worst paths, even
though we did not trace them all.
b
c
d
3
4
5
11
9
6
15
e
a f
AT=0
RAT=0
Slack=0
AT=3
RAT=3
Slack=0
AT=8
RAT=23
Slack=15
AT=4
RAT=5
Slack=1
AT=14
RAT=14
Slack=0
AT=29
RAT=29
Slack=0
Assume clock cycle = 29
45. N-Worst Path Reporting
45
We evolve partial paths; each partial path stores 3 things:
(Path itself, Delay of this path, Slack of the final node on path)
We store the partial paths in a min heap, which is indexed on
the Slack value.
Initially this heap contains only the source node.
Algorithm is quite simple (and just like maze routing!).
Expand: Pop partial path off the heap – it has the smallest (most
negative) slack.
Reach target? If its end node is the sink, print out the path.
Reach: Else add each successor node to make new partial paths,
push them back onto the heap, each with
(Path, Delay, Slack) labeled.
Repeat until N paths are reported – go pop next partial path.
46. Worst Case Path Reporting: Example
46
Min Heap
(a,0,0)
Expand path a,
reach b & c
Min Heap
(a-b,3,0)
(a-c,4,1)
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
Min heap entry of the form (Path, Delay, Slack)
Initially, heap contains only the source node.
47. Worst Case Path Reporting: Example
47
Expand path a-b,
reach d & e
Min Heap
(a-b,3,0)
(a-c,4,1)
Min Heap
(a-b-e,14,0)
(a-c,4,1)
(a-b-d,8,15)
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
48. Worst Case Path Reporting: Example
48
Expand path a-b-e,
reach f
Min Heap
(a-b-e,14,0)
(a-c,4,1)
(a-b-d,8,15)
Min Heap
(a-c,4,1)
(a-b-d,8,15)
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
f is sink!. Report 1st
worst path a-b-e-f,
with delay=29
49. Worst Case Path Reporting: Example
49
Expand path a-c,
reach e
Min Heap
(a-c,4,1)
(a-b-d,8,15)
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
Min Heap
(a-c-e,13,0)
(a-b-d,8,15)
50. Worst Case Path Reporting: Example
50
Expand path a-c-e,
reach f
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
Min Heap
(a-c-e,13,0)
(a-b-d,8,15) Min Heap
(a-b-d,8,15)
f is sink!. Report 2nd
worst path a-c-e-f,
with delay=28
51. Worst Case Path Reporting: Example
51
Expand path a-b-d,
reach f
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
Min Heap
(a-b-d,8,15)
Min Heap
(EMPTY)
f is sink!. Report 3rd
worst path a-b-d-f,
with delay=14
Done!
52. Worst Case Path Reporting: Example
52
b
c
d
3
4
5
11
9
6
15
e
a f
Slack=0 Slack=15
Slack=0
Slack=1 Slack=0
Slack=0
Source Sink
We find three paths:
a-b-e-f, delay = 29
a-c-e-f, delay = 28
a-b-d-f, delay = 14.
Note: only 3 possible paths
from source to sink in graph,
so we found them correctly in
delay order!
53. Static Timing Analysis: Summary
53
STA is a very important step in design of complex ASICs.
It’s a critical “sign off” step, which means: you don’t get to
fabricate unless you pass.
Several big ideas
Gate level delay models matter, and can be pretty complex in
real world.
Logical ≠Topological path analysis (i.e., STA).
Build delay graph, calculate ATs, RATs, slacks recursively.
Concept of slack is big: lets us locate worst paths, and problem
gates on path.
A similar idea to maze routing lets us find worst paths in delay
order.
54. Static Timing Analysis: Aside
54
STA is a huge topic – several things we did not cover.
STA for sequential elements
How do we model flip flops and latches, so we can verify, e.g., that setup and
hold times are met? More tricks with delay graph.
Early mode versus late mode timing
Our development was only so-called late mode timing, where we care about
longest path. Early mode focuses on shortest paths, and is critical for more
advanced timing, e.g., with transparent latches.
Incremental STA
In practice, you change 10,000 gates out of 1,000,000 gates, you don’t want to
redo the whole STA analysis.Advanced methods can update incrementally.