SlideShare a Scribd company logo
Static Timing
Analysis
Part 2
Amr Adel Mohammady
/amradelm
/amradelm
/amradelm
/amradelm
Introduction
• In part 1 we went through the basic principles that are needed to understand all VLSI timing checks. In this parts we will go through
some of the checks in details
• The timing checks covered in this part are:
o Setup timing
o Hold timing
2
/amradelm
/amradelm
Setup Timing Analysis
3
/amradelm
/amradelm
Setup Time
4
At time T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒, Data A is launched
from FF1 to FF2. The data needs to make it to
FF2 before the next clock edge arrives at FF2
at time 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒. The next clock edge will
arrive after a clock period
1 The clock takes some time to reach FF1 due
to the buffers. The launch won’t happen
exactly at T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 but after the
delay/latency of the clock buffers
2
As we saw in part 1, once the clock reaches
the FF it takes some time to push the data
out to the Q pin. We called this time 𝑇𝑐𝑞. This
is the 1st delay data A encounters to reach
FF2
3
A
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒍𝒂𝒕𝒆𝒏𝒄𝒚
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
A
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑻𝒄𝒒
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
A
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
/amradelm
/amradelm
Setup Time
5
Data A will propagate through the
combinational path to reach FF2. This is the
2nd delay it encounters
4 As we saw in part 1, the FF requires the data to
arrive some time before the clock edge in
order to avoid metastability. We called this
time 𝑇𝑠𝑒𝑡𝑢𝑝. Hence, we shouldn’t capture data
at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 but at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝
5
The clock takes some time to reach FF2 due
to the buffers. The capture won’t happen
exactly at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 but after the
delay/latency of the clock buffers
6
A
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑇𝑐𝑜𝑚𝑏
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒍𝒂𝒕𝒆𝒏𝒄𝒚
A
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑇𝑐𝑜𝑚𝑏
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑻𝒔𝒆𝒕𝒖𝒑
𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑻𝒄𝒐𝒎𝒃
𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑠𝑒𝑡𝑢𝑝
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑠𝑒𝑡𝑢𝑝
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
/amradelm
/amradelm
Setup Time
6
7
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑠𝑒𝑡𝑢𝑝
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
To make sure a setup violation doesn’t happen, we need to make sure data A arrives at FF2 before the required capture time
The difference between the required and arrival time is called the slack. If the slack is positive we pass setup and if negative we fail.
The launch FF is called the startpoint of the timing path and the capture FF is called the endpoint
𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≤ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≤ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Data arrived at
FF2 at this point
Data is required to arrive
at FF2 before this point
/amradelm
/amradelm
Example Timing Report
7
𝐷 𝑒 𝑙 𝑎 𝑦
𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 →
𝑇𝑐𝑞 →
𝑇𝑠𝑒𝑡𝑢𝑝 →
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 →
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 →
𝑇𝑐𝑜𝑚𝑏{
𝐿𝑎𝑢𝑛𝑐ℎ
An 554: How to Read HardCopy PrimeTime Timing Reports By Intel
Reference :
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 →
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
/amradelm
/amradelm
Setup Time
• The example we have shown is for a full cycle path where the 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 comes one clock cycle after 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒.
• This is not always the case. The capture edge could come half cycle later, multiple cycles later or from another clock.
o Half cycle paths occur when the launch and capture FFs use different clock edges
o Multi cycle paths occur when the first capture edge is masked by a control circuit and another edge is used.
o Multi clock paths occur when the launch and capture FFs use different clocks from each other. The diagram shows that there could be more than one
launch/capture edges combination. The STA tools will consider the worst case (The purple one)1
• All what we learned still apply and nothing changes. We will just plug different values for the clock edges into the setup equation.
• We will now discuss how to fix a setup violation
8
𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒆𝒅𝒈𝒆 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒍𝒂𝒕𝒆𝒏𝒄𝒚 + 𝑻𝒄𝒐𝒎𝒃 < 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒆𝒅𝒈𝒆 − 𝑻𝒔𝒆𝒕𝒖𝒑 + 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒍𝒂𝒕𝒆𝒏𝒄𝒚
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
Mask this edge
with control logic
Half Cycle Path Multi Cycle Path Multi Clock Path
The phase difference between the two clocks should be known in order to know exactly where the launch and
capture edge are. If not, we can’t run STA on such paths and we have to resort to clock domain crossing techniques.
[1] :
/amradelm
/amradelm
Overview of The Digital VLSI Flow
• Before we discuss how to fix a setup violation we need to have a quick overview of the digital design flow1.
• Specifications: The design process starts with the requirements to build the system (Functionality, Performance, Power consumption, Cost
etc)
• Architecture : Based on the required specs, the architecture team will start building the system. They will answer questions and make
decisions such as: What blocks are needed in the system to perform the functionality? How to implement these blocks as a digital circuit?
Do we need memory or not? What is its size? What operating voltage do we use? What clock frequency do we need to meet the
performance specs? What is the expected area of the chip and fabrication cost?
• RTL Design : The RTL team will start writing RTL code to implement the architecture and blocks of the system
• Simulation : The implemented design is tested through simulations to make sure it does the required function correctly
• Synthesis : The RTL code is translated into actual logic gates and digital blocks
• PNR : The place and route step involves several sub steps
o Floorplan : Involves allocating space on the chip for various blocks and modules, including the placement of macros, and I/O ports
o Power Grid : Creating the metal structure that delivers the power supply to the standard cells and blocks inside the chip
o Placements : Placing the cells inside the chip
o Clock Tree Synthesis : Creating the clock networks to deliver the clocks from the ports to the registers in the chip
o Routing : Routing the metal interconnects (wires) between the cells
o Timing Closure : Running STA on the chip to make sure it meets the timing requirements
o DRC/LVS/EMIR : DRC ensures the final layout is compliant with the manufacturing rules. LVS ensures the final layout perform the same
function of the schematic/logic description. EMIR ensures all cells get the required voltage without drop and the current flowing
through the wire is within the required limits
9
This is a very simplified view of the digital flow. There are more steps involved but we don’t mention them because
they won’t affect STA
[1] :
/amradelm
/amradelm
How to Fix a Setup Violation – Overview of The Digital VLSI Flow
• PNR :
o The PNR engineer starts the flow trying to meet the requirements with the help of automation tools
o The goal is to reach a good startpoint with good results before the manual work starts
o Once the manual work starts, the startpoint is saved and frozen. The PNR flow is said to be in ECO mode (Engineering Change Order)
o The manual work involves things like moving cells, changing their threshold voltage, manually routing wires, etc.
• Fixing timing:
o Each of the above flow steps involves several optimizations to enhance the timing and fix the violations
o The earlier steps solve larger timing violations that are difficult and sometimes impossible to fix in later stages
o As we go through the flow, the ability to fix large violations decrease and we are more focused in fixing small but tricky violations that
involves lots of manual work.
• We will now go through some of the ways to fix a setup violation. We will start with the solutions done in the early stages and go down till
we see what can be done in later and final stages.
10
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 1
Reducing the Clock Frequency
• The easiest and simplest solution is to reduce the frequency (increase the period) of the clock to add time to the capture time
• Doing this degrade the performance (Data rate / CPU speed / Operations per second / etc)
• The decision to reduce the clock frequency is left to the architecture team and can’t be modified individually by RTL or PNR engineers
• Sometimes this solution is not acceptable because the product standard requires specific data rate that needs to be met
11
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 2
Going To a Smaller Technology Node
• In part 1 we showed how the transistor length (tech node) affects the gate delay. A shorter length has smaller delay
• Going for a smaller tech node means higher fabrication cost and a longer design cycle because smaller tech nodes are more challenging to handle the on
chip variations (OCV) and the physical design rule constraints (DRC) and preparing the design files (standard cell libraries, etc) for the new tech node will
take time.
• Because of this, the target tech node is decided very early in the design process by doing experiments with the tech node to see if the target frequency will
be feasible or not
• These experiments could be :
o Quick hand calculations : By considering the average cell delay in the tech node and the average combinational path length. For example, 𝑇𝑎𝑣𝑔 = 5𝑛𝑠
and the average number of cells in a timing path = 20. So, on average, the combinational delay = 5 ∗ 20 = 100𝑛𝑠 meaning a maximum clock frequency
of
1
100𝑒−9
= 10 MHz.
This is, of course, a very rough estimation as it doesn’t include the effects of wire delay, clock latencies, etc. But the more effort you put in these
calculations the more accurate they get
o Doing a quick project : By synthesizing a small block or a previous project to get an estimate of the maximum clock frequency you can achieve on this
tech node
12
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 3
Increasing the Supply Voltage
• In part 1 we showed how the supply voltage affects the gate delay. A higher voltage has smaller delay. However, the power consumption increase quadratically.
• The higher voltage could be applied to certain parts of the chip that needs high performance while leaving other parts with the lower voltage to avoid higher
power consumption. However, this adds several difficulties in the ASIC design process
13
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝑡𝑝𝑟𝑜𝑝 =
0.69 𝑉𝐷𝐷 . 𝐶𝐿
𝑊
2𝐿
𝜇𝐶𝑜𝑥 𝑉𝐷𝐷 − 𝑉𝑡ℎ
2
𝑃𝑜𝑤𝑒𝑟𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = 𝛼𝑓𝐶𝐿𝑉𝐷𝐷
2
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 4
Changing the Architecture
• Digital blocks have a tradeoff between speed vs power and area. The designer might choose an implementation that consume more power or has larger area
but higher speed.
• For example, there are different ways to implement binary adders. One implementation is the ripple adder which has small area and power consumption but has
high 𝑇𝑐𝑜𝑚𝑏, while a carry-look-ahead (CLA) adder has smaller 𝑇𝑐𝑜𝑚𝑏 but takes larger area.
14
𝑇𝑐𝑜𝑚𝑏 = 700𝑝𝑠
𝐴𝑟𝑒𝑎 = 75𝜇𝑚2
𝑇𝑐𝑜𝑚𝑏 = 400𝑝𝑠
𝐴𝑟𝑒𝑎 = 130𝜇𝑚2
Kamanga, Isaack. Design Optimization of the 64-Bit Carry Look-Ahead Adder Based on FPGA and Verilog HDL
Reference :
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 5
Optimizing the RTL Code
• The way the RTL is written affects the structure of the logic gates
• The example below shows 2 circuits that perform the same functionality however the on the right creates the adder in a chain fashion resulting in a delay of 3
adders in series while the one on the right is made in a parallel tree fashion and only has a delay of 2 series adders
15
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝑇𝑐𝑜𝑚𝑏 = 100𝑝𝑠
100𝑝𝑠
100𝑝𝑠
100𝑝𝑠
100𝑝𝑠
𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 200𝑝𝑠
𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 300𝑝𝑠
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 6
Pipelining
• The most common way to fix setup in RTL design is to add pipeline registers.
• The idea of pipelining is to split a large 𝑇𝑐𝑜𝑚𝑏 into multiple clock cycles.
• For example, to implement the equation 𝐴 + 𝐵 ∗ 𝐶, one can do all the operations in one cycle or do the multiplication in one cycle then the addition in the next
cycle as shown in the diagram
• The disadvantages of pipelining is:
o More area due to the pipeline registers
o More latency. Instead of finishing the operation in one cycle we finish it in multiple cycles.
o Synchronization. Since the data is delayed by the pipeline registers, the downstream logic that will receive the data have to account for this delay. Notice also
how we needed to add pipeline on A as well to synchronize 𝐴1 with 𝐵1 ∗ 𝐶1 otherwise we would have added 𝐴2 from next sample to 𝐵1 ∗ 𝐶1
16
𝑇𝑎𝑑𝑑 + 𝑇𝑚𝑢𝑙 = 100 + 300 = 400
𝑇𝑎𝑑𝑑 = 100
𝑇𝑚𝑢𝑙 = 300
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Without Pipelining With Pipelining
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 7
Multi Cycle Path (MCP)
• This method has some similarity to pipelining. Similarly, we will let the combinational path finish in multiple cycles.
• The difference is we won’t add pipeline registers. Instead, we will capture the data at another capture clock edge
• This can be done in 2 ways1:
o Use a control circuit to mask the 1st capture edge and allow another one.
o Use a divided clock for the capture FF as shown in the diagram below
17
You need to inform the STA tool that you will mask the 1st edge since the tool has no knowledge about the
functionality of the circuit. This is done using the “set_multicycle_path” command
https://docs.amd.com/r/2021.2-English/ug903-vivado-using-constraints/Multicycle-Paths
[1] :
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Single Cycle Multi Cycle Path
Launch clock
Capture clock
Mask this edge
with control logic
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 7
Multi Cycle Path vs Pipelining
• At first it might appear that multi cycle path and pipelining are the same. But a deep look shows the big difference
• In the case of pipelining:
o In the 1st cycle A,B,C enters the 1st stage of the pipeline. In the 2nd cycle A,B,C enters the 2nd stage while a new sample enters 1st stage of the pipeline
o We receive an output every clock cycle and the added latency due to the pipeline registers affects us at the beginning only
• In the case of MCP:
o In the 1st cycle A,B,C enters the circuit. In the 2nd cycle, the circuit is still busy and we can’t insert a new sample until it finishes.
o We receive an output every 2 clock cycles
• This shows that pipelining fix setup and have high processing speed while MCP slows down the processing speed
• You can think of MCP as reducing the clock frequency but selectively in parts of the circuit and not on the entire circuit
18
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Pipelining Multi Cycle Path
1st cycle 2nd cycle 3rd cycle 1st cycle 2nd cycle 3rd cycle
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 8
Retiming
• In this method if 𝑇𝑐𝑜𝑚𝑏 is large to fit in the clock cycle, we split the logic and move part of it to another cycle.
• Consider the example below:
o The red and green logic combined make a 𝑇𝑐𝑜𝑚𝑏 = 𝟕𝟎𝟎𝑝𝑠 which causes a setup violation.
o We move the green logic to the next clock cycle to be combined with the blue logic.
o This reduces 𝑇𝑐𝑜𝑚𝑏 between FF1 and FF2 to 𝟓𝟎𝟎𝑝𝑠 instead of 𝟕𝟎𝟎𝑝𝑠 which passes setup.
o But increases 𝑇𝑐𝑜𝑚𝑏 between FF2 and FF3 to 𝟑𝟎𝟎𝑝𝑠 instead of 𝟏𝟎𝟎𝑝𝑠 but this is okay because it also passes setup. If the blue logic was big this method won’t
work
19
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
500𝑝𝑠 200𝑝𝑠 100𝑝𝑠
200𝑝𝑠 100𝑝𝑠
𝟕𝟎𝟎𝒑𝒔
𝟓𝟎𝟎𝒑𝒔 𝟑𝟎𝟎𝒑𝒔
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 8
Retiming
• Retiming can be done manually by the RTL designer or automatically by the synthesis tools
o In the example below, the purple logic takes as input A and B. If we move the green logic to the next cycle, we get B one cycle later than what was
expected. When we wait for this one cycle, 𝑨𝟏 will be gone and a new 𝑨𝟐 will arrive which will get computed with sample 𝑩𝟏. This will break the
functionality of the circuit
o Synthesis tools will avoid any retiming that breaks the functionality as this example did.
o The RTL designer has full control over the code so he can fix this issue by, for example, adding a pipeline register before the purple logic to delay it one
cycle and handle any new issues that will appear due to this added register
o Hence, the RTL designer can do more aggressive retiming compared to the synthesis tools but with extra effort.
20
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝑨𝟏
𝑩𝟏
𝑨𝟐
𝑩𝟏
1st Cycle 2nd Cycle
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 8
Retiming + Pipelining
• The previous example shows how retiming can be combined with pipelining.
• Lets Consider the same example of 𝑨 + 𝑩 ∗ 𝑪
o We can move the adder to the next clock cycle if there is margin there.
o However, we get the same issue in the previous slide that A is not synchronized with B*C. So we add a pipeline register.
o This way we fixed the setup violation and saved the area of the 𝐵 ∗ 𝐶 pipeline registers
21
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Pipelining Pipelining + Retiming
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 9
Optimizing Synthesis
• Synthesis tools have lots of features and switches that the engineer can use to enhance the timing and control the trade-offs between the PPA metrics.
• This topic is very large and needs a tutorial on its own, so we will demonstrate just a few of what can be done.
o Increase the timing effort : Most synthesis tools have switches that controls the effort the tool will put to fix a certain PPA metric or to do a certain
optimization. Higher effort leads to better optimization but higher runtime while a lower effort leads to less optimization but better runtime.
o Decrease or disable area and power efforts : Area and power optimizations usually degrade the timing of the circuit. Reducing the effort of these
optimizations or disabling them all together may enhance the timing but worsen the area and power of your chip
o Enable Flattening : The RTL code consists of several modules connected to each other. By default synthesis tools will synthesize each module separately
and then connect them together in the top module, thus preserve the hierarchy and boundaries between the modules. Another approach is to remove the
module boundaries and make all cells in one hierarchy. This is called flattening and generally produce better timing result1
22
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
No Flattening With Flattening
Flattening makes verification more difficult because the module boundaries are removed which makes tracing signals
and referencing cells more difficult.
[1] :
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 10
Applying False Paths in the Constraints
• False paths are timing paths that can’t possibly occur due to the logic of the circuit
• Consider the example below:
• Both muxes have the same select signal. This means we have 2 possible timing paths. The one going through both red logics (200 + 300 = 500𝑝𝑠) and
the one going through both blue logics (100 + 500 = 600𝑝𝑠)
• The paths going through a red logic then a blue logic (200 + 500 = 700𝑝𝑠) or blue logic then red logic (100 + 300 = 400𝑝𝑠) is impossible to happen.
• Unless we instruct the tool to ignore these false paths, they will be considered for timing analysis leading to the large 𝑇𝑐𝑜𝑚𝑏 of the red to blue path which
will violate setup.
23
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
0 0
1 1
sel
200𝑝𝑠
100𝑝𝑠 500𝑝𝑠
300𝑝𝑠
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑝𝑎𝑡ℎ𝑠
• If we don’t apply correct constraints on these paths, not only do we get fake setup
violations, but we hinder the synthesis and PnR tools ability to optimize the other real
violating timing paths, because the tools apply extreme optimizations only on the critical
and worst paths and it won’t consider the less critical paths for these optimizations unless
they solve the most critical ones.
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 11
Optimizing the Floorplan
• Floorplaning is the 1st step in the PNR flow and involves things like creating the chip size and boundaries, manually placing the major blocks (analog, SRAM,
etc) in the chip, and placing the chip ports
• Here are some of the things that affects the setup in the circuit
o A small chip area might cause the cells to get closer to each other and closer to the ports which in turn will reduce the wire delays. However, if the size is
too small several issues will appear such as big voltage drop, cell congestion, routing detours, crosstalk, etc1.
o The placement of the major blocks in the chip affects the timing. The example on the left shows how the placement of the SRAMs near the IO ports might
block the standard cells from being placed near their relevant ports. Not only that but they will block the routing resulting in longer wire delays to go
around them.
o The placement of the ports also affect the timing. The example on the right shows how a bad placement of the ports can lead to long wire delays and
buffering which will worsen 𝑇𝑐𝑜𝑚𝑏
24
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
We won’t discuss these issues because they are out of the scope of this document. You are advised to research
these topics to get a better understanding of the slides
[1] :
Block Placement Port Placement
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 12
• Reducing the capacitance 𝑪 =
𝝐𝑨
𝒅
1. Increasing the spacing 𝒅 by moving the two wires aways from each other will reduce the capacitance between them.
We can apply NDR on specific nets to tell the router that we want no nets to get routed very close to these nets
2. Reducing the common distance. When two wires move along each other for a long distance the common area 𝑨 will
be big leading to bigger capacitance. We can move one of the two wires to another layer to reduce the delay
25
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Optimizing the wire delay
• In part 1 we showed how a signal propagating through an RC circuit will have a delay proportional to the resistance and
the capacitance. Hence, to reduce this delay we need to reduce the resistance and capacitance of the wire.
• This will also decrease the load cap of the cell that drives the wire which will speed up the cell too.
• Reducing the resistance 𝑹 =
𝝆𝑳
𝑨
:
1. Reducing the length 𝑳 of the wire will reduce the delay. We showed some examples on how to reduce it using a
better floorplan.
2. Increasing the width will decrease the delay. Higher metal layers have higher default width and also bigger thickness
hence larger area 𝑨. PNR tools will use these higher layers for long and critical nets to reduce their delay. The PNR
engineer can manually move the wires to higher layers during ECO or apply non-default routing rules (NDR) on these
nets to make the router route them in higher layers
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 13
Relaxing the Power Grid
• The power grid is the metal connection that delivers the power from higher metal layers down to the standard cells
• We showed how the wire delay is affected by things like spacing and width, etc. A wide and compact power grid will leave few routing resource for the signal
nets leaving no option for increasing spacing or width.
• However, relaxing the power grid will increase the resistance of the power network causing bigger voltage drop. So the PNR designer has to trade-off
between enhancing timing and fixing voltage drop.
26
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Compact PG Relaxed PG
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 14
Upsizing
• We showed in part 1 how the MOSFET size affects the propagation delay of the cell. So to fix setup we can use
larger cells that has less propagation delay
• There are several considerations when doing this method:
o Bigger cells means more area and power consumption
o Bigger cells has larger gate capacitance. This will slow down the cell that drives them because it now has
larger load capacitance. The enhancement of upsizing the cell should overcome the slow down of the
driving cell.
o Since big cells consume more power they are likely to cause big voltage drop on the cells around them.
o During ECO flow there might not be enough area to accommodate the bigger cell which require you to
move the cells around it and then reroute the nets to their pins. The moving of the cells and the reroute
could worsen the timing for these cells
27
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
2𝑛𝑠 3𝑛𝑠
2.5𝑛𝑠 1.5𝑛𝑠
3𝑛𝑠 5𝑛𝑠
5𝑛𝑠
4𝑛𝑠
8𝑛𝑠
The big gate cap not only increased the delay
of the driver but caused a large output
transition time. The large transition time led to
a slower delay for the 2nd buffer
Before Upsizing After Upsizing
Effect of Upsizing on the
Driver and Load
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 15
MTCMOS
• Similarly the threshold voltage 𝑉𝑡ℎ of the MOSFET affects the propagation delay of the cell. So to fix setup we can use low 𝑉𝑡ℎ that has less propagation delay
but this will increase the leakage power consumption.
• Synthesis and PNR tools allow you to apply a limit on the percentage of low 𝑉𝑡ℎ in your chip. Relaxing this limit will lead to a better overall timing1.
• The gain from changing the flavor (threshold) is usually less than that of upsizing the cell. However, changing the cell flavor won’t increase the cell area hence
no moving of the cells or rerouting is required. This is why changing the flavor is the first go-to method for PNR engineers during ECO.
28
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
You need to be careful when relaxing the limit because the tool might resort to the easy solution of using low 𝑉𝑡ℎ
cells and ignore other optimizations in the logic and wire delay leaving you with big leakage power consumption.
[1] :
Before Changing Flavor After Changing Flavor
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 16
Increasing the Driving Strength
• When we discussed upsizing we showed that when a cell drives a large load capacitance its output transition time gets slower which in turn will slow down the load cells.
Increasing the driver strength will enhance the transition time which in turn will enhance the load cells delay
• There are several ways to enhance the driving strength
o Upsizing the driver cell : Bigger cells produce larger current and hence charge the load capacitance faster. This method combine the benefit of speeding up the driver by
upsizing and the benefit of speeding up the load cells because they see a better input transition time.
o Downsizing the load cells : this will decrease the load capacitance of the driver which will speed up the propagation and transition time which in turn will speed up the load
cells. However, smaller cells has larger delay, so for this method to work the gain from enhancing the driving strength should overcome the increase in delay due to downsizing
o Fanout splitting : Instead of one cell driving all the fanout we can duplicate the driver and split the fanout among them as shown in the diagram. But note that the driver of the
driver is now seeing double the load cap which increases it’s delay. So you have to balance things to make the overall gain overcome the increase in delay
o Side load isolation : Add a small buffer that isolates a large load from the driver. In the example shown, the driver now sees the small cap of the buffer instead of the large cap
of the large NAND. This will fix the green paths but will worsen the red path because the small buffer will add a delay that increases the overall delay of the red path. For this
method to work, the red path should be passing setup check and have good a margin to accommodate the increase in delay
29
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Upsizing the driver Downsizing the load Fanout splitting Side Load Isolation
Original
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 17
Breaking up Long Nets
• When a cell drives a very long wire with big capacitance it will have bad propagation and transition times. By breaking the long wire with buffers the overall
enhancement could overcome the delay of the added buffers
• If the wire is very long we can split it with an inverter pair instead of a buffer. This is better because the delay of an inverter is less than that of a buffer of the same
size1. This way we get more cuts in the wire (less load cap for each cell) with roughly the same delay of the added buffer
30
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
150𝑝𝑠 400𝑝𝑠
100𝑝𝑠
𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 650𝑝𝑠
100𝑝𝑠 250𝑝𝑠
50𝑝𝑠 50𝑝𝑠
120𝑝𝑠
𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 570𝑝𝑠
80𝑝𝑠 230𝑝𝑠
30𝑝𝑠 35𝑝𝑠
35𝑝𝑠
70𝑝𝑠 60𝑝𝑠
𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 540𝑝𝑠
Buffers are basically 2 inverters connected in series
[1] :
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 18
Register Duplication
• By duplicating registers, the timing paths can be shortened, reducing the wire
and cell propagation delays.
• Consider the example on the right :
o By duplicating the green registers we managed to move each copy near one
of the blue register
o This first, reduces the wire length between the green and blue registers and
second, allows us to remove the buffers and inverter pairs on the nets and
both reduce the total combinational delay
o This shows that this method becomes more useful when the capture registers
(the blue ones) are placed far away from each other in the chip.
o However, FF1 now drives double the fanout so the delay of the timing path
between FF1 and FF2 is increased. We need to make sure this increase doesn’t
cause the path to violate setup timing.
• Duplication can be done manually in the RTL or automatically by the synthesis
and PnR tools.
31
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Before Duplication After Duplication
https://community.intel.com/t5/FPGA-Wiki/Register-Duplication-for-Timing-Closure/ta-p/735917
More details :
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 19
Reducing Crosstalk
• When we discussed wire delays we showed that there is a capacitance between any two wires close
to each other. This capacitance is called the coupling capacitance.
• When one of the two wires switches from 0->1 or 1->0, the other wire switches too with the same
polarity. We call the first the aggressor and the second the victim.
• If the aggressor was switching and at the same time the victim was switching with the same polarity,
the aggressor will speed up the input transition time of the victim. This will speed up the
propagation delay of the victim.
• If the victim was switching with a different polarity than the aggressor, this will slow down the
transition time and so slow down the propagation delay of the victim and therefore increase 𝑇𝑐𝑜𝑚𝑏.
• To decrease the effect of crosstalk and speed up the cell delay:
o Reduce the coupling capacitance by increasing the spacing between the wires. This combines the
effect of wire delay optimizations and reducing crosstalk.
o Shielding the wires of victim net with VSS wires will block the crosstalk.
o Downsizing the aggressor cell will reduce its effect on the victim.
o Upsizing the driver of victim will make it overcome the aggressor effect.
32
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Aggressor
Victim
Driver of Victim
Transition without crosstalk effect
Transition with crosstalk effect
Rising Falling
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 19
Reducing Crosstalk
• The image below shows an aggressor switching from 0->1 vs the victim transition
• We can see that the stronger the driver, the less the effect of the crosstalk.
33
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
CMOS VLSI Design - https://pages.hmc.edu/harris/cmosvlsi/4e/index.html
CMOS VLSI Design - https://pages.hmc.edu/harris/cmosvlsi/4e/index.html
Reference :
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 20
Local Skew
• So far we have been discussing methods that reduce 𝑇𝑐𝑜𝑚𝑏. Now we will consider the launch and capture latencies 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 & 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
• From the setup equation we can see that decreasing 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 or increasing 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 will enhance the setup. The difference
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 − 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 is called the skew and to fix a setup violation we can increase the skew
• To decrease the launch latency we can use any of the methods we discussed such as upsizing, changing flavor, etc
• To increase the capture latency we can use the opposite of the methods we discussed such as downsizing, changing flavor to high 𝑉𝑡ℎ, etc or by
adding buffers.
• Changing the skew to fix a timing path will affect the previous and next paths:
o The launch FF of the current timing path is the capture for the previous one. So if you decreased the launch latency to fix the current path you
will also decrease the capture latency for the previous one which might cause it to violate setup. And the same applies to the next path.
o In other words, you are borrowing some of the positive slack from the prev and next paths.
o That’s why before changing the skew you have to check if the other prev and next paths are passing timing with a good margin or not
34
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Current Timing Path Next Path
Previous Path
Launch for current path
Capture for previous path
Capture for current path
Launch for next path
/amradelm
/amradelm
How to Fix a Setup Violation – Sol. 20
Local Skew
• In general, increasing the delay is a lot easier than decreasing it because we can simply add buffers. That’s why ASIC engineers and PNR tools tend to focus
on increasing the capture latency instead of decreasing the launch latency.
• Another reason why increasing the capture latency is more favored :
o When the PnR tool build the clock tree network, usually multiple FFs are driven by the same clock buffer. If we try to modify the launch latency network to
fix one timing path we will affect the other timing path that use the same clock buffer1
o This is not the case for the capture clock network because we can add a buffer just in front of the clock pin of the FF while not affecting the rest of the FFs
35
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Original
Decreasing Launch Latency
All blue FFs are affected
Increasing Capture Latency
Only the 1st blue FFs is affected
We don’t want to affect the latencies of other timing paths because this may cause them to violate hold. More on
this when we discuss hold.
[1] :
/amradelm
/amradelm
Hold Timing Analysis
36
/amradelm
/amradelm
Hold Time
37
1
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
The waveform below shows the timing of 2 consecutive samples (A and B) going through the FFs
In order to avoid metastability, we want A to get captured and then remain stable at FF2 for an amount of time. we called this time the hold time 𝑇ℎ𝑜𝑙𝑑
This means we want the arrival of B to come after the capturing and hold time of A
𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≥ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≥ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑
Data A arrived
at FF2 at this
point
Data A is getting
captured here
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
Data B arrived
at FF2 at this
point
Data A is required to
be stable at FF2 till this
time
𝑇ℎ𝑜𝑙𝑑
FF1 FF2
A
B
/amradelm
/amradelm
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
Data B arrived at
FF2 at this point
Data A is required to
be stable at FF2 till
this time
𝑇ℎ𝑜𝑙𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇ℎ𝑜𝑙𝑑
Delay added by
the buffers
Hold Time
38
2
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
The example below violates this requirement because B arrived before A remained the necessary hold time
A quick solution is to insert buffers in the combinational path to increase 𝑇𝑐𝑜𝑚𝑏 and make B arrive after the required hold time
FF1 FF2
Violation
Pass
/amradelm
/amradelm
Hold Time
39
3
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 1
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞
𝑇𝑐𝑜𝑚𝑏
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
We also don’t want A to be captured by an earlier edge as this will break the functionality.
𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≥ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≥ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑
Data A arrived
at FF2 at this
point
Data A should get captured here
FF1 FF2
A
𝑇ℎ𝑜𝑙𝑑
Not only does A need to come after the earlier edge, it also needs to come after the hold time of that edge or it will
cause metastability.
[1] :
Data A is required to arrive
after this point1
/amradelm
/amradelm
Example Timing Report
40
𝐷 𝑒 𝑙 𝑎 𝑦
𝐶𝑎𝑝𝑡𝑢𝑟𝑒
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 →
𝑇𝑐𝑞 →
𝑇ℎ𝑜𝑙𝑑 →
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 →
𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 →
𝑇𝑐𝑜𝑚𝑏{
𝐿𝑎𝑢𝑛𝑐ℎ
Advanced HDL Synthesis and SOC Prototyping: RTL Design Using Verilog | SpringerLink
Reference :
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 →
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑
/amradelm
/amradelm
Hold Time
• Like setup, the hold timing path could be full cycle, half cycle, multiple cycles or multi clock.
• We consider the edge where A is captured and B (next data) is launched because B is what will overwrite A.
The red arrows in the waveforms show the launch - capture edges.
• If there are more launch-capture combinations, like the case of multi clock path, the STA tool will consider
the worst of them.
• Like setup. We will just plug different values for the clock edges into the hold equation and the concepts
remain unchanged1.
41
Half Cycle Path Multi Cycle Path Multi Clock Path
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Full Cycle Path
Launch of A
Capture of A
Launch of B
OR
A common mistake is to say hold is not affected by the clock period. This is only true for full and multi cycle paths
where the launch and capture edges occur at the same time. But since full cycle paths are the most common types of
paths and also more susceptible to violation, engineers generalize and say hold is not affected by the clock period
[1] :
Another
Capture of A
/amradelm
/amradelm
Hold Time
• We also don’t want A to be captured by an earlier edge
• We should also check hold between the launch of A and the capture edge that comes before A’s intended
capture edge
• Now we know all the launch capture combinations and the tool will consider the worst of them1
42
Half Cycle Path Multi Cycle Path Multi Clock Path
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Launch of A
Capture of A
Launch of B
Full Cycle Path
Launch of A
Capture of A
Launch of B
OR
Timing Analyzer Example: Clock Analysis Equations | Intel.
[1] :
Another
Capture of A
/amradelm
/amradelm
How to Fix a Hold Violation
• By comparing the setup equation with the hold equation, we find that fixing hold violations requires the opposite of the methods we discussed with setup.
• Instead of decreasing 𝑇𝑐𝑜𝑚𝑏 we will try to increase it by adding buffers, increasing wire delay, downsizing, etc. And instead of increasing the capture latency
or decreasing the launch latency we will do the opposite.
• This shows that hold contradicts setup and fixing hold may worsen setup.
• We showed earlier that increasing delay is always easier than decreasing it. This means that fixing hold is generally easier than fixing setup.
• This is why setup has more priority over hold. Hold is only considered in PNR step and fixing hold violations starts when all setup violations are fixed1.
43
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑
𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Setup :
Hold :
Hold is still monitored across the PNR stages and while we focus more on setup we make sure hold is solvable and
under control
[1] :
/amradelm
/amradelm
How to Fix a Hold Violation
• Consider the example below:
o The STA engineer sees two violations, setup and hold, both having the same startpoint and endpoint. The engineer tries adding buffers in front of FF2 to fix
hold but the setup is worsened, then tries to fix setup by changing flavor but hold is worsened. It seems we reached a dead end.
o If we investigate the violations in depth, we can see there are two paths, the upper long one which violates setup and the lower short one (blue) that violates
hold.
o So, to fix the setup violations we can change the flavor of the cells in the upper path. And to fix hold we can add buffers along the lower blue path.
• This example shows that some hold violations can be tricky and need a deep look into the timing path.
44
/amradelm
/amradelm
Thank You!
45

More Related Content

PPTX
Physical design
PDF
sta slide ref.pdf
PDF
Deep Explaination of STA_setupandholdchecks
PPTX
PPTX
SHORT CHANNEL EFFECTS IN MOSFETS- VLSI DESIGN
PDF
10 static timing_analysis_1_concept_of_timing_analysis
PPTX
latches
PPT
Low Power Techniques
Physical design
sta slide ref.pdf
Deep Explaination of STA_setupandholdchecks
SHORT CHANNEL EFFECTS IN MOSFETS- VLSI DESIGN
10 static timing_analysis_1_concept_of_timing_analysis
latches
Low Power Techniques

What's hot (20)

PDF
VLSI Static Timing Analysis Intro Part 1
PDF
Clock Domain Crossing All Parts Combined.pdf
PDF
12 static timing_analysis_3_clocked_design
PDF
Static Time Analysis
PDF
VLSI Static Timing Analysis Timing Checks Part 3
PDF
Static_Timing_Analysis_in_detail.pdf
PPTX
2Overview of Primetime.pptx
PPTX
Netlist to GDSII flow new.pptx physical design full info
PDF
14 static timing_analysis_5_clock_domain_crossing
PPT
Timing Analysis
PDF
Understanding cts log_messages
DOCX
Timing analysis
PPT
Timing and Design Closure in Physical Design Flows
PPTX
Study of inter and intra chip variations
PDF
VLSI Static Timing Analysis Timing Checks Part 4 - Timing Constraints
PDF
Sta by usha_mehta
PDF
[Back2School] STA Basic Concepts- Chapter 1.pdf
PPTX
Library Characterization Flow
PPTX
Low power in vlsi with upf basics part 1
PDF
Implementing Useful Clock Skew Using Skew Groups
VLSI Static Timing Analysis Intro Part 1
Clock Domain Crossing All Parts Combined.pdf
12 static timing_analysis_3_clocked_design
Static Time Analysis
VLSI Static Timing Analysis Timing Checks Part 3
Static_Timing_Analysis_in_detail.pdf
2Overview of Primetime.pptx
Netlist to GDSII flow new.pptx physical design full info
14 static timing_analysis_5_clock_domain_crossing
Timing Analysis
Understanding cts log_messages
Timing analysis
Timing and Design Closure in Physical Design Flows
Study of inter and intra chip variations
VLSI Static Timing Analysis Timing Checks Part 4 - Timing Constraints
Sta by usha_mehta
[Back2School] STA Basic Concepts- Chapter 1.pdf
Library Characterization Flow
Low power in vlsi with upf basics part 1
Implementing Useful Clock Skew Using Skew Groups
Ad

Similar to VLSI Static Timing Analysis Setup And Hold Part 2 (20)

PPTX
Major project iii 3
PPTX
SOC Interconnect modified version 2019 course
PDF
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
PDF
bec306c Computer Architecture and Organization
PPTX
Real Time System
PPTX
Class 4 Static Timing Analysis.pptxkkkkkk
PPTX
Class 4 Static Timing Analysis.pptxkkkkkk
PDF
ASIC Synthesis Optimizations And Settings Part 3
PDF
Timing closure document
DOCX
Pd flow i
PDF
Lecture-5-STA.pdf
PDF
[Back2School] Constraint Develop.pdf- Chapter 3
PPTX
Te442 lecture02-2016-14-4-2016-1
PDF
Synthesis and Optimization in Vlsi design
PDF
Clock Domain Crossing Part 7 - Timing Constraints
PPT
Resource Management in (Embedded) Real-Time Systems
PDF
Timing synchronization F Ling_v1
PDF
Clock Domain Crossing Part 1 - Intro and MTBF
PPTX
Clock driven scheduling
PDF
SOC Chip Basics
Major project iii 3
SOC Interconnect modified version 2019 course
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
bec306c Computer Architecture and Organization
Real Time System
Class 4 Static Timing Analysis.pptxkkkkkk
Class 4 Static Timing Analysis.pptxkkkkkk
ASIC Synthesis Optimizations And Settings Part 3
Timing closure document
Pd flow i
Lecture-5-STA.pdf
[Back2School] Constraint Develop.pdf- Chapter 3
Te442 lecture02-2016-14-4-2016-1
Synthesis and Optimization in Vlsi design
Clock Domain Crossing Part 7 - Timing Constraints
Resource Management in (Embedded) Real-Time Systems
Timing synchronization F Ling_v1
Clock Domain Crossing Part 1 - Intro and MTBF
Clock driven scheduling
SOC Chip Basics
Ad

Recently uploaded (20)

PPTX
Geodesy 1.pptx...............................................
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Sustainable Sites - Green Building Construction
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
Project quality management in manufacturing
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
additive manufacturing of ss316l using mig welding
PDF
Digital Logic Computer Design lecture notes
Geodesy 1.pptx...............................................
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Sustainable Sites - Green Building Construction
Automation-in-Manufacturing-Chapter-Introduction.pdf
Project quality management in manufacturing
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Mechanical Engineering MATERIALS Selection
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Internet of Things (IOT) - A guide to understanding
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT 4 Total Quality Management .pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
additive manufacturing of ss316l using mig welding
Digital Logic Computer Design lecture notes

VLSI Static Timing Analysis Setup And Hold Part 2

  • 1. Static Timing Analysis Part 2 Amr Adel Mohammady /amradelm /amradelm
  • 2. /amradelm /amradelm Introduction • In part 1 we went through the basic principles that are needed to understand all VLSI timing checks. In this parts we will go through some of the checks in details • The timing checks covered in this part are: o Setup timing o Hold timing 2
  • 4. /amradelm /amradelm Setup Time 4 At time T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒, Data A is launched from FF1 to FF2. The data needs to make it to FF2 before the next clock edge arrives at FF2 at time 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒. The next clock edge will arrive after a clock period 1 The clock takes some time to reach FF1 due to the buffers. The launch won’t happen exactly at T=𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 but after the delay/latency of the clock buffers 2 As we saw in part 1, once the clock reaches the FF it takes some time to push the data out to the Q pin. We called this time 𝑇𝑐𝑞. This is the 1st delay data A encounters to reach FF2 3 A 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒍𝒂𝒕𝒆𝒏𝒄𝒚 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 A 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑻𝒄𝒒 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 A 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒
  • 5. /amradelm /amradelm Setup Time 5 Data A will propagate through the combinational path to reach FF2. This is the 2nd delay it encounters 4 As we saw in part 1, the FF requires the data to arrive some time before the clock edge in order to avoid metastability. We called this time 𝑇𝑠𝑒𝑡𝑢𝑝. Hence, we shouldn’t capture data at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 but at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 5 The clock takes some time to reach FF2 due to the buffers. The capture won’t happen exactly at 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 but after the delay/latency of the clock buffers 6 A 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑇𝑐𝑜𝑚𝑏 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒍𝒂𝒕𝒆𝒏𝒄𝒚 A 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑇𝑐𝑜𝑚𝑏 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑻𝒔𝒆𝒕𝒖𝒑 𝐿𝑎𝑢𝑛𝑐ℎ = 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝐷 𝑒 𝑙 𝑎 𝑦 = 𝑇𝑐𝑞 + 𝑻𝒄𝒐𝒎𝒃 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑠𝑒𝑡𝑢𝑝 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑠𝑒𝑡𝑢𝑝 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
  • 6. /amradelm /amradelm Setup Time 6 7 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑠𝑒𝑡𝑢𝑝 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 To make sure a setup violation doesn’t happen, we need to make sure data A arrives at FF2 before the required capture time The difference between the required and arrival time is called the slack. If the slack is positive we pass setup and if negative we fail. The launch FF is called the startpoint of the timing path and the capture FF is called the endpoint 𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≤ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≤ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Data arrived at FF2 at this point Data is required to arrive at FF2 before this point
  • 7. /amradelm /amradelm Example Timing Report 7 𝐷 𝑒 𝑙 𝑎 𝑦 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 → 𝑇𝑐𝑞 → 𝑇𝑠𝑒𝑡𝑢𝑝 → 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 → 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 → 𝑇𝑐𝑜𝑚𝑏{ 𝐿𝑎𝑢𝑛𝑐ℎ An 554: How to Read HardCopy PrimeTime Timing Reports By Intel Reference : 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 → 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
  • 8. /amradelm /amradelm Setup Time • The example we have shown is for a full cycle path where the 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 comes one clock cycle after 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒. • This is not always the case. The capture edge could come half cycle later, multiple cycles later or from another clock. o Half cycle paths occur when the launch and capture FFs use different clock edges o Multi cycle paths occur when the first capture edge is masked by a control circuit and another edge is used. o Multi clock paths occur when the launch and capture FFs use different clocks from each other. The diagram shows that there could be more than one launch/capture edges combination. The STA tools will consider the worst case (The purple one)1 • All what we learned still apply and nothing changes. We will just plug different values for the clock edges into the setup equation. • We will now discuss how to fix a setup violation 8 𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒆𝒅𝒈𝒆 + 𝑻𝒍𝒂𝒖𝒏𝒄𝒉_𝒍𝒂𝒕𝒆𝒏𝒄𝒚 + 𝑻𝒄𝒐𝒎𝒃 < 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒆𝒅𝒈𝒆 − 𝑻𝒔𝒆𝒕𝒖𝒑 + 𝑻𝒄𝒂𝒑𝒕𝒖𝒓𝒆_𝒍𝒂𝒕𝒆𝒏𝒄𝒚 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 Mask this edge with control logic Half Cycle Path Multi Cycle Path Multi Clock Path The phase difference between the two clocks should be known in order to know exactly where the launch and capture edge are. If not, we can’t run STA on such paths and we have to resort to clock domain crossing techniques. [1] :
  • 9. /amradelm /amradelm Overview of The Digital VLSI Flow • Before we discuss how to fix a setup violation we need to have a quick overview of the digital design flow1. • Specifications: The design process starts with the requirements to build the system (Functionality, Performance, Power consumption, Cost etc) • Architecture : Based on the required specs, the architecture team will start building the system. They will answer questions and make decisions such as: What blocks are needed in the system to perform the functionality? How to implement these blocks as a digital circuit? Do we need memory or not? What is its size? What operating voltage do we use? What clock frequency do we need to meet the performance specs? What is the expected area of the chip and fabrication cost? • RTL Design : The RTL team will start writing RTL code to implement the architecture and blocks of the system • Simulation : The implemented design is tested through simulations to make sure it does the required function correctly • Synthesis : The RTL code is translated into actual logic gates and digital blocks • PNR : The place and route step involves several sub steps o Floorplan : Involves allocating space on the chip for various blocks and modules, including the placement of macros, and I/O ports o Power Grid : Creating the metal structure that delivers the power supply to the standard cells and blocks inside the chip o Placements : Placing the cells inside the chip o Clock Tree Synthesis : Creating the clock networks to deliver the clocks from the ports to the registers in the chip o Routing : Routing the metal interconnects (wires) between the cells o Timing Closure : Running STA on the chip to make sure it meets the timing requirements o DRC/LVS/EMIR : DRC ensures the final layout is compliant with the manufacturing rules. LVS ensures the final layout perform the same function of the schematic/logic description. EMIR ensures all cells get the required voltage without drop and the current flowing through the wire is within the required limits 9 This is a very simplified view of the digital flow. There are more steps involved but we don’t mention them because they won’t affect STA [1] :
  • 10. /amradelm /amradelm How to Fix a Setup Violation – Overview of The Digital VLSI Flow • PNR : o The PNR engineer starts the flow trying to meet the requirements with the help of automation tools o The goal is to reach a good startpoint with good results before the manual work starts o Once the manual work starts, the startpoint is saved and frozen. The PNR flow is said to be in ECO mode (Engineering Change Order) o The manual work involves things like moving cells, changing their threshold voltage, manually routing wires, etc. • Fixing timing: o Each of the above flow steps involves several optimizations to enhance the timing and fix the violations o The earlier steps solve larger timing violations that are difficult and sometimes impossible to fix in later stages o As we go through the flow, the ability to fix large violations decrease and we are more focused in fixing small but tricky violations that involves lots of manual work. • We will now go through some of the ways to fix a setup violation. We will start with the solutions done in the early stages and go down till we see what can be done in later and final stages. 10
  • 11. /amradelm /amradelm How to Fix a Setup Violation – Sol. 1 Reducing the Clock Frequency • The easiest and simplest solution is to reduce the frequency (increase the period) of the clock to add time to the capture time • Doing this degrade the performance (Data rate / CPU speed / Operations per second / etc) • The decision to reduce the clock frequency is left to the architecture team and can’t be modified individually by RTL or PNR engineers • Sometimes this solution is not acceptable because the product standard requires specific data rate that needs to be met 11 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
  • 12. /amradelm /amradelm How to Fix a Setup Violation – Sol. 2 Going To a Smaller Technology Node • In part 1 we showed how the transistor length (tech node) affects the gate delay. A shorter length has smaller delay • Going for a smaller tech node means higher fabrication cost and a longer design cycle because smaller tech nodes are more challenging to handle the on chip variations (OCV) and the physical design rule constraints (DRC) and preparing the design files (standard cell libraries, etc) for the new tech node will take time. • Because of this, the target tech node is decided very early in the design process by doing experiments with the tech node to see if the target frequency will be feasible or not • These experiments could be : o Quick hand calculations : By considering the average cell delay in the tech node and the average combinational path length. For example, 𝑇𝑎𝑣𝑔 = 5𝑛𝑠 and the average number of cells in a timing path = 20. So, on average, the combinational delay = 5 ∗ 20 = 100𝑛𝑠 meaning a maximum clock frequency of 1 100𝑒−9 = 10 MHz. This is, of course, a very rough estimation as it doesn’t include the effects of wire delay, clock latencies, etc. But the more effort you put in these calculations the more accurate they get o Doing a quick project : By synthesizing a small block or a previous project to get an estimate of the maximum clock frequency you can achieve on this tech node 12 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
  • 13. /amradelm /amradelm How to Fix a Setup Violation – Sol. 3 Increasing the Supply Voltage • In part 1 we showed how the supply voltage affects the gate delay. A higher voltage has smaller delay. However, the power consumption increase quadratically. • The higher voltage could be applied to certain parts of the chip that needs high performance while leaving other parts with the lower voltage to avoid higher power consumption. However, this adds several difficulties in the ASIC design process 13 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑡𝑝𝑟𝑜𝑝 = 0.69 𝑉𝐷𝐷 . 𝐶𝐿 𝑊 2𝐿 𝜇𝐶𝑜𝑥 𝑉𝐷𝐷 − 𝑉𝑡ℎ 2 𝑃𝑜𝑤𝑒𝑟𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = 𝛼𝑓𝐶𝐿𝑉𝐷𝐷 2
  • 14. /amradelm /amradelm How to Fix a Setup Violation – Sol. 4 Changing the Architecture • Digital blocks have a tradeoff between speed vs power and area. The designer might choose an implementation that consume more power or has larger area but higher speed. • For example, there are different ways to implement binary adders. One implementation is the ripple adder which has small area and power consumption but has high 𝑇𝑐𝑜𝑚𝑏, while a carry-look-ahead (CLA) adder has smaller 𝑇𝑐𝑜𝑚𝑏 but takes larger area. 14 𝑇𝑐𝑜𝑚𝑏 = 700𝑝𝑠 𝐴𝑟𝑒𝑎 = 75𝜇𝑚2 𝑇𝑐𝑜𝑚𝑏 = 400𝑝𝑠 𝐴𝑟𝑒𝑎 = 130𝜇𝑚2 Kamanga, Isaack. Design Optimization of the 64-Bit Carry Look-Ahead Adder Based on FPGA and Verilog HDL Reference : 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦
  • 15. /amradelm /amradelm How to Fix a Setup Violation – Sol. 5 Optimizing the RTL Code • The way the RTL is written affects the structure of the logic gates • The example below shows 2 circuits that perform the same functionality however the on the right creates the adder in a chain fashion resulting in a delay of 3 adders in series while the one on the right is made in a parallel tree fashion and only has a delay of 2 series adders 15 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑜𝑚𝑏 = 100𝑝𝑠 100𝑝𝑠 100𝑝𝑠 100𝑝𝑠 100𝑝𝑠 𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 200𝑝𝑠 𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 300𝑝𝑠
  • 16. /amradelm /amradelm How to Fix a Setup Violation – Sol. 6 Pipelining • The most common way to fix setup in RTL design is to add pipeline registers. • The idea of pipelining is to split a large 𝑇𝑐𝑜𝑚𝑏 into multiple clock cycles. • For example, to implement the equation 𝐴 + 𝐵 ∗ 𝐶, one can do all the operations in one cycle or do the multiplication in one cycle then the addition in the next cycle as shown in the diagram • The disadvantages of pipelining is: o More area due to the pipeline registers o More latency. Instead of finishing the operation in one cycle we finish it in multiple cycles. o Synchronization. Since the data is delayed by the pipeline registers, the downstream logic that will receive the data have to account for this delay. Notice also how we needed to add pipeline on A as well to synchronize 𝐴1 with 𝐵1 ∗ 𝐶1 otherwise we would have added 𝐴2 from next sample to 𝐵1 ∗ 𝐶1 16 𝑇𝑎𝑑𝑑 + 𝑇𝑚𝑢𝑙 = 100 + 300 = 400 𝑇𝑎𝑑𝑑 = 100 𝑇𝑚𝑢𝑙 = 300 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Without Pipelining With Pipelining
  • 17. /amradelm /amradelm How to Fix a Setup Violation – Sol. 7 Multi Cycle Path (MCP) • This method has some similarity to pipelining. Similarly, we will let the combinational path finish in multiple cycles. • The difference is we won’t add pipeline registers. Instead, we will capture the data at another capture clock edge • This can be done in 2 ways1: o Use a control circuit to mask the 1st capture edge and allow another one. o Use a divided clock for the capture FF as shown in the diagram below 17 You need to inform the STA tool that you will mask the 1st edge since the tool has no knowledge about the functionality of the circuit. This is done using the “set_multicycle_path” command https://docs.amd.com/r/2021.2-English/ug903-vivado-using-constraints/Multicycle-Paths [1] : 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Single Cycle Multi Cycle Path Launch clock Capture clock Mask this edge with control logic
  • 18. /amradelm /amradelm How to Fix a Setup Violation – Sol. 7 Multi Cycle Path vs Pipelining • At first it might appear that multi cycle path and pipelining are the same. But a deep look shows the big difference • In the case of pipelining: o In the 1st cycle A,B,C enters the 1st stage of the pipeline. In the 2nd cycle A,B,C enters the 2nd stage while a new sample enters 1st stage of the pipeline o We receive an output every clock cycle and the added latency due to the pipeline registers affects us at the beginning only • In the case of MCP: o In the 1st cycle A,B,C enters the circuit. In the 2nd cycle, the circuit is still busy and we can’t insert a new sample until it finishes. o We receive an output every 2 clock cycles • This shows that pipelining fix setup and have high processing speed while MCP slows down the processing speed • You can think of MCP as reducing the clock frequency but selectively in parts of the circuit and not on the entire circuit 18 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Pipelining Multi Cycle Path 1st cycle 2nd cycle 3rd cycle 1st cycle 2nd cycle 3rd cycle
  • 19. /amradelm /amradelm How to Fix a Setup Violation – Sol. 8 Retiming • In this method if 𝑇𝑐𝑜𝑚𝑏 is large to fit in the clock cycle, we split the logic and move part of it to another cycle. • Consider the example below: o The red and green logic combined make a 𝑇𝑐𝑜𝑚𝑏 = 𝟕𝟎𝟎𝑝𝑠 which causes a setup violation. o We move the green logic to the next clock cycle to be combined with the blue logic. o This reduces 𝑇𝑐𝑜𝑚𝑏 between FF1 and FF2 to 𝟓𝟎𝟎𝑝𝑠 instead of 𝟕𝟎𝟎𝑝𝑠 which passes setup. o But increases 𝑇𝑐𝑜𝑚𝑏 between FF2 and FF3 to 𝟑𝟎𝟎𝑝𝑠 instead of 𝟏𝟎𝟎𝑝𝑠 but this is okay because it also passes setup. If the blue logic was big this method won’t work 19 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 500𝑝𝑠 200𝑝𝑠 100𝑝𝑠 200𝑝𝑠 100𝑝𝑠 𝟕𝟎𝟎𝒑𝒔 𝟓𝟎𝟎𝒑𝒔 𝟑𝟎𝟎𝒑𝒔
  • 20. /amradelm /amradelm How to Fix a Setup Violation – Sol. 8 Retiming • Retiming can be done manually by the RTL designer or automatically by the synthesis tools o In the example below, the purple logic takes as input A and B. If we move the green logic to the next cycle, we get B one cycle later than what was expected. When we wait for this one cycle, 𝑨𝟏 will be gone and a new 𝑨𝟐 will arrive which will get computed with sample 𝑩𝟏. This will break the functionality of the circuit o Synthesis tools will avoid any retiming that breaks the functionality as this example did. o The RTL designer has full control over the code so he can fix this issue by, for example, adding a pipeline register before the purple logic to delay it one cycle and handle any new issues that will appear due to this added register o Hence, the RTL designer can do more aggressive retiming compared to the synthesis tools but with extra effort. 20 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑨𝟏 𝑩𝟏 𝑨𝟐 𝑩𝟏 1st Cycle 2nd Cycle
  • 21. /amradelm /amradelm How to Fix a Setup Violation – Sol. 8 Retiming + Pipelining • The previous example shows how retiming can be combined with pipelining. • Lets Consider the same example of 𝑨 + 𝑩 ∗ 𝑪 o We can move the adder to the next clock cycle if there is margin there. o However, we get the same issue in the previous slide that A is not synchronized with B*C. So we add a pipeline register. o This way we fixed the setup violation and saved the area of the 𝐵 ∗ 𝐶 pipeline registers 21 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Pipelining Pipelining + Retiming
  • 22. /amradelm /amradelm How to Fix a Setup Violation – Sol. 9 Optimizing Synthesis • Synthesis tools have lots of features and switches that the engineer can use to enhance the timing and control the trade-offs between the PPA metrics. • This topic is very large and needs a tutorial on its own, so we will demonstrate just a few of what can be done. o Increase the timing effort : Most synthesis tools have switches that controls the effort the tool will put to fix a certain PPA metric or to do a certain optimization. Higher effort leads to better optimization but higher runtime while a lower effort leads to less optimization but better runtime. o Decrease or disable area and power efforts : Area and power optimizations usually degrade the timing of the circuit. Reducing the effort of these optimizations or disabling them all together may enhance the timing but worsen the area and power of your chip o Enable Flattening : The RTL code consists of several modules connected to each other. By default synthesis tools will synthesize each module separately and then connect them together in the top module, thus preserve the hierarchy and boundaries between the modules. Another approach is to remove the module boundaries and make all cells in one hierarchy. This is called flattening and generally produce better timing result1 22 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 No Flattening With Flattening Flattening makes verification more difficult because the module boundaries are removed which makes tracing signals and referencing cells more difficult. [1] :
  • 23. /amradelm /amradelm How to Fix a Setup Violation – Sol. 10 Applying False Paths in the Constraints • False paths are timing paths that can’t possibly occur due to the logic of the circuit • Consider the example below: • Both muxes have the same select signal. This means we have 2 possible timing paths. The one going through both red logics (200 + 300 = 500𝑝𝑠) and the one going through both blue logics (100 + 500 = 600𝑝𝑠) • The paths going through a red logic then a blue logic (200 + 500 = 700𝑝𝑠) or blue logic then red logic (100 + 300 = 400𝑝𝑠) is impossible to happen. • Unless we instruct the tool to ignore these false paths, they will be considered for timing analysis leading to the large 𝑇𝑐𝑜𝑚𝑏 of the red to blue path which will violate setup. 23 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 0 0 1 1 sel 200𝑝𝑠 100𝑝𝑠 500𝑝𝑠 300𝑝𝑠 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑝𝑎𝑡ℎ𝑠 • If we don’t apply correct constraints on these paths, not only do we get fake setup violations, but we hinder the synthesis and PnR tools ability to optimize the other real violating timing paths, because the tools apply extreme optimizations only on the critical and worst paths and it won’t consider the less critical paths for these optimizations unless they solve the most critical ones.
  • 24. /amradelm /amradelm How to Fix a Setup Violation – Sol. 11 Optimizing the Floorplan • Floorplaning is the 1st step in the PNR flow and involves things like creating the chip size and boundaries, manually placing the major blocks (analog, SRAM, etc) in the chip, and placing the chip ports • Here are some of the things that affects the setup in the circuit o A small chip area might cause the cells to get closer to each other and closer to the ports which in turn will reduce the wire delays. However, if the size is too small several issues will appear such as big voltage drop, cell congestion, routing detours, crosstalk, etc1. o The placement of the major blocks in the chip affects the timing. The example on the left shows how the placement of the SRAMs near the IO ports might block the standard cells from being placed near their relevant ports. Not only that but they will block the routing resulting in longer wire delays to go around them. o The placement of the ports also affect the timing. The example on the right shows how a bad placement of the ports can lead to long wire delays and buffering which will worsen 𝑇𝑐𝑜𝑚𝑏 24 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 We won’t discuss these issues because they are out of the scope of this document. You are advised to research these topics to get a better understanding of the slides [1] : Block Placement Port Placement
  • 25. /amradelm /amradelm How to Fix a Setup Violation – Sol. 12 • Reducing the capacitance 𝑪 = 𝝐𝑨 𝒅 1. Increasing the spacing 𝒅 by moving the two wires aways from each other will reduce the capacitance between them. We can apply NDR on specific nets to tell the router that we want no nets to get routed very close to these nets 2. Reducing the common distance. When two wires move along each other for a long distance the common area 𝑨 will be big leading to bigger capacitance. We can move one of the two wires to another layer to reduce the delay 25 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Optimizing the wire delay • In part 1 we showed how a signal propagating through an RC circuit will have a delay proportional to the resistance and the capacitance. Hence, to reduce this delay we need to reduce the resistance and capacitance of the wire. • This will also decrease the load cap of the cell that drives the wire which will speed up the cell too. • Reducing the resistance 𝑹 = 𝝆𝑳 𝑨 : 1. Reducing the length 𝑳 of the wire will reduce the delay. We showed some examples on how to reduce it using a better floorplan. 2. Increasing the width will decrease the delay. Higher metal layers have higher default width and also bigger thickness hence larger area 𝑨. PNR tools will use these higher layers for long and critical nets to reduce their delay. The PNR engineer can manually move the wires to higher layers during ECO or apply non-default routing rules (NDR) on these nets to make the router route them in higher layers
  • 26. /amradelm /amradelm How to Fix a Setup Violation – Sol. 13 Relaxing the Power Grid • The power grid is the metal connection that delivers the power from higher metal layers down to the standard cells • We showed how the wire delay is affected by things like spacing and width, etc. A wide and compact power grid will leave few routing resource for the signal nets leaving no option for increasing spacing or width. • However, relaxing the power grid will increase the resistance of the power network causing bigger voltage drop. So the PNR designer has to trade-off between enhancing timing and fixing voltage drop. 26 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Compact PG Relaxed PG
  • 27. /amradelm /amradelm How to Fix a Setup Violation – Sol. 14 Upsizing • We showed in part 1 how the MOSFET size affects the propagation delay of the cell. So to fix setup we can use larger cells that has less propagation delay • There are several considerations when doing this method: o Bigger cells means more area and power consumption o Bigger cells has larger gate capacitance. This will slow down the cell that drives them because it now has larger load capacitance. The enhancement of upsizing the cell should overcome the slow down of the driving cell. o Since big cells consume more power they are likely to cause big voltage drop on the cells around them. o During ECO flow there might not be enough area to accommodate the bigger cell which require you to move the cells around it and then reroute the nets to their pins. The moving of the cells and the reroute could worsen the timing for these cells 27 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 2𝑛𝑠 3𝑛𝑠 2.5𝑛𝑠 1.5𝑛𝑠 3𝑛𝑠 5𝑛𝑠 5𝑛𝑠 4𝑛𝑠 8𝑛𝑠 The big gate cap not only increased the delay of the driver but caused a large output transition time. The large transition time led to a slower delay for the 2nd buffer Before Upsizing After Upsizing Effect of Upsizing on the Driver and Load
  • 28. /amradelm /amradelm How to Fix a Setup Violation – Sol. 15 MTCMOS • Similarly the threshold voltage 𝑉𝑡ℎ of the MOSFET affects the propagation delay of the cell. So to fix setup we can use low 𝑉𝑡ℎ that has less propagation delay but this will increase the leakage power consumption. • Synthesis and PNR tools allow you to apply a limit on the percentage of low 𝑉𝑡ℎ in your chip. Relaxing this limit will lead to a better overall timing1. • The gain from changing the flavor (threshold) is usually less than that of upsizing the cell. However, changing the cell flavor won’t increase the cell area hence no moving of the cells or rerouting is required. This is why changing the flavor is the first go-to method for PNR engineers during ECO. 28 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 You need to be careful when relaxing the limit because the tool might resort to the easy solution of using low 𝑉𝑡ℎ cells and ignore other optimizations in the logic and wire delay leaving you with big leakage power consumption. [1] : Before Changing Flavor After Changing Flavor
  • 29. /amradelm /amradelm How to Fix a Setup Violation – Sol. 16 Increasing the Driving Strength • When we discussed upsizing we showed that when a cell drives a large load capacitance its output transition time gets slower which in turn will slow down the load cells. Increasing the driver strength will enhance the transition time which in turn will enhance the load cells delay • There are several ways to enhance the driving strength o Upsizing the driver cell : Bigger cells produce larger current and hence charge the load capacitance faster. This method combine the benefit of speeding up the driver by upsizing and the benefit of speeding up the load cells because they see a better input transition time. o Downsizing the load cells : this will decrease the load capacitance of the driver which will speed up the propagation and transition time which in turn will speed up the load cells. However, smaller cells has larger delay, so for this method to work the gain from enhancing the driving strength should overcome the increase in delay due to downsizing o Fanout splitting : Instead of one cell driving all the fanout we can duplicate the driver and split the fanout among them as shown in the diagram. But note that the driver of the driver is now seeing double the load cap which increases it’s delay. So you have to balance things to make the overall gain overcome the increase in delay o Side load isolation : Add a small buffer that isolates a large load from the driver. In the example shown, the driver now sees the small cap of the buffer instead of the large cap of the large NAND. This will fix the green paths but will worsen the red path because the small buffer will add a delay that increases the overall delay of the red path. For this method to work, the red path should be passing setup check and have good a margin to accommodate the increase in delay 29 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Upsizing the driver Downsizing the load Fanout splitting Side Load Isolation Original
  • 30. /amradelm /amradelm How to Fix a Setup Violation – Sol. 17 Breaking up Long Nets • When a cell drives a very long wire with big capacitance it will have bad propagation and transition times. By breaking the long wire with buffers the overall enhancement could overcome the delay of the added buffers • If the wire is very long we can split it with an inverter pair instead of a buffer. This is better because the delay of an inverter is less than that of a buffer of the same size1. This way we get more cuts in the wire (less load cap for each cell) with roughly the same delay of the added buffer 30 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 150𝑝𝑠 400𝑝𝑠 100𝑝𝑠 𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 650𝑝𝑠 100𝑝𝑠 250𝑝𝑠 50𝑝𝑠 50𝑝𝑠 120𝑝𝑠 𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 570𝑝𝑠 80𝑝𝑠 230𝑝𝑠 30𝑝𝑠 35𝑝𝑠 35𝑝𝑠 70𝑝𝑠 60𝑝𝑠 𝑇𝑜𝑡𝑎𝑙 𝑇𝑐𝑜𝑚𝑏 = 540𝑝𝑠 Buffers are basically 2 inverters connected in series [1] :
  • 31. /amradelm /amradelm How to Fix a Setup Violation – Sol. 18 Register Duplication • By duplicating registers, the timing paths can be shortened, reducing the wire and cell propagation delays. • Consider the example on the right : o By duplicating the green registers we managed to move each copy near one of the blue register o This first, reduces the wire length between the green and blue registers and second, allows us to remove the buffers and inverter pairs on the nets and both reduce the total combinational delay o This shows that this method becomes more useful when the capture registers (the blue ones) are placed far away from each other in the chip. o However, FF1 now drives double the fanout so the delay of the timing path between FF1 and FF2 is increased. We need to make sure this increase doesn’t cause the path to violate setup timing. • Duplication can be done manually in the RTL or automatically by the synthesis and PnR tools. 31 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Before Duplication After Duplication https://community.intel.com/t5/FPGA-Wiki/Register-Duplication-for-Timing-Closure/ta-p/735917 More details :
  • 32. /amradelm /amradelm How to Fix a Setup Violation – Sol. 19 Reducing Crosstalk • When we discussed wire delays we showed that there is a capacitance between any two wires close to each other. This capacitance is called the coupling capacitance. • When one of the two wires switches from 0->1 or 1->0, the other wire switches too with the same polarity. We call the first the aggressor and the second the victim. • If the aggressor was switching and at the same time the victim was switching with the same polarity, the aggressor will speed up the input transition time of the victim. This will speed up the propagation delay of the victim. • If the victim was switching with a different polarity than the aggressor, this will slow down the transition time and so slow down the propagation delay of the victim and therefore increase 𝑇𝑐𝑜𝑚𝑏. • To decrease the effect of crosstalk and speed up the cell delay: o Reduce the coupling capacitance by increasing the spacing between the wires. This combines the effect of wire delay optimizations and reducing crosstalk. o Shielding the wires of victim net with VSS wires will block the crosstalk. o Downsizing the aggressor cell will reduce its effect on the victim. o Upsizing the driver of victim will make it overcome the aggressor effect. 32 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Aggressor Victim Driver of Victim Transition without crosstalk effect Transition with crosstalk effect Rising Falling
  • 33. /amradelm /amradelm How to Fix a Setup Violation – Sol. 19 Reducing Crosstalk • The image below shows an aggressor switching from 0->1 vs the victim transition • We can see that the stronger the driver, the less the effect of the crosstalk. 33 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 CMOS VLSI Design - https://pages.hmc.edu/harris/cmosvlsi/4e/index.html CMOS VLSI Design - https://pages.hmc.edu/harris/cmosvlsi/4e/index.html Reference :
  • 34. /amradelm /amradelm How to Fix a Setup Violation – Sol. 20 Local Skew • So far we have been discussing methods that reduce 𝑇𝑐𝑜𝑚𝑏. Now we will consider the launch and capture latencies 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 & 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 • From the setup equation we can see that decreasing 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 or increasing 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 will enhance the setup. The difference 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 − 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 is called the skew and to fix a setup violation we can increase the skew • To decrease the launch latency we can use any of the methods we discussed such as upsizing, changing flavor, etc • To increase the capture latency we can use the opposite of the methods we discussed such as downsizing, changing flavor to high 𝑉𝑡ℎ, etc or by adding buffers. • Changing the skew to fix a timing path will affect the previous and next paths: o The launch FF of the current timing path is the capture for the previous one. So if you decreased the launch latency to fix the current path you will also decrease the capture latency for the previous one which might cause it to violate setup. And the same applies to the next path. o In other words, you are borrowing some of the positive slack from the prev and next paths. o That’s why before changing the skew you have to check if the other prev and next paths are passing timing with a good margin or not 34 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Current Timing Path Next Path Previous Path Launch for current path Capture for previous path Capture for current path Launch for next path
  • 35. /amradelm /amradelm How to Fix a Setup Violation – Sol. 20 Local Skew • In general, increasing the delay is a lot easier than decreasing it because we can simply add buffers. That’s why ASIC engineers and PNR tools tend to focus on increasing the capture latency instead of decreasing the launch latency. • Another reason why increasing the capture latency is more favored : o When the PnR tool build the clock tree network, usually multiple FFs are driven by the same clock buffer. If we try to modify the launch latency network to fix one timing path we will affect the other timing path that use the same clock buffer1 o This is not the case for the capture clock network because we can add a buffer just in front of the clock pin of the FF while not affecting the rest of the FFs 35 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Original Decreasing Launch Latency All blue FFs are affected Increasing Capture Latency Only the 1st blue FFs is affected We don’t want to affect the latencies of other timing paths because this may cause them to violate hold. More on this when we discuss hold. [1] :
  • 37. /amradelm /amradelm Hold Time 37 1 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 The waveform below shows the timing of 2 consecutive samples (A and B) going through the FFs In order to avoid metastability, we want A to get captured and then remain stable at FF2 for an amount of time. we called this time the hold time 𝑇ℎ𝑜𝑙𝑑 This means we want the arrival of B to come after the capturing and hold time of A 𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≥ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≥ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑 Data A arrived at FF2 at this point Data A is getting captured here 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 Data B arrived at FF2 at this point Data A is required to be stable at FF2 till this time 𝑇ℎ𝑜𝑙𝑑 FF1 FF2 A B
  • 38. /amradelm /amradelm 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 Data B arrived at FF2 at this point Data A is required to be stable at FF2 till this time 𝑇ℎ𝑜𝑙𝑑 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇ℎ𝑜𝑙𝑑 Delay added by the buffers Hold Time 38 2 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 The example below violates this requirement because B arrived before A remained the necessary hold time A quick solution is to insert buffers in the combinational path to increase 𝑇𝑐𝑜𝑚𝑏 and make B arrive after the required hold time FF1 FF2 Violation Pass
  • 39. /amradelm /amradelm Hold Time 39 3 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 1 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 𝑇𝑐𝑞 𝑇𝑐𝑜𝑚𝑏 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 We also don’t want A to be captured by an earlier edge as this will break the functionality. 𝐿𝑎𝑢𝑛𝑐ℎ + 𝐷𝑒𝑙𝑎𝑦 ≥ 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝐴𝑟𝑟𝑖𝑣𝑎𝑙 ≥ 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑 Data A arrived at FF2 at this point Data A should get captured here FF1 FF2 A 𝑇ℎ𝑜𝑙𝑑 Not only does A need to come after the earlier edge, it also needs to come after the hold time of that edge or it will cause metastability. [1] : Data A is required to arrive after this point1
  • 40. /amradelm /amradelm Example Timing Report 40 𝐷 𝑒 𝑙 𝑎 𝑦 𝐶𝑎𝑝𝑡𝑢𝑟𝑒 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 → 𝑇𝑐𝑞 → 𝑇ℎ𝑜𝑙𝑑 → 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 → 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 → 𝑇𝑐𝑜𝑚𝑏{ 𝐿𝑎𝑢𝑛𝑐ℎ Advanced HDL Synthesis and SOC Prototyping: RTL Design Using Verilog | SpringerLink Reference : 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 → 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑
  • 41. /amradelm /amradelm Hold Time • Like setup, the hold timing path could be full cycle, half cycle, multiple cycles or multi clock. • We consider the edge where A is captured and B (next data) is launched because B is what will overwrite A. The red arrows in the waveforms show the launch - capture edges. • If there are more launch-capture combinations, like the case of multi clock path, the STA tool will consider the worst of them. • Like setup. We will just plug different values for the clock edges into the hold equation and the concepts remain unchanged1. 41 Half Cycle Path Multi Cycle Path Multi Clock Path Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Full Cycle Path Launch of A Capture of A Launch of B OR A common mistake is to say hold is not affected by the clock period. This is only true for full and multi cycle paths where the launch and capture edges occur at the same time. But since full cycle paths are the most common types of paths and also more susceptible to violation, engineers generalize and say hold is not affected by the clock period [1] : Another Capture of A
  • 42. /amradelm /amradelm Hold Time • We also don’t want A to be captured by an earlier edge • We should also check hold between the launch of A and the capture edge that comes before A’s intended capture edge • Now we know all the launch capture combinations and the tool will consider the worst of them1 42 Half Cycle Path Multi Cycle Path Multi Clock Path Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Launch of A Capture of A Launch of B Full Cycle Path Launch of A Capture of A Launch of B OR Timing Analyzer Example: Clock Analysis Equations | Intel. [1] : Another Capture of A
  • 43. /amradelm /amradelm How to Fix a Hold Violation • By comparing the setup equation with the hold equation, we find that fixing hold violations requires the opposite of the methods we discussed with setup. • Instead of decreasing 𝑇𝑐𝑜𝑚𝑏 we will try to increase it by adding buffers, increasing wire delay, downsizing, etc. And instead of increasing the capture latency or decreasing the launch latency we will do the opposite. • This shows that hold contradicts setup and fixing hold may worsen setup. • We showed earlier that increasing delay is always easier than decreasing it. This means that fixing hold is generally easier than fixing setup. • This is why setup has more priority over hold. Hold is only considered in PNR step and fixing hold violations starts when all setup violations are fixed1. 43 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 ≥ 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇ℎ𝑜𝑙𝑑 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑒𝑑𝑔𝑒 + 𝑇𝑙𝑎𝑢𝑛𝑐ℎ_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 + 𝑇𝑐𝑜𝑚𝑏 < 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑒𝑑𝑔𝑒 − 𝑇𝑠𝑒𝑡𝑢𝑝 + 𝑇𝑐𝑎𝑝𝑡𝑢𝑟𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦 Setup : Hold : Hold is still monitored across the PNR stages and while we focus more on setup we make sure hold is solvable and under control [1] :
  • 44. /amradelm /amradelm How to Fix a Hold Violation • Consider the example below: o The STA engineer sees two violations, setup and hold, both having the same startpoint and endpoint. The engineer tries adding buffers in front of FF2 to fix hold but the setup is worsened, then tries to fix setup by changing flavor but hold is worsened. It seems we reached a dead end. o If we investigate the violations in depth, we can see there are two paths, the upper long one which violates setup and the lower short one (blue) that violates hold. o So, to fix the setup violations we can change the flavor of the cells in the upper path. And to fix hold we can add buffers along the lower blue path. • This example shows that some hold violations can be tricky and need a deep look into the timing path. 44