Implementation: CTS/CCD/MSCTS
Version W-2024.09
Fusion Compiler / IC Compiler II Update
Training
© Synopsys, Inc. All Rights Reserved
2
Synopsys Confidential Information © Synopsys, Inc.
Confidential Information
CONFIDENTIAL INFORMATION
The information contained in this presentation is the confidential and proprietary
information of Synopsys. You are not permitted to disseminate or use any of
the information provided to you in this presentation outside of Synopsys without
prior written authorization.
IMPORTANT NOTICE
In the event information in this presentation reflects Synopsys’ future plans, such
plans are as of the date of this presentation and are subject to change. Synopsys is
not obligated to update this presentation or develop the products with the features
and functionality discussed in this presentation. Additionally, Synopsys’ services and
products may only be offered and purchased pursuant to an authorized quote and
purchase order or a mutually agreed upon written contract with Synopsys.
3
Synopsys Confidential Information © Synopsys, Inc.
Agenda
Synopsys Confidential Information
Clock Tree Synthesis (CTS) Enhancements
• Pre-CTS Latency Bottleneck Reporting (GA)
• Streamlined CTS Phase 2 (GA)
• CTS Debuggability and Log file Enhancements (GA)
• Early CTS Flow (GA)
• Clock Power Recovery (GA)
4
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
• Overview
• Solution Description
• User Interface
W-2024.09
5
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
 Efficient latency debugging is a feature frequently requested by customers.
 Currently the longest critical paths can be identified only after executing CTS, using
the report_clock_qor or during CTS using the
cts.compile.report_latency_bottleneck application option.
 The latency bottleneck analysis is printed in the CTS log file for the longest path per
clock in a separate CTS step.
 Starting from the W-2024.09 release, the users can identify the longest paths even
before running CTS. It can be executed using the
report_estimated_clock_latency command, which is a standalone command that
can be used independently of the synthesize_clock_trees command.
 In this feature, the effect of ICG relocation for the benefit of latency during the
synthesize_clock_trees command is not reflected in the report.
 The format of this report looks similar to the usual latency bottleneck analysis step.
Overview
W-2024.09
6
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
 Reporting Pre-CTS latency bottleneck analysis :
 By default, Pre-CTS Latency Bottleneck analysis reports one longest path per clock for the
primary corner.
 This feature also supports the below switches,
-clock → reports longest paths for selected clocks only
-corner → reports the longest path in the selected corner
-longest n → reports ‘n’ longest paths
-through <driver> → reports longest path through a particular driver
-unique_gate_levels → report longest path with ‘k’ unique driver levels
Solution Description
7
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
Solution Description
W-2024.09
fc_shell> report_estimated_clock_latency
****************************************
Report : Estimated Clock Latency
****************************************
Latency Bottleneck Paths for clock clk in mode func at root clk:
Longest path 1:
(0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000]
(1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116]
(2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665]
(3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563]
fc_shell> report_estimated_clock_latency -longest 2
****************************************
Report : Estimated Clock Latency
-longest 2
****************************************
Latency Bottleneck Paths for clock clk in mode func at root clk:
Longest path 1:
(0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000]
(1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116]
(2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665]
(3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563]
Longest path 2:
(0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000]
(1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116]
(2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665]
(3) U8/CP [Location: (13.08, 15.03)] [SINK PIN] [Delay: 0.051507]
8
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
Solution Description
W-2024.09
fc_shell> report_estimated_clock_latency -corner C1 :
****************************************
Report : Estimated Clock Latency
-corner C1
****************************************
Latency Bottleneck Paths for clock clk in mode func at root clk:
Longest path 1:
(0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000]
(1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116]
(2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665]
(3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563]
fc_shell> report_estimated_clock_latency -longest 1 -unique_gate_levels 1 :
****************************************
Report : Estimated Clock Latency
-longest 1
-unique_gate_levels 1
****************************************
Latency Bottleneck Paths for clock clk in mode func at root clk:
Longest path 1:
(0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000]
(1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116]
(2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665]
(3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563]
9
Synopsys Confidential Information © Synopsys, Inc.
Pre-CTS Latency Bottleneck Reporting
User Interface
fc_shell> report_estimated_clock_latency -help
Usage: report_estimated_clock_latency # Report estimated clock
latency
[-clocks object_list] (List of clocks)
[-longest longest] (Number of longest paths)
[-corner corner] (Corner for reporting)
[-through through_pin] (Through pin)
[-unique_gate_levels unique_gate_levels]
(Number of unique gate levels)
W-2024.09
10
Synopsys Confidential Information © Synopsys, Inc.
Streamlined CTS Phase 2
• Overview
• Solution Description
• User Interface
W-2024.09
11
Synopsys Confidential Information © Synopsys, Inc.
Streamlined CTS Phase 2
 There are enhancements that are already done as part of phase 1 in the V-2023.12-
SP3 release where the tool used fast cells only for the critical sink associated with the
latency critical driver and slow cells added for non-critical sinks. Area and power are
recovered during the Compile CTS stage using path-based criticality buffering. It also
helped to improve latency by using fast cells for critical sinks.
 Starting from the W-2024.09 release, two different enhancements are added as part of
phase 2.
 Currently, the kind of cell selection for clustering is different from buffering. In the W-
2024.09 release, cell selection matching for On-route buffering (ORB) and buffering is
done. Improving correlation between CTS clustering and buffering helps to improve
clustering imbalance. It also helps to improve area, power, and logical DRC.
 Balanced topology enhancement is also done, which uses a new linear programming-
based solution for clock tree synthesis balanced topology generator to replace the
previous algorithm. Additional postprocessing of balanced topology is done to further
improve wire length and skew.
Overview & Solution Description
W-2024.09
12
Synopsys Confidential Information © Synopsys, Inc.
Streamlined CTS Phase 2
User Interface
 Please use the below application option for enabling the feature in the W-
2024.09 release:
set_app_options –name cts.compile.align_clustering_and_buffering –value true
set_app_options –name cts.compile.balanced_topology_enhancements –value true
W-2024.09
13
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
• Overview
• Solution Description
• User Interface
W-2024.09
14
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 As part of the W-2024.09 release, there are couple of enhancements with
respect to reporting commands and the log file for CTS.
 The list of the enhancements is given below:
 report_clock_qor prints total skew metric
 check_clock_trees flags auto exception generation points
 Print summary tables before and after each step in MTCTO
 get_clock_tree_pins to report all the pins above MSCTS driver using the
is_mscts_global_driver_object attribute
 An attribute is_on_boundary_register is added to get all CCD boundary
pins
Overview
W-2024.09
15
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 Feature #1: Report clock QoR prints total skew
metric
 Currently, there is not a way to report total skew
value through the report_clock_qor command
 This feature is particularly needed when utilizing
the total skew optimization enhancement during
MTCTO
 From the W-2024.09 release, the
report_clock_qor –type latency command
has now been enhanced to have a new column
called Total Skew
 For this, the skew is calculated as {median latency
of clock – path delay}, and the total skew is the
sum of skews to the endpoints, provided it is
greater than the target skew set
 Expectation
 We should see a section of total skew after
reporting the clock QoR summary
Solution Description
W-2024.09
16
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 Feature #2: check_clock_trees to flag auto exception generation points
 Currently, the auto exceptions are handled in CTS during the Clock Tree Initialization step, and a file named
clock_auto_exceptions*.tcl is dumped in the current working directory
 As part of this feature, to make it easier for the user to understand the auto exceptions that are derived during
CTS, the check_clock_trees command is enhanced to report them beforehand
 There is a file with the naming convention check_clock_trees_clock_auto_exceptions*.tcl dumped in the
current working directory
 Please note that this does not change the current database
 The following exceptions are what get reported by the command: internal pins, loop breaking pins,
conflict pins, and split pins.
 The expectation is that we should see a check_clock_trees_clock_auto_exceptions*.tcl file dumped in the
current working directory and that should have reported the above listed cases, if any
Solution Description
W-2024.09
#conflict pins
set_clock_tree_balance_point –consider_for_balancing false –clock [get_clocks $clock –mode $mode –balance_points $balance_point
17
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 Feature #3: Print summary tables before
and after each step in MTCTO
 As per the current behavior, we have CTS-037
messages printed for each step in MTCTO showing
the QoR before and after each stage
 From the W-2024.09 release, we have a feature that
prints summary tables before and after each step in
MTCTO and provides users with a clearer view of
clock QoR
 Please note that it gets printed for each clock,
corner, and mode
 Expectation:
There should be QoR summary tables printed before
and after each step in MTCTO
Solution Description
W-2024.09
Summary table
Total scenarios
18
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 Feature #4: get_clock_tree_pins to report all the pins above MSCTS driver
through an attribute
 Currently there is a way to get all the MSCTS subtree drivers with the attribute
is_mscts_subtree_driver_object
 However, to get the H-tree drivers, the user must use double filter: get_clock_tree_pins -
to [get_clock_tree_pins -filter is_mscts_subtree_driver_object]
 From the W-2024.09 release onwards, we have an attribute
is_mscts_global_driver_object to report all the H-tree drivers
 Please note that this feature works for multitap driver setup as well
 Expectation:
 We should see the tap drivers getting reported with the attribute
Solution Description
W-2024.09
fc_shell> get_clock_tree_pins –clock $clock –filter is_mscts_global_driver_object
{MSCTS_htree_0_0/Y MSCTS_htree_0_1/Y MSCTS_htree_0_2/Y}
19
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
 Feature #5: Attribute is_on_boundary_register to get all boundary pins
 Currently, to get the CCD boundary pins, users follow a manual method, which is not always
feasible
 Starting from the W-2024.09 release, we have an attribute is_on_boundary_register that
reports all the CCD boundary pins in the design
 Expectation:
 We should see the CCD boundary register pins getting reported after using the attribute
Solution Description
W-2024.09
fc_shell> get_clock_tree_pins –filter is_on_boundary_register
{reg_0/CLK reg_34/CLK reg_40/CLK}
20
Synopsys Confidential Information © Synopsys, Inc.
CTS Debuggability and Log file Enhancements
User Interface
W-2024.09
Feature UI
report_clock_qor printing the total skew
value
set_app_options –name
cts.report.report_clock_qor_total_skew –value true
check_clock_trees to flag auto exception
generation points
check_clock_trees
Print summary tables before and after each
step in MTCTO
set_app_options -name
cts.optimize.print_qor_summary_table -value true
Balance point check check_clock_trees
get_clock_tree_pins to report all the pins
above MSCTS driver with an attribute get_clock_tree_pins –is_mscts_global_driver_object
Attribute “is_on_boundary_register” to get
all boundary pins
get_clock_tree_pins –filter
is_on_boundary_register
21
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
• Introduction
• Solution Description
• Expectation
• User Interface
• Setup Instructions
W-2024.09
22
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 Regular Clock Tree Synthesis (CTS) flow involves clock tree building and optimization only
during the clock_opt build_clock stage (after compile_fusion or place_opt).
 The known behavior of the conventional flow is that the delay optimization and CCD in the
compile stage do not view the actual clock tree effects and perform optimization only based on
the ideal clock tree.
 Actual clock tree skews are visible only after the real clock trees are built and propagated
 OCV analysis comes into effect only when clock trees are built
 This leads to a miscorrelation in design timing QoR between the stages before and after the
clock tree is built.
 To overcome this miscorrelation, users tend to use a few workarounds :
 Workaround: Apply a higher clock uncertainty at the ideal clock stage to model the clock skews and add some
margin for modeling OCV effects at the ideal clock stage
 Caveat: However, this poses two difficulties. a) The user does not know what exact values of uncertainty to apply at
the pre-CTS stage. b) Chances of over-constraining the design, leading to power and area degradation
Introduction
W-2024.09
23
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 To overcome the issues, starting from the W-2024.09 release, the tool provides
a way to perform one round of clock tree building during the compile/place
stage itself – known as Early CTS flow.
 This makes the aforementioned clock tree effects seen during the data-path
optimization and CCD during the compile stage.
 Essentially, this makes the timing optimization in the compile stage more clock
tree aware.
Introduction
W-2024.09
24
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 With Early CTS, clock tree synthesis happens twice in the flow:
compile_fusion/place_opt final_place:
o CTS requires the placement to be finalized. Early CTS in the
compile_fusion/place_opt happens at the end of the final_place
after the direct timing driven placement (DTDP).
o Initial clock trees are built and propagated, followed by MTCTO based
global latency and skew optimization.
o Clock trees are global routed as well
o Any long nets on the data path created during CTS clock cell relocation
are addressed during the logical DRC (LDRC) fixing steps during the
compile_fusion/place_opt final_opto
o Now, with the propagated clock tree built, global routed, and optimized for
skews, the compile_fusion/place_opt final_opto views the
propagated timing picture
o With this, delay optimization and offset derivation by CUS in the
compile_fusion/place_opt final_opto more are realistic, leading
to better QoR convergence down the flow.
Solution Description
W-2024.09
25
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 CCD during the compile_fusion and place_opt
final_opto annotates useful skew offsets and
adjustments in the form of set_clock_latency
offset (with the recommended options) even in the
propagated mode for the timing and optimization to
understand the CCD offsets.
 With Early CTS, clock tree synthesis happens twice
in the flow:
clock_opt build_clock:
 It removes the clock trees built by early CTS completely
and rebuilds the clock tree guided by the balance points
derived by CCD during the compile_fusion final_opto
 CCD in the clock_opt build_clock incrementally
optimizes the timing QoR
Solution Description
W-2024.09
26
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 compile_fusion/place_opt:
 Timing looks worse after early CTS, but this is just bringing clock realism earlier into the flow
 Similarly, area and power appear worse due to the addition of clock tree buffers
 The compile_fusion/place_opt runtime increases due to early CTS
 clock_opt or End of Flow :
 With more accurate timing to drive the compile_fusion/place_opt final_opto, the flow QoR
should converge better in terms of timing QoR (WNS/TNS) as it progresses from CTS to clock_opt
final_opto
 We should end up with less hurtful and more helpful skews given that CUS in the
compile_fusion/place_opt final_opto sees a better timing context
 As the propagated timing QoR picture is visible upfront, we can also expect better power in cases of
designs with neutral timing QoR by the end of the flow
 The runtime overhead from trial CTS should be partially offset by faster timing convergence through the
clock_opt.
Expectation
W-2024.09
27
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
User Interface
• Early CTS flow can be enabled in the W-2024.09 release during the
compile_fusion/place_opt by stage using the below application option:
For place_opt flow:
set_app_options -list {place_opt.flow.trial_clock_tree true}
For compile_fusion flow:
set_app_options -list {compile.flow.trial_clock_tree true}
W-2024.09
28
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 Make sure all the necessary clock tree setup and settings are applied before the
compile_fusion/place_opt final_place stage for trial CTS to correlate with
build_clock.
 CTS and CTO application options and settings, including skew and latency targets, cell spacing rules,
skew group settings, clock balance group settings, etc.
 Clock NDRs, constraints like max_transition/capacitance/fanout/net_length, clock lib-cell
reference list
 Any custom user proc that runs before CTS, which could impact its execution
 Keep active scenarios the same from the compile_fusion/place_opt final_place stage
through the clock_opt final_opto.
 Reduce clock uncertainty after the compile_fusion/place_opt final_place to account for
the existence of the trial clock tree.
 We recommend using the same uncertainty as that of post-CTS
Setup Instructions (1/3)
W-2024.09
29
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 Set these application options in the compile_fusion/place_opt:
set_app_options -list {place_opt.flow.trial_clock_tree true} (or)
set_app_options -list {compile.flow.trial_clock_tree true}
set_app_options -list {npo.enable_ects_flow true}; #OBD since D20240830
set_app_options -list {cts.common.disable_ccs_rcv_cap_in_trial false}
set_app_options -list
{time.enable_offset_latency_computation_in_propagated_clocks true}
 Set these in both the compile_fusion/place_opt and clock_opt for both
baseline and early CTS flows:
set_app_options -as_user_default -name cts.optimize.improvement_mode_version
-value EIM_20240330
set_app_options -name cts.optimize.enable_improvement_mode -value skew
Setup Instructions (2/3
W-2024.09
30
Synopsys Confidential Information © Synopsys, Inc.
Early CTS Flow
 Set this for the clock_opt build_clock flow in early CTS flow:
set_app_options -list {cts.compile.enable_cell_relocation none}
 Override this setting only during the compile_fusion/place_opt final_place. Restore to
previous value after early CTS is done:
set_app_options -list {cts.compile.power_opt_mode none}
 Additionally, the following is recommended for CCD in the clock_opt for both baseline and
early CTS flows:
set_app_options -list {ccd.max_prepone_postpone_consider_skew_latency true}
set_app_options -list {ccd.max_prepone_postpone_consider_corner_scaling true}
set_app_options -list {ccd.enable_hyper_ccd true}
 Reset CCD max_pre/postpone limits from CTS to look for the best possible scope being
utilized by CCD in both baseline and early CTS flows
Setup Instructions (3/3)
W-2024.09
31
Synopsys Confidential Information © Synopsys, Inc.
Clock Power Recovery
• Overview
• User Interface
W-2024.09
32
Synopsys Confidential Information © Synopsys, Inc.
Clock Power Recovery
 Clock tree optimization (CTO) engine currently considers logical DRC, latency,
skew, and area as cost functions to improve clock QoR
 There were increasing requests to reduce power at the final CTO step
 A new step CTS STEP: Power optimization is introduced after the Area
Recovery step before the Final DRC Fixing step
 Introduces power as cost function for optimization at this step
 Performs buffer removal and sizing to improve total power over area
 Expectation
 Clock power should be improved
 There must be no degradation in terms of skew, latency, and logical DRC
Overview
W-2024.09
33
Synopsys Confidential Information © Synopsys, Inc.
Clock Power Recovery
User Interface
 Below application option can be used to enable clock power recovery feature,
set_app_options -list {cts.optimize.enable_cto_power_optimization true}
W-2024.09
34
Synopsys Confidential Information © Synopsys, Inc.
Agenda
Synopsys Confidential Information
Concurrent Clock and Data Optimization (CCD) Enhancements
• Data Path Aware CUS for Timing (GA)
• Data Path Aware CUS for Power (GA)
• Switching Power-Aware Pre-CTS CCD (GA)
• Fast CUS (GA)
• Clock DRC Improvements (GA)
35
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Timing
• Overview
• Solution Description
• User Interface
W-2024.09
36
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Timing
 The pre-CTS CCD engine, currently incorporated in the flow, follows sequential calls of
CCD and data optimization for achieving better timing QoR.
 It derives offset by considering the scope of prepone and postpone available and
scope of implementability. However, the offsets derived by CCD in the compile stage
are not data path aware.
 To have better concurrency on the engine, this feature enables CCD offsets derived
and solutions accepted to be data path aware.
 The current flow is that, once timing is optimized on a timing path and zero slack is
achieved, the path does not get disturbed further for being utilized for slack borrowing
for its fanin and fanout paths, even if the other paths are critical for timing. This leaves
the extra timing potential available for the zero slack path, limiting the design
frequency.
 This feature improves the design frequency by over-optimizing the paths that are
easily closed or have positive slack, which in turn provides extra borrowable slack for
paths with limiting frequency.
Overview and Benefits
W-2024.09
37
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Timing
 Utilizing the unused Data Path Potential:
 Slack margin is applied on endpoints with zero or
positive slack, making the slack more positive
 One round of useful skew computation is run, which
does slack borrow, considering the extra positive slack
 Slack margins applied on the endpoints are removed.
This makes the borrowed path violating
 Data path optimization is called to recover timing on
these violating endpoints
 Extra call of useful skew computation is done by the
end to recover clock tree power (cus-dpae2)
 This feature impacts final calls of CUS during
the place_opt/compile final_opto stage.
Solution Description
FF1 FF2 FF3
D D D
Q Q Q
FF1 FF2 FF3
D D D
Q Q Q
FF1 FF2 FF3
D D D
Q Q Q
Path Margin
Path Margin
CUS
FF1 FF2 FF3
D D D
Q Q Q
FF1 FF2 FF3
D D D
Q Q Q
Data Opto
W-2024.09
38
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Timing
 Delay Potential & Path Margin Computation:
 The path margin is not applied to all the endpoints but only to the endpoints having paths
with available delay potential
 For every endpoint, the initial slack is stored across scenarios. Based on delays and the
size-ability of logic cones, incremental arc delays are annotated on the timer, and a timer
update is performed
 If the new slack degrades, zero delay potential is assumed
 Otherwise, the difference is applied as predicted delay potential
Solution Description
W-2024.09
39
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Timing
User Interface
 Data Path Aware CUS for Timing in the place_opt and compile_fusion flow
with dynamic delay potential computation, can be enabled using the below new
public application options:
 For compile_fusion flow:
set_app_options –list {compile.flow.enable_dpa_ccd_timing_with_delay_potential true}
 For place_opt flow:
set_app_options -list {place_opt.flow.enable_dpa_ccd_timing_with_delay_potential true}
W-2024.09
40
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Power
• Overview
• Solution Description
• User Interface
W-2024.09
41
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Power
 The current CCD based place_opt/compile_fusion flow follows sequential
calls of CCD and data optimization, where CCD derives offset by considering
the scope of prepone/postpone available and scope of implementability.
 However, the offsets derived by CCD in the compile stage are not data path
aware.
 In current flow, if there are endpoints with zero or slightly positive slack
available once data power optimization is completed, those will be blocked
from further power reduction to avoid any new timing criticality.
 Also, positive slack in the path with no further power potential is left unused
throughout the flow.
 The feature utilizes this timing potential on neighboring paths to recover power
at parts of the design that have power potential, making the flow more
concurrent.
Overview & Benefits
W-2024.09
42
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Power
 Data Path Aware CUS flow - Power:
 Data path aware CUS power recovery is done by
borrowing the slack from pipeline stages where the
power potential is zero to the stages that have power
potential
 Note that this borrowing is applicable only for logic
cones with positive power potential
 This slack transfer is achieved by tuning the clock
latencies (preponing and postponing) that controls
the launch and capture arrival times
 Once the slack is transferred, an extra data path
optimization pass is introduced, which recovers
power in the path with power potential
 Finally, these offsets that are not helping for power
optimization are pruned
 This feature impacts final calls of CUS in the
place_opt final_opto flow.
Solution Description
W-2024.09
FF1 FF2 FF3
D D D
Q Q Q
FF1 FF2 FF3
D D D
Q Q Q
FF1 FF2 FF3
D D D
Q Q Q
Path with power
potential but no
timing scope
Path with no power
potential but with
timing scope
CUS Postponing → Slack Transfer
Data Power Opt. → Power Potential Utilized
43
Synopsys Confidential Information © Synopsys, Inc.
Data Path Aware CUS for Power
User Interface
 Data Path Aware CUS for Power can be enabled by using the following
application option in the W-2024.09 release.
 For compile_fusion flow:
set_app_options –list {compile.flow.enable_dpa_ccd_power true}
 For place_opt flow:
set_app_options –list {place_opt.flow.enable_dpa_ccd_power true}
W-2024.09
44
Synopsys Confidential Information © Synopsys, Inc.
Switching Power-Aware Pre-CTS CCD
• Overview
• Solution Description
• User Interface
W-2024.09
45
Synopsys Confidential Information © Synopsys, Inc.
Switching Power-Aware Pre-CTS CCD
 Overview
 Currently, CUS implements a clock tree model, which is not fully capable of capturing the
more complex design characteristics, one of them being power.
 In the W-2024.09 release, this feature aims at introducing a new cost model for optimizing
clock tree power during useful skew optimization by leveraging the switching activity of the
clock tree.
 The new power-aware clock tree cost formulation introduces an additional cost function that
is power-aware, i.e., the cost function derives higher offset values on the loads that are
placed in nets where the switching activity is higher​.
 Benefits
 The overall benefits include improved clock tree dynamic power from placement to clock tree
synthesis and beyond without any degradation in runtime.​
 Improved clock power and timing QoR​.
Overview & Benefits
W-2024.09
46
Synopsys Confidential Information © Synopsys, Inc.
Switching Power Aware Pre-CTS CCD
 A new total clock tree power estimation is introduced in CUS to balance the
leakage and dynamic power cost.​
 The new estimation is designed to approximate power in two phases. ​​
 In the first phase, clock gate power is estimated by just accumulating the leakage and
dynamic power of every clock gate, and the second phase estimates the power of repeaters.​
 For this, a representative repeater power is chosen as the average across all repeater library
cells, and the power contribution for repeaters is then calculated based on the representative
dynamic and leakage power multiplied by the estimated repeater count per net, calculated
based on the maximum fanout constraint​​
 Once dynamic and leakage power are estimated, the clock tree cost is scaled following the
ratio given by the estimated dynamic and leakage power
 Please note that for this feature to be effective, we need to have at least one
power scenario (leakage or dynamic)​.
 ​Power awareness is considered in every call of CUS in the compile stage with
the same weight.​​​
Solution Description
W-2024.09
47
Synopsys Confidential Information © Synopsys, Inc.
Switching Power Aware Pre-CTS CCD
User Interface
 Clock Power Aware CUS feature is made on-by-default and can be
enabled with the OBD mega option from the W-2024.09 release:
set_app_options -list {flow.common.effort 12345}
W-2024.09
48
Synopsys Confidential Information © Synopsys, Inc.
Fast CUS
• Overview
• Solution Description
• User Interface
W-2024.09
IC Compiler II
49
Synopsys Confidential Information © Synopsys, Inc.
Fast CUS
 Overview
 Currently, the IC Compiler II place_opt flow makes four main calls to the global-solver
based CUS engine (at the place_opt initial_opto and at the place_opt final_opto),
which attempts to adjust clock latencies to improve the timing QoR in each call.
 This takes a significant amount of runtime (10-25%) in the place_opt flow.
 In the W-2024.09 release, the tool helps to improve the runtime of CUS by reducing the
number of calls to the global-solver based CUS engine and introduces calls to incremental
CUS, picking certain target endpoints that are critical for design and works only on it.
 Impacts the initial_opto of place_opt and the compile_fusion is not affected by Fast
CUS.​
 Benefits
 Helps to improve runtime in the place_opt flow while preserving QoR.
 For an average design, 20-25% speedup for CUS and 2% speedup for the place_opt flow
is expected.
Overview & Benefits
W-2024.09
IC Compiler II
50
Synopsys Confidential Information © Synopsys, Inc.
Fast CUS
User Interface
 Fast CUS feature is made on-by-default in the place_opt flow and can be
enabled with the OBD mega option from the W-2024.09 release:
set_app_options -list {flow.common.effort 12345}
W-2024.09
IC Compiler II
51
Synopsys Confidential Information © Synopsys, Inc.
Clock DRC Improvements
• Overview
• Solution Description
• User Interface
W-2024.09
52
Synopsys Confidential Information © Synopsys, Inc.
Clock DRC Improvements
 Overview
 There are designs where, max_transition violations on the clock tree remain unfixed once
clock nets are routed, which have to be handled during CCD DRC fixing during the
clock_opt final_opto flow.
 It is observed that in some designs there are maximum transition violations on clock remain
after concurrent clock and data (CCD) engines and multi objective optimization CCD (MOO-
CCD) calls.
 Such violations happen because of commit failures caused by legalization failures. As a
workaround, over-constraints are applied, and the DRC violations are made more visible for
CCD to work on.
 But over-constraining being a workaround, always limits the OOTB DRC fixing and reduces
the effectiveness as it does not target the root cause. Also, with over-constraining, clock
power is expected to degrade.
 This feature provides enhancements, such that the left-over maximum transition violations
could be effectively reduced with CCD in the clock_opt final_opto flow.
Overview & Benefits
W-2024.09
53
Synopsys Confidential Information © Synopsys, Inc.
Clock DRC Improvements
 The proposed solution consists of two parts.
 Enhance the maximum transition fixing ability of the legacy CCD engine for DRC fixing
(DRC-CCD) and multi-objective CCD engine for DRC fixing.
 Enhance the protection on maximum transition for other calls of CCD down the flow.
 The feature includes a solution to fix maximum transition violations because of,
 Commit failures due to legalization failures​
 No RC node found to insert buffers​
 LEQs having via ladder (VL) candidates
 Less accuracy on Max transition values is not accurate in sub-graph
 Sizing down of a cell by multi-vector CCD from the library cell with VL constraint to the one
without VL constraint
Solution Description
W-2024.09
54
Synopsys Confidential Information © Synopsys, Inc.
Clock DRC Improvements
User Interface
 Clock Trans DRC closure feature can be enabled by using the below
application option in the W-2024.09 release:
set_app_options -name ccd.enable_clock_drc_improvements -value true; #Default false
W-2024.09
55
Synopsys Confidential Information © Synopsys, Inc.
Agenda
Synopsys Confidential Information
Multisource Clock Tree Synthesis (MSCTS) Enhancements
• H-tree Improvements (GA)
• Tap Assignments Improvements (GA)
• Automated H-tree Synthesis Enhancements (GA)
• Dynamic Clock Power Improvement for SMSCTS (GA)
• Structural MSCTS Clock QoR Improvements (GA)
• Irregular Global Tree Synthesis (GA)
56
Synopsys Confidential Information © Synopsys, Inc.
H-tree Improvements
• Overview
• Solution Description
• User Interface
W-2024.09
57
Synopsys Confidential Information © Synopsys, Inc.
H-tree Improvements
 During global tree synthesis, we build a centrally symmetric H-tree structure
from the input pins of tap driver to H-tree root driver output pin.
 Few customers have a special need for single repeater solution at H-tree
junctions instead of two repeaters to avoid routing and pin congestion issues.
 This feature aims to:
 Enhance the existing H-tree synthesis algorithm to improve H-tree QoR (especially latency)
 Resolve the routing and pin congestion issues because of multiple H-tree repeater cells at
junctions
 Improve multiple H-tree stem path skew
Overview
W-2024.09
58
Synopsys Confidential Information © Synopsys, Inc.
H-tree Improvements
The enhancements covered as part of this feature are:
 Support for best reference cell selection for latency improvement
 Evaluate all repeaters (from H-tree library cell collection) for each node and generate solution
if that repeater meets DRC constraint
 All possible solution (with all repeaters) are available at top
 Then at the top level, pick the solution which is delay wise best
 Preference to single-buffer solution
 Higher preference is given to single-buffer solution for each node. The best single-repeater
solution is selected for implementation based on latency
 This helps resolve routing and pin congestion issues because of two repeater solutions at H-
tree junctions
Solution Description
W-2024.09
59
Synopsys Confidential Information © Synopsys, Inc.
H-tree Improvements
 Multiple H-tree improvements
 ZBUF engine is tuned for global stem path skew improvement
 Routing topology performed post buffering is enhanced for better net lengths
Solution Description
W-2024.09
60
Synopsys Confidential Information © Synopsys, Inc.
H-tree Improvements
User Interface
 Preference to single-buffer solution at junctions can be enabled by using the
application option given below:
 Support for best reference cell selection for latency can be enabled using the
application option given below:
set_app_options –list {cts.multisource.htree_single_repeater_at_node true}
set_app_options –list {cts.multisource.htree_explore_all_repeater_solutions true}
W-2024.09
61
Synopsys Confidential Information © Synopsys, Inc.
Tap Assignment Improvements
• Overview
• Solution Description
• User Interface
W-2024.09
62
Synopsys Confidential Information © Synopsys, Inc.
Tap Assignment Improvements
 During tap assignment, we distribute the clock network below the tap drivers
(local tree) between the tap driver cells to create smaller subtrees for the
clock_opt build_clock step to synthesize.
 This feature aims to:
 Reduce the cluster size during initial clustering
 Prevent tap assignment for ICG driving only ignore pins
 Add multi-voltage awareness to tap assignment which enables cloning of cells across power
domains
Overview
W-2024.09
63
Synopsys Confidential Information © Synopsys, Inc.
Tap Assignment Improvements
The enhancements covered as part of this feature are:
 Reduce the cluster size during initial clustering, which prevents sinks
from connecting to tap drivers far away.
 There are few cases where macro sinks are assigned to far away taps
 This happens because the initial clusters created during tap assignment were too big,
resulting in the macro sinks to get clustered with other sinks close to the boundary of two tap
regions
 This issue is resolved by reducing the initial cluster size
 Prevent tap assignment for ICG driving only ignore pins.
 We do not have to check in the tap assignment flow to prevent creation of single fanout CGs,
which drives only ignore sinks, as this could result in breakage of CUS timing closure
Solution Description
W-2024.09
64
Synopsys Confidential Information © Synopsys, Inc.
Tap Assignment Improvements
 Add MV awareness to tap assignment, which enables cloning of cells
across power domains.
Solution Description
Fig 1: Without MV aware TA Fig 2: With MV aware TA
W-2024.09
Switch domain AON domain Switch domain AON domain
65
Synopsys Confidential Information © Synopsys, Inc.
Tap Assignment Improvements
User Interface
 Use the following application option to control the reduced cluster sizing
feature
 Use the following application option to control the feature that prevents tap
assignment for ICG driving only ignore pins
 MV awareness, which enables cloning of cells across power domains, can be
enabled using the following application option
set_app_options –list {cts.multisource.limit_cluster_size true}
set_app_options –list {cts.multisource.tap_assignment_reassign_ignore_sinks true}
set_app_options –list {cts.multisource.enable_mv_aware_tap_assignment true}
W-2024.09
66
Synopsys Confidential Information © Synopsys, Inc.
Enhancements to Automated H-tree Synthesis
• Overview
• User Interface
W-2024.09
67
Synopsys Confidential Information © Synopsys, Inc.
Enhancements to Automated H-tree Synthesis
 Starting from the W-2024.09 release, ease of use of the automated H-tree
synthesis command – synthesize_regular_multisource_clock_trees– is
improved by adding a separate step for pin connection using Zroute.
 For this, an additional sub step is added to
synthesize_regular_multisource_clock_trees –from/-to steps,
named route_pin_connections.
 This new step allows users to return to shell after H-tree trunk routing using
galaxy custom router (GCR) and run the pin connection as a separate step.
Overview
W-2024.09
68
Synopsys Confidential Information © Synopsys, Inc.
Enhancements to Automated H-tree Synthesis
The enhancements covered as part of this feature are:
 Starting from the W-2024.09 release, ease of use of the automated H-tree
synthesis command – synthesize_regular_multisource_clock_trees– is
improved by adding a separate step for pin connection using Zroute.
 For this, an additional sub-step is added to the
synthesize_regular_multisource_clock_trees –from/-to steps,
named the route_pin_connections.
 This new step allows users to return to shell after H-tree trunk routing using
galaxy custom router (GCR) and run the pin connection as a separate step.
Solution Description
W-2024.09
69
Synopsys Confidential Information © Synopsys, Inc.
Enhancements to Automated H-tree Synthesis
User Interface
The new sub-step the route_pin_connections can be used to stop and run
pin connections as an atomic step with the –from and –to switches.
set_app_options –list {cts.multisource.tap_assignment_reassign_ignore_sinks true}
set_regular_multisource_clock_tree_options –clock $clk 
–topology htree_only … -skip_pin_connections
synthesize_regular_multisource_clock_trees –to tap_synthesis
synthesize_regular_multisource_clock_trees 
–from htree_synthesis –to htree_synthesis
synthesize_regular_multisource_clock_trees –from route_pin_connections
W-2024.09
70
Synopsys Confidential Information © Synopsys, Inc.
Enhancements to Automated H-tree Synthesis
 Ensure to set the –skip_pin_connections switch with the
set_regular_multisource_clock_tree_options command when atomic pin routing
step is desired.
 When H-tree options are defined with the –skip_pin_connections, the htree_synthesis
step returns before CTS STEP: Zroute for pin connections, and the –from
route_pin_connections can be used to atomically run this step
 If not defined, then the htree_synthesis runs the pin connection step as well and returns to
shell only after all the steps of H-tree synthesis
 If the -skip_pin_connections option is not used, the new pin connection step cannot be
performed, and an error message is generated
 There is no need to alter the set_regular_multisource_clock_tree_options
definition before running the –from route_pin_connections.
Things to note
W-2024.09
71
Synopsys Confidential Information © Synopsys, Inc.
Dynamic Clock Power Improvement for SMSCTS
• Overview
• Solution Description
• User Interface
W-2024.09
72
Synopsys Confidential Information © Synopsys, Inc.
Dynamic Clock Power Improvement for SMSCTS
 Clock dynamic power increases linearly with switching activity of nets.
 This feature aims to reduce the wirelength of high switching nets so that overall clock
dynamic power improves.
 With proper SAIF information, the tool accurately estimates the high switching nets.
The cell or cluster of cells with high input toggle rate is relocated closer to its driver.
 In the V-2023.12 release, we relocated all the cells that had a toggle ratio (Output TR
and Input TR) less than a threshold value. The relocation percentage also depended
on the toggle ratio value.
 Some gaps were identified in this initial implementation. The relocation of few
candidate nodes resulted in an increased overall wire length and degraded power.
 Starting from the W-2024.09 release, this feature is enhanced to address these gaps.
Overview
W-2024.09
73
Synopsys Confidential Information © Synopsys, Inc.
Dynamic Clock Power Improvement for SMSCTS
 Starting from the W-2024.09 release, this feature is enhanced with an accept
and reject mechanism after each power-aware relocation, so that the tool only
accepts moves where dynamic power is improved.
 The tool no longer relocates all the nodes below a threshold toggle ratio.
Instead, the tool identifies all high switching nets and their drivers to begin with.
For each such driver, we relocate all the loads incrementally towards it. Then,
the tool calculates the power and wire length after each relocation and accepts
the move only if there is an improvement. Any relocation that results in
degraded power is reverted, and the tool moves to the next driver.
 The power-aware relocation steps kick in after major SMSCTS steps like DRC
fixing and latency optimization and are supported in both standalone and
integrated flows.
Solution Description
W-2024.09
74
Synopsys Confidential Information © Synopsys, Inc.
Dynamic Clock Power Improvement for SMSCTS
User Interface
 This feature can be enabled in both standalone and integrated SMSCTS flows
using the following application option:
set_app_options -name cts.multisource.enable_activity_aware_relocation -value true
W-2024.09
75
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvements (GA)
• Overview
• Solution Description
• User Interface
W-2024.09
76
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvements
 Multiple enhancements are introduced to improve the overall SMSCTS
clock QoR.
 Enhancement to reduce long path violations to improve latency (on-by-default from W-
2024.09)
 Enhancement to dynamic sink reassignment for reducing clock wire length
 Enhancement to first-level cell assignment for better clock implementation.(on-by-default
from W-2024.09)
Overview
W-2024.09
77
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvements
Enhancement to reduce long path violations to improve latency.
 Long path violations during latency optimization are being addressed here.
 SMSCTS downsizes cells for area recovery post DRC fixing. With the new
enhancement, the tool does not downsize a cell if it is part of a long path.
 The SMSCTS flow does not split cells for latency optimization if the driver has a
transition violation. The synthesize_multisource_clock_subtrees
command is enhanced to enable splitting of intermediate cells for latency even
if they are violating for transition. This has helped to bring down long path
violations.
Solution Description
W-2024.09
78
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvements
Enhancement to dynamic sink reassignment for reducing clock wire length.
 This feature is targeted to improve the SMSCTS clock wire length.
 Under this feature, loads are reassigned to nearby equivalent drivers such that
routing overlaps reduce, and total clock wire length improves.
Solution Description
Purple L2 Inverter
Yellow L2 output net
Red L3 ICG
Overlaps circled
Without re-assignment With re-assignment
W-2024.09
79
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvements
Enhancement to first-level cell assignment for better clock implementation.
 The first-level cells are meant to be assigned to subtree drivers that are closest
to its fanout centroid.
 There was an issue observed where, at intermediate stages of SMSCTS, first-
level cells were not assigned to the nearest subtree driver.
 The clock optimizations are done based on this sub-optimal first-level
assignment, and the tool later reassigns the first-level cells to its closest
SMSCTS subtree driver or SMSCTS clock tap. This leads to a miscorrelation.
 This problem is addressed by this feature.
Solution Description
W-2024.09
80
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvement
User Interface
Set the following application option to enable the various enhancements:
 Enhancement to dynamic sink reassignment for reducing clock wire length.
set_app_options -name cts.multisource.subtree_dynamic_sinks_reclustering -value true
W-2024.09
81
Synopsys Confidential Information © Synopsys, Inc.
Structural MSCTS Clock QoR Improvement
User Interface
These two features are on-by-default starting from the W-2024.09 release. For
testing purposes in older versions, please use the below application option:
 Enhancement to reduce long path violations to improve latency.
 Enhancement to first-level cell assignment for better clock implementation.
set_app_options -name cts.multisource.reduce_long_path_violations –value true
set_app_options -name cts.multisource.first_level_cell_tree_assignment_fix –value true
W-2024.09
82
Synopsys Confidential Information © Synopsys, Inc.
Irregular Global Tree Synthesis
• Overview
• User Interface
W-2024.09
83
Synopsys Confidential Information © Synopsys, Inc.
Irregular Global Tree Synthesis
 Prior to the W-2024.09 release, automated global trees could only be created to
symmetrically inserted tap drivers, distributed evenly across the layout.
 This caused some floorplan and global tree complexities requiring manual global tree
insertion
 It also left a scope for a different style of global tree methodology for designs without tight on-
chip variation (OCV) issues or common path requirements
 Starting from the W-2024.09 release, a new global tree synthesis methodology is
introduced to automatically construct multilevel DRC clean global trees to non-
symmetric, unevenly inserted tap drivers called irregular global tree synthesis.
 In irregular global tree synthesis, global trees are created using multilevel buffer trees in a
non-H-tree style, ensuring minimum latency for all tap drivers
 This global tree style can be used to construct global tree for irregularly placed tap drivers at
the block or even to build a global distribution of clock to block inputs in top level channels
Overview
W-2024.09
84
Synopsys Confidential Information © Synopsys, Inc.
Irregular Global Tree Synthesis
User Interface (1/2)
 A new set of commands is introduced for irregular global tree synthesis.
#Define settings related to global tree
set_irregular_multisource_clock_tree_options
-clock <clock>
[-net <net_name>] to specify net for global tree creation. If not specified, global tree will be
built on the net connected to clock source. This net must drive all of the inserted tap drivers
-lib_cell {list} to specify lib_cell <inverter or buffer> to be used in global tree.
-routing_rule <rule>
-routing_layers {list}
-prefix to name the global tree cells inserted
#To synthesize global tree
synthesize_irregular_multisource_clock_trees
[-clocks {list}] to specify clock to synthesize, if multiple irregular global tree options
are defined.
W-2024.09
85
Synopsys Confidential Information © Synopsys, Inc.
Irregular Global Tree Synthesis
User Interface (2/2)
 Additionally, the irregular global tree options can be reported or removed
using the following new commands:
#Report settings related to irregular global tree
report_irregular_multisource_clock_tree_options
[-clock] <clock>
#Remove the settings related to irregular global tree
remove_irregular_multisource_clock_tree_options
[-clock] <clock>
Note: Irregular Global Tree Synthesis performs global tree construction to already inserted tap drivers. Here, users must
insert the irregular tap drivers themselves as per requirement.
W-2024.09
Fusion Compiler Incremental Clock Tree Synthesis Update W-2024.09_FC_ICCII_CTS_CCD_MSCTS_Update_Training.pdf

More Related Content

PDF
Judd-Ofelt Theory: Principles and Practices
PPTX
Adult stem cells.pptx
PDF
modeling techniques for composites structures
PDF
Future orthopedics basics of stem cells and tissue engineering dr.sandeep c a...
PPTX
Radiation units
PDF
RADIOSS - Composite Materials & Optimization
PPTX
Hodgkin huxleymodeling
PPTX
Nuclear Chemistry-Augar effect-Internal conversion-Isomerism
Judd-Ofelt Theory: Principles and Practices
Adult stem cells.pptx
modeling techniques for composites structures
Future orthopedics basics of stem cells and tissue engineering dr.sandeep c a...
Radiation units
RADIOSS - Composite Materials & Optimization
Hodgkin huxleymodeling
Nuclear Chemistry-Augar effect-Internal conversion-Isomerism

Similar to Fusion Compiler Incremental Clock Tree Synthesis Update W-2024.09_FC_ICCII_CTS_CCD_MSCTS_Update_Training.pdf (20)

PPTX
Nokia engineer basic_training_session_v1
PDF
Clock Gating of Streaming Applications for Power Minimization on FPGA’s
PDF
Training feedback Basavaraju
PDF
Motorola MotoTRBO Firmware 2.3 Release Notes (November 2013)
PDF
NetSIm Technology Library- Cognitive radio
PDF
IRJET- A New High Speed Wide Fan in Carry Look Ahead Adder Design using M...
PPTX
Trg138042019_1_annex_MAX-NG.pptx
PDF
Signal-Oriented ECUs in a Centralized Service-Oriented Architecture: Scalabil...
PPT
Verilog HDL Verification
PDF
Smart Surveillance Bot with Low Power MCU
PDF
Adaptive Laser Cladding System with Variable Spot Sizes
PDF
Impact2014 session #1317 you have got a friend on z - tales from cics tran...
PDF
C20 20090615-019-alu csfb-performance_enhance
PDF
Automation Production Systems and Computer Integrated Manufacturing 4th Editi...
PDF
AIRCOM LTE Webinar 5 - LTE Capacity
PDF
Time-Predictable Communication in Service-Oriented Architecture - What are th...
PDF
Ethernet_Smart_Switches_ElektronikAutomotive_202306_PressArticle_EN.pdf
PDF
D1.2 analysis and selection of low power techniques, services and patterns
PDF
toyota-Challenges towards New Software Platform for Automated Driving.pdf
PDF
High performance low leakage power full subtractor circuit design using rate ...
Nokia engineer basic_training_session_v1
Clock Gating of Streaming Applications for Power Minimization on FPGA’s
Training feedback Basavaraju
Motorola MotoTRBO Firmware 2.3 Release Notes (November 2013)
NetSIm Technology Library- Cognitive radio
IRJET- A New High Speed Wide Fan in Carry Look Ahead Adder Design using M...
Trg138042019_1_annex_MAX-NG.pptx
Signal-Oriented ECUs in a Centralized Service-Oriented Architecture: Scalabil...
Verilog HDL Verification
Smart Surveillance Bot with Low Power MCU
Adaptive Laser Cladding System with Variable Spot Sizes
Impact2014 session #1317 you have got a friend on z - tales from cics tran...
C20 20090615-019-alu csfb-performance_enhance
Automation Production Systems and Computer Integrated Manufacturing 4th Editi...
AIRCOM LTE Webinar 5 - LTE Capacity
Time-Predictable Communication in Service-Oriented Architecture - What are th...
Ethernet_Smart_Switches_ElektronikAutomotive_202306_PressArticle_EN.pdf
D1.2 analysis and selection of low power techniques, services and patterns
toyota-Challenges towards New Software Platform for Automated Driving.pdf
High performance low leakage power full subtractor circuit design using rate ...
Ad

Recently uploaded (20)

PDF
1 Introduction to Networking (06).pdfbsbsbsb
PPTX
PROPOSAL tentang PLN di metode pelaksanaan.pptx
PDF
2025CategoryRanking of technology university
PDF
Govind singh Corporate office interior Portfolio
PPTX
timber basics in structure mechanics (dos)
PDF
Timeless Interiors by PEE VEE INTERIORS
PPT
Fire_electrical_safety community 08.ppt
PPTX
lecture-8-entropy-and-the-second-law-of-thermodynamics.pptx
PDF
trenching-standard-drawings procedure rev
PDF
Chalkpiece Annual Report from 2019 To 2025
PPTX
2. Competency Based Interviewing - September'16.pptx
PPTX
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
PPTX
ENG4-Q2-W5-PPT (1).pptx nhdedhhehejjedheh
PDF
How Animation is Used by Sports Teams and Leagues
PPTX
Evolution_of_Computing_Presentation (1).pptx
PPTX
WHY UPLOADING IS IMPORTANT TO DOWNLOAD SLIDES.pptx
PDF
Social Media USAGE .............................................................
PPTX
22CDH01-V3-UNIT III-UX-UI for Immersive Design
PPTX
8086.pptx microprocessor and microcontroller
PPT
Unit I Preparatory process of dyeing in textiles
1 Introduction to Networking (06).pdfbsbsbsb
PROPOSAL tentang PLN di metode pelaksanaan.pptx
2025CategoryRanking of technology university
Govind singh Corporate office interior Portfolio
timber basics in structure mechanics (dos)
Timeless Interiors by PEE VEE INTERIORS
Fire_electrical_safety community 08.ppt
lecture-8-entropy-and-the-second-law-of-thermodynamics.pptx
trenching-standard-drawings procedure rev
Chalkpiece Annual Report from 2019 To 2025
2. Competency Based Interviewing - September'16.pptx
UNIT III - GRAPHICS AND AUDIO FOR MOBILE
ENG4-Q2-W5-PPT (1).pptx nhdedhhehejjedheh
How Animation is Used by Sports Teams and Leagues
Evolution_of_Computing_Presentation (1).pptx
WHY UPLOADING IS IMPORTANT TO DOWNLOAD SLIDES.pptx
Social Media USAGE .............................................................
22CDH01-V3-UNIT III-UX-UI for Immersive Design
8086.pptx microprocessor and microcontroller
Unit I Preparatory process of dyeing in textiles
Ad

Fusion Compiler Incremental Clock Tree Synthesis Update W-2024.09_FC_ICCII_CTS_CCD_MSCTS_Update_Training.pdf

  • 1. Implementation: CTS/CCD/MSCTS Version W-2024.09 Fusion Compiler / IC Compiler II Update Training © Synopsys, Inc. All Rights Reserved
  • 2. 2 Synopsys Confidential Information © Synopsys, Inc. Confidential Information CONFIDENTIAL INFORMATION The information contained in this presentation is the confidential and proprietary information of Synopsys. You are not permitted to disseminate or use any of the information provided to you in this presentation outside of Synopsys without prior written authorization. IMPORTANT NOTICE In the event information in this presentation reflects Synopsys’ future plans, such plans are as of the date of this presentation and are subject to change. Synopsys is not obligated to update this presentation or develop the products with the features and functionality discussed in this presentation. Additionally, Synopsys’ services and products may only be offered and purchased pursuant to an authorized quote and purchase order or a mutually agreed upon written contract with Synopsys.
  • 3. 3 Synopsys Confidential Information © Synopsys, Inc. Agenda Synopsys Confidential Information Clock Tree Synthesis (CTS) Enhancements • Pre-CTS Latency Bottleneck Reporting (GA) • Streamlined CTS Phase 2 (GA) • CTS Debuggability and Log file Enhancements (GA) • Early CTS Flow (GA) • Clock Power Recovery (GA)
  • 4. 4 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting • Overview • Solution Description • User Interface W-2024.09
  • 5. 5 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting  Efficient latency debugging is a feature frequently requested by customers.  Currently the longest critical paths can be identified only after executing CTS, using the report_clock_qor or during CTS using the cts.compile.report_latency_bottleneck application option.  The latency bottleneck analysis is printed in the CTS log file for the longest path per clock in a separate CTS step.  Starting from the W-2024.09 release, the users can identify the longest paths even before running CTS. It can be executed using the report_estimated_clock_latency command, which is a standalone command that can be used independently of the synthesize_clock_trees command.  In this feature, the effect of ICG relocation for the benefit of latency during the synthesize_clock_trees command is not reflected in the report.  The format of this report looks similar to the usual latency bottleneck analysis step. Overview W-2024.09
  • 6. 6 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting  Reporting Pre-CTS latency bottleneck analysis :  By default, Pre-CTS Latency Bottleneck analysis reports one longest path per clock for the primary corner.  This feature also supports the below switches, -clock → reports longest paths for selected clocks only -corner → reports the longest path in the selected corner -longest n → reports ‘n’ longest paths -through <driver> → reports longest path through a particular driver -unique_gate_levels → report longest path with ‘k’ unique driver levels Solution Description
  • 7. 7 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting Solution Description W-2024.09 fc_shell> report_estimated_clock_latency **************************************** Report : Estimated Clock Latency **************************************** Latency Bottleneck Paths for clock clk in mode func at root clk: Longest path 1: (0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000] (1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116] (2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665] (3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563] fc_shell> report_estimated_clock_latency -longest 2 **************************************** Report : Estimated Clock Latency -longest 2 **************************************** Latency Bottleneck Paths for clock clk in mode func at root clk: Longest path 1: (0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000] (1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116] (2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665] (3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563] Longest path 2: (0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000] (1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116] (2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665] (3) U8/CP [Location: (13.08, 15.03)] [SINK PIN] [Delay: 0.051507]
  • 8. 8 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting Solution Description W-2024.09 fc_shell> report_estimated_clock_latency -corner C1 : **************************************** Report : Estimated Clock Latency -corner C1 **************************************** Latency Bottleneck Paths for clock clk in mode func at root clk: Longest path 1: (0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000] (1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116] (2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665] (3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563] fc_shell> report_estimated_clock_latency -longest 1 -unique_gate_levels 1 : **************************************** Report : Estimated Clock Latency -longest 1 -unique_gate_levels 1 **************************************** Latency Bottleneck Paths for clock clk in mode func at root clk: Longest path 1: (0) clk [Location: (0.17, 15.86)] [Fanout: 3] [Delay: 0.000000] (1) U1/GCP [Location: (7.15, 17.75)] [Fanout: 2] [Delay: 0.001116] (2) U3/GCP [Location: (11.44, 15.15)] [Fanout: 2] [Delay: 0.026665] (3) U7/CP [Location: (10.99, 13.44)] [SINK PIN] [Delay: 0.051563]
  • 9. 9 Synopsys Confidential Information © Synopsys, Inc. Pre-CTS Latency Bottleneck Reporting User Interface fc_shell> report_estimated_clock_latency -help Usage: report_estimated_clock_latency # Report estimated clock latency [-clocks object_list] (List of clocks) [-longest longest] (Number of longest paths) [-corner corner] (Corner for reporting) [-through through_pin] (Through pin) [-unique_gate_levels unique_gate_levels] (Number of unique gate levels) W-2024.09
  • 10. 10 Synopsys Confidential Information © Synopsys, Inc. Streamlined CTS Phase 2 • Overview • Solution Description • User Interface W-2024.09
  • 11. 11 Synopsys Confidential Information © Synopsys, Inc. Streamlined CTS Phase 2  There are enhancements that are already done as part of phase 1 in the V-2023.12- SP3 release where the tool used fast cells only for the critical sink associated with the latency critical driver and slow cells added for non-critical sinks. Area and power are recovered during the Compile CTS stage using path-based criticality buffering. It also helped to improve latency by using fast cells for critical sinks.  Starting from the W-2024.09 release, two different enhancements are added as part of phase 2.  Currently, the kind of cell selection for clustering is different from buffering. In the W- 2024.09 release, cell selection matching for On-route buffering (ORB) and buffering is done. Improving correlation between CTS clustering and buffering helps to improve clustering imbalance. It also helps to improve area, power, and logical DRC.  Balanced topology enhancement is also done, which uses a new linear programming- based solution for clock tree synthesis balanced topology generator to replace the previous algorithm. Additional postprocessing of balanced topology is done to further improve wire length and skew. Overview & Solution Description W-2024.09
  • 12. 12 Synopsys Confidential Information © Synopsys, Inc. Streamlined CTS Phase 2 User Interface  Please use the below application option for enabling the feature in the W- 2024.09 release: set_app_options –name cts.compile.align_clustering_and_buffering –value true set_app_options –name cts.compile.balanced_topology_enhancements –value true W-2024.09
  • 13. 13 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements • Overview • Solution Description • User Interface W-2024.09
  • 14. 14 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  As part of the W-2024.09 release, there are couple of enhancements with respect to reporting commands and the log file for CTS.  The list of the enhancements is given below:  report_clock_qor prints total skew metric  check_clock_trees flags auto exception generation points  Print summary tables before and after each step in MTCTO  get_clock_tree_pins to report all the pins above MSCTS driver using the is_mscts_global_driver_object attribute  An attribute is_on_boundary_register is added to get all CCD boundary pins Overview W-2024.09
  • 15. 15 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  Feature #1: Report clock QoR prints total skew metric  Currently, there is not a way to report total skew value through the report_clock_qor command  This feature is particularly needed when utilizing the total skew optimization enhancement during MTCTO  From the W-2024.09 release, the report_clock_qor –type latency command has now been enhanced to have a new column called Total Skew  For this, the skew is calculated as {median latency of clock – path delay}, and the total skew is the sum of skews to the endpoints, provided it is greater than the target skew set  Expectation  We should see a section of total skew after reporting the clock QoR summary Solution Description W-2024.09
  • 16. 16 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  Feature #2: check_clock_trees to flag auto exception generation points  Currently, the auto exceptions are handled in CTS during the Clock Tree Initialization step, and a file named clock_auto_exceptions*.tcl is dumped in the current working directory  As part of this feature, to make it easier for the user to understand the auto exceptions that are derived during CTS, the check_clock_trees command is enhanced to report them beforehand  There is a file with the naming convention check_clock_trees_clock_auto_exceptions*.tcl dumped in the current working directory  Please note that this does not change the current database  The following exceptions are what get reported by the command: internal pins, loop breaking pins, conflict pins, and split pins.  The expectation is that we should see a check_clock_trees_clock_auto_exceptions*.tcl file dumped in the current working directory and that should have reported the above listed cases, if any Solution Description W-2024.09 #conflict pins set_clock_tree_balance_point –consider_for_balancing false –clock [get_clocks $clock –mode $mode –balance_points $balance_point
  • 17. 17 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  Feature #3: Print summary tables before and after each step in MTCTO  As per the current behavior, we have CTS-037 messages printed for each step in MTCTO showing the QoR before and after each stage  From the W-2024.09 release, we have a feature that prints summary tables before and after each step in MTCTO and provides users with a clearer view of clock QoR  Please note that it gets printed for each clock, corner, and mode  Expectation: There should be QoR summary tables printed before and after each step in MTCTO Solution Description W-2024.09 Summary table Total scenarios
  • 18. 18 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  Feature #4: get_clock_tree_pins to report all the pins above MSCTS driver through an attribute  Currently there is a way to get all the MSCTS subtree drivers with the attribute is_mscts_subtree_driver_object  However, to get the H-tree drivers, the user must use double filter: get_clock_tree_pins - to [get_clock_tree_pins -filter is_mscts_subtree_driver_object]  From the W-2024.09 release onwards, we have an attribute is_mscts_global_driver_object to report all the H-tree drivers  Please note that this feature works for multitap driver setup as well  Expectation:  We should see the tap drivers getting reported with the attribute Solution Description W-2024.09 fc_shell> get_clock_tree_pins –clock $clock –filter is_mscts_global_driver_object {MSCTS_htree_0_0/Y MSCTS_htree_0_1/Y MSCTS_htree_0_2/Y}
  • 19. 19 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements  Feature #5: Attribute is_on_boundary_register to get all boundary pins  Currently, to get the CCD boundary pins, users follow a manual method, which is not always feasible  Starting from the W-2024.09 release, we have an attribute is_on_boundary_register that reports all the CCD boundary pins in the design  Expectation:  We should see the CCD boundary register pins getting reported after using the attribute Solution Description W-2024.09 fc_shell> get_clock_tree_pins –filter is_on_boundary_register {reg_0/CLK reg_34/CLK reg_40/CLK}
  • 20. 20 Synopsys Confidential Information © Synopsys, Inc. CTS Debuggability and Log file Enhancements User Interface W-2024.09 Feature UI report_clock_qor printing the total skew value set_app_options –name cts.report.report_clock_qor_total_skew –value true check_clock_trees to flag auto exception generation points check_clock_trees Print summary tables before and after each step in MTCTO set_app_options -name cts.optimize.print_qor_summary_table -value true Balance point check check_clock_trees get_clock_tree_pins to report all the pins above MSCTS driver with an attribute get_clock_tree_pins –is_mscts_global_driver_object Attribute “is_on_boundary_register” to get all boundary pins get_clock_tree_pins –filter is_on_boundary_register
  • 21. 21 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow • Introduction • Solution Description • Expectation • User Interface • Setup Instructions W-2024.09
  • 22. 22 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  Regular Clock Tree Synthesis (CTS) flow involves clock tree building and optimization only during the clock_opt build_clock stage (after compile_fusion or place_opt).  The known behavior of the conventional flow is that the delay optimization and CCD in the compile stage do not view the actual clock tree effects and perform optimization only based on the ideal clock tree.  Actual clock tree skews are visible only after the real clock trees are built and propagated  OCV analysis comes into effect only when clock trees are built  This leads to a miscorrelation in design timing QoR between the stages before and after the clock tree is built.  To overcome this miscorrelation, users tend to use a few workarounds :  Workaround: Apply a higher clock uncertainty at the ideal clock stage to model the clock skews and add some margin for modeling OCV effects at the ideal clock stage  Caveat: However, this poses two difficulties. a) The user does not know what exact values of uncertainty to apply at the pre-CTS stage. b) Chances of over-constraining the design, leading to power and area degradation Introduction W-2024.09
  • 23. 23 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  To overcome the issues, starting from the W-2024.09 release, the tool provides a way to perform one round of clock tree building during the compile/place stage itself – known as Early CTS flow.  This makes the aforementioned clock tree effects seen during the data-path optimization and CCD during the compile stage.  Essentially, this makes the timing optimization in the compile stage more clock tree aware. Introduction W-2024.09
  • 24. 24 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  With Early CTS, clock tree synthesis happens twice in the flow: compile_fusion/place_opt final_place: o CTS requires the placement to be finalized. Early CTS in the compile_fusion/place_opt happens at the end of the final_place after the direct timing driven placement (DTDP). o Initial clock trees are built and propagated, followed by MTCTO based global latency and skew optimization. o Clock trees are global routed as well o Any long nets on the data path created during CTS clock cell relocation are addressed during the logical DRC (LDRC) fixing steps during the compile_fusion/place_opt final_opto o Now, with the propagated clock tree built, global routed, and optimized for skews, the compile_fusion/place_opt final_opto views the propagated timing picture o With this, delay optimization and offset derivation by CUS in the compile_fusion/place_opt final_opto more are realistic, leading to better QoR convergence down the flow. Solution Description W-2024.09
  • 25. 25 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  CCD during the compile_fusion and place_opt final_opto annotates useful skew offsets and adjustments in the form of set_clock_latency offset (with the recommended options) even in the propagated mode for the timing and optimization to understand the CCD offsets.  With Early CTS, clock tree synthesis happens twice in the flow: clock_opt build_clock:  It removes the clock trees built by early CTS completely and rebuilds the clock tree guided by the balance points derived by CCD during the compile_fusion final_opto  CCD in the clock_opt build_clock incrementally optimizes the timing QoR Solution Description W-2024.09
  • 26. 26 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  compile_fusion/place_opt:  Timing looks worse after early CTS, but this is just bringing clock realism earlier into the flow  Similarly, area and power appear worse due to the addition of clock tree buffers  The compile_fusion/place_opt runtime increases due to early CTS  clock_opt or End of Flow :  With more accurate timing to drive the compile_fusion/place_opt final_opto, the flow QoR should converge better in terms of timing QoR (WNS/TNS) as it progresses from CTS to clock_opt final_opto  We should end up with less hurtful and more helpful skews given that CUS in the compile_fusion/place_opt final_opto sees a better timing context  As the propagated timing QoR picture is visible upfront, we can also expect better power in cases of designs with neutral timing QoR by the end of the flow  The runtime overhead from trial CTS should be partially offset by faster timing convergence through the clock_opt. Expectation W-2024.09
  • 27. 27 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow User Interface • Early CTS flow can be enabled in the W-2024.09 release during the compile_fusion/place_opt by stage using the below application option: For place_opt flow: set_app_options -list {place_opt.flow.trial_clock_tree true} For compile_fusion flow: set_app_options -list {compile.flow.trial_clock_tree true} W-2024.09
  • 28. 28 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  Make sure all the necessary clock tree setup and settings are applied before the compile_fusion/place_opt final_place stage for trial CTS to correlate with build_clock.  CTS and CTO application options and settings, including skew and latency targets, cell spacing rules, skew group settings, clock balance group settings, etc.  Clock NDRs, constraints like max_transition/capacitance/fanout/net_length, clock lib-cell reference list  Any custom user proc that runs before CTS, which could impact its execution  Keep active scenarios the same from the compile_fusion/place_opt final_place stage through the clock_opt final_opto.  Reduce clock uncertainty after the compile_fusion/place_opt final_place to account for the existence of the trial clock tree.  We recommend using the same uncertainty as that of post-CTS Setup Instructions (1/3) W-2024.09
  • 29. 29 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  Set these application options in the compile_fusion/place_opt: set_app_options -list {place_opt.flow.trial_clock_tree true} (or) set_app_options -list {compile.flow.trial_clock_tree true} set_app_options -list {npo.enable_ects_flow true}; #OBD since D20240830 set_app_options -list {cts.common.disable_ccs_rcv_cap_in_trial false} set_app_options -list {time.enable_offset_latency_computation_in_propagated_clocks true}  Set these in both the compile_fusion/place_opt and clock_opt for both baseline and early CTS flows: set_app_options -as_user_default -name cts.optimize.improvement_mode_version -value EIM_20240330 set_app_options -name cts.optimize.enable_improvement_mode -value skew Setup Instructions (2/3 W-2024.09
  • 30. 30 Synopsys Confidential Information © Synopsys, Inc. Early CTS Flow  Set this for the clock_opt build_clock flow in early CTS flow: set_app_options -list {cts.compile.enable_cell_relocation none}  Override this setting only during the compile_fusion/place_opt final_place. Restore to previous value after early CTS is done: set_app_options -list {cts.compile.power_opt_mode none}  Additionally, the following is recommended for CCD in the clock_opt for both baseline and early CTS flows: set_app_options -list {ccd.max_prepone_postpone_consider_skew_latency true} set_app_options -list {ccd.max_prepone_postpone_consider_corner_scaling true} set_app_options -list {ccd.enable_hyper_ccd true}  Reset CCD max_pre/postpone limits from CTS to look for the best possible scope being utilized by CCD in both baseline and early CTS flows Setup Instructions (3/3) W-2024.09
  • 31. 31 Synopsys Confidential Information © Synopsys, Inc. Clock Power Recovery • Overview • User Interface W-2024.09
  • 32. 32 Synopsys Confidential Information © Synopsys, Inc. Clock Power Recovery  Clock tree optimization (CTO) engine currently considers logical DRC, latency, skew, and area as cost functions to improve clock QoR  There were increasing requests to reduce power at the final CTO step  A new step CTS STEP: Power optimization is introduced after the Area Recovery step before the Final DRC Fixing step  Introduces power as cost function for optimization at this step  Performs buffer removal and sizing to improve total power over area  Expectation  Clock power should be improved  There must be no degradation in terms of skew, latency, and logical DRC Overview W-2024.09
  • 33. 33 Synopsys Confidential Information © Synopsys, Inc. Clock Power Recovery User Interface  Below application option can be used to enable clock power recovery feature, set_app_options -list {cts.optimize.enable_cto_power_optimization true} W-2024.09
  • 34. 34 Synopsys Confidential Information © Synopsys, Inc. Agenda Synopsys Confidential Information Concurrent Clock and Data Optimization (CCD) Enhancements • Data Path Aware CUS for Timing (GA) • Data Path Aware CUS for Power (GA) • Switching Power-Aware Pre-CTS CCD (GA) • Fast CUS (GA) • Clock DRC Improvements (GA)
  • 35. 35 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Timing • Overview • Solution Description • User Interface W-2024.09
  • 36. 36 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Timing  The pre-CTS CCD engine, currently incorporated in the flow, follows sequential calls of CCD and data optimization for achieving better timing QoR.  It derives offset by considering the scope of prepone and postpone available and scope of implementability. However, the offsets derived by CCD in the compile stage are not data path aware.  To have better concurrency on the engine, this feature enables CCD offsets derived and solutions accepted to be data path aware.  The current flow is that, once timing is optimized on a timing path and zero slack is achieved, the path does not get disturbed further for being utilized for slack borrowing for its fanin and fanout paths, even if the other paths are critical for timing. This leaves the extra timing potential available for the zero slack path, limiting the design frequency.  This feature improves the design frequency by over-optimizing the paths that are easily closed or have positive slack, which in turn provides extra borrowable slack for paths with limiting frequency. Overview and Benefits W-2024.09
  • 37. 37 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Timing  Utilizing the unused Data Path Potential:  Slack margin is applied on endpoints with zero or positive slack, making the slack more positive  One round of useful skew computation is run, which does slack borrow, considering the extra positive slack  Slack margins applied on the endpoints are removed. This makes the borrowed path violating  Data path optimization is called to recover timing on these violating endpoints  Extra call of useful skew computation is done by the end to recover clock tree power (cus-dpae2)  This feature impacts final calls of CUS during the place_opt/compile final_opto stage. Solution Description FF1 FF2 FF3 D D D Q Q Q FF1 FF2 FF3 D D D Q Q Q FF1 FF2 FF3 D D D Q Q Q Path Margin Path Margin CUS FF1 FF2 FF3 D D D Q Q Q FF1 FF2 FF3 D D D Q Q Q Data Opto W-2024.09
  • 38. 38 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Timing  Delay Potential & Path Margin Computation:  The path margin is not applied to all the endpoints but only to the endpoints having paths with available delay potential  For every endpoint, the initial slack is stored across scenarios. Based on delays and the size-ability of logic cones, incremental arc delays are annotated on the timer, and a timer update is performed  If the new slack degrades, zero delay potential is assumed  Otherwise, the difference is applied as predicted delay potential Solution Description W-2024.09
  • 39. 39 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Timing User Interface  Data Path Aware CUS for Timing in the place_opt and compile_fusion flow with dynamic delay potential computation, can be enabled using the below new public application options:  For compile_fusion flow: set_app_options –list {compile.flow.enable_dpa_ccd_timing_with_delay_potential true}  For place_opt flow: set_app_options -list {place_opt.flow.enable_dpa_ccd_timing_with_delay_potential true} W-2024.09
  • 40. 40 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Power • Overview • Solution Description • User Interface W-2024.09
  • 41. 41 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Power  The current CCD based place_opt/compile_fusion flow follows sequential calls of CCD and data optimization, where CCD derives offset by considering the scope of prepone/postpone available and scope of implementability.  However, the offsets derived by CCD in the compile stage are not data path aware.  In current flow, if there are endpoints with zero or slightly positive slack available once data power optimization is completed, those will be blocked from further power reduction to avoid any new timing criticality.  Also, positive slack in the path with no further power potential is left unused throughout the flow.  The feature utilizes this timing potential on neighboring paths to recover power at parts of the design that have power potential, making the flow more concurrent. Overview & Benefits W-2024.09
  • 42. 42 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Power  Data Path Aware CUS flow - Power:  Data path aware CUS power recovery is done by borrowing the slack from pipeline stages where the power potential is zero to the stages that have power potential  Note that this borrowing is applicable only for logic cones with positive power potential  This slack transfer is achieved by tuning the clock latencies (preponing and postponing) that controls the launch and capture arrival times  Once the slack is transferred, an extra data path optimization pass is introduced, which recovers power in the path with power potential  Finally, these offsets that are not helping for power optimization are pruned  This feature impacts final calls of CUS in the place_opt final_opto flow. Solution Description W-2024.09 FF1 FF2 FF3 D D D Q Q Q FF1 FF2 FF3 D D D Q Q Q FF1 FF2 FF3 D D D Q Q Q Path with power potential but no timing scope Path with no power potential but with timing scope CUS Postponing → Slack Transfer Data Power Opt. → Power Potential Utilized
  • 43. 43 Synopsys Confidential Information © Synopsys, Inc. Data Path Aware CUS for Power User Interface  Data Path Aware CUS for Power can be enabled by using the following application option in the W-2024.09 release.  For compile_fusion flow: set_app_options –list {compile.flow.enable_dpa_ccd_power true}  For place_opt flow: set_app_options –list {place_opt.flow.enable_dpa_ccd_power true} W-2024.09
  • 44. 44 Synopsys Confidential Information © Synopsys, Inc. Switching Power-Aware Pre-CTS CCD • Overview • Solution Description • User Interface W-2024.09
  • 45. 45 Synopsys Confidential Information © Synopsys, Inc. Switching Power-Aware Pre-CTS CCD  Overview  Currently, CUS implements a clock tree model, which is not fully capable of capturing the more complex design characteristics, one of them being power.  In the W-2024.09 release, this feature aims at introducing a new cost model for optimizing clock tree power during useful skew optimization by leveraging the switching activity of the clock tree.  The new power-aware clock tree cost formulation introduces an additional cost function that is power-aware, i.e., the cost function derives higher offset values on the loads that are placed in nets where the switching activity is higher​.  Benefits  The overall benefits include improved clock tree dynamic power from placement to clock tree synthesis and beyond without any degradation in runtime.​  Improved clock power and timing QoR​. Overview & Benefits W-2024.09
  • 46. 46 Synopsys Confidential Information © Synopsys, Inc. Switching Power Aware Pre-CTS CCD  A new total clock tree power estimation is introduced in CUS to balance the leakage and dynamic power cost.​  The new estimation is designed to approximate power in two phases. ​​  In the first phase, clock gate power is estimated by just accumulating the leakage and dynamic power of every clock gate, and the second phase estimates the power of repeaters.​  For this, a representative repeater power is chosen as the average across all repeater library cells, and the power contribution for repeaters is then calculated based on the representative dynamic and leakage power multiplied by the estimated repeater count per net, calculated based on the maximum fanout constraint​​  Once dynamic and leakage power are estimated, the clock tree cost is scaled following the ratio given by the estimated dynamic and leakage power  Please note that for this feature to be effective, we need to have at least one power scenario (leakage or dynamic)​.  ​Power awareness is considered in every call of CUS in the compile stage with the same weight.​​​ Solution Description W-2024.09
  • 47. 47 Synopsys Confidential Information © Synopsys, Inc. Switching Power Aware Pre-CTS CCD User Interface  Clock Power Aware CUS feature is made on-by-default and can be enabled with the OBD mega option from the W-2024.09 release: set_app_options -list {flow.common.effort 12345} W-2024.09
  • 48. 48 Synopsys Confidential Information © Synopsys, Inc. Fast CUS • Overview • Solution Description • User Interface W-2024.09 IC Compiler II
  • 49. 49 Synopsys Confidential Information © Synopsys, Inc. Fast CUS  Overview  Currently, the IC Compiler II place_opt flow makes four main calls to the global-solver based CUS engine (at the place_opt initial_opto and at the place_opt final_opto), which attempts to adjust clock latencies to improve the timing QoR in each call.  This takes a significant amount of runtime (10-25%) in the place_opt flow.  In the W-2024.09 release, the tool helps to improve the runtime of CUS by reducing the number of calls to the global-solver based CUS engine and introduces calls to incremental CUS, picking certain target endpoints that are critical for design and works only on it.  Impacts the initial_opto of place_opt and the compile_fusion is not affected by Fast CUS.​  Benefits  Helps to improve runtime in the place_opt flow while preserving QoR.  For an average design, 20-25% speedup for CUS and 2% speedup for the place_opt flow is expected. Overview & Benefits W-2024.09 IC Compiler II
  • 50. 50 Synopsys Confidential Information © Synopsys, Inc. Fast CUS User Interface  Fast CUS feature is made on-by-default in the place_opt flow and can be enabled with the OBD mega option from the W-2024.09 release: set_app_options -list {flow.common.effort 12345} W-2024.09 IC Compiler II
  • 51. 51 Synopsys Confidential Information © Synopsys, Inc. Clock DRC Improvements • Overview • Solution Description • User Interface W-2024.09
  • 52. 52 Synopsys Confidential Information © Synopsys, Inc. Clock DRC Improvements  Overview  There are designs where, max_transition violations on the clock tree remain unfixed once clock nets are routed, which have to be handled during CCD DRC fixing during the clock_opt final_opto flow.  It is observed that in some designs there are maximum transition violations on clock remain after concurrent clock and data (CCD) engines and multi objective optimization CCD (MOO- CCD) calls.  Such violations happen because of commit failures caused by legalization failures. As a workaround, over-constraints are applied, and the DRC violations are made more visible for CCD to work on.  But over-constraining being a workaround, always limits the OOTB DRC fixing and reduces the effectiveness as it does not target the root cause. Also, with over-constraining, clock power is expected to degrade.  This feature provides enhancements, such that the left-over maximum transition violations could be effectively reduced with CCD in the clock_opt final_opto flow. Overview & Benefits W-2024.09
  • 53. 53 Synopsys Confidential Information © Synopsys, Inc. Clock DRC Improvements  The proposed solution consists of two parts.  Enhance the maximum transition fixing ability of the legacy CCD engine for DRC fixing (DRC-CCD) and multi-objective CCD engine for DRC fixing.  Enhance the protection on maximum transition for other calls of CCD down the flow.  The feature includes a solution to fix maximum transition violations because of,  Commit failures due to legalization failures​  No RC node found to insert buffers​  LEQs having via ladder (VL) candidates  Less accuracy on Max transition values is not accurate in sub-graph  Sizing down of a cell by multi-vector CCD from the library cell with VL constraint to the one without VL constraint Solution Description W-2024.09
  • 54. 54 Synopsys Confidential Information © Synopsys, Inc. Clock DRC Improvements User Interface  Clock Trans DRC closure feature can be enabled by using the below application option in the W-2024.09 release: set_app_options -name ccd.enable_clock_drc_improvements -value true; #Default false W-2024.09
  • 55. 55 Synopsys Confidential Information © Synopsys, Inc. Agenda Synopsys Confidential Information Multisource Clock Tree Synthesis (MSCTS) Enhancements • H-tree Improvements (GA) • Tap Assignments Improvements (GA) • Automated H-tree Synthesis Enhancements (GA) • Dynamic Clock Power Improvement for SMSCTS (GA) • Structural MSCTS Clock QoR Improvements (GA) • Irregular Global Tree Synthesis (GA)
  • 56. 56 Synopsys Confidential Information © Synopsys, Inc. H-tree Improvements • Overview • Solution Description • User Interface W-2024.09
  • 57. 57 Synopsys Confidential Information © Synopsys, Inc. H-tree Improvements  During global tree synthesis, we build a centrally symmetric H-tree structure from the input pins of tap driver to H-tree root driver output pin.  Few customers have a special need for single repeater solution at H-tree junctions instead of two repeaters to avoid routing and pin congestion issues.  This feature aims to:  Enhance the existing H-tree synthesis algorithm to improve H-tree QoR (especially latency)  Resolve the routing and pin congestion issues because of multiple H-tree repeater cells at junctions  Improve multiple H-tree stem path skew Overview W-2024.09
  • 58. 58 Synopsys Confidential Information © Synopsys, Inc. H-tree Improvements The enhancements covered as part of this feature are:  Support for best reference cell selection for latency improvement  Evaluate all repeaters (from H-tree library cell collection) for each node and generate solution if that repeater meets DRC constraint  All possible solution (with all repeaters) are available at top  Then at the top level, pick the solution which is delay wise best  Preference to single-buffer solution  Higher preference is given to single-buffer solution for each node. The best single-repeater solution is selected for implementation based on latency  This helps resolve routing and pin congestion issues because of two repeater solutions at H- tree junctions Solution Description W-2024.09
  • 59. 59 Synopsys Confidential Information © Synopsys, Inc. H-tree Improvements  Multiple H-tree improvements  ZBUF engine is tuned for global stem path skew improvement  Routing topology performed post buffering is enhanced for better net lengths Solution Description W-2024.09
  • 60. 60 Synopsys Confidential Information © Synopsys, Inc. H-tree Improvements User Interface  Preference to single-buffer solution at junctions can be enabled by using the application option given below:  Support for best reference cell selection for latency can be enabled using the application option given below: set_app_options –list {cts.multisource.htree_single_repeater_at_node true} set_app_options –list {cts.multisource.htree_explore_all_repeater_solutions true} W-2024.09
  • 61. 61 Synopsys Confidential Information © Synopsys, Inc. Tap Assignment Improvements • Overview • Solution Description • User Interface W-2024.09
  • 62. 62 Synopsys Confidential Information © Synopsys, Inc. Tap Assignment Improvements  During tap assignment, we distribute the clock network below the tap drivers (local tree) between the tap driver cells to create smaller subtrees for the clock_opt build_clock step to synthesize.  This feature aims to:  Reduce the cluster size during initial clustering  Prevent tap assignment for ICG driving only ignore pins  Add multi-voltage awareness to tap assignment which enables cloning of cells across power domains Overview W-2024.09
  • 63. 63 Synopsys Confidential Information © Synopsys, Inc. Tap Assignment Improvements The enhancements covered as part of this feature are:  Reduce the cluster size during initial clustering, which prevents sinks from connecting to tap drivers far away.  There are few cases where macro sinks are assigned to far away taps  This happens because the initial clusters created during tap assignment were too big, resulting in the macro sinks to get clustered with other sinks close to the boundary of two tap regions  This issue is resolved by reducing the initial cluster size  Prevent tap assignment for ICG driving only ignore pins.  We do not have to check in the tap assignment flow to prevent creation of single fanout CGs, which drives only ignore sinks, as this could result in breakage of CUS timing closure Solution Description W-2024.09
  • 64. 64 Synopsys Confidential Information © Synopsys, Inc. Tap Assignment Improvements  Add MV awareness to tap assignment, which enables cloning of cells across power domains. Solution Description Fig 1: Without MV aware TA Fig 2: With MV aware TA W-2024.09 Switch domain AON domain Switch domain AON domain
  • 65. 65 Synopsys Confidential Information © Synopsys, Inc. Tap Assignment Improvements User Interface  Use the following application option to control the reduced cluster sizing feature  Use the following application option to control the feature that prevents tap assignment for ICG driving only ignore pins  MV awareness, which enables cloning of cells across power domains, can be enabled using the following application option set_app_options –list {cts.multisource.limit_cluster_size true} set_app_options –list {cts.multisource.tap_assignment_reassign_ignore_sinks true} set_app_options –list {cts.multisource.enable_mv_aware_tap_assignment true} W-2024.09
  • 66. 66 Synopsys Confidential Information © Synopsys, Inc. Enhancements to Automated H-tree Synthesis • Overview • User Interface W-2024.09
  • 67. 67 Synopsys Confidential Information © Synopsys, Inc. Enhancements to Automated H-tree Synthesis  Starting from the W-2024.09 release, ease of use of the automated H-tree synthesis command – synthesize_regular_multisource_clock_trees– is improved by adding a separate step for pin connection using Zroute.  For this, an additional sub step is added to synthesize_regular_multisource_clock_trees –from/-to steps, named route_pin_connections.  This new step allows users to return to shell after H-tree trunk routing using galaxy custom router (GCR) and run the pin connection as a separate step. Overview W-2024.09
  • 68. 68 Synopsys Confidential Information © Synopsys, Inc. Enhancements to Automated H-tree Synthesis The enhancements covered as part of this feature are:  Starting from the W-2024.09 release, ease of use of the automated H-tree synthesis command – synthesize_regular_multisource_clock_trees– is improved by adding a separate step for pin connection using Zroute.  For this, an additional sub-step is added to the synthesize_regular_multisource_clock_trees –from/-to steps, named the route_pin_connections.  This new step allows users to return to shell after H-tree trunk routing using galaxy custom router (GCR) and run the pin connection as a separate step. Solution Description W-2024.09
  • 69. 69 Synopsys Confidential Information © Synopsys, Inc. Enhancements to Automated H-tree Synthesis User Interface The new sub-step the route_pin_connections can be used to stop and run pin connections as an atomic step with the –from and –to switches. set_app_options –list {cts.multisource.tap_assignment_reassign_ignore_sinks true} set_regular_multisource_clock_tree_options –clock $clk –topology htree_only … -skip_pin_connections synthesize_regular_multisource_clock_trees –to tap_synthesis synthesize_regular_multisource_clock_trees –from htree_synthesis –to htree_synthesis synthesize_regular_multisource_clock_trees –from route_pin_connections W-2024.09
  • 70. 70 Synopsys Confidential Information © Synopsys, Inc. Enhancements to Automated H-tree Synthesis  Ensure to set the –skip_pin_connections switch with the set_regular_multisource_clock_tree_options command when atomic pin routing step is desired.  When H-tree options are defined with the –skip_pin_connections, the htree_synthesis step returns before CTS STEP: Zroute for pin connections, and the –from route_pin_connections can be used to atomically run this step  If not defined, then the htree_synthesis runs the pin connection step as well and returns to shell only after all the steps of H-tree synthesis  If the -skip_pin_connections option is not used, the new pin connection step cannot be performed, and an error message is generated  There is no need to alter the set_regular_multisource_clock_tree_options definition before running the –from route_pin_connections. Things to note W-2024.09
  • 71. 71 Synopsys Confidential Information © Synopsys, Inc. Dynamic Clock Power Improvement for SMSCTS • Overview • Solution Description • User Interface W-2024.09
  • 72. 72 Synopsys Confidential Information © Synopsys, Inc. Dynamic Clock Power Improvement for SMSCTS  Clock dynamic power increases linearly with switching activity of nets.  This feature aims to reduce the wirelength of high switching nets so that overall clock dynamic power improves.  With proper SAIF information, the tool accurately estimates the high switching nets. The cell or cluster of cells with high input toggle rate is relocated closer to its driver.  In the V-2023.12 release, we relocated all the cells that had a toggle ratio (Output TR and Input TR) less than a threshold value. The relocation percentage also depended on the toggle ratio value.  Some gaps were identified in this initial implementation. The relocation of few candidate nodes resulted in an increased overall wire length and degraded power.  Starting from the W-2024.09 release, this feature is enhanced to address these gaps. Overview W-2024.09
  • 73. 73 Synopsys Confidential Information © Synopsys, Inc. Dynamic Clock Power Improvement for SMSCTS  Starting from the W-2024.09 release, this feature is enhanced with an accept and reject mechanism after each power-aware relocation, so that the tool only accepts moves where dynamic power is improved.  The tool no longer relocates all the nodes below a threshold toggle ratio. Instead, the tool identifies all high switching nets and their drivers to begin with. For each such driver, we relocate all the loads incrementally towards it. Then, the tool calculates the power and wire length after each relocation and accepts the move only if there is an improvement. Any relocation that results in degraded power is reverted, and the tool moves to the next driver.  The power-aware relocation steps kick in after major SMSCTS steps like DRC fixing and latency optimization and are supported in both standalone and integrated flows. Solution Description W-2024.09
  • 74. 74 Synopsys Confidential Information © Synopsys, Inc. Dynamic Clock Power Improvement for SMSCTS User Interface  This feature can be enabled in both standalone and integrated SMSCTS flows using the following application option: set_app_options -name cts.multisource.enable_activity_aware_relocation -value true W-2024.09
  • 75. 75 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvements (GA) • Overview • Solution Description • User Interface W-2024.09
  • 76. 76 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvements  Multiple enhancements are introduced to improve the overall SMSCTS clock QoR.  Enhancement to reduce long path violations to improve latency (on-by-default from W- 2024.09)  Enhancement to dynamic sink reassignment for reducing clock wire length  Enhancement to first-level cell assignment for better clock implementation.(on-by-default from W-2024.09) Overview W-2024.09
  • 77. 77 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvements Enhancement to reduce long path violations to improve latency.  Long path violations during latency optimization are being addressed here.  SMSCTS downsizes cells for area recovery post DRC fixing. With the new enhancement, the tool does not downsize a cell if it is part of a long path.  The SMSCTS flow does not split cells for latency optimization if the driver has a transition violation. The synthesize_multisource_clock_subtrees command is enhanced to enable splitting of intermediate cells for latency even if they are violating for transition. This has helped to bring down long path violations. Solution Description W-2024.09
  • 78. 78 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvements Enhancement to dynamic sink reassignment for reducing clock wire length.  This feature is targeted to improve the SMSCTS clock wire length.  Under this feature, loads are reassigned to nearby equivalent drivers such that routing overlaps reduce, and total clock wire length improves. Solution Description Purple L2 Inverter Yellow L2 output net Red L3 ICG Overlaps circled Without re-assignment With re-assignment W-2024.09
  • 79. 79 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvements Enhancement to first-level cell assignment for better clock implementation.  The first-level cells are meant to be assigned to subtree drivers that are closest to its fanout centroid.  There was an issue observed where, at intermediate stages of SMSCTS, first- level cells were not assigned to the nearest subtree driver.  The clock optimizations are done based on this sub-optimal first-level assignment, and the tool later reassigns the first-level cells to its closest SMSCTS subtree driver or SMSCTS clock tap. This leads to a miscorrelation.  This problem is addressed by this feature. Solution Description W-2024.09
  • 80. 80 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvement User Interface Set the following application option to enable the various enhancements:  Enhancement to dynamic sink reassignment for reducing clock wire length. set_app_options -name cts.multisource.subtree_dynamic_sinks_reclustering -value true W-2024.09
  • 81. 81 Synopsys Confidential Information © Synopsys, Inc. Structural MSCTS Clock QoR Improvement User Interface These two features are on-by-default starting from the W-2024.09 release. For testing purposes in older versions, please use the below application option:  Enhancement to reduce long path violations to improve latency.  Enhancement to first-level cell assignment for better clock implementation. set_app_options -name cts.multisource.reduce_long_path_violations –value true set_app_options -name cts.multisource.first_level_cell_tree_assignment_fix –value true W-2024.09
  • 82. 82 Synopsys Confidential Information © Synopsys, Inc. Irregular Global Tree Synthesis • Overview • User Interface W-2024.09
  • 83. 83 Synopsys Confidential Information © Synopsys, Inc. Irregular Global Tree Synthesis  Prior to the W-2024.09 release, automated global trees could only be created to symmetrically inserted tap drivers, distributed evenly across the layout.  This caused some floorplan and global tree complexities requiring manual global tree insertion  It also left a scope for a different style of global tree methodology for designs without tight on- chip variation (OCV) issues or common path requirements  Starting from the W-2024.09 release, a new global tree synthesis methodology is introduced to automatically construct multilevel DRC clean global trees to non- symmetric, unevenly inserted tap drivers called irregular global tree synthesis.  In irregular global tree synthesis, global trees are created using multilevel buffer trees in a non-H-tree style, ensuring minimum latency for all tap drivers  This global tree style can be used to construct global tree for irregularly placed tap drivers at the block or even to build a global distribution of clock to block inputs in top level channels Overview W-2024.09
  • 84. 84 Synopsys Confidential Information © Synopsys, Inc. Irregular Global Tree Synthesis User Interface (1/2)  A new set of commands is introduced for irregular global tree synthesis. #Define settings related to global tree set_irregular_multisource_clock_tree_options -clock <clock> [-net <net_name>] to specify net for global tree creation. If not specified, global tree will be built on the net connected to clock source. This net must drive all of the inserted tap drivers -lib_cell {list} to specify lib_cell <inverter or buffer> to be used in global tree. -routing_rule <rule> -routing_layers {list} -prefix to name the global tree cells inserted #To synthesize global tree synthesize_irregular_multisource_clock_trees [-clocks {list}] to specify clock to synthesize, if multiple irregular global tree options are defined. W-2024.09
  • 85. 85 Synopsys Confidential Information © Synopsys, Inc. Irregular Global Tree Synthesis User Interface (2/2)  Additionally, the irregular global tree options can be reported or removed using the following new commands: #Report settings related to irregular global tree report_irregular_multisource_clock_tree_options [-clock] <clock> #Remove the settings related to irregular global tree remove_irregular_multisource_clock_tree_options [-clock] <clock> Note: Irregular Global Tree Synthesis performs global tree construction to already inserted tap drivers. Here, users must insert the irregular tap drivers themselves as per requirement. W-2024.09