SlideShare a Scribd company logo
1
Efficient Techniques for Per Clock Gating Domain
Contributor based Power Abstraction of IP Blocks
for Hierarchical Power Analysis
Arun Joseph, Nagu Dhanwada, Spandana Rachamalla, William Dungan,
Ricardo Nigaglioni
IBM Systems Group
Motivation
 IP blocks are becoming larger with increased number of
clock gating domains.
 Much more aggressive solutions are being adopted for
improving the clock gating of these very large IP blocks.
 There are several workloads for which significant
heterogeneity in both clock and data activity is seen across
the multiple clock gating domains within an IP block.
2
3
Background: Contributor based Power Analysis Flow
Library
Characterization
Corner 1 ………. Corner N
Contributor
Power
Model
Contributor
based Macro
Power
Abstract
Chip Level
Power Analysis
Corner 1 ……. Corner N
Workload 1….Workload N
Input to Wafer Test,
System Planning,
Power Sorting and Binning
Cell
Library
Macro Power
Abstract
Generation
Macro/IP Block
Chip
Background
 Accurate and efficient full chip power analysis is an important step in
the design of power efficient microprocessor and SoC chips.
 A power model abstraction flow based on contributors is PVT-
independent and enables efficient hierarchical chip level power
analysis.
 The dynamic power and leakage power model abstraction primarily
targeted for full chip power analysis was presented in [3]. This prior
work was based on parameterizing capacitance switching due to clock
gating onto a single clock gate control for the entire macro.
 This approximation to a single macro wide clock gate control works
fairly well for full chip dynamic power analysis but there is a need to
explore improved dynamic power abstraction techniques for enabling
more accurate power analysis.
4
Main Idea: Multi Clock Gating Domain Abstract
Slide 5
Single Clock Gate Control Base Power Abstract Multiple Clock Gate Domain Power Abstract
d1
d3
d2
d4
IP Block
IP Block Power Abstract
1. Case Setup
2. Simulation
3. Power Contributor Element Generation
and Contributor Accumulation
IP Block Power Abstract Generation
Multiple clock gating
domains in the IP block.
Multiple clock gating
domains in the IP block.
Approximated to a single
macro wide clock gate
control as a part of the
abstraction process.
Approximated to a single
macro wide clock gate
control as a part of the
abstraction process.
Chip Level
Power Analysis
Weights and activity
factors set during activity
extraction in chip level
power analysis.
Weights and activity
factors set during activity
extraction in chip level
power analysis.
Name Weight Activity factor(s)
AlwaysCeff
GatableCeff (1 -
clock_gating)
PiSfDepCeff input_switch_rate
LoSfDepCeff latch_output_switch_rate
PiLoXPCeff input_switch_rate,
latch_output_switch_rate
Chip level
power analysis
Per clock
gating
domain activity
extraction
Per clock
gating
domain activity
extraction
1.1. Marking and Domain IdentificationMarking and Domain Identification
2. Case Setup
3. Simulation
4.4. Power Contributor Element Generation and AccumulationPower Contributor Element Generation and Accumulation
IP Block Power Abstract Generation
Name Weight Activity factor(s)
AlwaysCeff
PiSfDepCeff input_switch_rate
GatableCeff (1 - clock_gating)
LoSfDepCeff latch_output_switch_rate
PiLoXPCeff input_switch_rate,
latch_output_switch_rate
GatableCeff.d1 (1 - clock_gating.d1)
LoSfDepCeff.d1 latch_output_switch_rate.d1
PiLoXPCeff.d1 input_switch_rate,
latch_output_switch_rate.d.d1
GatableCeff.d2 (1 - clock_gating.d2)
LoSfDepCeff.d2 latch_output_switch_rate.d2
PiLoXPCeff.d2 input_switch_rate,
latch_output_switch_rate.d2
LoSfDepCeff.d1-d2 latch_output_switch_rate.d1
latch_output_switch_rate.d2
PiLoXPCeff.d1-d2 input_switch_rate,
latch_output_switch_rate.d1
latch_output_switch_rate.d2
Multi Clock gate domain Abstraction: Three Variants
6
Domain
identification
Marking of
domains
Domains
combination
list creation
Per case
simulations
Per domain and
Single domain ceffs
computation
and accumulation
Per domain IP
Power abstract
creation
Per domain
Bill of
materials
file
generation
No-sim based
clock power
only abstraction
Domain
collapsing
Domain
parameterized
clock power
abstract
Domain
parameterize
d
clock and
data power
abstract
• Quick tracing based clock
power only abstraction,
• Clock and Data Power
abstraction based on
domain merging using
Domain combination lists,
• Domain collapsing for
handling large extensively
gated designs
7
Experimentation
Workload based
Simulation
Abstraction based Power
Analysis
In sync
Gate Level Block
Power
Abstract
Activity Files
(SAIF Like)
Abstraction
based Power
Unit Level
Activity
Extraction and
Power Rollup
Compare
Gate Level Block
(Workload driven
model)
Workload
driven power
VHDL Sim Data
Activity Files
Waveform File
Comparison of workload driven power simulation with the power abstract based
estimation for the three approaches
Experimental Results
8
Comparison for Design D4. D4 has 87 latches, 2 clock gating
domains, 4 domain combinations, ~12000 gates and nets.
Comparison for Design D3. D3 has 1200 latches, 21 clock
gating domains, 83 domain combinations.
Comparison for Design D1. D2 has 640 latches, 9 clock
gating domains, 44 domain combinations, ~13000 standard
cell instances and nets.
Comparison for Design D2. D2 has 520 latches, 3 clock
gating domains, 3 domain combinations, ~10000 gates and
nets.
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -54.87 -7.26 2.02
No-sim clock 1.17 -54.87 -0.02 1.93
Domain combinations 2.03 -16.30 -0.02 1.58
Domain combinations & collapse 1.13 -17.41 -0.02 1.78
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -6.7 4.8 1.82
No-sim clock 1.03 -6.7 0.3 1.75
Domain combinations 1.12 -0.7 0.3 1.61
Domain combinations & collapse 1.12 -0.7 0.3 1.61
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -8.7 4.1 2.35
No-sim clock 1.06 -8.7 0.3 2.22
Domain combinations 1.98 -2.3 0.3 1.64
Domain combinations & collapse 1.08 -4.4 0.3 1.96
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -22.4 0.4 1.51
No-sim clock 1.10 -22.4 0.4 1.47
Domain combinations 1.25 2.7 0.4 1.41
Domain combinations & collapse 1.25 2.7 0.4 1.41
9
Conclusion
 Presented approaches for generation of multiple clock gating domain
parameterized PVT independent power abstracts for large IP blocks.
 We accomplish the gating domain parameterization through separation of
the attribution of switching due to each single domain through a marking
and tracing process, thereby precluding the need for separate domain by
domain simulation to achieve the parameterization.
 Experimental results comparing proposed approach on IP blocks of
varying sizes from a real industry strength microprocessor design clearly
highlight accuracy impact while keeping run time and model size increase
in an acceptable range.
 In terms of extensions, we are exploring approaches where we could
preserve each of the domains independently, for which we are looking into
formulations based on constructing clock gating domain conflict hyper
graphs and coloring them to determine domain interactions.
9
Conclusion
 Presented approaches for generation of multiple clock gating domain
parameterized PVT independent power abstracts for large IP blocks.
 We accomplish the gating domain parameterization through separation of
the attribution of switching due to each single domain through a marking
and tracing process, thereby precluding the need for separate domain by
domain simulation to achieve the parameterization.
 Experimental results comparing proposed approach on IP blocks of
varying sizes from a real industry strength microprocessor design clearly
highlight accuracy impact while keeping run time and model size increase
in an acceptable range.
 In terms of extensions, we are exploring approaches where we could
preserve each of the domains independently, for which we are looking into
formulations based on constructing clock gating domain conflict hyper
graphs and coloring them to determine domain interactions.

More Related Content

PDF
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
PDF
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
PDF
Jet Energy Corrections with Deep Neural Network Regression
PDF
Low Power High-Performance Computing on the BeagleBoard Platform
PDF
Accelerate Machine Learning Software on Intel Architecture
PDF
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
PDF
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...
PDF
A HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSING
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Jet Energy Corrections with Deep Neural Network Regression
Low Power High-Performance Computing on the BeagleBoard Platform
Accelerate Machine Learning Software on Intel Architecture
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...
A HIGH SPEED LOW POWER CAM AND TCAM WITH A PARITY BIT AND POWER GATED ML SENSING

What's hot (19)

PDF
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
PDF
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
PDF
Implementation of Low Power Test Pattern Generator Using LFSR
PDF
Core Objective 1: Highlights from the Central Data Resource
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
PDF
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
PDF
IIBMP2019 講演資料「オープンソースで始める深層学習」
PDF
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
PDF
PDF
Machine Learning with New Hardware Challegens
PPTX
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
PDF
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
PDF
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
PPTX
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
PDF
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
PDF
Convolutional Neural Networks at scale in Spark MLlib
PDF
Multi-core GPU – Fast parallel SAR image generation
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
Implementation of Low Power Test Pattern Generator Using LFSR
Core Objective 1: Highlights from the Central Data Resource
International Journal of Engineering and Science Invention (IJESI)
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
IIBMP2019 講演資料「オープンソースで始める深層学習」
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Machine Learning with New Hardware Challegens
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
Accelerating Real Time Applications on Heterogeneous Platforms
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Convolutional Neural Networks at scale in Spark MLlib
Multi-core GPU – Fast parallel SAR image generation
Ad

Viewers also liked (14)

PPTX
EDUC5102G Session 4 Presentation
PDF
Reportaxe Escrita
PDF
burton
PPT
FVCAG: A framework for formal verification driven power modelling and verific...
PDF
Keddy Minette
PPTX
Creativetechnopreneur
PPT
портфоліо
PPTX
Metodologi penelitian program mm kelas bni 2012
PPTX
ヴォーンDC使い方 スライド
PDF
Welcome to Jeunesse
PDF
10 frases para motivar e vender mais!
PPTX
Segmentação de mercado - LUXO
PPSX
09 atendente de farmácia (organização de uma farmácia)
PPT
Sortez de la meute : Réussir son branding personnel avec les médias sociaux
EDUC5102G Session 4 Presentation
Reportaxe Escrita
burton
FVCAG: A framework for formal verification driven power modelling and verific...
Keddy Minette
Creativetechnopreneur
портфоліо
Metodologi penelitian program mm kelas bni 2012
ヴォーンDC使い方 スライド
Welcome to Jeunesse
10 frases para motivar e vender mais!
Segmentação de mercado - LUXO
09 atendente de farmácia (organização de uma farmácia)
Sortez de la meute : Réussir son branding personnel avec les médias sociaux
Ad

Similar to Per domain power analysis (20)

PPTX
Library Characterization Flow
PDF
Vector processing aware advanced clock-gating techniques for low-power fused ...
PDF
[IJET-V1I3P5] Authors :Dushyant Kumar Soni, Ashish Hiradhar
PDF
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
PDF
Low-Power Design and Verification
PDF
VLSID 2015 FirmLeak (Poster)
DOCX
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
DOCX
Ieee 2015 2014 nexgen tech vlsi abstract
PDF
lowpower consumption and details of dfferent power pdf
PPT
Instruction level power analysis
PDF
P358387
PDF
Power Optimized Datapath Units of Hybrid Embedded Core Architecture Using Clo...
DOCX
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
DOCX
Nexgen tech vlsi 2015 2014
PPTX
Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage A...
PDF
Guidelines for-early-power-analysis
PDF
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
PPTX
FirmLeak
PPT
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
PDF
Use of abstraction for generating timing models for hierarchical
Library Characterization Flow
Vector processing aware advanced clock-gating techniques for low-power fused ...
[IJET-V1I3P5] Authors :Dushyant Kumar Soni, Ashish Hiradhar
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
Low-Power Design and Verification
VLSID 2015 FirmLeak (Poster)
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
Ieee 2015 2014 nexgen tech vlsi abstract
lowpower consumption and details of dfferent power pdf
Instruction level power analysis
P358387
Power Optimized Datapath Units of Hybrid Embedded Core Architecture Using Clo...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
Nexgen tech vlsi 2015 2014
Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage A...
Guidelines for-early-power-analysis
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
FirmLeak
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Use of abstraction for generating timing models for hierarchical

More from Arun Joseph (6)

PPT
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
PPTX
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
PPT
FreqLeak
PPT
Process synchronization in multi core systems using on-chip memories
PPT
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
PPT
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
FreqLeak
Process synchronization in multi core systems using on-chip memories
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TLE Review Electricity (Electricity).pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Getting Started with Data Integration: FME Form 101
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Heart disease approach using modified random forest and particle swarm optimi...
Group 1 Presentation -Planning and Decision Making .pptx
Assigned Numbers - 2025 - Bluetooth® Document
Univ-Connecticut-ChatGPT-Presentaion.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
OMC Textile Division Presentation 2021.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
TLE Review Electricity (Electricity).pptx
A comparative study of natural language inference in Swahili using monolingua...
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Getting Started with Data Integration: FME Form 101
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Unlocking AI with Model Context Protocol (MCP)
Heart disease approach using modified random forest and particle swarm optimi...

Per domain power analysis

  • 1. 1 Efficient Techniques for Per Clock Gating Domain Contributor based Power Abstraction of IP Blocks for Hierarchical Power Analysis Arun Joseph, Nagu Dhanwada, Spandana Rachamalla, William Dungan, Ricardo Nigaglioni IBM Systems Group
  • 2. Motivation  IP blocks are becoming larger with increased number of clock gating domains.  Much more aggressive solutions are being adopted for improving the clock gating of these very large IP blocks.  There are several workloads for which significant heterogeneity in both clock and data activity is seen across the multiple clock gating domains within an IP block. 2
  • 3. 3 Background: Contributor based Power Analysis Flow Library Characterization Corner 1 ………. Corner N Contributor Power Model Contributor based Macro Power Abstract Chip Level Power Analysis Corner 1 ……. Corner N Workload 1….Workload N Input to Wafer Test, System Planning, Power Sorting and Binning Cell Library Macro Power Abstract Generation Macro/IP Block Chip
  • 4. Background  Accurate and efficient full chip power analysis is an important step in the design of power efficient microprocessor and SoC chips.  A power model abstraction flow based on contributors is PVT- independent and enables efficient hierarchical chip level power analysis.  The dynamic power and leakage power model abstraction primarily targeted for full chip power analysis was presented in [3]. This prior work was based on parameterizing capacitance switching due to clock gating onto a single clock gate control for the entire macro.  This approximation to a single macro wide clock gate control works fairly well for full chip dynamic power analysis but there is a need to explore improved dynamic power abstraction techniques for enabling more accurate power analysis. 4
  • 5. Main Idea: Multi Clock Gating Domain Abstract Slide 5 Single Clock Gate Control Base Power Abstract Multiple Clock Gate Domain Power Abstract d1 d3 d2 d4 IP Block IP Block Power Abstract 1. Case Setup 2. Simulation 3. Power Contributor Element Generation and Contributor Accumulation IP Block Power Abstract Generation Multiple clock gating domains in the IP block. Multiple clock gating domains in the IP block. Approximated to a single macro wide clock gate control as a part of the abstraction process. Approximated to a single macro wide clock gate control as a part of the abstraction process. Chip Level Power Analysis Weights and activity factors set during activity extraction in chip level power analysis. Weights and activity factors set during activity extraction in chip level power analysis. Name Weight Activity factor(s) AlwaysCeff GatableCeff (1 - clock_gating) PiSfDepCeff input_switch_rate LoSfDepCeff latch_output_switch_rate PiLoXPCeff input_switch_rate, latch_output_switch_rate Chip level power analysis Per clock gating domain activity extraction Per clock gating domain activity extraction 1.1. Marking and Domain IdentificationMarking and Domain Identification 2. Case Setup 3. Simulation 4.4. Power Contributor Element Generation and AccumulationPower Contributor Element Generation and Accumulation IP Block Power Abstract Generation Name Weight Activity factor(s) AlwaysCeff PiSfDepCeff input_switch_rate GatableCeff (1 - clock_gating) LoSfDepCeff latch_output_switch_rate PiLoXPCeff input_switch_rate, latch_output_switch_rate GatableCeff.d1 (1 - clock_gating.d1) LoSfDepCeff.d1 latch_output_switch_rate.d1 PiLoXPCeff.d1 input_switch_rate, latch_output_switch_rate.d.d1 GatableCeff.d2 (1 - clock_gating.d2) LoSfDepCeff.d2 latch_output_switch_rate.d2 PiLoXPCeff.d2 input_switch_rate, latch_output_switch_rate.d2 LoSfDepCeff.d1-d2 latch_output_switch_rate.d1 latch_output_switch_rate.d2 PiLoXPCeff.d1-d2 input_switch_rate, latch_output_switch_rate.d1 latch_output_switch_rate.d2
  • 6. Multi Clock gate domain Abstraction: Three Variants 6 Domain identification Marking of domains Domains combination list creation Per case simulations Per domain and Single domain ceffs computation and accumulation Per domain IP Power abstract creation Per domain Bill of materials file generation No-sim based clock power only abstraction Domain collapsing Domain parameterized clock power abstract Domain parameterize d clock and data power abstract • Quick tracing based clock power only abstraction, • Clock and Data Power abstraction based on domain merging using Domain combination lists, • Domain collapsing for handling large extensively gated designs
  • 7. 7 Experimentation Workload based Simulation Abstraction based Power Analysis In sync Gate Level Block Power Abstract Activity Files (SAIF Like) Abstraction based Power Unit Level Activity Extraction and Power Rollup Compare Gate Level Block (Workload driven model) Workload driven power VHDL Sim Data Activity Files Waveform File Comparison of workload driven power simulation with the power abstract based estimation for the three approaches
  • 8. Experimental Results 8 Comparison for Design D4. D4 has 87 latches, 2 clock gating domains, 4 domain combinations, ~12000 gates and nets. Comparison for Design D3. D3 has 1200 latches, 21 clock gating domains, 83 domain combinations. Comparison for Design D1. D2 has 640 latches, 9 clock gating domains, 44 domain combinations, ~13000 standard cell instances and nets. Comparison for Design D2. D2 has 520 latches, 3 clock gating domains, 3 domain combinations, ~10000 gates and nets. Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -54.87 -7.26 2.02 No-sim clock 1.17 -54.87 -0.02 1.93 Domain combinations 2.03 -16.30 -0.02 1.58 Domain combinations & collapse 1.13 -17.41 -0.02 1.78 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -6.7 4.8 1.82 No-sim clock 1.03 -6.7 0.3 1.75 Domain combinations 1.12 -0.7 0.3 1.61 Domain combinations & collapse 1.12 -0.7 0.3 1.61 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -8.7 4.1 2.35 No-sim clock 1.06 -8.7 0.3 2.22 Domain combinations 1.98 -2.3 0.3 1.64 Domain combinations & collapse 1.08 -4.4 0.3 1.96 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -22.4 0.4 1.51 No-sim clock 1.10 -22.4 0.4 1.47 Domain combinations 1.25 2.7 0.4 1.41 Domain combinations & collapse 1.25 2.7 0.4 1.41
  • 9. 9 Conclusion  Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks.  We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.  Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range.  In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.
  • 10. 9 Conclusion  Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks.  We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.  Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range.  In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.

Editor's Notes

  • #5: For instance, such analysis is required for power sort process, which is used for determining product shipping frequencies. The key enablers of this flow are the concept of contributor based power models [1, 2], an abstract definition and a method for generating such abstracts for complex IP blocks.
  • #6: Base (Single Clock Gate Control) Abstraction: The dynamic power abstraction introduced in [3] is performed in terms of the dynamic power contributors. It characterizes power as a function of a clock_gating weight factor, input_switch_rate and latch_output_switch_rate activity factors. The capacitance, weight and activity factors (computed during higher level power analysis) are computed by approximating (into a single macro wide clock gate control) across the clock gating domains to compute power. Proposed Multi-Domain Abstraction: In the proposed abstraction, there are clock and data Ceff components that correspond to the individual clock gating domains. The per domain Ceff, along with weight and activity factors (computed on a per clock gating domain basis during higher level analysis) are used for hierarchical per clock gating domain power analysis. This makes it more efficient, accurate and usable to drive more aggressive use of clock gating in the logic design process.
  • #8: Workload driven gate level simulation (GLS) based power is compared with abstraction (ABS) based power for validation of proposed abstractions. GLS based power: A netlist for an IP block is simulated for several thousand cycles by applying switching patterns extracted from RTL simulations of higher level realistic workloads. The switching at every net is computed to get an average power dissipated for the simulated switching patterns. ABS based power: Same netlist is simulated to generate different power abstracts. For the same switching patterns, switching activity factors including the clock gating factor, switching factor at the primary inputs and latch outputs on a per gating domain basis are computed. The computed activity factors are applied to the generated power abstract (base, and per clock gating domain) to calculate ABS based power. All experiments were run on 24 core 2.6GHz Xeon machine running RHEL 5 with 256GB memory. Designs from both core and uncore units of the microprocessor were studied. A variant of a thermal design point workload is used for comparison
  • #9: “TAT Benefit” is the improvement seen in runtime while using ABS based power computation, when compared against the runtime of GLS based power computation. “Model size increase” here refers to ratio of the size of per clock gating domain power abstract (in bytes when stored on the disk) to the size of the base power abstract. The domain collapse procedure is triggered when the size of domain combinations list is greater than a certain threshold (DT), and is collapsed to DX% of the size of the domain combinations list. Both DT and DX are programmable, and they were empirically chosen as DT=10 and DX=10%.