SlideShare a Scribd company logo
6/23/2014 © 2014 ANSYS, Inc. 1 
PowerArtist™: RTL Design-for-Power 
Design Automation Conference 2014
6/23/2014 © 2014 ANSYS, Inc. 2 
Early Power Decisions  High Impact 
Power Reduction 
100% 
50% 
0% 
Large Impact Small Impact 
RTL 
Design 
Logic 
Synthesis 
Physical 
Design 
Timing 
Closure 
• Power-Performance-Area Trade-offs 
• Voltage / Power Domain Planning 
• Block-level Clock and Data Gating 
• Eliminate Redundant Activity 
• Power Switch Sizing / Placement 
• Clock Gater Cloning / Decloning 
• Multi-Vt Optimization 
• Power Integrity Verification 
RTL Design-for-Power Low Power Implementation
6/23/2014 © 2014 ANSYS, Inc. 3 
RTL Power ↔ Gate-level Power 
Design Specification 
RTL Design 
Gate-Level Design 
Layout 
~20 hours 
~22 mins 
Quicker Design Iterations Effective Design-for-Power 
Gate-level Power 
+ 
Adder 
Register 
Mux 
RTL Power 
Power-per-Function 
Power-per-Gate
6/23/2014 © 2014 ANSYS, Inc. 4 
PowerArtist: RTL Design-for-Power Platform 
RTL Power 
Analysis 
• Average, time-based 
• Power-critical vector selection 
• Regressions via TCL interface 
RTL Power 
Reduction 
• Clock, memory, logic 
• Analysis-driven automation 
• Interactive power debug 
RTL Links 
with Physical 
• PACE™: RTL power accuracy 
• RPM™: RTL-driven physical power integrity 
Physical 
Power 
RTL Power 
PACE RPM
6/23/2014 © 2014 ANSYS, Inc. 5 
RTL Power: Ins and Outs 
Vdd1 
Power domains 
(UPF / CPF) 
Vdd2 
module PA ( 
... 
always @ (posedge clk) begin 
dout <= din1; 
end 
assign out = sel ? dout : din2; 
... 
endmodule RTL 
(VHDL, Verilog, System Verilog) 
RTL Power 
Analysis 
Capacitance model 
(WLM / PACE) 
mu 
x 
and 
register 
register 
Activity 
(FSDB / VCD / SAIF) 
Clock tree, gating 
(SDC, PACE, user input) clk 
Power models 
(Liberty .lib) 
mux
6/23/2014 © 2014 ANSYS, Inc. 6 
Low Power RTL Design Methodology 
Peak Power = 391mW 
Check power vs. budget 
TRANSMIT MODE RECEIVE MODE 
Residual receive activity in 
transmit mode 
Profile power vectors 
RTL Power Regression Flow 
Reduce power automatically Monitor power vs. budget 
Enabled Clock 
Inactive Data 
Debug power hotspots 
Average power = 239mW 
Perform design trade-offs 
0.00E+00 
1.00E-02 
2.00E-02 
3.00E-02 
4.00E-02 
5.00E-02 
6.00E-02 
Power (W) 
Version 2 (Typ) 
Version 1 (Typ) 
Version 2 (Idle) 
Version 1 (Idle) 
Version 1 Version 2
6/23/2014 © 2014 ANSYS, Inc. 7 
RTL vs. Gates: Accuracy and Performance 
Nvidia Case Study 
RTL Power Accuracy: ~15% RTL Power: ~30X faster
6/23/2014 © 2014 ANSYS, Inc. 9 
RTL Capacity: Large Designs / FSDBs 
Samsung Case Study 
FSDB captures only power-critical 
signals identified by PowerArtist 
• FSDB size: 1/4 
• TAT: 4X faster 
• Loss of accuracy: 2%
6/23/2014 © 2014 ANSYS, Inc. 10 
RTL Power Analysis
6/23/2014 © 2014 ANSYS, Inc. 11 
PowerArtist RTL Power Analysis 
• Total Logic / Clock Activity 
per Hierarchical Instance 
• Qualify Coverage per Power 
Mode 
• Identify Power Bugs 
• Understand Power: Where? 
Why? 
• Per Hierarchy, Category, Mode, 
Clock / Voltage Domains 
• Qualify Power Efficiency with 
Multiple Metrics 
Activity Analysis Average Power Analysis 
• Power Waveforms per 
Hierarchical Instance 
• Waveforms per Category: 
Clock, Memory, Logic 
• Identify Peak Power and 
Time 
Time-based Power Analysis
6/23/2014 © 2014 ANSYS, Inc. 12 
Clock Gating Efficiency 
Temporal and Structural Metrics 
Example 
• 16 of 20 bits are gated 
• 5 of 10 cycles are gated 
• 2 of 5 enabled cycles had data toggles 
gclk 
clk 
en 
data 
SCGE DCGE CGEE 
Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles 
Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk) 
Value 80% 50% 40%
6/23/2014 © 2014 ANSYS, Inc. 13 
Clock Gating Efficiency 
Temporal and Structural Metrics 
100% Static CGE 
0% Dynamic CGE 
CGEE, 
Power Impact 
CGE: Static, Dynamic 
Flop: Power, Activity
6/23/2014 © 2014 ANSYS, Inc. 14 
RTL Power Reduction
6/23/2014 © 2014 ANSYS, Inc. 15 
PowerArtist RTL Power Reduction 
Original RTL Low-Power RTL 
openPDB powerartist.pdb 
set RPT [open $output_file "w"] 
set ungated_registers [getRegisters -cg none] 
foreach I $ungated_registers { 
set dyn_power [getPropVal $i Dynamic_Power "inst"] 
set bit_width [getInstWidth $reg] 
set file [getPropVal $iFile_Name "inst"] 
set line_num [getPropVal $i Line_Number "inst"] 
} 
1. Interactive Power 
Debug 
2. Automated Power 
Reduction 
3. Customizable Power 
Reports 
• Block-level Power “Bugs” 
• Large Power Savings 
• Instance-level Power Reduction 
• 15 Analysis-driven Techniques 
• TCL Queries to OADB 
• Automation Beyond 
PowerArtist Reports
6/23/2014 © 2014 ANSYS, Inc. 16 
Debug Power: Visualize-Analyze-Reduce 
Inactive Data, Active Clock 
Identify Block-level Clock Gating Enable
6/23/2014 © 2014 ANSYS, Inc. 17 
Block-Level Power Reduction 
Clock Active, Data Inactive 
Clock Inactive, Data Active 
Block-level 
Clock Gating 
Block-level 
Data Gating 
Block-level Activity Analysis: 
Clock and Data Ports 
1.1 Clock Pins 
------------------------------------------------------- 
Redundant Total Pin Mode Instance 
Cycles Cycles Name Name Name 
------------------------------------------------------- 
200 201 CLKA read top.core1.t1.dpmem.m1 
------------------------------------------------------- 
1.2 Input and Redundant Pins 
------------------------------------------------------- 
Redundant Total Pin Mode Instance 
Toggles Toggles Name Name Name 
------------------------------------------------------- 
1 1 AB[8] read top.core1.t1.dpmem.m1 
------------------------------------------------------- 
Wasted Activity 
per Mode 
Clock Activity per 
Hierarchy 
Constant high activity 
Missed clock gating? 
Redundant activity 
in read mode
6/23/2014 © 2014 ANSYS, Inc. 18 
Instance-Level Power Reduction 
• Clock gating coverage 
• Clock gating efficiency 
• Sequential and combinational 
• Redundant activity 
• Don’t care conditions 
• Datapath operand isolation 
• Redundant read/write 
• Splitting memories 
• Exercising sleep modes 
Clock / Clock Gating Control Logic and Datapath Memory Subsystem
6/23/2014 © 2014 ANSYS, Inc. 19 
Analysis-Driven RTL Power Reduction 
Wasted activity/power when sel is 0
6/23/2014 © 2014 ANSYS, Inc. 20 
Analysis-Driven RTL Power Reduction 
Pre-compute based new clock gate enables 
Multi-cycle ODC sequential analysis
6/23/2014 © 2014 ANSYS, Inc. 21 
Analysis-Driven RTL Power Reduction 
Pre-compute based new clock gate enables 
Multi-cycle ODC sequential analysis 
0.00 
0.10 
0.20 
0.30 
0.40 
0.50 
0.60 
0.70 
0.80 
0.90 
1.00 
1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291 
Predicted Power Savings 
(normalized) 
# RTL Changes (Design Effort) 
Top 5 RTL changes  
50% identified power savings 
Maximize Power Savings 
Minimize Design Impact 
• Clock, Memory, Logic 
• Sequential, Combinational 
• Vector-based, Vectorless 
• Hierarchical, SoC capacity 
15 Power Reduction Techniques
6/23/2014 © 2014 ANSYS, Inc. 22 
Power Reduction Case Studies 
…. 
. 
1 
0 
A 
B 
scan_enable = 0 
scan_clock 
data_in 
M_OUT 
Write Write Read 
MUX Reduction Technique: 
• Scan clocks toggling in functional mode 
• Redundant data activity in registers wasting power 
Redundant Data Toggles 
GMC Technique: 
• Redundant data toggles in 
read mode 
• Cycle-based analysis reports 
% Redundant Cycles
6/23/2014 © 2014 ANSYS, Inc. 23 
Power Database Access with TCL API 
Power Database 
(OpenAccess) 
Design Queries 
• getMemories/Flops/Combs 
• getFanout 
• getModulePorts 
• reportDesignStats 
Report Creation 
• reportCGEfficiency 
• diffPdbPower 
• reportPower 
• reportReductions 
Power Queries 
• getPropVal instance/net 
• getClockPower 
• getNetPower 
• getClockEnableExpr 
Design Navigation 
• dls 
• dpwd, dcd 
• dpushd, dpopd 
• show 
Customize and Automate Power Reduction, Reports, Regressions 
• Quick access to power and design properties 
• Accomplish custom tasks with few lines of TCL
6/23/2014 © 2014 ANSYS, Inc. 24 
Custom Power Reports 
50% Idle Power Reduction in Mobile SoC 
Instance Name 
Enable 
Efficiency Clock Power Clock En Net 
or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk 
or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1 
en_blk 
clk 
data 
gclk_blk 
Inefficient enables waste power 
en_blk 
clk 
gclk_blk 
Block 
Clock 
Gate 
en_reg 
Register 
Clock 
Gate 
gclk_reg 
Block-level clock gates control 
significant power 
Power Efficiency = 0 Single clock gate controls >5mW 
PowerArtist clock gating report  identifies inefficient clock gates
6/23/2014 © 2014 ANSYS, Inc. 25 
RTL Power Regressions 
• 30+ blocks per typical SoC 
• 2+ vectors per block 
• Vectors written for power: idle, active 
• Daily block-level, weekly chip-level regressions 
monitor power changes 
• Power metrics track power efficiency 
• PowerArtist identifies where power changed 
RTL 
(Verilog, SV, VHDL) 
Testbench 
Simulator 
FSDB 
RTL Power 
Analysis, Reduction, Regression
6/23/2014 © 2014 ANSYS, Inc. 26 
RTL Links with Physical Design
6/23/2014 © 2014 ANSYS, Inc. 27 
PACE™: Physical-Aware RTL Power 
Budgeting 
module PA ( 
... 
always @ (posedge clk) 
begin 
dout <= din1; 
end 
assign out = sel ? dout : 
din2; 
... 
endmodule 
• Clock Distribution 
• Parasitics 
• Multiple Vt 
• Low-power Structures 
• Optimization 
PACE Models 
(Cap, Clock) 
Post-Layout 
Gate-level Power 
RTL Power PACE 
PACE Bridges the RTL vs. Layout Gap 
 Predictable RTL Power Accuracy
6/23/2014 © 2014 ANSYS, Inc. 28 
RTL PACE vs. Gate-Power: Mobile SoC @14nm 
RTL-PACE Power within 20% 
Total Power Correlation 
Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation 
Gate-SPEF vs. RTL-PACE 
RTL-PACE Clock Power within 20%
6/23/2014 © 2014 ANSYS, Inc. 29 
RTL Power-Driven Power Integrity 
module PA ( 
... 
always @ (posedge clk) 
begin 
dout <= din1; 
end 
assign out = sel ? dout : 
din2; 
... 
endmodule 
• Shrinking geometries  Increasing di/dt 
• Gate vectors too late 
• Layout late for changes 
• Error-prone guesstimates 
RTL Power 
RPM Enables PDN Planning  
Early, Optimal, Robust 
RTL Power 
Model 
RPM 
Physical 
Power Integrity
6/23/2014 © 2014 ANSYS, Inc. 30 
RPM Case Studies 
RPM 
CPM(Layout)+Pkg 
CPM(RPM)+Pkg 
Pkg only 
RPM 
Gate 
FSDB 
Vectorless 
Peak = 6X Average Power 
Di/dt event not at the 
same time as the peak 
Peak and di/dt Cycle Selection on a GPU Core 
Frame: DIDT 
Start time: 0.0817704 
Finish time: 0.0817706 
Average leakage for supply VDD: 0.00257393 
Average power for supply VDD: 0.185336 
Peak power for supply VDD: 0.219776 
Frame: CYCLE_POWER 
Start time: 0.0806005 
Finish time: 0.0806007 
Average leakage for supply VDD: 0.002569 
Average power for supply VDD: 0.250168 
Peak power for supply VDD: 0.266678 
Early Voltage Drop Analysis Early Package Resonance Analysis
6/23/2014 © 2014 ANSYS, Inc. 32 
Related Presentations @ DAC2014 
• Power Analysis Using PowerArtist for WaveLogic3 ASIC – 
100Gbs Coherent Metro Optical Modem 
• Achieving RTL Power Efficiency and Automated Power 
Reduction 
• Methods for Achieving RTL to Gate Power Consistency

More Related Content

DOCX
Vlsi interview questions compilation
PPTX
ZERO WIRE LOAD MODEL.pptx
PPTX
Library Characterization Flow
PPTX
2Overview of Primetime.pptx
PDF
Clock Tree Timing 101
PPT
Low Power Design Techniques for ASIC / SOC Design
PDF
sta slide ref.pdf
PPT
Asic backend design
Vlsi interview questions compilation
ZERO WIRE LOAD MODEL.pptx
Library Characterization Flow
2Overview of Primetime.pptx
Clock Tree Timing 101
Low Power Design Techniques for ASIC / SOC Design
sta slide ref.pdf
Asic backend design

What's hot (20)

PDF
Methods for Achieving RTL to Gate Power Consistency
PDF
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
PDF
Low-Power Design and Verification
PPTX
Trends and challenges in IP based SOC design
PDF
fpga programming
PDF
VLSI testing and analysis
PPTX
ASIC DESIGN FLOW
ODP
Inputs of physical design
PPTX
System on Chip (SoC)
PDF
DRAM Cell - Working and Read and Write Operations
PPTX
GUI for DRV fix in ICC2
PPTX
Low Power VLSI Design Presentation_final
PDF
io and pad ring.pdf
PPTX
Logic synthesis,flootplan&placement
PPTX
Floor plan & Power Plan
PDF
Basic of AI Accelerator Design using Verilog HDL
PDF
Soc architecture and design
PPT
I2C Protocol
PPT
PDF
Sta by usha_mehta
Methods for Achieving RTL to Gate Power Consistency
UPF-Based Static Low-Power Verification in Complex Power Structure SoC Design...
Low-Power Design and Verification
Trends and challenges in IP based SOC design
fpga programming
VLSI testing and analysis
ASIC DESIGN FLOW
Inputs of physical design
System on Chip (SoC)
DRAM Cell - Working and Read and Write Operations
GUI for DRV fix in ICC2
Low Power VLSI Design Presentation_final
io and pad ring.pdf
Logic synthesis,flootplan&placement
Floor plan & Power Plan
Basic of AI Accelerator Design using Verilog HDL
Soc architecture and design
I2C Protocol
Sta by usha_mehta
Ad

Viewers also liked (17)

PDF
How to Identify and Prevent ESD Failures using PathFinder
PDF
Totem Technologies for Analog, Memory, Mixed-Signal Designs
PDF
Achieving Power Noise Reliability Sign-off for FinFET based Designs
PPTX
FD-SOI Harnessing the Power - DAC 2016 Austin Presentation
PDF
Local indoor positioning and communication system (RTLS)
PPT
ansys presentation
PPTX
Real Time Location Systems in Healthcare
PPT
Real Time Location System with ZigBee
PDF
The Current Reality of RTLS in Healthcare_CenTrak RTLS Webcast 2011
PDF
Real Time Location Systems (RTLS)
PPTX
Zigbee Wireless Sensor Network - RTLS and Automation
PPTX
Precision (Indoor) Real Time Location Systems
PPTX
Real Time Locating Systems (RTLS, RFID, Bluetooth, Wi-Fi, UWB, GPS, IR, NFER,...
PDF
Arcticom's Zebra Technologies RFID Solutions
PDF
3D Printing Technology Publication Wm Enos
PPTX
Principles of valid contract special principles of life insurance
PDF
11b rede inteligente
How to Identify and Prevent ESD Failures using PathFinder
Totem Technologies for Analog, Memory, Mixed-Signal Designs
Achieving Power Noise Reliability Sign-off for FinFET based Designs
FD-SOI Harnessing the Power - DAC 2016 Austin Presentation
Local indoor positioning and communication system (RTLS)
ansys presentation
Real Time Location Systems in Healthcare
Real Time Location System with ZigBee
The Current Reality of RTLS in Healthcare_CenTrak RTLS Webcast 2011
Real Time Location Systems (RTLS)
Zigbee Wireless Sensor Network - RTLS and Automation
Precision (Indoor) Real Time Location Systems
Real Time Locating Systems (RTLS, RFID, Bluetooth, Wi-Fi, UWB, GPS, IR, NFER,...
Arcticom's Zebra Technologies RFID Solutions
3D Printing Technology Publication Wm Enos
Principles of valid contract special principles of life insurance
11b rede inteligente
Ad

Similar to PowerArtist: RTL Design for Power Platform (20)

PPT
C:\fakepath\apache track d updated
PPT
Apache track d updated
PPT
5378086.ppt
PPTX
Low power in vlsi with upf basics part 1
PDF
lowpower consumption and details of dfferent power pdf
PDF
Guidelines for-early-power-analysis
PDF
A verilog based simulation methodology for estimating statistical test for th...
PDF
Shultz dallas q108
PDF
Schulz dallas q1_2008
PDF
Low Power Design and Verification
PDF
Low power design-ver_26_mar08
PDF
Implementation of Low Power Test Pattern Generator Using LFSR
PPT
Altera trcak g
PDF
Optimized Design of an Alu Block Using Power Gating Technique
PDF
P358387
PPT
L14-Embedded.ppt
PDF
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
PPT
Instruction level power analysis
PPTX
C:\fakepath\apache track d updated
Apache track d updated
5378086.ppt
Low power in vlsi with upf basics part 1
lowpower consumption and details of dfferent power pdf
Guidelines for-early-power-analysis
A verilog based simulation methodology for estimating statistical test for th...
Shultz dallas q108
Schulz dallas q1_2008
Low Power Design and Verification
Low power design-ver_26_mar08
Implementation of Low Power Test Pattern Generator Using LFSR
Altera trcak g
Optimized Design of an Alu Block Using Power Gating Technique
P358387
L14-Embedded.ppt
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
Instruction level power analysis

More from Ansys (20)

PDF
Reliability Engineering Services Overview
PDF
Accelerating Innovation Through HPC-Enabled Simulations
PDF
Mechanical Simulations for Electronic Products
PDF
Accelerating Innovation Through HPC-Enabled Simulations
PDF
Automotive Sensor Simulation
PDF
Volvo Trucks GPS Antenna Placement
PDF
Molex Automotive Connector Simulation Using Ansys
PDF
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
PDF
Benefits of Intel Technologies for Engineering Simulation
PDF
6 Myths of High-Performance Computing
PPTX
ANSYS USERMAT for Prediction of Bone Failure
PPTX
ANSYS Corporate Overview
PDF
Thermal Reliability for FinFET based Designs
PDF
What's New in ANSYS RedHawk 2014
PDF
Full DDR Bank Power and Signal Integrity Analysis with Chip-Package-System Co...
PDF
ANSYS RedHawk-CPA: New Paradigm for Faster Chip-Package Convergence
PDF
Modeling a Magnetic Stirrer Coupling for the Dispersion of Particulate Materials
PDF
Solving 3-D Printing Design Problems with ANSYS CFD for UAV Project
PDF
ANSYS Performance on Xeon E5-2600 v3
PDF
Advances in Accelerator-based CFD Simulation
Reliability Engineering Services Overview
Accelerating Innovation Through HPC-Enabled Simulations
Mechanical Simulations for Electronic Products
Accelerating Innovation Through HPC-Enabled Simulations
Automotive Sensor Simulation
Volvo Trucks GPS Antenna Placement
Molex Automotive Connector Simulation Using Ansys
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
Benefits of Intel Technologies for Engineering Simulation
6 Myths of High-Performance Computing
ANSYS USERMAT for Prediction of Bone Failure
ANSYS Corporate Overview
Thermal Reliability for FinFET based Designs
What's New in ANSYS RedHawk 2014
Full DDR Bank Power and Signal Integrity Analysis with Chip-Package-System Co...
ANSYS RedHawk-CPA: New Paradigm for Faster Chip-Package Convergence
Modeling a Magnetic Stirrer Coupling for the Dispersion of Particulate Materials
Solving 3-D Printing Design Problems with ANSYS CFD for UAV Project
ANSYS Performance on Xeon E5-2600 v3
Advances in Accelerator-based CFD Simulation

Recently uploaded (20)

DOCX
573137875-Attendance-Management-System-original
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT
Project quality management in manufacturing
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Safety Seminar civil to be ensured for safe working.
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Well-logging-methods_new................
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
573137875-Attendance-Management-System-original
Automation-in-Manufacturing-Chapter-Introduction.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Internet of Things (IOT) - A guide to understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
737-MAX_SRG.pdf student reference guides
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Project quality management in manufacturing
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Safety Seminar civil to be ensured for safe working.
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
R24 SURVEYING LAB MANUAL for civil enggi
Well-logging-methods_new................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

PowerArtist: RTL Design for Power Platform

  • 1. 6/23/2014 © 2014 ANSYS, Inc. 1 PowerArtist™: RTL Design-for-Power Design Automation Conference 2014
  • 2. 6/23/2014 © 2014 ANSYS, Inc. 2 Early Power Decisions  High Impact Power Reduction 100% 50% 0% Large Impact Small Impact RTL Design Logic Synthesis Physical Design Timing Closure • Power-Performance-Area Trade-offs • Voltage / Power Domain Planning • Block-level Clock and Data Gating • Eliminate Redundant Activity • Power Switch Sizing / Placement • Clock Gater Cloning / Decloning • Multi-Vt Optimization • Power Integrity Verification RTL Design-for-Power Low Power Implementation
  • 3. 6/23/2014 © 2014 ANSYS, Inc. 3 RTL Power ↔ Gate-level Power Design Specification RTL Design Gate-Level Design Layout ~20 hours ~22 mins Quicker Design Iterations Effective Design-for-Power Gate-level Power + Adder Register Mux RTL Power Power-per-Function Power-per-Gate
  • 4. 6/23/2014 © 2014 ANSYS, Inc. 4 PowerArtist: RTL Design-for-Power Platform RTL Power Analysis • Average, time-based • Power-critical vector selection • Regressions via TCL interface RTL Power Reduction • Clock, memory, logic • Analysis-driven automation • Interactive power debug RTL Links with Physical • PACE™: RTL power accuracy • RPM™: RTL-driven physical power integrity Physical Power RTL Power PACE RPM
  • 5. 6/23/2014 © 2014 ANSYS, Inc. 5 RTL Power: Ins and Outs Vdd1 Power domains (UPF / CPF) Vdd2 module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule RTL (VHDL, Verilog, System Verilog) RTL Power Analysis Capacitance model (WLM / PACE) mu x and register register Activity (FSDB / VCD / SAIF) Clock tree, gating (SDC, PACE, user input) clk Power models (Liberty .lib) mux
  • 6. 6/23/2014 © 2014 ANSYS, Inc. 6 Low Power RTL Design Methodology Peak Power = 391mW Check power vs. budget TRANSMIT MODE RECEIVE MODE Residual receive activity in transmit mode Profile power vectors RTL Power Regression Flow Reduce power automatically Monitor power vs. budget Enabled Clock Inactive Data Debug power hotspots Average power = 239mW Perform design trade-offs 0.00E+00 1.00E-02 2.00E-02 3.00E-02 4.00E-02 5.00E-02 6.00E-02 Power (W) Version 2 (Typ) Version 1 (Typ) Version 2 (Idle) Version 1 (Idle) Version 1 Version 2
  • 7. 6/23/2014 © 2014 ANSYS, Inc. 7 RTL vs. Gates: Accuracy and Performance Nvidia Case Study RTL Power Accuracy: ~15% RTL Power: ~30X faster
  • 8. 6/23/2014 © 2014 ANSYS, Inc. 9 RTL Capacity: Large Designs / FSDBs Samsung Case Study FSDB captures only power-critical signals identified by PowerArtist • FSDB size: 1/4 • TAT: 4X faster • Loss of accuracy: 2%
  • 9. 6/23/2014 © 2014 ANSYS, Inc. 10 RTL Power Analysis
  • 10. 6/23/2014 © 2014 ANSYS, Inc. 11 PowerArtist RTL Power Analysis • Total Logic / Clock Activity per Hierarchical Instance • Qualify Coverage per Power Mode • Identify Power Bugs • Understand Power: Where? Why? • Per Hierarchy, Category, Mode, Clock / Voltage Domains • Qualify Power Efficiency with Multiple Metrics Activity Analysis Average Power Analysis • Power Waveforms per Hierarchical Instance • Waveforms per Category: Clock, Memory, Logic • Identify Peak Power and Time Time-based Power Analysis
  • 11. 6/23/2014 © 2014 ANSYS, Inc. 12 Clock Gating Efficiency Temporal and Structural Metrics Example • 16 of 20 bits are gated • 5 of 10 cycles are gated • 2 of 5 enabled cycles had data toggles gclk clk en data SCGE DCGE CGEE Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk) Value 80% 50% 40%
  • 12. 6/23/2014 © 2014 ANSYS, Inc. 13 Clock Gating Efficiency Temporal and Structural Metrics 100% Static CGE 0% Dynamic CGE CGEE, Power Impact CGE: Static, Dynamic Flop: Power, Activity
  • 13. 6/23/2014 © 2014 ANSYS, Inc. 14 RTL Power Reduction
  • 14. 6/23/2014 © 2014 ANSYS, Inc. 15 PowerArtist RTL Power Reduction Original RTL Low-Power RTL openPDB powerartist.pdb set RPT [open $output_file "w"] set ungated_registers [getRegisters -cg none] foreach I $ungated_registers { set dyn_power [getPropVal $i Dynamic_Power "inst"] set bit_width [getInstWidth $reg] set file [getPropVal $iFile_Name "inst"] set line_num [getPropVal $i Line_Number "inst"] } 1. Interactive Power Debug 2. Automated Power Reduction 3. Customizable Power Reports • Block-level Power “Bugs” • Large Power Savings • Instance-level Power Reduction • 15 Analysis-driven Techniques • TCL Queries to OADB • Automation Beyond PowerArtist Reports
  • 15. 6/23/2014 © 2014 ANSYS, Inc. 16 Debug Power: Visualize-Analyze-Reduce Inactive Data, Active Clock Identify Block-level Clock Gating Enable
  • 16. 6/23/2014 © 2014 ANSYS, Inc. 17 Block-Level Power Reduction Clock Active, Data Inactive Clock Inactive, Data Active Block-level Clock Gating Block-level Data Gating Block-level Activity Analysis: Clock and Data Ports 1.1 Clock Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Cycles Cycles Name Name Name ------------------------------------------------------- 200 201 CLKA read top.core1.t1.dpmem.m1 ------------------------------------------------------- 1.2 Input and Redundant Pins ------------------------------------------------------- Redundant Total Pin Mode Instance Toggles Toggles Name Name Name ------------------------------------------------------- 1 1 AB[8] read top.core1.t1.dpmem.m1 ------------------------------------------------------- Wasted Activity per Mode Clock Activity per Hierarchy Constant high activity Missed clock gating? Redundant activity in read mode
  • 17. 6/23/2014 © 2014 ANSYS, Inc. 18 Instance-Level Power Reduction • Clock gating coverage • Clock gating efficiency • Sequential and combinational • Redundant activity • Don’t care conditions • Datapath operand isolation • Redundant read/write • Splitting memories • Exercising sleep modes Clock / Clock Gating Control Logic and Datapath Memory Subsystem
  • 18. 6/23/2014 © 2014 ANSYS, Inc. 19 Analysis-Driven RTL Power Reduction Wasted activity/power when sel is 0
  • 19. 6/23/2014 © 2014 ANSYS, Inc. 20 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis
  • 20. 6/23/2014 © 2014 ANSYS, Inc. 21 Analysis-Driven RTL Power Reduction Pre-compute based new clock gate enables Multi-cycle ODC sequential analysis 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291 Predicted Power Savings (normalized) # RTL Changes (Design Effort) Top 5 RTL changes  50% identified power savings Maximize Power Savings Minimize Design Impact • Clock, Memory, Logic • Sequential, Combinational • Vector-based, Vectorless • Hierarchical, SoC capacity 15 Power Reduction Techniques
  • 21. 6/23/2014 © 2014 ANSYS, Inc. 22 Power Reduction Case Studies …. . 1 0 A B scan_enable = 0 scan_clock data_in M_OUT Write Write Read MUX Reduction Technique: • Scan clocks toggling in functional mode • Redundant data activity in registers wasting power Redundant Data Toggles GMC Technique: • Redundant data toggles in read mode • Cycle-based analysis reports % Redundant Cycles
  • 22. 6/23/2014 © 2014 ANSYS, Inc. 23 Power Database Access with TCL API Power Database (OpenAccess) Design Queries • getMemories/Flops/Combs • getFanout • getModulePorts • reportDesignStats Report Creation • reportCGEfficiency • diffPdbPower • reportPower • reportReductions Power Queries • getPropVal instance/net • getClockPower • getNetPower • getClockEnableExpr Design Navigation • dls • dpwd, dcd • dpushd, dpopd • show Customize and Automate Power Reduction, Reports, Regressions • Quick access to power and design properties • Accomplish custom tasks with few lines of TCL
  • 23. 6/23/2014 © 2014 ANSYS, Inc. 24 Custom Power Reports 50% Idle Power Reduction in Mobile SoC Instance Name Enable Efficiency Clock Power Clock En Net or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1 en_blk clk data gclk_blk Inefficient enables waste power en_blk clk gclk_blk Block Clock Gate en_reg Register Clock Gate gclk_reg Block-level clock gates control significant power Power Efficiency = 0 Single clock gate controls >5mW PowerArtist clock gating report  identifies inefficient clock gates
  • 24. 6/23/2014 © 2014 ANSYS, Inc. 25 RTL Power Regressions • 30+ blocks per typical SoC • 2+ vectors per block • Vectors written for power: idle, active • Daily block-level, weekly chip-level regressions monitor power changes • Power metrics track power efficiency • PowerArtist identifies where power changed RTL (Verilog, SV, VHDL) Testbench Simulator FSDB RTL Power Analysis, Reduction, Regression
  • 25. 6/23/2014 © 2014 ANSYS, Inc. 26 RTL Links with Physical Design
  • 26. 6/23/2014 © 2014 ANSYS, Inc. 27 PACE™: Physical-Aware RTL Power Budgeting module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Clock Distribution • Parasitics • Multiple Vt • Low-power Structures • Optimization PACE Models (Cap, Clock) Post-Layout Gate-level Power RTL Power PACE PACE Bridges the RTL vs. Layout Gap  Predictable RTL Power Accuracy
  • 27. 6/23/2014 © 2014 ANSYS, Inc. 28 RTL PACE vs. Gate-Power: Mobile SoC @14nm RTL-PACE Power within 20% Total Power Correlation Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation Gate-SPEF vs. RTL-PACE RTL-PACE Clock Power within 20%
  • 28. 6/23/2014 © 2014 ANSYS, Inc. 29 RTL Power-Driven Power Integrity module PA ( ... always @ (posedge clk) begin dout <= din1; end assign out = sel ? dout : din2; ... endmodule • Shrinking geometries  Increasing di/dt • Gate vectors too late • Layout late for changes • Error-prone guesstimates RTL Power RPM Enables PDN Planning  Early, Optimal, Robust RTL Power Model RPM Physical Power Integrity
  • 29. 6/23/2014 © 2014 ANSYS, Inc. 30 RPM Case Studies RPM CPM(Layout)+Pkg CPM(RPM)+Pkg Pkg only RPM Gate FSDB Vectorless Peak = 6X Average Power Di/dt event not at the same time as the peak Peak and di/dt Cycle Selection on a GPU Core Frame: DIDT Start time: 0.0817704 Finish time: 0.0817706 Average leakage for supply VDD: 0.00257393 Average power for supply VDD: 0.185336 Peak power for supply VDD: 0.219776 Frame: CYCLE_POWER Start time: 0.0806005 Finish time: 0.0806007 Average leakage for supply VDD: 0.002569 Average power for supply VDD: 0.250168 Peak power for supply VDD: 0.266678 Early Voltage Drop Analysis Early Package Resonance Analysis
  • 30. 6/23/2014 © 2014 ANSYS, Inc. 32 Related Presentations @ DAC2014 • Power Analysis Using PowerArtist for WaveLogic3 ASIC – 100Gbs Coherent Metro Optical Modem • Achieving RTL Power Efficiency and Automated Power Reduction • Methods for Achieving RTL to Gate Power Consistency