SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1021
Proposing a RTD-based block for on-chip GPU caches to reduce static
power leaks
Richa Sharma
Netaji Subhas University of Technology, New Delhi, 110078
----------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - GPUs dominate the industry in the parallel processing. The architecture of GPUs supports parallel algorithms and
process images at much higher rates than CPUs. Each generation of GPUs increases the use of on-chip caches and number of
processor cores embedded on-chip. Also, with each generation CMOS technology gets significantly worse in its leakage energy
consumption. This paper addresses the issues such energy leaks can cause in the efficiency of GPU computing. Itfurtherdelvesinto
Cache Power Management for GPUs. In this paper, the integrationofTunnelingdiodetechnology insidetheembedded architecture
of GPUs is proposed with a RTD switching block. Furthermore, the benefits of proposed integrations are discussed.
Key Words: GPU Caches, Cache Power management, StaticLeaks,ResonantTunnelingDiodes,NegativeDifferential Resistance,
RTD switching block
1. INTRODUCTION
Graphical Processing Units are used to process imagessincetheirinherentparallel structurebooststheperformanceofparallel
processing algorithms. In recent years, GPUs are being utilized for many general purpose computing applications of cloud
computing, machine learning, and other cost-effective High-Performance Computing(HPC) clusters.[1] In this era of green
computing, there has been a shift of focus in producing more high performance- energy efficient results than just achieving
highest peak performance. Achieving energy efficiency has become important in the design of all range of processors, such as
battery-driven portable devices, desktop or server processors to supercomputers.[2] The efforts to achieve this dual goal of
performance and energy efficiency, researchers have suggested various architectural modifications across different
components of the processor, such as processor core, caches, DRAM (dynamic random access memory) etc. Techniques like
power grating, DTCMOS have been applied to reduce static power leaks and cache-decay, drowsy cache has been employedto
reduce dynamic power leaks.[3] However, researchersfacethechallengeofscalingthe devices withCMOStechnologysincethe
leakage currents increase significantly with the decrease in the thickness of the gate oxide. The increase of subthreshold
currents is explained by thermodynamics, more specifically by Boltzmanndistribution.[4]Theproblemofpowerconsumption
with CMOS technology gets worse with each generation of processors. Theestimates ofInternational Technology Roadmapfor
Semiconductors (ITRS) indicates that leakage power consumption could become a major threat for the survival of CMOS
technology.[5] The competition in the development of GPUs and harnessing its computing power has driven companies to
increase the number of processing cores on the processors and their number will continue to rise. Furthermore, to bridge the
gap between the speed of the processor and main memory, thecachesoflargersizesarebeingembeddedonthechip.[6]Tables
1 and 2 list the cache memory requirements of current designs for different processors and fraction of energy consumed in
cache power out of total power consumption. In this paper, I focus on static energy leaksofGPUsthatiscontributedbyexisting
CMOS technology and propose to replace it by Tunneling Diode technology. Furthermore, I try to elaborate on the benefits of
this upgrade in cache power management of GPUs. GPUs are yet to realize their full computing potential. Since energy
management is one of the biggest challenges to overcome in the processor industry, our solution provides a new approach to
this problem of excessive energy consumption in processor chips.
Table-1: On-chip cache memory size in the latest generations Processors.
Processor Type Cache Memory Used
Desktop (CPU, GPU) 8 MB
Server Cores 24-32 MB
Mobile Processors 1 MB
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1022
Table-2: Power consumption of on-chip caches.
2. CACHE POWER MANAGEMENT IN GPUs
The power consumption of CMOS circuits is mainly classified in two parts, namely dynamic power (also called active power)
and leakage power (also called static power).
The mathematical modelling equations for both dynamic and leakage Power are given as the following
Here, dynamic and leakage power can also be written as active and static power respectively. Dynamic Power is dependent on
activity factor(number of cache accesses or number of bits accessed per cache access), effective capacitance, frequency of the
operation and voltage supplied. Static Power is dependent upon supply voltage and leakage currents, thus its consumption can
be reduced by decreasing either of these factors. Modern processors incorporate multi-level cache hierarchy with L1, L2, L3
caches. L1 is classified as First Level Caches (FLCs) and L2, L3 etc. are Lower Level Caches(LLCs).
First Level caches have higher dynamic power consumption rates and Lower Level Caches have higher leakage power
consumption rates. Since the design of FLCs and LLCs incorporates latency of caches differently in each of them. Techniques
developed to improve energy efficiency with Cache includesdevelopingahigh-impedancepath,sothatitcanturn“off”thecache
line when they hold data not likely to be reused. The period of non-usage of caches is referred to as “dead time”. The technique
used to predict “dead time” is cache decay which incorporates a counter to keep the count of pre-set cycles since its last usage.
[7]
3. RESONANT TUNNELING DIODE AND INTEGRATION OF RTD SWTICHING BLOCK IN GPU
3.1 Resonant Tunneling Diode(RTD)
Resonant Tunneling Diode(RTD), a variant of Tunneling Diode, invented by Esaki in 1957, is one of the most promising
quantum effect devices that are operational at room temperature. These diodes produce enhanced quantum tunneling effect,
which in turn produces very high-speed currents, which can be fine-tuned. Resonant Tunneling Diodes similar to CMOS
transistor turns “on”, conducts a current under a specific external bias voltage range.
The inherent difference between these two devices is that current in RTDs tunnelsthroughquasi-boundstateswithina double
barrier structure, instead of going through a channel betweendrainandsource.InotherTunnelingDiodeslikeEsakiTunneling
Diodes current tunnels through depletion region. Shown in the Fig. 1 is the IV characteristic curve of RTDs. The figure
represents the dynamic voltage vs current behavior.
The peak current occurs at the resonance at a specific voltage, then the device exhibits Negative Differential Resistance(NDR)
as the current continues to decrease with the increasing voltage and as it reaches a minimum “valley” current at a specific
voltage it starts to rise again. This minimum “valley” current canbeclassifiedasleakagecurrentis significantlylessthaninTDs.
The bistability of RTDs gives them an advantage over TDs, since TD’sproduceveryhigh leakagecurrentswhenthereverse bias
voltage is applied.
Processor Name Percentage of Total Power Consumption
Alpha 21264 16%
StrongARM 30%
Niagra,Niagra-2 L-2 Caches 24%
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1023
Figure-1: I-V characteristic of RTDs. Leakage currents are kept to a minimum due to the resonance and
bistability of the device, therefore providing advantage over TDs.
3.2 Applications of Negative Differential Resistance(NDR) in High-Switching Blocks
A huge advantage of RTDs is the ease of their integration with other technologies like complementary metal-oxide-
semiconductor (CMOS) and high electron mobility transistors (HEMTs). We can utilize the Negative Differential Resistanceof
the RTDs and create a switching block that will switch off and on at the application of specific voltages. This switching block
gives two operational points to switch the device on and off as shown in Figure. 2
These RTDs, resistor and CMOS blocks can provide high speed switching method for cache lines. [8] This provides smooth
integration that can work in tandem with cache decay technique. The counter will indicate when the cache line should be
deactivated, sending the high voltage signal to RTD, RTD will enter NDR region and turn “off” Cache line, hence implementing
cache decay. But due to this method, we could easily reactivate and switch back to low-impedance path way for cache line.
Figure-2: NDR property of RTD allows two stable operation points for switching.
The advantage of using NDR to turn on Cache lines lies in its low static power consumption since the leakage currents in RTDs
are minimal and the switching speeds achieved are in picoseconds.
3.2 Execution of Cache Decay by utilizing Negative Differential Resistance(NDR)
Cache line decay technique as proposed by Kaxiras Et al. [7] utilizes a pre-set counter to count a number of inactive cycles of
cache line and cuts-off that particular line thus dramatically saving leakage power. In thisimplementationcut-offsignal will be
sent to switching block a cycle earlier than proposed in Kraxis model.[7] Kraxis model utilizes gated Vdd CMOS transistor as a
switch as proposed by Powell Et al.,2000.[8]
Switching block of RTD will provide a stable point to cut-off power supply to cache line. Implementation of the block as
demonstrated in Figure 3 will provide a switch that will turn off at a certain input voltageandwill resumeoperation,according
to output Voltage.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1024
Negative Differential Resistance property of RTD will execute in a manner of capacitance, and switching speed of block is
dependent upon Load voltage, shown in Figure 3 as Vout. Thus, it provides us with a probe to control switching speed of RTD.
Figure 3: (a) RTD- switching Block with CMOS transistor and Vout as a probe to control switching speed. (b)
Demonstration of RTD-block as capacitance and current source
4. CONCLUSION
The integration of RTD based structure in SRAMs will be beneficial on many accounts. The proposed utilization of NDR for
cache decay technique to reduce power will speed up the operation and therefore not compromise the performance of GPUs.
The leakage current can be maintained at very low levels and hence reducing leakage power substantially. Hence, I have
proposed the integration of RTDs with CMOS to improve the power efficiency of caches in GPUs and discussed the benefits of
utilizing Negative Differential Resistance(NDR) in reducing total power consumption of GPUs.
REFERENCES
[1] OLCF, http://www.olcf.ornl.gov/titan/, 2012
[2] S. Murugesan, “Harnessing green IT: Principles and practices,” IT professional, vol. 10, no. 1, pp. 24–33, 2008
[3] D. J. Frank, "The Limits of CMOS Scaling from a Power-ConstrainedTechnologyOptimizationPerspective," Nanohub, 4Oct
06.
[4] “International technology roadmap for semiconductors
(ITRS),”http://www.itrs.net/Links/2011ITRS/2011Chapters/2011ExecSum.pdf, 2011.
[5] Sparsh Mittal, “A Survey of Architectural Techniques For Improving Cache Power Efficiency”.
[6] Stefanos Kaxiras, ”Cache-line Decay: A Mechanism to Reduce Cache Leakage Power”
[7] J. L. Huber, J. Chen,” An RTD/Transistor Switching Block and Its Possible Application in Binary and Ternary Adders”
[8] Powell, M. D. et al. 2000. Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep Submicron Cache
Memories. In Proc. of the Int'l Symp. On Low Power Electronics and Design'00 (2000).
BIOGRAPHY
Richa Sharma received the Bachelors of
Engineering from Netaji SubhasUniversity
of Technology, New Delhi formerly Netaji
Subhas Institute of Technology. Richa is
working on technology intensive areas of
physics research and creating innovation
in educational methods of these subjects.
Currently she works with Society of
AppliedMicrowaveElectronicsEngineering
& Research.

More Related Content

PDF
Ju3417721777
PDF
CMOS LOW POWER CELL LIBRARY FOR DIGITAL DESIGN
PDF
PDF
Implementation and analysis of power reduction in 2 to 4 decoder design using...
PDF
Design and Analysis of Sequential Elements for Low Power Clocking System with...
PDF
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...
PDF
An Approach for Low Power CMOS Design
PDF
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
Ju3417721777
CMOS LOW POWER CELL LIBRARY FOR DIGITAL DESIGN
Implementation and analysis of power reduction in 2 to 4 decoder design using...
Design and Analysis of Sequential Elements for Low Power Clocking System with...
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...
An Approach for Low Power CMOS Design
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...

What's hot (19)

PPT
Lecture20 asic back_end_design
PDF
Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register
PDF
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
PPTX
ECE561_finalProject
PDF
Dual technique of reconfiguration and capacitor placement for distribution sy...
PDF
Design of Continuous Time Multibit Sigma Delta ADC for Next Generation Wirele...
PDF
Energy efficient and high speed domino logic circuits
PDF
Hx3313651367
PDF
Low power embedded system design
PDF
Distribution Automation_2012
PDF
Modified digital space vector pulse width modulation realization on low-cost ...
PDF
Low Power-Area Design of Full Adder Using Self Resetting Logic with GDI Techn...
PDF
IRJET- Single Switched Capacitor High Gain Boost Quasi-Z Source Converter
PDF
IRJET- Study Over Current Relay (MCGG53) Response using Matlab Model
PDF
A Survey on Low Power VLSI Designs
PDF
A Simulation Based Analysis of Lowering Dynamic Power in a CMOS Inverter
PDF
Hv2514131415
PDF
Fixed Width Replica Redundancy Block Multiplier
PDF
DUAL EDGE-TRIGGERED D-TYPE FLIP-FLOP WITH LOW POWER CONSUMPTION
Lecture20 asic back_end_design
Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
ECE561_finalProject
Dual technique of reconfiguration and capacitor placement for distribution sy...
Design of Continuous Time Multibit Sigma Delta ADC for Next Generation Wirele...
Energy efficient and high speed domino logic circuits
Hx3313651367
Low power embedded system design
Distribution Automation_2012
Modified digital space vector pulse width modulation realization on low-cost ...
Low Power-Area Design of Full Adder Using Self Resetting Logic with GDI Techn...
IRJET- Single Switched Capacitor High Gain Boost Quasi-Z Source Converter
IRJET- Study Over Current Relay (MCGG53) Response using Matlab Model
A Survey on Low Power VLSI Designs
A Simulation Based Analysis of Lowering Dynamic Power in a CMOS Inverter
Hv2514131415
Fixed Width Replica Redundancy Block Multiplier
DUAL EDGE-TRIGGERED D-TYPE FLIP-FLOP WITH LOW POWER CONSUMPTION
Ad

Similar to IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Power Leaks (20)

PDF
Analysis of Power Dissipation & Low Power VLSI Chip Design
PDF
Analysis Of Power Dissipation Amp Low Power VLSI Chip Design
PDF
IRJET- Reduction of Dark Silicon through Efficient Power Reduction Designing ...
PDF
Analysis Of Optimization Techniques For Low Power VLSI Design
PDF
Low Power Design of Standard Digital Gate Design Using Novel Sleep Transisto...
PDF
IRJET - Low Power Design for Fast Full Adder
PDF
Optimized Design of an Alu Block Using Power Gating Technique
PDF
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
PDF
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
PDF
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
PDF
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
PDF
IRJET- Reduction of Power, Leakage and Area of a Standard Cell Asics using Th...
PDF
FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...
PDF
A verilog based simulation methodology for estimating statistical test for th...
PDF
Adiabatic technique based low power synchronous counter design
PDF
IRJET- Fin FET Two Bit Comparator for Low Voltage, Low Power, High Speed and ...
PDF
Different Leakage Power Reduction Techniques in SRAM Circuits : A State-of-th...
PDF
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
PPTX
High speed low power viterbi decoder design for TCM decoders
PDF
Optimal Body Biasing Technique for CMOS Tapered Buffer
Analysis of Power Dissipation & Low Power VLSI Chip Design
Analysis Of Power Dissipation Amp Low Power VLSI Chip Design
IRJET- Reduction of Dark Silicon through Efficient Power Reduction Designing ...
Analysis Of Optimization Techniques For Low Power VLSI Design
Low Power Design of Standard Digital Gate Design Using Novel Sleep Transisto...
IRJET - Low Power Design for Fast Full Adder
Optimized Design of an Alu Block Using Power Gating Technique
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
SURVEY ON POWER OPTIMIZATION TECHNIQUES FOR LOW POWER VLSI CIRCUIT IN DEEP SU...
IRJET- Reduction of Power, Leakage and Area of a Standard Cell Asics using Th...
FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...
A verilog based simulation methodology for estimating statistical test for th...
Adiabatic technique based low power synchronous counter design
IRJET- Fin FET Two Bit Comparator for Low Voltage, Low Power, High Speed and ...
Different Leakage Power Reduction Techniques in SRAM Circuits : A State-of-th...
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
High speed low power viterbi decoder design for TCM decoders
Optimal Body Biasing Technique for CMOS Tapered Buffer
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
composite construction of structures.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Well-logging-methods_new................
PPTX
Construction Project Organization Group 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PDF
PPT on Performance Review to get promotions
PPTX
Sustainable Sites - Green Building Construction
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Geodesy 1.pptx...............................................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
composite construction of structures.pdf
Lecture Notes Electrical Wiring System Components
Embodied AI: Ushering in the Next Era of Intelligent Systems
UNIT-1 - COAL BASED THERMAL POWER PLANTS
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CH1 Production IntroductoryConcepts.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Well-logging-methods_new................
Construction Project Organization Group 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
web development for engineering and engineering
Digital Logic Computer Design lecture notes
PPT on Performance Review to get promotions
Sustainable Sites - Green Building Construction
Foundation to blockchain - A guide to Blockchain Tech
Geodesy 1.pptx...............................................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Power Leaks

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1021 Proposing a RTD-based block for on-chip GPU caches to reduce static power leaks Richa Sharma Netaji Subhas University of Technology, New Delhi, 110078 ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - GPUs dominate the industry in the parallel processing. The architecture of GPUs supports parallel algorithms and process images at much higher rates than CPUs. Each generation of GPUs increases the use of on-chip caches and number of processor cores embedded on-chip. Also, with each generation CMOS technology gets significantly worse in its leakage energy consumption. This paper addresses the issues such energy leaks can cause in the efficiency of GPU computing. Itfurtherdelvesinto Cache Power Management for GPUs. In this paper, the integrationofTunnelingdiodetechnology insidetheembedded architecture of GPUs is proposed with a RTD switching block. Furthermore, the benefits of proposed integrations are discussed. Key Words: GPU Caches, Cache Power management, StaticLeaks,ResonantTunnelingDiodes,NegativeDifferential Resistance, RTD switching block 1. INTRODUCTION Graphical Processing Units are used to process imagessincetheirinherentparallel structurebooststheperformanceofparallel processing algorithms. In recent years, GPUs are being utilized for many general purpose computing applications of cloud computing, machine learning, and other cost-effective High-Performance Computing(HPC) clusters.[1] In this era of green computing, there has been a shift of focus in producing more high performance- energy efficient results than just achieving highest peak performance. Achieving energy efficiency has become important in the design of all range of processors, such as battery-driven portable devices, desktop or server processors to supercomputers.[2] The efforts to achieve this dual goal of performance and energy efficiency, researchers have suggested various architectural modifications across different components of the processor, such as processor core, caches, DRAM (dynamic random access memory) etc. Techniques like power grating, DTCMOS have been applied to reduce static power leaks and cache-decay, drowsy cache has been employedto reduce dynamic power leaks.[3] However, researchersfacethechallengeofscalingthe devices withCMOStechnologysincethe leakage currents increase significantly with the decrease in the thickness of the gate oxide. The increase of subthreshold currents is explained by thermodynamics, more specifically by Boltzmanndistribution.[4]Theproblemofpowerconsumption with CMOS technology gets worse with each generation of processors. Theestimates ofInternational Technology Roadmapfor Semiconductors (ITRS) indicates that leakage power consumption could become a major threat for the survival of CMOS technology.[5] The competition in the development of GPUs and harnessing its computing power has driven companies to increase the number of processing cores on the processors and their number will continue to rise. Furthermore, to bridge the gap between the speed of the processor and main memory, thecachesoflargersizesarebeingembeddedonthechip.[6]Tables 1 and 2 list the cache memory requirements of current designs for different processors and fraction of energy consumed in cache power out of total power consumption. In this paper, I focus on static energy leaksofGPUsthatiscontributedbyexisting CMOS technology and propose to replace it by Tunneling Diode technology. Furthermore, I try to elaborate on the benefits of this upgrade in cache power management of GPUs. GPUs are yet to realize their full computing potential. Since energy management is one of the biggest challenges to overcome in the processor industry, our solution provides a new approach to this problem of excessive energy consumption in processor chips. Table-1: On-chip cache memory size in the latest generations Processors. Processor Type Cache Memory Used Desktop (CPU, GPU) 8 MB Server Cores 24-32 MB Mobile Processors 1 MB
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1022 Table-2: Power consumption of on-chip caches. 2. CACHE POWER MANAGEMENT IN GPUs The power consumption of CMOS circuits is mainly classified in two parts, namely dynamic power (also called active power) and leakage power (also called static power). The mathematical modelling equations for both dynamic and leakage Power are given as the following Here, dynamic and leakage power can also be written as active and static power respectively. Dynamic Power is dependent on activity factor(number of cache accesses or number of bits accessed per cache access), effective capacitance, frequency of the operation and voltage supplied. Static Power is dependent upon supply voltage and leakage currents, thus its consumption can be reduced by decreasing either of these factors. Modern processors incorporate multi-level cache hierarchy with L1, L2, L3 caches. L1 is classified as First Level Caches (FLCs) and L2, L3 etc. are Lower Level Caches(LLCs). First Level caches have higher dynamic power consumption rates and Lower Level Caches have higher leakage power consumption rates. Since the design of FLCs and LLCs incorporates latency of caches differently in each of them. Techniques developed to improve energy efficiency with Cache includesdevelopingahigh-impedancepath,sothatitcanturn“off”thecache line when they hold data not likely to be reused. The period of non-usage of caches is referred to as “dead time”. The technique used to predict “dead time” is cache decay which incorporates a counter to keep the count of pre-set cycles since its last usage. [7] 3. RESONANT TUNNELING DIODE AND INTEGRATION OF RTD SWTICHING BLOCK IN GPU 3.1 Resonant Tunneling Diode(RTD) Resonant Tunneling Diode(RTD), a variant of Tunneling Diode, invented by Esaki in 1957, is one of the most promising quantum effect devices that are operational at room temperature. These diodes produce enhanced quantum tunneling effect, which in turn produces very high-speed currents, which can be fine-tuned. Resonant Tunneling Diodes similar to CMOS transistor turns “on”, conducts a current under a specific external bias voltage range. The inherent difference between these two devices is that current in RTDs tunnelsthroughquasi-boundstateswithina double barrier structure, instead of going through a channel betweendrainandsource.InotherTunnelingDiodeslikeEsakiTunneling Diodes current tunnels through depletion region. Shown in the Fig. 1 is the IV characteristic curve of RTDs. The figure represents the dynamic voltage vs current behavior. The peak current occurs at the resonance at a specific voltage, then the device exhibits Negative Differential Resistance(NDR) as the current continues to decrease with the increasing voltage and as it reaches a minimum “valley” current at a specific voltage it starts to rise again. This minimum “valley” current canbeclassifiedasleakagecurrentis significantlylessthaninTDs. The bistability of RTDs gives them an advantage over TDs, since TD’sproduceveryhigh leakagecurrentswhenthereverse bias voltage is applied. Processor Name Percentage of Total Power Consumption Alpha 21264 16% StrongARM 30% Niagra,Niagra-2 L-2 Caches 24%
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1023 Figure-1: I-V characteristic of RTDs. Leakage currents are kept to a minimum due to the resonance and bistability of the device, therefore providing advantage over TDs. 3.2 Applications of Negative Differential Resistance(NDR) in High-Switching Blocks A huge advantage of RTDs is the ease of their integration with other technologies like complementary metal-oxide- semiconductor (CMOS) and high electron mobility transistors (HEMTs). We can utilize the Negative Differential Resistanceof the RTDs and create a switching block that will switch off and on at the application of specific voltages. This switching block gives two operational points to switch the device on and off as shown in Figure. 2 These RTDs, resistor and CMOS blocks can provide high speed switching method for cache lines. [8] This provides smooth integration that can work in tandem with cache decay technique. The counter will indicate when the cache line should be deactivated, sending the high voltage signal to RTD, RTD will enter NDR region and turn “off” Cache line, hence implementing cache decay. But due to this method, we could easily reactivate and switch back to low-impedance path way for cache line. Figure-2: NDR property of RTD allows two stable operation points for switching. The advantage of using NDR to turn on Cache lines lies in its low static power consumption since the leakage currents in RTDs are minimal and the switching speeds achieved are in picoseconds. 3.2 Execution of Cache Decay by utilizing Negative Differential Resistance(NDR) Cache line decay technique as proposed by Kaxiras Et al. [7] utilizes a pre-set counter to count a number of inactive cycles of cache line and cuts-off that particular line thus dramatically saving leakage power. In thisimplementationcut-offsignal will be sent to switching block a cycle earlier than proposed in Kraxis model.[7] Kraxis model utilizes gated Vdd CMOS transistor as a switch as proposed by Powell Et al.,2000.[8] Switching block of RTD will provide a stable point to cut-off power supply to cache line. Implementation of the block as demonstrated in Figure 3 will provide a switch that will turn off at a certain input voltageandwill resumeoperation,according to output Voltage.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1024 Negative Differential Resistance property of RTD will execute in a manner of capacitance, and switching speed of block is dependent upon Load voltage, shown in Figure 3 as Vout. Thus, it provides us with a probe to control switching speed of RTD. Figure 3: (a) RTD- switching Block with CMOS transistor and Vout as a probe to control switching speed. (b) Demonstration of RTD-block as capacitance and current source 4. CONCLUSION The integration of RTD based structure in SRAMs will be beneficial on many accounts. The proposed utilization of NDR for cache decay technique to reduce power will speed up the operation and therefore not compromise the performance of GPUs. The leakage current can be maintained at very low levels and hence reducing leakage power substantially. Hence, I have proposed the integration of RTDs with CMOS to improve the power efficiency of caches in GPUs and discussed the benefits of utilizing Negative Differential Resistance(NDR) in reducing total power consumption of GPUs. REFERENCES [1] OLCF, http://www.olcf.ornl.gov/titan/, 2012 [2] S. Murugesan, “Harnessing green IT: Principles and practices,” IT professional, vol. 10, no. 1, pp. 24–33, 2008 [3] D. J. Frank, "The Limits of CMOS Scaling from a Power-ConstrainedTechnologyOptimizationPerspective," Nanohub, 4Oct 06. [4] “International technology roadmap for semiconductors (ITRS),”http://www.itrs.net/Links/2011ITRS/2011Chapters/2011ExecSum.pdf, 2011. [5] Sparsh Mittal, “A Survey of Architectural Techniques For Improving Cache Power Efficiency”. [6] Stefanos Kaxiras, ”Cache-line Decay: A Mechanism to Reduce Cache Leakage Power” [7] J. L. Huber, J. Chen,” An RTD/Transistor Switching Block and Its Possible Application in Binary and Ternary Adders” [8] Powell, M. D. et al. 2000. Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep Submicron Cache Memories. In Proc. of the Int'l Symp. On Low Power Electronics and Design'00 (2000). BIOGRAPHY Richa Sharma received the Bachelors of Engineering from Netaji SubhasUniversity of Technology, New Delhi formerly Netaji Subhas Institute of Technology. Richa is working on technology intensive areas of physics research and creating innovation in educational methods of these subjects. Currently she works with Society of AppliedMicrowaveElectronicsEngineering & Research.