SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 259
CLOCK GATING OF STREAMING APPLICATIONS FOR POWER
MINIMIZATION ON FPGA’S
M. Venkata Raghudeep1, K. Anji Babu2
1Student, Dept of ECE, A.K.R.G College of Engineering and Technology, JNTU Kakinada, AP, India
2Assistant Professor, Dept of ECE, A.K.R.G College of Engineering and Technology, JNTU Kakinada, AP, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This project investigates the reduction of dynamic
power for streaming applications yielded by asynchronous
dataflow designs by using clock gating techniques. Streaming
applications constitute a very broad class of computing
algorithms in areas such as signal processing, digital media
coding, cryptography, video analytics, network routing,
packet processing, etc. This project introduces a set of
techniques that, considering the dynamic streaming behavior
of algorithms, can achieve power savings by selectively
switching off parts of the circuits when they are temporarily
inactive. The techniques being independent from the
semantic of the application can be applied to any application
and can be integrated into the synthesis stage of a high-
level dataflow design flow. Experimental results of at-size
applications synthesized on field-programmable gatearrays
platforms demonstrate power reductions achievable with
no loss in data throughput.
Key Words: Clock gating, Streaming applications, Power
saving, Field programmable gate arrays, Data flow.
1. INTRODUCTION
In the past, the major concernsoftheVLSIdesigner were
area, performance,costandreliability;powerconsiderations
were mostly of only secondary importance. In recent years,
however, this has begun to change and, increasingly, power
is being given comparable weight to area and speed. Several
factors have contributed to this trend. Perhaps the primary
driving factor has been the remarkable success and growth
of the class of personal computing devices (portable
desktops, audio- and video-based multimedia products)and
wirelesscommunicationssystemswhichdemandhigh-speed
computation and complex functionality with low power
consumption.
1.1 Streaming Applications
Many applications in embedded systems perform the
same or similar operations on regular sequences of data,
which can be categorized as streaming applications. For
example, on a smart-phone, one can easily find many
streaming applications that are essential to thefunctionality
of the system, like wireless communication, high-definition
video/audio codes and 3D graphics rendering. A streaming
application usually contains a set of independentprocessing
stages. The structure of a H.264 video codec, which is a
typical streaming application. The different stages or the
codec form a pipeline that is used to encode or decode
videos frames. A frame in a H.264 streamcan beanI-frame,a
P-frame or a B-frame.
The streaming processing structure in these applications
creates lots of optimization opportunities, such as pipelined
parallel execution of differentparts,andexploitingdata level
parallelism in certain stages. More importantly, streaming
applications are among the most performance demanding
applications in embedded systems.The rapidintroductionof
these applications becomes an important driving force for
the development of new embedded technologies. So this
thesis focuses on developing energy efficient architectures
and compilation techniques for streaming applications.
1.2 Clock Gating
Clock gating is a popular technique used in many
synchronous circuits for reducing dynamic power
dissipation. Clock gating saves power by adding more logic
to a circuit to prune the clock tree. Pruningtheclock disables
portions of the circuitry so that the flip-flops in them do not
have to switch states. Switching states consumes power.
When not being switched, the switchingpowerconsumption
goes to zero, and only leakage currents are incurred. Clock
gating is a methodology of turning off the clock for a
particular block when it is not needed and is used by most
SoC designs today as an effective technique to save dynamic
power.
The simplest and most common form of clock gating is
when a logical “AND” function is used to selectively disable
the clock to individual blocks by a control signal, as
illustrated in Figure.1 During synthesis, the tools identify
groups of FFs which share a common enable control signal
and use them to selectively switch off the clocks to those
groups of flops. Both of these clock-gating methods will
eventually introduce physical gates in the clock paths which
control their downstream clocks. These gates could
introduce clock skew and lead to setup and hold-time
violations even when mapped into the SoC. However, this is
compensated for by the clock-tree synthesisandlayouttools
at various stages of the SoC back-end flow.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 260
Fig -1: Block Diagram of Clock Gating
2. PROPOSED SYSTEM
Current FPGA families support differentCGstrategiesand
each manufacturer provides its own IP for managing
these different approaches. The methodology described
here is based on using primitives specific to Xilinx FPGA
architectures. However, this methodology can be modified
to support other FPGA vendors primitives.
The execution of a dataflow program consists of a
sequence of action firings. These firings can be correlated
to each other in a graph-based representation using an
approach called executiontracegraphing(ETG).Thegraphis
an acyclic directed graph where each node represents an
action firing, and a directed arc represents either a data or a
control dependency between two different action firings.
3. IMPLEMENTATION
Algorithm:
1) Step1: Initialize en=1.when F=0,AF=0;
2) Step2: Now queue has empty space. Wait with en=1
until AF=1.
If AF=1 go to next state.
3) Step3: Now CG-ckt ready to give en=0. If AF=0again
go to step2.
Else if AF=1, F=0 be in the same state.
Else F=1, AF=1 go to next state.
4) Step4: Now queue is in full state so,disableclk input
to actor by asserting en=0;
5) Step5: Wait until F=0;
If AF=1, F=1 be in the same state.
Else F=0, AF=1 go to next state;
6) Step6: Now queue is in almost full state en=1;
7) If F=0,AF=0 queue has empty spacethengotostep2;
8) Elseif
F=1,
AF=1
again goto step5.
The controller is implemented as a finite state machine
(FSM) having a clock; a reset; input F, for full; input AF, for
almost full; and output EN, for enable. TheAFinputbecomes
active high when there is only one space left on its FIFO
Queue. Its FSM has five states S ={ INIT , SPACE , AFULL _
DISABLE , FULL , AFULL _ ENABLE } . The controller starts
with the INIT state and maintains the EN output port at
active high until F and AF become active low. The activehigh
EN is maintained during the SPACE state.
As a queue becomes full, the state changes to AFULL _
DISABLE .In this state, the EN output passes to an activelow.
A conservative approach is taken in this state astheBUFGCE
disables the output clock on the high-to-low edge. The clock
enables entering the BUFGCE should be synchronized to the
input clock. Once the queue becomes full, the controller
maintains the EN at active low.
Fig -2: State machine of the clock enabling controller.
The power reduction gain of the aforse
mentioned methodology is evaluated by applying it to a
video decoder design. In the reader can find a variety of
RVC-CAL applications for dataflow programs. One of these
applications is the intra MPEG-4 simpleprofiledecoder. Due
to restrictions on the number of clock buffers in Xilinx
FPGAs, the design selected was re factored to result in 32
actors.
The intra MPEG-4 SP description contains 32
actors and it is 4:2:0 decoder which is separated into eight
processing blocks.
Fig -3: streaming application element (De-blocking filter)
with clock gating.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 261
4. RESULTS:
Table -1: comparison table of proposed and extension
method
PROJECT SLICE
LUTS
SLICE
REGISTERS
LUT-
FFS
DELAY
PROPOSED 368 171 97 8.775ns
EXTENSION 268 155 81 8.775ns
4. CONCLUSION
This project presents a CG methodology applied to
dataflow designs that can be automatically included in the
synthesis stage of an HLS design flow. The application of the
power saving technique is independent from the schematic
of application and does not need any additional step or
effort during the “design” of the application at the
dataflow program level. The CG logic is generated during
the synthesis stage together with the synthesis of the
computational kernels connected via FIFO queues
constituting the dataflow network. Conceivably, these
techniques could be extended to other dataflow methods of
computation. Experimental results are very encouraging:
savings in power dissipation achieved with a slight
increase in control logic without any reduction in
throughput have been achieved. Unsurprisingly, CG is
attractive in situations where the design isnotusedtoitsfull
capacity. In these circumstances CG is a simple, automatic,
and effective way to recover power otherwise lost in “idle”
cycles. As a result, this techniqueisparticularlyinterestingin
applications with dynamically varying performance
requirements, when designing to a particular
performance point is impossible, and when power
consumption is deemed costly. Further investigations into
CG should consider more aggressive control logic,
whereby control is given to each individual actor,
allowing greater flexibility to actor inactivity.
Furthermore, it will be necessary to develop tools
that partition complex applications onto the limited
number of clock domains for more efficient
implementations. Lastly, additional considerations could
be given to controlling clock speed and, possibly, voltage
transitions.
REFERENCES
[1] M. Pedram, “Power minimization in IC design:Principles
and applications,” ACM Trans. Design Autom.
Electron. Syst. , vol. 1, no. 1, pp. 3–56, Jan. 1996. [Online].
Available: http://doi. acm . org/10 . 1145/225871 . 225877
[2] Q. Wu, M. Pedram and X. Wu, “Clock-gating and
its application to low power design of sequential
circuits,” IEEE Trans. Circuits Syst. I, Fundam. Theory
Appl. , vol. 47, no. 3, pp. 415–420, Mar. 2000.
[3] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, “Activity-
driven clock design for low power circuits,” in IEEE/ACM
Int. Conf. Comput.-Aided Design Dig. Tech. Papers (ICCAD)
San Jose, CA, USA, Nov. 1995, pp. 62–65.
[4] E. A. Lee and A. Sangiovanni-Vincentelli, “Comparing
models of computation,” in Proc. IEEE/ACM Int. Conf.
Comput.-Aided Design , Austin, TX, USA, 1997, pp. 234–241.
[5] G. Kahn, “The semantics of simple language for parallel
programming,” in Proc. IFIP Congr. , Stockholm, Sweden,
1974, pp. 471–475.
[6] E. A. Lee and D. G. Messerschmitt, “Static scheduling of
synchronous data flow programs for digital signal
processing,” IEEE Trans. Comput. , vol. 36, no. 1, pp. 24–35,
Jan. 1987.
[7] E. A. Lee and T. M. Parks, “Dataflow process networks,”
Proc. IEEE, vol. 83, no. 5, pp. 773–801, May 1995.
[8] S. Suhaib, D. Mathaikutty, and S. Shukla, “Dataflow
architectures for GALS,” Electron. Notes Theory. Comput.
Sci. , vol. 200, no. 1, pp. 33–50, 2008.
[9] T.-Y. Wuu and S. B. K. Vrudhula, “Synthesis of
asynchronous systems from data flow specification,”Inf.Sci.
Inst., Univ. Southern California, Los Angeles, CA, USA, Tech.
Rep. ISI/RR-93-366, Dec. 1993.

More Related Content

PDF
IRJET- A High Performance Parallel Architecture for Linear Feedback Shift Reg...
PDF
Design and development of matlab gui based fuzzy logic controllers for ac motor
PDF
POWER EFFICIENT ALU DESIGN WITH CLOCK AND CONTROL-SIGNAL GATING TECHNIQUE
PDF
Decentralized supervisory based switching control for uncertain multivariable...
PDF
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
DOCX
A Computers Architecture project on Barrel shifters
PPT
Floating point units
PPTX
Integrated fuzzy logic controller for a Brushless DC Servomotor system
IRJET- A High Performance Parallel Architecture for Linear Feedback Shift Reg...
Design and development of matlab gui based fuzzy logic controllers for ac motor
POWER EFFICIENT ALU DESIGN WITH CLOCK AND CONTROL-SIGNAL GATING TECHNIQUE
Decentralized supervisory based switching control for uncertain multivariable...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
A Computers Architecture project on Barrel shifters
Floating point units
Integrated fuzzy logic controller for a Brushless DC Servomotor system

What's hot (18)

PDF
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
PDF
IRJET- Analysis of 3-Phase Induction Motor with High Step-Up PWM DC-DC Conver...
PDF
Design of low power barrel shifter and rotator using two phase clocked adiaba...
PDF
Tracy–Widom distribution based fault detection approach: Application to aircr...
PDF
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
PDF
A portable hardware in-the-loop device for automotive diagnostic control systems
PDF
Fuzzy imp in part
PPTX
Challenges in testing iec61850 ver 3
PDF
FPGA BASED ADAPTIVE NEURO FUZZY INFERENCE CONTROLLER FOR FULL VEHICLE NONLINE...
PDF
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
PDF
A04310104
PDF
IRJET - High Speed Approximation Error Tolerance Adders for Image Processing ...
PDF
Parallel distribution compensation PID based on Takagi-Sugeno fuzzy model app...
PPT
DESIGN AND IMPLEMENTATION OF 64-BIT ARITHMETIC LOGIC UNIT ON FPGA USING VHDL
PDF
Automatic Generation Control of Multi-Area Power System with Generating Rate ...
PDF
Fuzzy gain scheduling control apply to an RC Hovercraft
PDF
Digital Implementation of Fuzzy Logic Controller for Real Time Position Contr...
PDF
Design of Fuzzy Logic Controller for Speed Regulation of BLDC motor using MATLAB
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
IRJET- Analysis of 3-Phase Induction Motor with High Step-Up PWM DC-DC Conver...
Design of low power barrel shifter and rotator using two phase clocked adiaba...
Tracy–Widom distribution based fault detection approach: Application to aircr...
Design and Implementation of Single Precision Pipelined Floating Point Co-Pro...
A portable hardware in-the-loop device for automotive diagnostic control systems
Fuzzy imp in part
Challenges in testing iec61850 ver 3
FPGA BASED ADAPTIVE NEURO FUZZY INFERENCE CONTROLLER FOR FULL VEHICLE NONLINE...
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
A04310104
IRJET - High Speed Approximation Error Tolerance Adders for Image Processing ...
Parallel distribution compensation PID based on Takagi-Sugeno fuzzy model app...
DESIGN AND IMPLEMENTATION OF 64-BIT ARITHMETIC LOGIC UNIT ON FPGA USING VHDL
Automatic Generation Control of Multi-Area Power System with Generating Rate ...
Fuzzy gain scheduling control apply to an RC Hovercraft
Digital Implementation of Fuzzy Logic Controller for Real Time Position Contr...
Design of Fuzzy Logic Controller for Speed Regulation of BLDC motor using MATLAB
Ad

Similar to Clock Gating of Streaming Applications for Power Minimization on FPGA’s (20)

PDF
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
PDF
Power Optimized Datapath Units of Hybrid Embedded Core Architecture Using Clo...
PDF
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
PDF
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
PDF
Performance Comparison of Various Clock Gating Techniques
PDF
[IJET-V1I3P5] Authors :Dushyant Kumar Soni, Ashish Hiradhar
PDF
P358387
PPTX
Low power
PDF
Dynamic Power Reduction of Digital Circuits by ClockGating
PDF
H33038041
DOCX
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
DOCX
Ieee 2015 2014 nexgen tech vlsi abstract
PDF
Reducing power in using different technologies using FSM architecture
PDF
Design of 16 bit low power processor using clock gating technique 2-3
PDF
FPGA IMPLEMENTATION OF UNIVERSAL SHIFT REGISTER FOR ASYNCHRONOUS DATA SAMPLING
PDF
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
PDF
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
PPT
5378086.ppt
PDF
Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register
DOCX
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Power Optimized Datapath Units of Hybrid Embedded Core Architecture Using Clo...
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Design and Analysis of Sequential Circuit for Leakage Power Reduction using S...
Performance Comparison of Various Clock Gating Techniques
[IJET-V1I3P5] Authors :Dushyant Kumar Soni, Ashish Hiradhar
P358387
Low power
Dynamic Power Reduction of Digital Circuits by ClockGating
H33038041
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS IN...
Ieee 2015 2014 nexgen tech vlsi abstract
Reducing power in using different technologies using FSM architecture
Design of 16 bit low power processor using clock gating technique 2-3
FPGA IMPLEMENTATION OF UNIVERSAL SHIFT REGISTER FOR ASYNCHRONOUS DATA SAMPLING
IRJET-Power Efficient Implementation of Asynchronous Counter using Intelligen...
vdoc.pub_static-timing-analysis-for-nanometer-designs-a-practical-approach-.pdf
5378086.ppt
Area Efficient Pulsed Clock Generator Using Pulsed Latch Shift Register
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
Well-logging-methods_new................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
composite construction of structures.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
composite construction of structures.pdf
Lecture Notes Electrical Wiring System Components
Foundation to blockchain - A guide to Blockchain Tech
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
573137875-Attendance-Management-System-original
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Clock Gating of Streaming Applications for Power Minimization on FPGA’s

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 259 CLOCK GATING OF STREAMING APPLICATIONS FOR POWER MINIMIZATION ON FPGA’S M. Venkata Raghudeep1, K. Anji Babu2 1Student, Dept of ECE, A.K.R.G College of Engineering and Technology, JNTU Kakinada, AP, India 2Assistant Professor, Dept of ECE, A.K.R.G College of Engineering and Technology, JNTU Kakinada, AP, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This project investigates the reduction of dynamic power for streaming applications yielded by asynchronous dataflow designs by using clock gating techniques. Streaming applications constitute a very broad class of computing algorithms in areas such as signal processing, digital media coding, cryptography, video analytics, network routing, packet processing, etc. This project introduces a set of techniques that, considering the dynamic streaming behavior of algorithms, can achieve power savings by selectively switching off parts of the circuits when they are temporarily inactive. The techniques being independent from the semantic of the application can be applied to any application and can be integrated into the synthesis stage of a high- level dataflow design flow. Experimental results of at-size applications synthesized on field-programmable gatearrays platforms demonstrate power reductions achievable with no loss in data throughput. Key Words: Clock gating, Streaming applications, Power saving, Field programmable gate arrays, Data flow. 1. INTRODUCTION In the past, the major concernsoftheVLSIdesigner were area, performance,costandreliability;powerconsiderations were mostly of only secondary importance. In recent years, however, this has begun to change and, increasingly, power is being given comparable weight to area and speed. Several factors have contributed to this trend. Perhaps the primary driving factor has been the remarkable success and growth of the class of personal computing devices (portable desktops, audio- and video-based multimedia products)and wirelesscommunicationssystemswhichdemandhigh-speed computation and complex functionality with low power consumption. 1.1 Streaming Applications Many applications in embedded systems perform the same or similar operations on regular sequences of data, which can be categorized as streaming applications. For example, on a smart-phone, one can easily find many streaming applications that are essential to thefunctionality of the system, like wireless communication, high-definition video/audio codes and 3D graphics rendering. A streaming application usually contains a set of independentprocessing stages. The structure of a H.264 video codec, which is a typical streaming application. The different stages or the codec form a pipeline that is used to encode or decode videos frames. A frame in a H.264 streamcan beanI-frame,a P-frame or a B-frame. The streaming processing structure in these applications creates lots of optimization opportunities, such as pipelined parallel execution of differentparts,andexploitingdata level parallelism in certain stages. More importantly, streaming applications are among the most performance demanding applications in embedded systems.The rapidintroductionof these applications becomes an important driving force for the development of new embedded technologies. So this thesis focuses on developing energy efficient architectures and compilation techniques for streaming applications. 1.2 Clock Gating Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruningtheclock disables portions of the circuitry so that the flip-flops in them do not have to switch states. Switching states consumes power. When not being switched, the switchingpowerconsumption goes to zero, and only leakage currents are incurred. Clock gating is a methodology of turning off the clock for a particular block when it is not needed and is used by most SoC designs today as an effective technique to save dynamic power. The simplest and most common form of clock gating is when a logical “AND” function is used to selectively disable the clock to individual blocks by a control signal, as illustrated in Figure.1 During synthesis, the tools identify groups of FFs which share a common enable control signal and use them to selectively switch off the clocks to those groups of flops. Both of these clock-gating methods will eventually introduce physical gates in the clock paths which control their downstream clocks. These gates could introduce clock skew and lead to setup and hold-time violations even when mapped into the SoC. However, this is compensated for by the clock-tree synthesisandlayouttools at various stages of the SoC back-end flow.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 260 Fig -1: Block Diagram of Clock Gating 2. PROPOSED SYSTEM Current FPGA families support differentCGstrategiesand each manufacturer provides its own IP for managing these different approaches. The methodology described here is based on using primitives specific to Xilinx FPGA architectures. However, this methodology can be modified to support other FPGA vendors primitives. The execution of a dataflow program consists of a sequence of action firings. These firings can be correlated to each other in a graph-based representation using an approach called executiontracegraphing(ETG).Thegraphis an acyclic directed graph where each node represents an action firing, and a directed arc represents either a data or a control dependency between two different action firings. 3. IMPLEMENTATION Algorithm: 1) Step1: Initialize en=1.when F=0,AF=0; 2) Step2: Now queue has empty space. Wait with en=1 until AF=1. If AF=1 go to next state. 3) Step3: Now CG-ckt ready to give en=0. If AF=0again go to step2. Else if AF=1, F=0 be in the same state. Else F=1, AF=1 go to next state. 4) Step4: Now queue is in full state so,disableclk input to actor by asserting en=0; 5) Step5: Wait until F=0; If AF=1, F=1 be in the same state. Else F=0, AF=1 go to next state; 6) Step6: Now queue is in almost full state en=1; 7) If F=0,AF=0 queue has empty spacethengotostep2; 8) Elseif F=1, AF=1 again goto step5. The controller is implemented as a finite state machine (FSM) having a clock; a reset; input F, for full; input AF, for almost full; and output EN, for enable. TheAFinputbecomes active high when there is only one space left on its FIFO Queue. Its FSM has five states S ={ INIT , SPACE , AFULL _ DISABLE , FULL , AFULL _ ENABLE } . The controller starts with the INIT state and maintains the EN output port at active high until F and AF become active low. The activehigh EN is maintained during the SPACE state. As a queue becomes full, the state changes to AFULL _ DISABLE .In this state, the EN output passes to an activelow. A conservative approach is taken in this state astheBUFGCE disables the output clock on the high-to-low edge. The clock enables entering the BUFGCE should be synchronized to the input clock. Once the queue becomes full, the controller maintains the EN at active low. Fig -2: State machine of the clock enabling controller. The power reduction gain of the aforse mentioned methodology is evaluated by applying it to a video decoder design. In the reader can find a variety of RVC-CAL applications for dataflow programs. One of these applications is the intra MPEG-4 simpleprofiledecoder. Due to restrictions on the number of clock buffers in Xilinx FPGAs, the design selected was re factored to result in 32 actors. The intra MPEG-4 SP description contains 32 actors and it is 4:2:0 decoder which is separated into eight processing blocks. Fig -3: streaming application element (De-blocking filter) with clock gating.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 01 | Jan 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 261 4. RESULTS: Table -1: comparison table of proposed and extension method PROJECT SLICE LUTS SLICE REGISTERS LUT- FFS DELAY PROPOSED 368 171 97 8.775ns EXTENSION 268 155 81 8.775ns 4. CONCLUSION This project presents a CG methodology applied to dataflow designs that can be automatically included in the synthesis stage of an HLS design flow. The application of the power saving technique is independent from the schematic of application and does not need any additional step or effort during the “design” of the application at the dataflow program level. The CG logic is generated during the synthesis stage together with the synthesis of the computational kernels connected via FIFO queues constituting the dataflow network. Conceivably, these techniques could be extended to other dataflow methods of computation. Experimental results are very encouraging: savings in power dissipation achieved with a slight increase in control logic without any reduction in throughput have been achieved. Unsurprisingly, CG is attractive in situations where the design isnotusedtoitsfull capacity. In these circumstances CG is a simple, automatic, and effective way to recover power otherwise lost in “idle” cycles. As a result, this techniqueisparticularlyinterestingin applications with dynamically varying performance requirements, when designing to a particular performance point is impossible, and when power consumption is deemed costly. Further investigations into CG should consider more aggressive control logic, whereby control is given to each individual actor, allowing greater flexibility to actor inactivity. Furthermore, it will be necessary to develop tools that partition complex applications onto the limited number of clock domains for more efficient implementations. Lastly, additional considerations could be given to controlling clock speed and, possibly, voltage transitions. REFERENCES [1] M. Pedram, “Power minimization in IC design:Principles and applications,” ACM Trans. Design Autom. Electron. Syst. , vol. 1, no. 1, pp. 3–56, Jan. 1996. [Online]. Available: http://doi. acm . org/10 . 1145/225871 . 225877 [2] Q. Wu, M. Pedram and X. Wu, “Clock-gating and its application to low power design of sequential circuits,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl. , vol. 47, no. 3, pp. 415–420, Mar. 2000. [3] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, “Activity- driven clock design for low power circuits,” in IEEE/ACM Int. Conf. Comput.-Aided Design Dig. Tech. Papers (ICCAD) San Jose, CA, USA, Nov. 1995, pp. 62–65. [4] E. A. Lee and A. Sangiovanni-Vincentelli, “Comparing models of computation,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design , Austin, TX, USA, 1997, pp. 234–241. [5] G. Kahn, “The semantics of simple language for parallel programming,” in Proc. IFIP Congr. , Stockholm, Sweden, 1974, pp. 471–475. [6] E. A. Lee and D. G. Messerschmitt, “Static scheduling of synchronous data flow programs for digital signal processing,” IEEE Trans. Comput. , vol. 36, no. 1, pp. 24–35, Jan. 1987. [7] E. A. Lee and T. M. Parks, “Dataflow process networks,” Proc. IEEE, vol. 83, no. 5, pp. 773–801, May 1995. [8] S. Suhaib, D. Mathaikutty, and S. Shukla, “Dataflow architectures for GALS,” Electron. Notes Theory. Comput. Sci. , vol. 200, no. 1, pp. 33–50, 2008. [9] T.-Y. Wuu and S. B. K. Vrudhula, “Synthesis of asynchronous systems from data flow specification,”Inf.Sci. Inst., Univ. Southern California, Los Angeles, CA, USA, Tech. Rep. ISI/RR-93-366, Dec. 1993.