Green schedulingVincenzo De Maio
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
What is green computing?“The study and practice of designing, manufacturing, using, anddisposing of computers, servers, and associated subsystems suchas monitors, printers, storage devices, and networking andcommunications systems efficiently and effectively with minimal orno impact on the environment.”[1]Professor Dr San MurugesanFaculty of ManagementMultimedia UniversityCyberjaya, Malaysia,
Why does green computing matters?Some numbers:2 google searches = 14CO2 grams (as boiling a kettle!) (Alex Wissner-Gross, Harvard University physicist) [2][3]Windows 7 + Microsoft office 2007 requires 70 times more RAM than Windows 98 + Office 2000 to write exactly the same text or send the same email[4]In 2010, servers were responsible of the 2.5% of the total energy consumption of the USA.  A Further 2.5% were used for their cooling.[5]It was estimated that by 2020, servers would use more of the world's energy than air travel if current trends continued[5]
Further references Green500 (www.green500.com)GreenIT (www.greenit.fr) CO2Stats (www.co2stats.com)
Why green scheduling?A green scheduler could provideEnergy-oriented task assignmentSetting the correct power level for current workloadImproved use of the power management Learning power usage profile of job typesCould be a part of the Operating System power management
What do we want from a green scheduler?EfficiencySimplicityTime is money! 
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
Computation modelTasks usually depends on each otherDAGs: Directed Acyclic GraphsIf there’s a dependency between task u and v, we put an arc between nodes u and v
Computation modelSP-DAGs: Serial parallel DAGsA DAG with 2 terminals (source and target) and an arc between them is a SP-DAGMade by parallel and series  composition of other SP-DAGs
Why SP-DAGs?They describe several significant class of computation (for instance divide and conquer algorithms)They are the natural abstraction for several parallel programming languages (such as CILK) [10]We can recognize if a DAG is an SP-DAG in linear timeWe can easily transform an arbitrary DAG in an SP-DAG in linear time, using SP-ization
LEGO® DAGsAssessing the computational benefits of AREA-Oriented DAG-Scheduling (GennaroCordasco, Rosario De Chiara, Arnold L. Rosenberg) 2009SP-DAGs made by a repertoire of Connected Bipartite Building Blocks DAGs representing the various subcomputations
Furtherdefinitions on DAGs and SP-DAGsA node in the DAG could beUnelegibleElegibleAssigned/executedSchedule: Topologicalsort of the DAG Obtained by a rule for selectingwhichelegiblenode to executeateachstep of computationv has been scheduled for execution or executedv  has at least a non-executed parentAllv’sparenthave been executed
Critical pathLongest path from the source to the sinkWhy is so important?It’s clear to see that we can’t finish our computation before executing each node on the critical pathSo, time critical path execution takes it’s a trivial lower bound.
Further definitions on DAGs and SP-DAGsYield of a node: number of nodes that become elegible when the given node completes his execution.𝑬Σ(𝒊): Elegible nodes at step i in schedule Σ𝑨𝑹𝑬𝑨(Σ)≜𝑖=0𝑛𝐸Σ𝑖 
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
Energy consumption modelWe need a realistic model for energy consumptionWe should checkCircuits dissipationThrottling models
Energy consumption modelCMOS Circuit dissipation:𝑃=𝐶𝑉2𝑓+𝐼𝑚𝑒𝑎𝑛𝑉 +𝑉𝑙𝑒𝑎𝑘𝑎𝑔𝑒(we won’t consider short circuit power and leakage)We assume a linear relationship between voltage and frequency𝑓=𝑘𝑉 
Energy consumption modelOur model:𝐸=𝐶 × 𝑇× 𝑓3Where:𝑇=𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓𝑓= clock cycles per secondC enclosesseveralconstantslikecapacitance, k and clock multiplier 
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
CPU throttlingmodelsWhichis the common throttling model used by modern processors?ACPI: Advanced Configuration and Power-management Interface[6]A fullyplatform-independent standard thatprovides:MonitoringConfiguringHardware discoveringPower managementDefinespowerstates for everydevice
Performance vs powerstatesPowerstates:C0: Operationalpower stateC1: Halt stateC2: Stop-clockC3: SleepPerformance states:P0: Higher stateP1: Lessthan P0, frequency / voltagescaledPn: Lessthan Pn-1, frequancy/voltagescaledIn our model, weimplementonly C0 power state and P0,P1,P2 Performance states.
Ourthrottling modelWe use a DFS (DynamicFrequencyScaling) Model, assumingthatscalingdoesn’taddenergyoverheadP0: 1.0 ∗𝑓P1: 0.7 ∗𝑓P2: 0.5 ∗ 𝑓 
Further considerationsIn our model, an idle core consumes 0We do not track the algorithm execution energyWe do not track energy dissipated by memory usingEnergy is unboundedWe’re assuming that you can set a single core throttling
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
The simulatorWe implemented this model in a DAG-Scheduling simulator, Providing classes and methods to calculate energy consumptionImplementing the energy model  we discussed earlierPaying attention to extensibility
A typical simulationLoads a DAGComputesgraphcriticalpathInitializesschedulersthatneeds to be testedExecutesschedulers on the givengraphs for a givennumber of trials (usually 100, due to randomnessinfluencingschedulers)At the end of iterations, itcollectsstatisticsabout the executions, specificallyMakespan (min, max, average)Energy consumptionaverageRepeats on each DAG
How we implemented the modelOur focus: ExtensibilityWe wanted our simulator to support multiple kind of modelsProvidingCore abstractionThrottling level abstractionEnergy aware scheduler abstractionTotally decoupled from core and throttling levelMaking easier to addDifferent scheduling algorithmsDifferent core types Different energy models
Core abstractionA core canExecute tasksSet its own throttling levelTrack its power consumptionProblem: different cores could implement different throttling strategiesSolution: Every core has its own throttling levels arrayThrottling level is a nested class in the core implementation
ThrottlinglevelabstractionA throttlinglevelcontainsInformationsaboutfrequency and consumptionMethods to calculateDue date of a task at a givenlevel (lesser the level, slower the task execution)Powerconsumptionat a givenlevel
Energy packageCore interfaceWe assume thatevery core can execute task and set hisownthrottlingAbstractclassThrottlingLevelImplements a throttlinglevel, with energyconsumption info and frequency.Class DummyCoreCore base implementationClass DefaultThrottlingLevelDummyCorenestedclass, implementsour performance states
Core interface/** * Execute a task on this core * @param node The node that models the task * @param length Task length if executed at max power * @return the real task length (this could differ from input * if Core is set to a different throttling level) */public double executeTask(ICONodenode, double length);/** * Sets a core power consumption to his current throttling level * idleconsumption */public voidsetIdle();/** * Sets the core to a greater power level */public voidincreaseThrottlingLevel();/** * Sets the core to a lesser power level */public voiddecreaseThrottlingLevel();
ThrottlingLevel/** * This method calculates the power consumption for a * given task length, according to power consumption unit * and other parameters, according to programmer's will that * implementsit. *  * @param length The task length * @return Power consumption for this task */abstract double getPowerConsumptionPerTask(double length);/** * This method calculates how task length is modified * for the given throttling level *  * @param length ideal length of the task * @return the real task length for the given throttling level */abstract double getRealLength(double length);
Throttlinglevelinitializationpublic voidinitializeThrottlingLevels(double hardwareConstant,double maxFreq, double maxVoltage, intthrottlingLevels) {this.levels= new ThrottlingLevel[throttlingLevels];	for( int i = 0; i < throttlingLevels - 1 ; i++ ){		double numerator,denominator;		numerator = i + 1.0;		denominator = i + 2.0;		double fraction = numerator/denominator;		levels[i] = new DefaultThrottlingLevel("LEVEL"+i,hardwareConstant, fraction * maxFreq, fraction * maxVoltage);	}this.levels[throttlingLevels- 1] = new DefaultThrottlingLevel("LEVEL"+(throttlingLevels-1),hardwareConstant, maxFreq, maxVoltage);	//necessary for correct use of increase and decreaseArrays.sort(levels);	//by default we set the maximum power levelthis.currentThrottlingLevel= levels[2];this.throttlingLevelIndex= 2;this.dissipatedPower= 0.0;}
Energy awareschedulerabstractionAn energyawareschedulerhas toWork with differenttypes of coresTrack the makespan and the energyconsumptionImplementlogic forCore selectionElegiblenodeselectionChoosing the right throttlinglevel
Energy awarescheduler packageCoreSelectorImplements free core selectionstrategy (In thosetestswe use DefaultCoreSelectorclass)EnergyAwareSchedulerBase for eachschedulertrackingenergyconsumption
InspectingEnergyAwareSchedulerclass/** * Istantiates a new EnergyAwareScheduler * @paramnumCores number of cores * @paramcoreClass class that models the desired core type * @throwsInstantiationException * @throwsIllegalAccessException * @throws IllegalArgumentException if numCores <= 0 */public EnergyAwareScheduler(intnumCores, Class<? extends Core> coreClass) throwsInstantiationException, IllegalAccessException, IllegalArgumentException/** * Calculates the task length on a given core * @paramcoreIndex index of the core in the corePool * @parameventLength ideal length of the task * @param node node to be executed * @return the task length if executed on coreIndex core */protected double getTimeOffsetForCore(intcoreIndex, double eventLength,ICONodenode)
InspectingEnergyAwareSchedulerclass/***Sets thtottlingfor core thatare going to execute a task in thisstep*@paramcoreIndex: the core id*/protectedvoidsetBusyThrottling(intcoreIndex)/***Sets throttling state for core thatwillremainidle*/protectedvoidsetIdleThrottling()public double getTotalPowerConsumption()private voidcalculateIdleConsumptions()
Whataboutscheduling?Schedule steps are implementedusing the TimeLine ObjectA priorityqueuecontainingtwotypes of TimeEventprocessorsArrivesclientFinishesAt eachschedulingstepremoves the first event from the TimeLineSchedulinglogicisimplemented in the runBatchedMakespanmethodFurtherinitialization are made in the initBatchedMakespanmethod
runBatchedMakespanmethodWhile ( executedNode != target)Event := timeline.pollNextEvent();setOverallThrottlingLevel();Switch(Event)Case(processorsArrives)𝑛𝑒 := min(availableCores,elegibleNodesNum)For i := 0 to 𝑛𝑒nextNode := getNextElegibleNode();coreIndex := coreSelector.getCoreIndex();corePool[coreIndex].setBusy();setBusyThrottling(coreIndex);timeOffset := getTimeOffsetForCore(coreIndex, eventLength, nextNode);timeline.add(new TimeEvent(event.getTime+ timeOffset,ClientFinishes,nextNode) 
runBatchedMakespanMethodCase(clientFinishes)executedNode = event.getNode();Execute(executedNode);corePool[event.getOwnerCore()].setFree();
Default strategiesgetNextElegibleCore() isabstract (every core has to implementit)setBusyThrottling(coreIndex) by default sets the maximum throttlinglevel, assetOverallThrottlingLevel()Furtherinitializations are made in the initBatchedMakespanmethod
Whatabout core selection?Core selectionisimplementedas a differentclassimplementing the CoreSelectorinterfaceCoreSelectorprovides the getCoreIndexmethodIn oursimulationwe use only the DefaultCoreSelector, thatsimplytakes the highestfrequency free core
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
Green heuristicsCPSchedulerAOSPDSchedulerTFIHeuristicSchedulerMarathonHeuristicEvery heuristic has been implemented as an EnergyAwareScheduler subclass
CRITICAL PATH Based schedulingComputes graph critical pathSelect free core with highest energySet core to maximum powerSelect node with maximum distance from the sinkTo implement this scheduler, only method getNextElegibleCore() has been overwritten
AOSPD SCHEDULINGOn scheduling DAGs to maximize AREA (GennaroCordasco, Arnold L. Rosenberg)An idea from Internet Computing scenarioIt’s quite impossible to determine when new processors become available for task executionSo… What we can do?Solutions: Maximize the AREA at each execution stepGREAT! Not always possibile [7]Maximize the average AREA over the execution stepsGood! Always possible! 
More on AOSPD schedulingAt step 1, wehave to choose B or C for executionTo maximize AREA atstep 1, wechoose CWhathappens in step 2?Choosingelegiblenodes in step 2 wecan’tmaximize AREA To maximize AREA in step 2 weshouldhavechosen B, thatwasnot AREA-Maximizing for step 1
AddingenergytrackingaospdschedulingWealreadyhadthisalgorithmimplemented, withoutenergytrackingHow to plug AOSPD in?Solution:Extending the EnergyAwareSchedulerRefactoringclass so thatwehave the getNextElegibleNode()
TFI HEURISTICThe idea: if we have to wait for a task that requires much more time than others, we could slow down the faster ones to save energyTFI: Max due date for critical path value i
TFI HEURISTICComputes graph critical pathSelect free core with highest frequencySort elegible nodes by their critical path value and yieldFind maximum due dateTFINode := node with maximum critical path value and due dateTFI:= maximum task length𝑛𝑒:=min⁡(𝑐𝑜𝑟𝑒𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)For i:=1 to 𝑛𝑒Node := elegibleNodes[i]If Node == TFINodeexecute Node at max powerElse if (elegibleNodes.size() <numCores)Execute our node at minimum throttling level that keeps his length lesser than TFIElse execute node at default throttling level 
Marathon heuristicThe idea: Our problem reminds a Marathon…We have to come first…… and possibly alive  (with enough energy to come back home)Being lazier we’ll save more energyHow should we run a marathon?According to my uncle:It’s better to preserve an average pace than squandering energies to run faster for a short stretchWhen you can’t overtake (road too narrow or you’re too tired), it’s better to slow down a little waiting for best conditions
Marathon heuristicComputes graph critical pathSelect free core with highest frequencySort elegible nodes by their critical path value and Yield𝑛𝑒:=min⁡(𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑟𝑒𝑠, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)Front := sum of yields of the first 𝑛𝑒 nodesFor i := 1 to 𝑛𝑒Node := elegibleNodes[i]If front + n <= numcores – (numcores / DELTA)execute Node at minimum powerElseExecute Node at average power 
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
Assessing resultsRemember “time is money”?Solution: 𝐸𝑇2Remember Area-time complexity in VLSI design?[8][9]We use Energy-Time complexity to plot our schedulers performancesLesser the 𝐸𝑇2 score, better  the scheduler   
TestsTest parameters:Number of cores: 4, 8, 16Standard deviation: 1, 2, 4, 8Standard deviation influences task due date, which are generated by a Gaussian distribution with mean 1.0 and stdev in the given set
4 cores, stdev = 1
4 cores, stdev = 2
4 cores, stdev = 4
4 cores, stdev = 8
8 cores, stdev = 1
8 cores, stdev = 2
8 cores, stdev = 4
8 cores, stdev = 8
16 cores, stdev = 1
16 cores, stdev = 2
16 cores, stdev = 8
ConclusionsWe can’t obtain a makespanbetterthan the criticalpathschedulingAREA and Yieldconsiderationsdoesn’t seemtoaddmuch more in termsofenergysavingsAt least in a multicorescenario…Probablyweshould focus only on criticalpathTask due datesdoesn’t seemtoinfluencemakespantoomuch
Future worksTracking scheduler efficiencyAdding a model for idle core’s consumptionConsidering a “finite energy” modelExtend it in a volunteer computing scenarioWe could consider a scenario with many core on different diesAdding an extra cost to switch them onAdding thermal parameters
OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
ReferencesHarnessing GREEN IT: Principles and pratice (San Murugesan, 2009)"Research reveals environmental impact of Google searches.". Fox News. 2009-01-12. http://www.foxnews.com/story/0,2933,479127,00.html. Retrieved 2009-01-15.“Powering a Google search". Official Google Blog. Google. http://googleblog.blogspot.com/2009/01/powering-google-search.html. Retrieved 2009-10-01. "Office suite require 70 times more memory than 10 years ago.". GreenIT.fr. 2010-05-24. http://www.greenit.fr/article/logiciels/logiciel-la-cle-de-l-obsolescence-programmee-du-materiel-informatique-2748. Retrieved 2010-05-24.
References"ARM chief calls for low-drain wireless". The Inquirer. 29 June 2010. http://www.theinquirer.net/inquirer/news/1719749/arm-chief-calls-low-drain-wireless. Retrieved 30 June 2010.Advanced Configuration and Power Interface Specification, 2010 (www.acpi.info)Towarda theory for schedulingdags in internet-basedcomputing (G. Malewicz, A. L. Rosenberg, M. Yurkewych, 2006)Lower bound for VLSI (Richard J. Lipton, Robert Sedgewick, 1981)
ReferencesArea-time complexity for VLSI (C.D. Thompson, 1979)Cilk: an efficientmultithreadedruntimesystem (R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K. H. Randall, Y. Zhou) 5° ACM SIGPLAN Symp. On Principles and practices of Parallel Programming (PPoPP ‘95)
That’s all, folks!Thanks for your attention!

More Related Content

PPTX
Moleculas y solidos
PPTX
E-waste
PDF
Applied thermodynamics and engineering fifth edition by t.d eastop and a. mc ...
PDF
Energy aware networking
PPTX
Smart Computing : Cloud + Mobile + Social
PDF
Virtualization in green computing
KEY
The Lure Of Ubiquitous Mobile
PPTX
Power Management in Green Computing
Moleculas y solidos
E-waste
Applied thermodynamics and engineering fifth edition by t.d eastop and a. mc ...
Energy aware networking
Smart Computing : Cloud + Mobile + Social
Virtualization in green computing
The Lure Of Ubiquitous Mobile
Power Management in Green Computing

Similar to Green scheduling (20)

PPTX
Energy efficient resource management for high-performance clusters
PDF
LCU14-410: How to build an Energy Model for your SoC
PDF
Energy power efficient real time systems
PPTX
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
PDF
Parallel and Distributed Computing Chapter 9
PDF
A Review of Different Types of Schedulers Used In Energy Management
PPTX
ECE 565 presentation
PDF
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...
PPT
Mobile computing edited
PDF
22). smlevel energy eff-dynamictaskschedng
PDF
Energy Models in Data Parallel CPU/GPU Computations
PDF
Energy efficient-resource-allocation-in-distributed-computing-systems
PPTX
Energy Efficiency in Large Scale Systems
PDF
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
PPTX
Towards an Energy Aware Task Scheduler for Asymmetric Architectures
PDF
Power reductionofmicroprocessors
PDF
Runtime Methods to Improve Energy Efficiency in HPC Applications
PDF
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
PPTX
EEC Workshop 2014
PDF
Power aware compilation
Energy efficient resource management for high-performance clusters
LCU14-410: How to build an Energy Model for your SoC
Energy power efficient real time systems
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
Parallel and Distributed Computing Chapter 9
A Review of Different Types of Schedulers Used In Energy Management
ECE 565 presentation
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...
Mobile computing edited
22). smlevel energy eff-dynamictaskschedng
Energy Models in Data Parallel CPU/GPU Computations
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy Efficiency in Large Scale Systems
Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems
Towards an Energy Aware Task Scheduler for Asymmetric Architectures
Power reductionofmicroprocessors
Runtime Methods to Improve Energy Efficiency in HPC Applications
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
EEC Workshop 2014
Power aware compilation
Ad

More from Vincenzo De Maio (7)

PDF
Scheduling power-aware abstract
PDF
Cell Programming 2
PDF
Cell Programming 1
PDF
R e la statistica
PDF
Linguaggio R, principi e concetti
PDF
Envy free makespan approximation
PDF
Ambienti opensource per l'apprendimento
Scheduling power-aware abstract
Cell Programming 2
Cell Programming 1
R e la statistica
Linguaggio R, principi e concetti
Envy free makespan approximation
Ambienti opensource per l'apprendimento
Ad

Recently uploaded (20)

PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
HVAC Specification 2024 according to central public works department
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Climate Change and Its Global Impact.pptx
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
semiconductor packaging in vlsi design fab
PDF
Hazard Identification & Risk Assessment .pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Journal of Dental Science - UDMY (2022).pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Empowerment Technology for Senior High School Guide
PPTX
Education and Perspectives of Education.pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
English Textual Question & Ans (12th Class).pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
HVAC Specification 2024 according to central public works department
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
CRP102_SAGALASSOS_Final_Projects_2025.pdf
My India Quiz Book_20210205121199924.pdf
Climate Change and Its Global Impact.pptx
Computer Architecture Input Output Memory.pptx
Core Concepts of Personalized Learning and Virtual Learning Environments
semiconductor packaging in vlsi design fab
Hazard Identification & Risk Assessment .pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Journal of Dental Science - UDMY (2022).pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Environmental Education MCQ BD2EE - Share Source.pdf
Empowerment Technology for Senior High School Guide
Education and Perspectives of Education.pptx
AI-driven educational solutions for real-life interventions in the Philippine...
English Textual Question & Ans (12th Class).pdf

Green scheduling

  • 2. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
  • 3. What is green computing?“The study and practice of designing, manufacturing, using, anddisposing of computers, servers, and associated subsystems suchas monitors, printers, storage devices, and networking andcommunications systems efficiently and effectively with minimal orno impact on the environment.”[1]Professor Dr San MurugesanFaculty of ManagementMultimedia UniversityCyberjaya, Malaysia,
  • 4. Why does green computing matters?Some numbers:2 google searches = 14CO2 grams (as boiling a kettle!) (Alex Wissner-Gross, Harvard University physicist) [2][3]Windows 7 + Microsoft office 2007 requires 70 times more RAM than Windows 98 + Office 2000 to write exactly the same text or send the same email[4]In 2010, servers were responsible of the 2.5% of the total energy consumption of the USA. A Further 2.5% were used for their cooling.[5]It was estimated that by 2020, servers would use more of the world's energy than air travel if current trends continued[5]
  • 5. Further references Green500 (www.green500.com)GreenIT (www.greenit.fr) CO2Stats (www.co2stats.com)
  • 6. Why green scheduling?A green scheduler could provideEnergy-oriented task assignmentSetting the correct power level for current workloadImproved use of the power management Learning power usage profile of job typesCould be a part of the Operating System power management
  • 7. What do we want from a green scheduler?EfficiencySimplicityTime is money! 
  • 8. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
  • 9. Computation modelTasks usually depends on each otherDAGs: Directed Acyclic GraphsIf there’s a dependency between task u and v, we put an arc between nodes u and v
  • 10. Computation modelSP-DAGs: Serial parallel DAGsA DAG with 2 terminals (source and target) and an arc between them is a SP-DAGMade by parallel and series composition of other SP-DAGs
  • 11. Why SP-DAGs?They describe several significant class of computation (for instance divide and conquer algorithms)They are the natural abstraction for several parallel programming languages (such as CILK) [10]We can recognize if a DAG is an SP-DAG in linear timeWe can easily transform an arbitrary DAG in an SP-DAG in linear time, using SP-ization
  • 12. LEGO® DAGsAssessing the computational benefits of AREA-Oriented DAG-Scheduling (GennaroCordasco, Rosario De Chiara, Arnold L. Rosenberg) 2009SP-DAGs made by a repertoire of Connected Bipartite Building Blocks DAGs representing the various subcomputations
  • 13. Furtherdefinitions on DAGs and SP-DAGsA node in the DAG could beUnelegibleElegibleAssigned/executedSchedule: Topologicalsort of the DAG Obtained by a rule for selectingwhichelegiblenode to executeateachstep of computationv has been scheduled for execution or executedv has at least a non-executed parentAllv’sparenthave been executed
  • 14. Critical pathLongest path from the source to the sinkWhy is so important?It’s clear to see that we can’t finish our computation before executing each node on the critical pathSo, time critical path execution takes it’s a trivial lower bound.
  • 15. Further definitions on DAGs and SP-DAGsYield of a node: number of nodes that become elegible when the given node completes his execution.𝑬Σ(𝒊): Elegible nodes at step i in schedule Σ𝑨𝑹𝑬𝑨(Σ)≜𝑖=0𝑛𝐸Σ𝑖 
  • 16. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
  • 17. Energy consumption modelWe need a realistic model for energy consumptionWe should checkCircuits dissipationThrottling models
  • 18. Energy consumption modelCMOS Circuit dissipation:𝑃=𝐶𝑉2𝑓+𝐼𝑚𝑒𝑎𝑛𝑉 +𝑉𝑙𝑒𝑎𝑘𝑎𝑔𝑒(we won’t consider short circuit power and leakage)We assume a linear relationship between voltage and frequency𝑓=𝑘𝑉 
  • 19. Energy consumption modelOur model:𝐸=𝐶 × 𝑇× 𝑓3Where:𝑇=𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠𝑓𝑓= clock cycles per secondC enclosesseveralconstantslikecapacitance, k and clock multiplier 
  • 20. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
  • 21. CPU throttlingmodelsWhichis the common throttling model used by modern processors?ACPI: Advanced Configuration and Power-management Interface[6]A fullyplatform-independent standard thatprovides:MonitoringConfiguringHardware discoveringPower managementDefinespowerstates for everydevice
  • 22. Performance vs powerstatesPowerstates:C0: Operationalpower stateC1: Halt stateC2: Stop-clockC3: SleepPerformance states:P0: Higher stateP1: Lessthan P0, frequency / voltagescaledPn: Lessthan Pn-1, frequancy/voltagescaledIn our model, weimplementonly C0 power state and P0,P1,P2 Performance states.
  • 23. Ourthrottling modelWe use a DFS (DynamicFrequencyScaling) Model, assumingthatscalingdoesn’taddenergyoverheadP0: 1.0 ∗𝑓P1: 0.7 ∗𝑓P2: 0.5 ∗ 𝑓 
  • 24. Further considerationsIn our model, an idle core consumes 0We do not track the algorithm execution energyWe do not track energy dissipated by memory usingEnergy is unboundedWe’re assuming that you can set a single core throttling
  • 25. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and future worksReferences
  • 26. The simulatorWe implemented this model in a DAG-Scheduling simulator, Providing classes and methods to calculate energy consumptionImplementing the energy model we discussed earlierPaying attention to extensibility
  • 27. A typical simulationLoads a DAGComputesgraphcriticalpathInitializesschedulersthatneeds to be testedExecutesschedulers on the givengraphs for a givennumber of trials (usually 100, due to randomnessinfluencingschedulers)At the end of iterations, itcollectsstatisticsabout the executions, specificallyMakespan (min, max, average)Energy consumptionaverageRepeats on each DAG
  • 28. How we implemented the modelOur focus: ExtensibilityWe wanted our simulator to support multiple kind of modelsProvidingCore abstractionThrottling level abstractionEnergy aware scheduler abstractionTotally decoupled from core and throttling levelMaking easier to addDifferent scheduling algorithmsDifferent core types Different energy models
  • 29. Core abstractionA core canExecute tasksSet its own throttling levelTrack its power consumptionProblem: different cores could implement different throttling strategiesSolution: Every core has its own throttling levels arrayThrottling level is a nested class in the core implementation
  • 30. ThrottlinglevelabstractionA throttlinglevelcontainsInformationsaboutfrequency and consumptionMethods to calculateDue date of a task at a givenlevel (lesser the level, slower the task execution)Powerconsumptionat a givenlevel
  • 31. Energy packageCore interfaceWe assume thatevery core can execute task and set hisownthrottlingAbstractclassThrottlingLevelImplements a throttlinglevel, with energyconsumption info and frequency.Class DummyCoreCore base implementationClass DefaultThrottlingLevelDummyCorenestedclass, implementsour performance states
  • 32. Core interface/** * Execute a task on this core * @param node The node that models the task * @param length Task length if executed at max power * @return the real task length (this could differ from input * if Core is set to a different throttling level) */public double executeTask(ICONodenode, double length);/** * Sets a core power consumption to his current throttling level * idleconsumption */public voidsetIdle();/** * Sets the core to a greater power level */public voidincreaseThrottlingLevel();/** * Sets the core to a lesser power level */public voiddecreaseThrottlingLevel();
  • 33. ThrottlingLevel/** * This method calculates the power consumption for a * given task length, according to power consumption unit * and other parameters, according to programmer's will that * implementsit. * * @param length The task length * @return Power consumption for this task */abstract double getPowerConsumptionPerTask(double length);/** * This method calculates how task length is modified * for the given throttling level * * @param length ideal length of the task * @return the real task length for the given throttling level */abstract double getRealLength(double length);
  • 34. Throttlinglevelinitializationpublic voidinitializeThrottlingLevels(double hardwareConstant,double maxFreq, double maxVoltage, intthrottlingLevels) {this.levels= new ThrottlingLevel[throttlingLevels]; for( int i = 0; i < throttlingLevels - 1 ; i++ ){ double numerator,denominator; numerator = i + 1.0; denominator = i + 2.0; double fraction = numerator/denominator; levels[i] = new DefaultThrottlingLevel("LEVEL"+i,hardwareConstant, fraction * maxFreq, fraction * maxVoltage); }this.levels[throttlingLevels- 1] = new DefaultThrottlingLevel("LEVEL"+(throttlingLevels-1),hardwareConstant, maxFreq, maxVoltage); //necessary for correct use of increase and decreaseArrays.sort(levels); //by default we set the maximum power levelthis.currentThrottlingLevel= levels[2];this.throttlingLevelIndex= 2;this.dissipatedPower= 0.0;}
  • 35. Energy awareschedulerabstractionAn energyawareschedulerhas toWork with differenttypes of coresTrack the makespan and the energyconsumptionImplementlogic forCore selectionElegiblenodeselectionChoosing the right throttlinglevel
  • 36. Energy awarescheduler packageCoreSelectorImplements free core selectionstrategy (In thosetestswe use DefaultCoreSelectorclass)EnergyAwareSchedulerBase for eachschedulertrackingenergyconsumption
  • 37. InspectingEnergyAwareSchedulerclass/** * Istantiates a new EnergyAwareScheduler * @paramnumCores number of cores * @paramcoreClass class that models the desired core type * @throwsInstantiationException * @throwsIllegalAccessException * @throws IllegalArgumentException if numCores <= 0 */public EnergyAwareScheduler(intnumCores, Class<? extends Core> coreClass) throwsInstantiationException, IllegalAccessException, IllegalArgumentException/** * Calculates the task length on a given core * @paramcoreIndex index of the core in the corePool * @parameventLength ideal length of the task * @param node node to be executed * @return the task length if executed on coreIndex core */protected double getTimeOffsetForCore(intcoreIndex, double eventLength,ICONodenode)
  • 38. InspectingEnergyAwareSchedulerclass/***Sets thtottlingfor core thatare going to execute a task in thisstep*@paramcoreIndex: the core id*/protectedvoidsetBusyThrottling(intcoreIndex)/***Sets throttling state for core thatwillremainidle*/protectedvoidsetIdleThrottling()public double getTotalPowerConsumption()private voidcalculateIdleConsumptions()
  • 39. Whataboutscheduling?Schedule steps are implementedusing the TimeLine ObjectA priorityqueuecontainingtwotypes of TimeEventprocessorsArrivesclientFinishesAt eachschedulingstepremoves the first event from the TimeLineSchedulinglogicisimplemented in the runBatchedMakespanmethodFurtherinitialization are made in the initBatchedMakespanmethod
  • 40. runBatchedMakespanmethodWhile ( executedNode != target)Event := timeline.pollNextEvent();setOverallThrottlingLevel();Switch(Event)Case(processorsArrives)𝑛𝑒 := min(availableCores,elegibleNodesNum)For i := 0 to 𝑛𝑒nextNode := getNextElegibleNode();coreIndex := coreSelector.getCoreIndex();corePool[coreIndex].setBusy();setBusyThrottling(coreIndex);timeOffset := getTimeOffsetForCore(coreIndex, eventLength, nextNode);timeline.add(new TimeEvent(event.getTime+ timeOffset,ClientFinishes,nextNode) 
  • 42. Default strategiesgetNextElegibleCore() isabstract (every core has to implementit)setBusyThrottling(coreIndex) by default sets the maximum throttlinglevel, assetOverallThrottlingLevel()Furtherinitializations are made in the initBatchedMakespanmethod
  • 43. Whatabout core selection?Core selectionisimplementedas a differentclassimplementing the CoreSelectorinterfaceCoreSelectorprovides the getCoreIndexmethodIn oursimulationwe use only the DefaultCoreSelector, thatsimplytakes the highestfrequency free core
  • 44. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
  • 46. CRITICAL PATH Based schedulingComputes graph critical pathSelect free core with highest energySet core to maximum powerSelect node with maximum distance from the sinkTo implement this scheduler, only method getNextElegibleCore() has been overwritten
  • 47. AOSPD SCHEDULINGOn scheduling DAGs to maximize AREA (GennaroCordasco, Arnold L. Rosenberg)An idea from Internet Computing scenarioIt’s quite impossible to determine when new processors become available for task executionSo… What we can do?Solutions: Maximize the AREA at each execution stepGREAT! Not always possibile [7]Maximize the average AREA over the execution stepsGood! Always possible! 
  • 48. More on AOSPD schedulingAt step 1, wehave to choose B or C for executionTo maximize AREA atstep 1, wechoose CWhathappens in step 2?Choosingelegiblenodes in step 2 wecan’tmaximize AREA To maximize AREA in step 2 weshouldhavechosen B, thatwasnot AREA-Maximizing for step 1
  • 49. AddingenergytrackingaospdschedulingWealreadyhadthisalgorithmimplemented, withoutenergytrackingHow to plug AOSPD in?Solution:Extending the EnergyAwareSchedulerRefactoringclass so thatwehave the getNextElegibleNode()
  • 50. TFI HEURISTICThe idea: if we have to wait for a task that requires much more time than others, we could slow down the faster ones to save energyTFI: Max due date for critical path value i
  • 51. TFI HEURISTICComputes graph critical pathSelect free core with highest frequencySort elegible nodes by their critical path value and yieldFind maximum due dateTFINode := node with maximum critical path value and due dateTFI:= maximum task length𝑛𝑒:=min⁡(𝑐𝑜𝑟𝑒𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)For i:=1 to 𝑛𝑒Node := elegibleNodes[i]If Node == TFINodeexecute Node at max powerElse if (elegibleNodes.size() <numCores)Execute our node at minimum throttling level that keeps his length lesser than TFIElse execute node at default throttling level 
  • 52. Marathon heuristicThe idea: Our problem reminds a Marathon…We have to come first…… and possibly alive  (with enough energy to come back home)Being lazier we’ll save more energyHow should we run a marathon?According to my uncle:It’s better to preserve an average pace than squandering energies to run faster for a short stretchWhen you can’t overtake (road too narrow or you’re too tired), it’s better to slow down a little waiting for best conditions
  • 53. Marathon heuristicComputes graph critical pathSelect free core with highest frequencySort elegible nodes by their critical path value and Yield𝑛𝑒:=min⁡(𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝐶𝑜𝑟𝑒𝑠, 𝑒𝑙𝑒𝑔𝑖𝑏𝑙𝑒𝑁𝑜𝑑𝑒𝑠𝑁𝑢𝑚)Front := sum of yields of the first 𝑛𝑒 nodesFor i := 1 to 𝑛𝑒Node := elegibleNodes[i]If front + n <= numcores – (numcores / DELTA)execute Node at minimum powerElseExecute Node at average power 
  • 54. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
  • 55. Assessing resultsRemember “time is money”?Solution: 𝐸𝑇2Remember Area-time complexity in VLSI design?[8][9]We use Energy-Time complexity to plot our schedulers performancesLesser the 𝐸𝑇2 score, better the scheduler  
  • 56. TestsTest parameters:Number of cores: 4, 8, 16Standard deviation: 1, 2, 4, 8Standard deviation influences task due date, which are generated by a Gaussian distribution with mean 1.0 and stdev in the given set
  • 68. ConclusionsWe can’t obtain a makespanbetterthan the criticalpathschedulingAREA and Yieldconsiderationsdoesn’t seemtoaddmuch more in termsofenergysavingsAt least in a multicorescenario…Probablyweshould focus only on criticalpathTask due datesdoesn’t seemtoinfluencemakespantoomuch
  • 69. Future worksTracking scheduler efficiencyAdding a model for idle core’s consumptionConsidering a “finite energy” modelExtend it in a volunteer computing scenarioWe could consider a scenario with many core on different diesAdding an extra cost to switch them onAdding thermal parameters
  • 70. OutlineIntroductionTheoretical ModelComputation modelEnergy consumption modelThrottling modelSimulatorGreen HeuristicsResults and projectedworksReferences
  • 71. ReferencesHarnessing GREEN IT: Principles and pratice (San Murugesan, 2009)"Research reveals environmental impact of Google searches.". Fox News. 2009-01-12. http://www.foxnews.com/story/0,2933,479127,00.html. Retrieved 2009-01-15.“Powering a Google search". Official Google Blog. Google. http://googleblog.blogspot.com/2009/01/powering-google-search.html. Retrieved 2009-10-01. "Office suite require 70 times more memory than 10 years ago.". GreenIT.fr. 2010-05-24. http://www.greenit.fr/article/logiciels/logiciel-la-cle-de-l-obsolescence-programmee-du-materiel-informatique-2748. Retrieved 2010-05-24.
  • 72. References"ARM chief calls for low-drain wireless". The Inquirer. 29 June 2010. http://www.theinquirer.net/inquirer/news/1719749/arm-chief-calls-low-drain-wireless. Retrieved 30 June 2010.Advanced Configuration and Power Interface Specification, 2010 (www.acpi.info)Towarda theory for schedulingdags in internet-basedcomputing (G. Malewicz, A. L. Rosenberg, M. Yurkewych, 2006)Lower bound for VLSI (Richard J. Lipton, Robert Sedgewick, 1981)
  • 73. ReferencesArea-time complexity for VLSI (C.D. Thompson, 1979)Cilk: an efficientmultithreadedruntimesystem (R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K. H. Randall, Y. Zhou) 5° ACM SIGPLAN Symp. On Principles and practices of Parallel Programming (PPoPP ‘95)
  • 74. That’s all, folks!Thanks for your attention!