SlideShare a Scribd company logo
Xen Power Improvements



Will Auld, Yang Z Zhang, Winston Wang
Intel Corporation
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO
LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS
AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER,
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF
INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR
OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN
MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to
change without notice.
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which
may cause the product to deviate from published specifications. Current characterized errata are available on
request.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the
United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2012 Intel Corporation.




                                                       2
Agenda

• Background
• Power saving in client
• Power saving in server
• Summary




                           3
Room to save POWER

• Ideal/standard  Native OS power consumption
• Reality  Hypervisor power consumption
• LARGE DELTA    (~40% for client at start)




                                 4
Client architecture



                Client Xen Configuration


             Linux                    Win7
                        DomU
             Dom0                     DomU
                         VM
              VM                       VM


                     Xen Hypervisor
                       Hardware




                           5
Goal
 • Native OS power efficiency
 • Close the Power gap with Native Win7

                         Code
                         Drop


               Fix                  Identify
              Code                    Gap


                          Root
                         Cause




                                6
Current results
• ~40% idle power gap 2 years ago
• ~5% idle power gap now

                            Idle Power Gap
                 45%
                 40%
                 35%
                 30%
                 25%
                 20%
                 15%
                 10%
                 5%
                 0%
                          Project Start       Project End


• More?
• Increasingly harder to extract




                                          7
LCD brightness control
 LCD Display
 – ~20% idle power
 − Broken brightness controls




                                   Win7>                             Dom0>




 Fix:
   −Added emulation of ACPI video extension
        − Specifically, brightness control methods _BCL, _BCM, and _BQC
        − Added to VM guest ACPI BIOS
        − Pass through control knob output to Dom0 take platform action
   −Make sure Dom0 LCD brightness is really working


                                           8
Runtime IO power management

Dysfunctional IO power management
• ~15% Idle power
• 1st available in 2.6.32 kernel, but:
  − not functioning correctly


Fix:
• Enable energy-saving states at run time and auto suspended when idle
• Gap dropped from ~25% to 6.8% after fix
  − HP 8440p mobile platform based on Nehalem processor




                                         9
ATA_link power                                            Max_Perf


ATA_link static power setting
− ~6% idle power in max_performance                      Run Time
− But performance suffers with min_power
− Even worse:
  −All SCSI hosts active with/without attached devices
                                                         Mim_Power

Fix:
− Runtime update for ATA_link power setting
  −Toggle min_power / max_performance, as needed
− Disable clocks on deviceless ports




                                        10
Network power
Wired and Wi-Fi
− ~16 % idle power (650mw)
− Many interrupts break deep c state during idle



                Win7>                              Dom0>




Fix:
− Enable Wi-Fi and E1000 power saving mode in Dom0
− Add Win7 power management PV driver to pass control settings to Dom0




                                      11
GFX power management

iGFX power management inactive
− ~16% idle power (650mw)
− VT-d requires device reset
  −Reset clears all regs including BIOS enabled power management regs
       − Disables: RC6 (render standby), turbo, and GPMT (Graphics Power
         Modulation Technology)
                                              VT-d operation


       BIOS               PM ON                         PM        PM ON
                                     Reset              OFF


                                       Save / Restore

Fix:
− Save/Restore PM registers around FLR




                                      12
Client summary

• Started with a ~40% gap
• Ended with ~5% gap
• Greatly improved and got close to the goal




                                    13
Server power savings --
  increasing idle time

• Timer alignment
• Power aware scheduling
• Reducing periodic tasks




                            14
Timer alignment

• Independent, frequent timer interrupts 
• Frequent wake-ups
• Reduced idle time, greater power consumption
                       intr arrived           intr arrived
                                                                Timer intr
                idle            busy          idle      busy
   Cpu0:                                                       CPU idle
                                                               CPU busy


                                       intr arrived

   Cpu1:

                                                               Resultant
  Socket
                                                               Socket C-state
  :




                                         15
Timer alignment

• Proposal
 •   Configurable timer consolidate window, such as 50 ns
 •   Compute timer interrupt moment
 •   Shift timer handle moment to next timer consolidate moment
• Benefit
 •   Fewer interrupts  longer idle time  power savings


• Challenges
 •   Guest schedule impact– performance impact
 •   Cross CPU timer synchronization
 •   IPI frequency and synchronization
                                    16
Timer alignment
                      intr arrived          intr arrived
                                                              Timer intr
               idle            busy        idle       busy
   Cpu0:                                                     CPU idle
                                                             CPU busy
                                          New intr arrived
                                       intr arrived

   Cpu1:

                                                             Resultant
  Socket
                                                             Socket C-state
  :
                                      Gained C-State


• Shifting CPU1’s interrupt to match CPU0’s Nice gain in C-State
• Repeated over and over adds up




                                          17
Power aware scheduling

• ACPI modes –
 − Performance  Power hungry mode
 − Energy mode  Power savings mode
 − Balanced


• Task to Scheduling
 − Performance
   − Schedule vCPUs one per physical core before pairing
 − Energy
   − Schedule vCPUs one per logical core 
     − power down more cores 
     − power down more sockets




                                         18
Power saving scheduler


 packages
                                      pkg 0                            pkg 1
 cores
                         core 0          core 1               core 0          core 1
 HT                      cpu 0   cpu 1   cpu 2   cpu 3        cpu 4   cpu 5   cpu 6   cpu 7



running task            vcpu0            vcpu1                vcpu2

    power aware
     scheduler

           Idle CPU/in deep C-state                    Busy CPU               Not in deep C-state




                                                  19
Reduce periodic activity

• Power-unfriendly RTC emulation:
 − VMM updates RTC clock twice per second
 − Solution
   − Update RTC clock only on Read
                                                          If a clock ticks
                                                          where no one
                                                          can see it, does
                                                          the time change?
• Frequent Wake-ups to check buffered I/O:
 − Wakeup multiple times a second (Polling model)
 − Solution (Push model)
   − Event channel to notify buffered I/O change status

                                                          No more polling




                                          20
Server summary

• Significant areas of work
• Need to quantify the impacts




                                 21
Overall summary

• Every component counts – software and hardware
• Make sure the basics are working
• Still more to do




                                 22
Questions?




       23

More Related Content

PDF
Dave Gilbert - KVM and QEMU
PDF
Linux based Stubdomains
PDF
Xen PV Performance Status and Optimization Opportunities
PDF
Building a Distributed Block Storage System on Xen
PDF
KVM Tuning @ eBay
PDF
QEMU Disk IO Which performs Better: Native or threads?
PPTX
Link Virtualization based on Xen
PDF
PVH : PV Guest in HVM container
Dave Gilbert - KVM and QEMU
Linux based Stubdomains
Xen PV Performance Status and Optimization Opportunities
Building a Distributed Block Storage System on Xen
KVM Tuning @ eBay
QEMU Disk IO Which performs Better: Native or threads?
Link Virtualization based on Xen
PVH : PV Guest in HVM container

What's hot (20)

PDF
Kvm performance optimization for ubuntu
PDF
Rmll Virtualization As Is Tool 20090707 V1.0
PDF
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
PDF
Xen in Linux 3.x (or PVOPS)
PPTX
PV-Drivers for SeaBIOS using Upstream Qemu
PDF
XS Boston 2008 Memory Overcommit
PDF
XS Boston 2008 Quantitative
PDF
VMworld 2014: Extreme Performance Series
PDF
The kvm virtualization way
PDF
XS Boston 2008 XenLoop
PPTX
Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...
PDF
XPDDS18: The Art of Virtualizing Cache Maintenance - Julien Grall, Arm
PDF
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
PDF
Xen and Client Virtualization: the case of XenClient XT
PDF
KVM tools and enterprise usage
PDF
From printk to QEMU: Xen/Linux Kernel debugging
ODP
PDF
XPDDS18: NVDIMM Overview - George Dunlap, Citrix
PPTX
How to Fail at VDI
PDF
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...
Kvm performance optimization for ubuntu
Rmll Virtualization As Is Tool 20090707 V1.0
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
Xen in Linux 3.x (or PVOPS)
PV-Drivers for SeaBIOS using Upstream Qemu
XS Boston 2008 Memory Overcommit
XS Boston 2008 Quantitative
VMworld 2014: Extreme Performance Series
The kvm virtualization way
XS Boston 2008 XenLoop
Dealing with Hardware Heterogeneity Using EmbeddedXEN, a Virtualization Frame...
XPDDS18: The Art of Virtualizing Cache Maintenance - Julien Grall, Arm
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
Xen and Client Virtualization: the case of XenClient XT
KVM tools and enterprise usage
From printk to QEMU: Xen/Linux Kernel debugging
XPDDS18: NVDIMM Overview - George Dunlap, Citrix
How to Fail at VDI
XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & De...
Ad

Similar to Improving Xen idle power efficiency (20)

PDF
Hp All In 1
PPTX
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
PPTX
Intern presentation nicolechiou_2018_vareximaging_sanitized
PPT
atom-imp-concept of hardware tools in ECE.ppt
PPTX
Computer Architecture and Organization
PPT
Power Optimization Through Manycore Multiprocessing
PDF
IT Energy Waste - Green IT Expo 2009
PDF
Sharam salamian
PDF
2011 Feb07 Lewis Prospectus
PPT
Mobile computing edited
PDF
Trilogy - Henk Groenendijk
PPTX
Green computing
PDF
Symposium on HPC Applications – IIT Kanpur
PPTX
Power Management in Embedded Systems
PDF
Analyze and optimize Android apps power consumption
PDF
Developing Low Power Mobile Platform
PDF
Ibm power7
PPT
Os Vandeven
PDF
Embedded Solutions 2010: Intel Multicore by Eastronics
PDF
Productive parallel programming for intel xeon phi coprocessors
Hp All In 1
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Intern presentation nicolechiou_2018_vareximaging_sanitized
atom-imp-concept of hardware tools in ECE.ppt
Computer Architecture and Organization
Power Optimization Through Manycore Multiprocessing
IT Energy Waste - Green IT Expo 2009
Sharam salamian
2011 Feb07 Lewis Prospectus
Mobile computing edited
Trilogy - Henk Groenendijk
Green computing
Symposium on HPC Applications – IIT Kanpur
Power Management in Embedded Systems
Analyze and optimize Android apps power consumption
Developing Low Power Mobile Platform
Ibm power7
Os Vandeven
Embedded Solutions 2010: Intel Multicore by Eastronics
Productive parallel programming for intel xeon phi coprocessors
Ad

More from The Linux Foundation (20)

PDF
ELC2019: Static Partitioning Made Simple
PDF
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
PDF
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
PDF
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
PDF
XPDDS19 Keynote: Unikraft Weather Report
PDF
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
PDF
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
PDF
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
PDF
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
PPTX
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
PPTX
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
PDF
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
PDF
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
PDF
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
PDF
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
PDF
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
PDF
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
PDF
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
PDF
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
PDF
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
ELC2019: Static Partitioning Made Simple
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE

Recently uploaded (20)

DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PDF
Types of control:Qualitative vs Quantitative
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
DOCX
Euro SEO Services 1st 3 General Updates.docx
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPTX
HR Introduction Slide (1).pptx on hr intro
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PPTX
Amazon (Business Studies) management studies
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
How to Get Funding for Your Trucking Business
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
Types of control:Qualitative vs Quantitative
DOC-20250806-WA0002._20250806_112011_0000.pdf
Belch_12e_PPT_Ch18_Accessible_university.pptx
Euro SEO Services 1st 3 General Updates.docx
COST SHEET- Tender and Quotation unit 2.pdf
ICG2025_ICG 6th steering committee 30-8-24.pptx
Laughter Yoga Basic Learning Workshop Manual
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
HR Introduction Slide (1).pptx on hr intro
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Amazon (Business Studies) management studies
Power and position in leadershipDOC-20250808-WA0011..pdf
Probability Distribution, binomial distribution, poisson distribution
How to Get Funding for Your Trucking Business
Nidhal Samdaie CV - International Business Consultant
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider

Improving Xen idle power efficiency

  • 1. Xen Power Improvements Will Auld, Yang Z Zhang, Winston Wang Intel Corporation
  • 2. Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2012 Intel Corporation. 2
  • 3. Agenda • Background • Power saving in client • Power saving in server • Summary 3
  • 4. Room to save POWER • Ideal/standard  Native OS power consumption • Reality  Hypervisor power consumption • LARGE DELTA (~40% for client at start) 4
  • 5. Client architecture Client Xen Configuration Linux Win7 DomU Dom0 DomU VM VM VM Xen Hypervisor Hardware 5
  • 6. Goal • Native OS power efficiency • Close the Power gap with Native Win7 Code Drop Fix Identify Code Gap Root Cause 6
  • 7. Current results • ~40% idle power gap 2 years ago • ~5% idle power gap now Idle Power Gap 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Project Start Project End • More? • Increasingly harder to extract 7
  • 8. LCD brightness control LCD Display – ~20% idle power − Broken brightness controls Win7> Dom0> Fix: −Added emulation of ACPI video extension − Specifically, brightness control methods _BCL, _BCM, and _BQC − Added to VM guest ACPI BIOS − Pass through control knob output to Dom0 take platform action −Make sure Dom0 LCD brightness is really working 8
  • 9. Runtime IO power management Dysfunctional IO power management • ~15% Idle power • 1st available in 2.6.32 kernel, but: − not functioning correctly Fix: • Enable energy-saving states at run time and auto suspended when idle • Gap dropped from ~25% to 6.8% after fix − HP 8440p mobile platform based on Nehalem processor 9
  • 10. ATA_link power Max_Perf ATA_link static power setting − ~6% idle power in max_performance Run Time − But performance suffers with min_power − Even worse: −All SCSI hosts active with/without attached devices Mim_Power Fix: − Runtime update for ATA_link power setting −Toggle min_power / max_performance, as needed − Disable clocks on deviceless ports 10
  • 11. Network power Wired and Wi-Fi − ~16 % idle power (650mw) − Many interrupts break deep c state during idle Win7> Dom0> Fix: − Enable Wi-Fi and E1000 power saving mode in Dom0 − Add Win7 power management PV driver to pass control settings to Dom0 11
  • 12. GFX power management iGFX power management inactive − ~16% idle power (650mw) − VT-d requires device reset −Reset clears all regs including BIOS enabled power management regs − Disables: RC6 (render standby), turbo, and GPMT (Graphics Power Modulation Technology) VT-d operation BIOS PM ON PM PM ON Reset OFF Save / Restore Fix: − Save/Restore PM registers around FLR 12
  • 13. Client summary • Started with a ~40% gap • Ended with ~5% gap • Greatly improved and got close to the goal 13
  • 14. Server power savings -- increasing idle time • Timer alignment • Power aware scheduling • Reducing periodic tasks 14
  • 15. Timer alignment • Independent, frequent timer interrupts  • Frequent wake-ups • Reduced idle time, greater power consumption intr arrived intr arrived Timer intr idle busy idle busy Cpu0: CPU idle CPU busy intr arrived Cpu1: Resultant Socket Socket C-state : 15
  • 16. Timer alignment • Proposal • Configurable timer consolidate window, such as 50 ns • Compute timer interrupt moment • Shift timer handle moment to next timer consolidate moment • Benefit • Fewer interrupts  longer idle time  power savings • Challenges • Guest schedule impact– performance impact • Cross CPU timer synchronization • IPI frequency and synchronization 16
  • 17. Timer alignment intr arrived intr arrived Timer intr idle busy idle busy Cpu0: CPU idle CPU busy New intr arrived intr arrived Cpu1: Resultant Socket Socket C-state : Gained C-State • Shifting CPU1’s interrupt to match CPU0’s Nice gain in C-State • Repeated over and over adds up 17
  • 18. Power aware scheduling • ACPI modes – − Performance  Power hungry mode − Energy mode  Power savings mode − Balanced • Task to Scheduling − Performance − Schedule vCPUs one per physical core before pairing − Energy − Schedule vCPUs one per logical core  − power down more cores  − power down more sockets 18
  • 19. Power saving scheduler packages pkg 0 pkg 1 cores core 0 core 1 core 0 core 1 HT cpu 0 cpu 1 cpu 2 cpu 3 cpu 4 cpu 5 cpu 6 cpu 7 running task vcpu0 vcpu1 vcpu2 power aware scheduler Idle CPU/in deep C-state Busy CPU Not in deep C-state 19
  • 20. Reduce periodic activity • Power-unfriendly RTC emulation: − VMM updates RTC clock twice per second − Solution − Update RTC clock only on Read If a clock ticks where no one can see it, does the time change? • Frequent Wake-ups to check buffered I/O: − Wakeup multiple times a second (Polling model) − Solution (Push model) − Event channel to notify buffered I/O change status No more polling 20
  • 21. Server summary • Significant areas of work • Need to quantify the impacts 21
  • 22. Overall summary • Every component counts – software and hardware • Make sure the basics are working • Still more to do 22

Editor's Notes

  • #9: The LCD brightness is control by ACPI. When we press the hotkey in the laptop to decrease the LCD brightness, it will trigger a ACPI event and the event handler will call the control methods to take the corresponding action. But we lack the control methods support in guest’s ACPI table, so we need add those control methods to guest ACPI. And when those control methods are called by guest, then ask dom0 to do the work.
  • #11: Basically, we need to turn off the SCSI host that doesn’t attached any device to save the power. But previous client didn’t do this and this will waste power. Now our solution is to bring down all the SCSI host that do not attached any device to save more power.In previous Client, the ATA link only can be set statically: either to mini_power or to max_performance. Now, we add the dynamic solution: runtime check the system load, if idle, set to mini_power, or else, set to max_performance.
  • #13: FLR means function level reset. FLR will reset the whole device and go to initial status(like power on). The issue is that FLR is required when pass through GFX to guest. Then it will clear all PM regs setting by BIOS which can save the power. The solution is that we save the PM regs before FLR and restore it after FLR.RC6(render standby) is a GPU’s technology that allows the GPU to go into a very low power consumption state when the GPU is idle. It is same with the C state in CPU.Turbo is the intel turbo boost. Refer to en.wikipedia.org/wiki/Intel_Turbo_Boost to get more detailsGPMT(Graphics Power Modulation Technology):Graphics Power Modulation Technology (Intel GPMT) is a method for saving power in the graphics adapter while continuing to display and process data in the adapter. This method will switch the render frequency and/or render voltage dynamically between higher and lower power states supported on the platform based on render engine workload
  • #15: Process is an OS schedulable task/entity
  • #17: Actually, we don’t know the proper value as the expiration window. The 50ns just a guess. As you know, different guest and different workload have the different requirement. It hard to give a fixing value as the expiration window. We may need lots of experiments to get the proper value. Unfortunately, we don’t have the time to do this. Also, we don’t have the time to implement the timer alignment in Xen. We did it in KVM. But the idea is same between Xen and KVM.
  • #21: 1. Not VM. The RTC is emulated by Hypervisor. Here I mean the emulation logic in Hypervior is wrong, not the usage inside VM.2. Event channel is a mechanism used to notify events between hypervisor and VMs. Before, device model polls the buffered I/O(several times a second), and mostly, there are no new data arrived. Now, when hypervisor write the data to buffered I/O page, it will issue an event to notify device model that new data is arriving, then device model will wake up to get the data. With this way, we can eliminate the needless waken ups of device model to check the buffer I/O.