SlideShare a Scribd company logo
XeMPUPiL
A performance-aware power capping
orchestrator for the Xen hypervisor
Marco Arnaboldi, author
marco1.arnaboldi@mail.polimi.it
05 June 2017
2
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
3
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
4
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
5
Problem Definition
VIRTUALIZATION
DATACENTERTENANTS
AVAILABLE 

POWER
XEN
6
Problem Definition
One problem, two points of view:
➡ minimize power consumption given a minimum performance
requirement
➡ maximize performance given a maximum power consumption
capping
AVAILABLE 

POWER
7
Problem Definition
AVAILABLE 

POWER
One problem, two points of view:
➡ minimize power consumption given a minimum performance
requirement
➡ maximize performance given a maximum power consumption
capping
8
Challenges
A performance-aware power capping
orchestrator for the Xen hypervisor
9
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Challenges
10
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Power management techniques
HW vs. SW
Challenges
11
A performance-aware power capping
orchestrator for the Xen hypervisor
Instrumentation-free
workload monitoring
Open Source virtualization layer adopted
by many fortune companies
Power management techniques
HW vs. SW
Challenges
12
State of the Art
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2]
RESOURCE

MANAGMENT DVFS [4]
RAPL [1]
CPU

QUOTA
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–
174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. 







13
State of the Art
RESOURCE

MANAGMENT
CPU

QUOTA
HYBRID APPROACH [5]
✓ efficiency
✓ timeliness
SOFTWARE APPROACH
✓ efficiency
✖ timeliness
HARDWARE APPROACH
✖ efficiency
✓ timeliness
[1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010.
[2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011.
[3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171–
174. ACM, 2013.
[4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007.
[5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS), 2016. 





MODEL BASED

MONITORING [3]
THREAD

MIGRATION [2]
DVFS [4]
RAPL [1]
14
Proposed Solution
Xen
Hypervisor
Hardware
DomainU
Workload
DomainU
Workload
Dom0
Decide
Observe
XeMPower
Hardware
Events
Counters
PUPiL
CLI
buffers
Hypercall manager
RAPL
interface
XL
Act
u Server setup (aka Sandy)
u 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled (4 physical
core)
u 32GB of RAM
u Xen hypervisor version 4.4
u paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the first pCPU and
with 4GB of RAM
15
Experimental Setup
u Benchmarking
u Embarrassingly Parallel (EP) [1]
u IOzone [3]
u cachebench [2]
u Bi-Triagonal solver (BT) [1]
EP IOzone cachebench BT
CPU-bound YES NO NO YES
IO-bound NO YES NO YES
memory-bound NO NO YES YES
[1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2017-04-01.
[2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2017-04-01.
[3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2017- 04-01.




16
Experimental Results
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
Baseline Definition via RAPL
17
Experimental Results
Baseline Definition via RAPL
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
CPU-intensive
benchmarks
suffer
processor
frequency
reduction
18
Experimental Results
Baseline Definition via RAPL
0
0.2
0.4
0.6
0.8
1.0
NO RAPL
RAPL 40
RAPL 30
RAPL 20NormalizedPerformance
0
0.2
0.4
0.6
0.8
1.0
EP cachebench IOzone BT
Other
benchmarks suffer
processor voltage
reduction
19
Experimental Results
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
XeMPUPiL
results
compared to the
baseline
20
Experimental Results
XeMPUPiL
results
compared to the
baseline
XeMPUPiL
outperforms pure
RAPL
for IO-, MEM-, and
mix-bound
benchmarks
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
21
Experimental Results
XeMPUPiL
results
compared to the
baseline XeMPUPiL suffers
pure CPU-bound
benchmarks, due
to Xen developer-
transparent
optimization
0
0.5
1.0
PUPiL 40
RAPL 40
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 30
RAPL 30
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
0
0.5
1.0
PUPiL 20
RAPL 20
Normalizedperformance
0
0.5
1.0
EP cachebench IOzone BT
22
Conclusions
u Conclusions
u Performance tuning trough ODA controller under a power cap
improves performances
u Hybrid approaches like XeMPUPiL
u Better efficiency than HW approaches
u Better timeliness than SW approaches

“Towards a performance-aware power capping
orchestrator for the Xen hypervisor” @ EWiLi’16, October
6th, 2016, Pittsburgh, USA
Paper
23
Future Works
u (Integrating || Moving) orchestrator logic into
scheduler
u Exploit new RAPL version on Haswell family
u Explore new policies regarding:
u Decision
u Resource assignment
24
Thank you!!!
XeMPUPiL
“Towards a performance-aware
power capping orchestrator for the
Xen hypervisor” @ EWiLi’16,
October 6th, 2016, Pittsburgh, USA
25
ODA Details
ACTDECIDEOBSERVE
u Exploration in the space of
all possible resource
configuration, based on
binary search tree
u Policy in order to distribute
the virtual resources on the
physical ones.
u Enforce power cap via
RAPL
u Define a cpu pool for the
workload
u Launch the workload on the
pool
u Change the number of the
resource on the pool
accordingly with the
decision phase
u Pin workload’s vCPU over
pCPU accordingly with the
map decided
The decision phase is similar to the one implemented in
PUPiL. The major changes are in how we evaluate the metrics
gathered in the previous phase and in how we assign the
physical resources to each virtual domain.
The evaluation criterion is based on the average IR rate,
given a certain time window: this allows the workload to adapt
to the actual configuration before a new decision is taken.
For what concerns the allocation of resources to each
domains, we chose to work at a core-level granularity: on the
one hand, each domain owns a set virtual CPUs (vCPUs),
while, on the other hand, we have a set of physical CPUs
(pCPU) present on the machine. Each vCPU is mapped on a
pCPU for a certain amount of time, while it may happen that
even multiple vCPUs can be mapped on the same pCPU.
We wanted our allocation policy to be as fair as possible,
covering the whole set of pCPUs if possible; given a workload
with M virtual resources and an assignment of N physical
resources, to each pCPUi we assign:
vCPUs(i) =
2
6
6
6
6
6
M
X
0<j<i
vCPUs(j)
N i
3
7
7
7
7
7
(1)
where i is a number between 0 and N 1, i.e., it spans over
the set of pCPUs.
C. Act
The act phase essentially consists in: 1) setting the chosen
power cap and 2) actuating the selected resource configuration.
2Source code available at: https://bitbucket.org/necst/xempower
written to set a limit on the po
CPU socket.
In a virtualized environment,
accessible by the virtual doma
tenant Dom0. However, this li
invoking custom hypercalls th
derlying hardware. To the bes
hypervisor does not natively
interact with the RAPL inter
implemented our custom hype
der to be enough generic, we
"xempower_rdmsr" and "x
one allows to reads, while the
specified MSR from Dom0.
Each hypercall needs to be
the hypervisor, that runs bare
kernel keeps track of the list o
input parameters they accept.
function has to be declared and
by the kernel at runtime: our im
Xen build-in functions to safely
i.e., wrms_safe and rdmsr_
if something goes wrong in ac
critical problems to happen at
We then implemented
Interface (CLI) tools to ac
Dom0: xempower_RaplS
xempower_RaplPowerMoni
consumption of the socket.
value of power cap and the p
are passed through the whole
u Instruction retired per
domain metric
u Data gathered from
xempowermon
u Use of HPC and Xen
scheduler in order to map
the IR to the respective
domain
26
RAPL Details
MSR
INTEL RAPL INTERFACE
HYPERCALL MANAGER
BUFFER
XEMPOWER
CLI TOOL
u Two tools based on xc native tool: XEMPOWER_RAPLSETPOWER and
XEMPOWER_RAPLPOWERMONITOR
u Tools divided into two parts
u FRONTEND: manage users command and gather information ad
privileges about the session. Pass the user parameters to the backend
u BACKEND: bake the hyperbola, declaring an hypercall structure and
filling it with the user parameters. The invoke the just defined hypercall
u Used to map user space memory to kernel memory, in order to perform
“pass by reference like” mechanism inside hyperbola
u Declaration of two custom hypercall: XEMPOWER_RDMSR and
XEMPOWER_WRMSR
u Implementation of the routines that will manage the two custom hyperbolas
u Accessed by the routines, that write to and read from RAPL specific MSR
register, in order to set the power cap and to retrive metrics on the socket
power consumption
u Three registers are accessed:
u RAPL_PWR_INFO
u RAPL_PK_POWER_LIMIT
u RAPL_PK_POWER_INFO

More Related Content

PDF
A performance-aware power capping orchestrator for the Xen hypervisor
PDF
[241]large scale search with polysemous codes
PDF
The Convergence of HPC and Deep Learning
PDF
Handout3o
PDF
Data-intensive IceCube Cloud Burst
PDF
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
PDF
Burst data retrieval after 50k GPU Cloud run
PPTX
"Building and running the cloud GPU vacuum cleaner"
A performance-aware power capping orchestrator for the Xen hypervisor
[241]large scale search with polysemous codes
The Convergence of HPC and Deep Learning
Handout3o
Data-intensive IceCube Cloud Burst
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
Burst data retrieval after 50k GPU Cloud run
"Building and running the cloud GPU vacuum cleaner"

What's hot (20)

PDF
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
PDF
SkyhookDM - Towards an Arrow-Native Storage System
PDF
Implementation of k means algorithm on Hadoop
PDF
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
PDF
20181025_pgconfeu_lt_gstorefdw
PDF
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
PDF
20181016_pgconfeu_ssd2gpu_multi
PDF
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
PDF
HTCC poster for CERN Openlab opendays 2015
PPTX
GoodFit: Multi-Resource Packing of Tasks with Dependencies
PPTX
MapReduce: A useful parallel tool that still has room for improvement
PDF
Apache Nemo
PDF
IIBMP2019 講演資料「オープンソースで始める深層学習」
PDF
[232]mist 고성능 iot 스트림 처리 시스템
PDF
20181212 - PGconfASIA - LT - English
PDF
Intro to Machine Learning for GPUs
PDF
Intelligent Placement of Datacenter for Internet Services
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PDF
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
SkyhookDM - Towards an Arrow-Native Storage System
Implementation of k means algorithm on Hadoop
JAWS-UG HPC #17 - Supercomputing'19 参加報告 - PFN 福田圭祐
20181025_pgconfeu_lt_gstorefdw
CoolDC'16: Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud
20181016_pgconfeu_ssd2gpu_multi
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
HTCC poster for CERN Openlab opendays 2015
GoodFit: Multi-Resource Packing of Tasks with Dependencies
MapReduce: A useful parallel tool that still has room for improvement
Apache Nemo
IIBMP2019 講演資料「オープンソースで始める深層学習」
[232]mist 고성능 iot 스트림 처리 시스템
20181212 - PGconfASIA - LT - English
Intro to Machine Learning for GPUs
Intelligent Placement of Datacenter for Internet Services
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Ad

Similar to XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hypervisor (20)

PDF
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
PDF
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
PDF
The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes by...
PDF
Sonaiya software Solutions
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PPTX
OpenACC Monthly Highlights: September 2021
PDF
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
PPTX
OpenACC Monthly Highlights: May 2020
PDF
Comparing Write-Ahead Logging and the Memory Bus Using
DOCX
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
DOCX
Harvesting aware energy management for time-critical wireless sensor networks
PDF
Distributed storage performance for OpenStack clouds using small-file IO work...
PPTX
OpenACC Monthly Highlights: January 2021
PDF
Hardware for Deep Learning AI ML CNN.pdf
PDF
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
PDF
Manycores for the Masses
PDF
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
PDF
HPC Resource Accounting
PDF
Scallable Distributed Deep Learning on OpenPOWER systems
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes by...
Sonaiya software Solutions
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
OpenACC Monthly Highlights: September 2021
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
OpenACC Monthly Highlights: May 2020
Comparing Write-Ahead Logging and the Memory Bus Using
JAVA 2013 IEEE NETWORKING PROJECT Harvesting aware energy management for time...
Harvesting aware energy management for time-critical wireless sensor networks
Distributed storage performance for OpenStack clouds using small-file IO work...
OpenACC Monthly Highlights: January 2021
Hardware for Deep Learning AI ML CNN.pdf
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Manycores for the Masses
XeMPUPiL @ NGCLE@e-Novia 15.11.2017
HPC Resource Accounting
Scallable Distributed Deep Learning on OpenPOWER systems
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTCamp101@2020.10.1
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTCamp101@2020.10.1
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching

Recently uploaded (20)

PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
OOP with Java - Java Introduction (Basics)
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
additive manufacturing of ss316l using mig welding
PPTX
CH1 Production IntroductoryConcepts.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Structs to JSON How Go Powers REST APIs.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Strings in CPP - Strings in C++ are sequences of characters used to store and...
OOP with Java - Java Introduction (Basics)
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CYBER-CRIMES AND SECURITY A guide to understanding
additive manufacturing of ss316l using mig welding
CH1 Production IntroductoryConcepts.pptx
573137875-Attendance-Management-System-original
Foundation to blockchain - A guide to Blockchain Tech
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Arduino robotics embedded978-1-4302-3184-4.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
bas. eng. economics group 4 presentation 1.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hypervisor

  • 1. XeMPUPiL A performance-aware power capping orchestrator for the Xen hypervisor Marco Arnaboldi, author marco1.arnaboldi@mail.polimi.it 05 June 2017
  • 6. 6 Problem Definition One problem, two points of view: ➡ minimize power consumption given a minimum performance requirement ➡ maximize performance given a maximum power consumption capping AVAILABLE 
 POWER
  • 7. 7 Problem Definition AVAILABLE 
 POWER One problem, two points of view: ➡ minimize power consumption given a minimum performance requirement ➡ maximize performance given a maximum power consumption capping
  • 8. 8 Challenges A performance-aware power capping orchestrator for the Xen hypervisor
  • 9. 9 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Challenges
  • 10. 10 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Power management techniques HW vs. SW Challenges
  • 11. 11 A performance-aware power capping orchestrator for the Xen hypervisor Instrumentation-free workload monitoring Open Source virtualization layer adopted by many fortune companies Power management techniques HW vs. SW Challenges
  • 12. 12 State of the Art SOFTWARE APPROACH ✓ efficiency ✖ timeliness MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] RESOURCE MANAGMENT DVFS [4] RAPL [1] CPU QUOTA HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171– 174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. 
 
 
 

  • 13. 13 State of the Art RESOURCE MANAGMENT CPU QUOTA HYBRID APPROACH [5] ✓ efficiency ✓ timeliness SOFTWARE APPROACH ✓ efficiency ✖ timeliness HARDWARE APPROACH ✖ efficiency ✓ timeliness [1] H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. Rapl: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISPLED), 2010. [2] R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In International Symposium on Microarchitecture (MICRO), 2011. [3]M. Ferroni, A. Cazzola, D. Matteo, A. A. Nacci, D. Sciuto, and M. D. Santambrogio. Mpower: gain back your android battery life! In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication, pages 171– 174. ACM, 2013. [4] T. Horvath, T. Abdelzaher, K. Skadron, and X. Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. In Computers, IEEE Transactions. IEEE, 2007. [5] H. Zhang and H. Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016. 
 
 
 MODEL BASED
 MONITORING [3] THREAD
 MIGRATION [2] DVFS [4] RAPL [1]
  • 15. u Server setup (aka Sandy) u 2.8-GHz quad-core Intel Xeon E5-1410 processor, no HT enabled (4 physical core) u 32GB of RAM u Xen hypervisor version 4.4 u paravirtualized instance of Ubuntu 14.04 as Dom0, pinned on the first pCPU and with 4GB of RAM 15 Experimental Setup u Benchmarking u Embarrassingly Parallel (EP) [1] u IOzone [3] u cachebench [2] u Bi-Triagonal solver (BT) [1] EP IOzone cachebench BT CPU-bound YES NO NO YES IO-bound NO YES NO YES memory-bound NO NO YES YES [1] Nas parallel benchmarks. http://www.nas.nasa.gov/publications/npb. html#url. Accessed: 2017-04-01. [2] Openbenchmarking.org. https://openbenchmarking.org/test/pts/ cachebench. Accessed: 2017-04-01. [3] Iozone filesystem benchmark. http://www.iozone.org. Accessed: 2017- 04-01. 
 

  • 16. 16 Experimental Results 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT Baseline Definition via RAPL
  • 17. 17 Experimental Results Baseline Definition via RAPL 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT CPU-intensive benchmarks suffer processor frequency reduction
  • 18. 18 Experimental Results Baseline Definition via RAPL 0 0.2 0.4 0.6 0.8 1.0 NO RAPL RAPL 40 RAPL 30 RAPL 20NormalizedPerformance 0 0.2 0.4 0.6 0.8 1.0 EP cachebench IOzone BT Other benchmarks suffer processor voltage reduction
  • 19. 19 Experimental Results 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT XeMPUPiL results compared to the baseline
  • 20. 20 Experimental Results XeMPUPiL results compared to the baseline XeMPUPiL outperforms pure RAPL for IO-, MEM-, and mix-bound benchmarks 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT
  • 21. 21 Experimental Results XeMPUPiL results compared to the baseline XeMPUPiL suffers pure CPU-bound benchmarks, due to Xen developer- transparent optimization 0 0.5 1.0 PUPiL 40 RAPL 40 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 30 RAPL 30 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT 0 0.5 1.0 PUPiL 20 RAPL 20 Normalizedperformance 0 0.5 1.0 EP cachebench IOzone BT
  • 22. 22 Conclusions u Conclusions u Performance tuning trough ODA controller under a power cap improves performances u Hybrid approaches like XeMPUPiL u Better efficiency than HW approaches u Better timeliness than SW approaches
 “Towards a performance-aware power capping orchestrator for the Xen hypervisor” @ EWiLi’16, October 6th, 2016, Pittsburgh, USA Paper
  • 23. 23 Future Works u (Integrating || Moving) orchestrator logic into scheduler u Exploit new RAPL version on Haswell family u Explore new policies regarding: u Decision u Resource assignment
  • 24. 24 Thank you!!! XeMPUPiL “Towards a performance-aware power capping orchestrator for the Xen hypervisor” @ EWiLi’16, October 6th, 2016, Pittsburgh, USA
  • 25. 25 ODA Details ACTDECIDEOBSERVE u Exploration in the space of all possible resource configuration, based on binary search tree u Policy in order to distribute the virtual resources on the physical ones. u Enforce power cap via RAPL u Define a cpu pool for the workload u Launch the workload on the pool u Change the number of the resource on the pool accordingly with the decision phase u Pin workload’s vCPU over pCPU accordingly with the map decided The decision phase is similar to the one implemented in PUPiL. The major changes are in how we evaluate the metrics gathered in the previous phase and in how we assign the physical resources to each virtual domain. The evaluation criterion is based on the average IR rate, given a certain time window: this allows the workload to adapt to the actual configuration before a new decision is taken. For what concerns the allocation of resources to each domains, we chose to work at a core-level granularity: on the one hand, each domain owns a set virtual CPUs (vCPUs), while, on the other hand, we have a set of physical CPUs (pCPU) present on the machine. Each vCPU is mapped on a pCPU for a certain amount of time, while it may happen that even multiple vCPUs can be mapped on the same pCPU. We wanted our allocation policy to be as fair as possible, covering the whole set of pCPUs if possible; given a workload with M virtual resources and an assignment of N physical resources, to each pCPUi we assign: vCPUs(i) = 2 6 6 6 6 6 M X 0<j<i vCPUs(j) N i 3 7 7 7 7 7 (1) where i is a number between 0 and N 1, i.e., it spans over the set of pCPUs. C. Act The act phase essentially consists in: 1) setting the chosen power cap and 2) actuating the selected resource configuration. 2Source code available at: https://bitbucket.org/necst/xempower written to set a limit on the po CPU socket. In a virtualized environment, accessible by the virtual doma tenant Dom0. However, this li invoking custom hypercalls th derlying hardware. To the bes hypervisor does not natively interact with the RAPL inter implemented our custom hype der to be enough generic, we "xempower_rdmsr" and "x one allows to reads, while the specified MSR from Dom0. Each hypercall needs to be the hypervisor, that runs bare kernel keeps track of the list o input parameters they accept. function has to be declared and by the kernel at runtime: our im Xen build-in functions to safely i.e., wrms_safe and rdmsr_ if something goes wrong in ac critical problems to happen at We then implemented Interface (CLI) tools to ac Dom0: xempower_RaplS xempower_RaplPowerMoni consumption of the socket. value of power cap and the p are passed through the whole u Instruction retired per domain metric u Data gathered from xempowermon u Use of HPC and Xen scheduler in order to map the IR to the respective domain
  • 26. 26 RAPL Details MSR INTEL RAPL INTERFACE HYPERCALL MANAGER BUFFER XEMPOWER CLI TOOL u Two tools based on xc native tool: XEMPOWER_RAPLSETPOWER and XEMPOWER_RAPLPOWERMONITOR u Tools divided into two parts u FRONTEND: manage users command and gather information ad privileges about the session. Pass the user parameters to the backend u BACKEND: bake the hyperbola, declaring an hypercall structure and filling it with the user parameters. The invoke the just defined hypercall u Used to map user space memory to kernel memory, in order to perform “pass by reference like” mechanism inside hyperbola u Declaration of two custom hypercall: XEMPOWER_RDMSR and XEMPOWER_WRMSR u Implementation of the routines that will manage the two custom hyperbolas u Accessed by the routines, that write to and read from RAPL specific MSR register, in order to set the power cap and to retrive metrics on the socket power consumption u Three registers are accessed: u RAPL_PWR_INFO u RAPL_PK_POWER_LIMIT u RAPL_PK_POWER_INFO