SlideShare a Scribd company logo
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
Zhenyun Zhuang, Haricharan Ramachandra, Badri Sridharan
2029 Stierlin Ct, Mountain View, CA 94043, USA
{zzhuang, hramachandra, bsridharan}@linkedin.com
Abstract—Modern cloud computing platforms (e.g. Linux
on Intel CPUs) feature ACPI-based (Advanced Configuration
and Power Interface) mechanism, which dynamically scales
CPU frequencies/voltages to adjust the CPU frequencies based
on the workload intensity. With this feature, CPU frequency
is reduced when the workload is relatively light in order to
save energy; while increased when the workload intensity is
relatively high.
In business cloud computing environments, software prod-
ucts/services often need to “scale out” to multiple machines to
form a cluster to achieve a pre-defined aggregated performance
goal (e.g., SLA-devised throughput). To reduce business opera-
tion cost, minimizing the provisioned cluster size is critical.
However, as we show in this work, the working of ACPI
in today’s modern OS may result in more machines being
provisioned, hence higher business operation cost,
To deal with this problem, we propose a SLA-aware CPU
scaling algorithm based on business SLA (Service Level Agree-
ment aware). The proposed design rational and algorithm are
a fundamental rethinking of how ACPI mechanisms should be
implemented in business cloud computing environments. Con-
trary to the current forms of ACPI which simply adapt CPU
power levels only based on workload intensity, the proposed
SLA-aware algorithm is primarily based on current application
performance relative to the pre-defined SLA. Specifically, the
algorithm targets at achieving the pre-defined SLA as the top-
level goal, while saving energy as the second-level goal.
Keywords-ACPI; Power saving; Service level agreements;
Performance
I. INTRODUCTION
Advanced Configuration and Power Interface (ACPI) [1]
provides standards for power management by OS. ACPI
allows dynamic scaling of CPU power levels and frequen-
cies. Modern CPUs typically allow the computations under
multiple CPU frequencies and voltages. Operating at higher
frequency, a CPU is more powerful processing computing
tasks, while more power will be consumed per unit of time.
With the support of OS, the primary goal of ACPI is to save
energy through the CPU scaling of frequencies and voltages.
ACPI is particularly important in cloud computing scenarios
where computing demand is elastic and energy saving is
critical. Specifically, a mechanism called CPUfreq [2] is
implemented in Linux kernel, which enables the operating
system to scale the CPU frequency up or down in order to
save power.
To help managing the power levels, certain pre-configured
power schemes are implemented in OS. These power
schemes are referred to as governors [2]. Common governors
are performance, ondemand, userspace, etc. Among them,
the ondemand governor is enabled by default in Linux.
Ondemand governor adjusts the CPU frequencies based on
how heavy the workload is. The more intensive workload
is detected, the higher frequency will be scaled up to. On
the other hand, if the workload is detected to be light, the
CPU frequency scales down. The detection mechanism that
measures the current workload is based on sampling of par-
ticular intervals (e.g., every 10ms). During the last interval,
if the CPU usage is above a scaling threshold (e.g., CPU
95% busy), the frequency will be scaled up to maximum
frequency; otherwise, the frequency will be scaled down,
one level at a time. For instance, on a machine with Sandy
Bridge single socket machine with 6 cores and 12 CPUs, and
the CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz,
there are totally 10 levels, ranging between 1200MHz and
2001MHz. With the heaviest workload, the CPU frequency
will be 2001MHz, while the lightest workload will result in
1200MHz.
In this work, we found that current ACPI-based CPU scal-
ing mechanism, particularly the default ondemand governor,
is fundamentally inappropriate to business cloud production
environments. For example, in the production computing
environments of Internet companies, particularly the cloud
computing platforms, the primary goal is to meet the SLA
(Service Level Agreement) with the minimum business op-
eration cost, rather than blindly saving energy. The current
design of the governors does NOT take into considerations
the SLA part. Specifically, when the current performance
of an application is violating the SLA (e.g., in the form of
response latencies), the CPU frequencies should be scaled
up, irrespective how busy the CPU actually is. In other
words, without the gene of SLA considerations, current
dynamic governors may unnecessarily violate the SLAs.
Not only may current governors violate SLAs, the resulted
operation cost may also be unnecessarily increased. To
give an example, let us consider the following scenario.
Assuming a particular application (e.g., a web service) needs
to be deployed on a cluster of machines to achieve a required
aggregated throughput of 200K event/s. To make it simple,
we assume a machine can only deploy one application
instance due to certain limitations. With the default on-
demand governor, each application instance delivers 10K
event/s throughput, hence we need to deploy 20 machines
in order to meet our aggregated SLA of 200K event/s.
However, by manually scaling up the CPU frequency, the
machine could achieve higher throughput. Assuming each
application instance can delivery 20K event/s with scaled-
up CPU frequency, only 10 machines are needed - a 2X
saving on the number of machines! We have verified such
a hypothesis in our lab, which will be discussed in later
section. CPU consumed energy is only part of the entire
energy consumption (e.g., RAM, motherboard, disks), hence
minimizing the cluster size oftentimes can lead to much
more energy saving. In other words, even though running
with the default governor can save the energy on each
individual machines, thanks to the lower CPU frequencies
resulted, considering the number of machines deployed, the
total business cost of the cluster may well exceeds the
cost of a smaller cluster with manually scaled-up CPU.
Furthermore, current governors may unnecessarily increase
the CPU power consumption. Since SLAs are businesses’
primary concerns, when SLAs are met, the CPU frequen-
cies should be scaled down, irrespective of how busy the
CPUs are. Subjecting to meeting SLA, scaling down CPU
frequencies will result in less CPU energy consumption.
So we argue that as long as the performance does not
violate SLAs, such power-saving actions (i.e., scaling down
CPU frequencies) should be taken. Unfortunately, current
governors are blind to SLAs, and they only scale frequencies
based on how busy the CPU is, which is a runaway from
typical business requirements.
In this work, motivated by the weaknesses of current
governors on modern OS 1, we address this problem by
proposing an entirely new SLA-aware paradigm to dynami-
cally scale CPU frequencies. We also proposed an example
mechanism, which is a fundamental change of how ACPI-
based CPU scaling works on modern OS today. Unlike
traditional ACPI mechanisms which do not consider business
SLA requirements of the applications running on a machine,
the proposed mechanism targets at achieving the SLA as the
top-level goal, while treating saving energy as the second-
level goal. In other words, the SLA-aware algorithm will
achieve the following two goals, in the order of importance
of: (1) Meeting SLA is the primary target; when SLAs are
in jeopardy, the frequency will be scaled up to meet SLA;
(2) Subject to meeting SLAs, the CPU frequencies will be
scaled down whenever possible.
To achieve the above goals, the proposed algorithm con-
tinuously monitors the current application performance, and
scales CPU frequencies accordingly. It can also consider
the workload properties (e.g., traffic volume changes) to
further ensure the above goals by finely tuning the scaling
mechanism. The algorithm can be realized in three forms:
(1) designing a new governor; (2) dynamically choosing
different governors; (3) dynamically tuning a particular
1Note that we do not argue that the design of ACPI is wrong, instead, we
attempt to enhance the application of ACPI by making ACPI SLA-aware.
governor (e.g. ondemand). We will detail the solution in
later sections.
For the remainder of the writing, after providing some
necessary technical background in section II, we then define
and motivate the problems being addressed in this writing in
Section III. We present the designs in Section IV and present
in Section V the deployment model and how to use our
algorithm. We build a prototype and perform performance
evaluation using the prototype in Section VI. We also present
certain related works in Section VII. Finally in Section VIII
we conclude the work.
II. BACKGROUND AND SCOPE
A. Background
CPU power consumption and ACPI CPU is one of
the major components that consume power on computing
platforms. Modern CPUs can operate on different power
levels, which further depend on the voltage and the fre-
quency the CPU operates on. The power consumption of
a CPU is linearly determined by the frequency it operates.
CPU frequencies also roughly determine how powerful the
CPU is: the higher frequency, the better CPU computing
performance.
Advanced Configuration and Power Interface (ACPI) [1]
specification provides an open standard for device config-
uration and power management by the operating system.
ACPI allows dynamic scaling of CPU power levels and
frequencies. For the particular Linux/Intel platform we used,
the CPU has totally 12 power levels, with the minimum
1.2GHz and maximum 2.0GHz. Apparently, the higher the
frequency, the more power is consumed.
Governors To help managing the power levels, cer-
tain pre-configured power schemes are implemented in OS
kernel. These power schemes are referred to as gover-
nors. For instance, several governors are available with
the CPUfreq subsystem: (1) Performance Governor, which
sets CPU frequencies to the highest possible for maximum
performance. (2) Power-save Governor, which sets CPU
frequencies to the lowest possible. This can have severe
impact on the performance, as the system will never rise
above this frequency no matter how busy the processors are.
(3) On-demand Governor, which dynamically adjusts CPU
frequency. The governor monitors the CPU utilization. As
soon as it exceeds a certain threshold, the governor will
maximize CPU frequency. If the utilization is below the
threshold, the next lowest frequency is used.
B. Scope
This work assumes the availability of different CPU power
levels (e.g., frequencies) and the ability to adjust the CPU
level. The proposed algorithm adjusts the CPU levels based
on the comparison between the pre-defined performance
(a) Per-instance throughput
(b) Average CPU frequency
(c) Average CPU usage
Figure 1. Deploying multiple instances on the same machine (ACPI
ON)
SLA and current application performance, hence it requires
the SLA definition and the measurement of current perfor-
mance. To simply the presentation, we assume the SLA is
defined as single absolute value such as “throughput be
higher than 200KBps” or “response time be lower than
100ms”.
To allow for timely adaptation of CPU levels, whenever
necessary, coarsely defined SLAs need to be converted to
specific performance requirements. For example, a SLA
may be coarsely defined as “99% of the time in a day
should be achieved at higher than 200KBps”. For this SLA,
the converted SLA can be simply “throughput be higher
than 200KBps”. As another example, “99.9 percentile of
the response time should be smaller than 100ms”. For this
SLA, the converted SLA can be “response time be lower
than 100ms”. Similarly, the current application performance
needs to be measured timely according the power adaptation
periods, e.g., every 5 seconds.
III. PROBLEM DEFINITION AND MOTIVATION
SCENARIOS
A. Problem
The problem we try to address in this work is to de-
termine the computing capacity needed to achieve certain
performance goal (e.g., aggregated throughput). After certain
SLA is set (i.e., in the form of aggregated throughput of
processed events), the de facto practice in most Internet
companies such as Facebook and LinkedIn is to “scale
out” the computing infrastructure by parallelizing multiple
deployments of the same computing component. For this,
certain capacity planning is conducted to determined the
number of nodes (i.e., machines) needed, as well as the
number of computing instances are deployed on a single
node.
Capacity planning in business cloud computing environ-
ments is closely tightened to business cost. The goal is to
allocate “just enough” nodes such that the pre-determined
SLA is met. For instance, LinkedIn’s products such as
Databus [3] can be horizontally scaled up by deploying
multiple instances of the product. To reduce the number of
machines used, we need to know how many instances we
can co-locate on the same machine. Based on these results,
we can then answer further questions including: (1) Given
a performance requirement (e.g. SLA), how many machine
are needed; (2) Given a traffic volume, how many machines
are needed; etc.
During our investigations and experiments with capacity
planning of Databus, we had interesting findings regarding
an issue with ACPI that results in more-than-necessary
number of machines are needed, hence higher business
operation cost. Based on these findings, we propose an
algorithm to address the issue.
B. Production experiments
In the experiment, we’d like to determine the minimum
number of nodes needed for a pre-determined SLA. Firstly
we need to know the maximum aggregated throughput can
be achieved by a single node. To maximize the utilization
of computing resources, typically multiple homogenous in-
stances are deployed on the same node.
For ease of reputability and configurability, we used an
custom-built application which mimics our Databus product.
The application mimics the major internal mechanism of
Databus, while removing the dependence on production data.
It consists of a pair of Java components which communicates
via a TCP/IP connection. Briefly, the sending component
keeps sending out events (i.e., certain bytes of data) to the
receiving component. Upon receiving an event, the receiving
component processes the event and replies back to the
sender.
Both components are deployed on the same machine. The
machine is a Sandy Bridge single socket machine with 6
cores, 12 CPUs (i.e., hardware threads) and 64GB RAM.
The CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz.
The OS is Linux RedHat Enterprise with kernel version of
2.6.32-358.6.2.el6.x86 64.
The experiment began from deploying a single instance,
then adding more and more homogenous instances. We
would expect that the per-instance throughput will keep
dropping as we add more co-located instances due to the
competition of computing resources. We also expect that
the aggregated throughput of all deployed instances will rise
with more instances, up to a particular threshold. After the
threshold, the aggregated throughput will also drop. Such
expectations are quite normal to the usual scenarios where
we run multiple applications on the same machine.
C. Using the default OS config (ACPI enabled)
We first conduct the experiments with the default OS
configuration. Though the aggregated throughput faithfully
complies with our expectation, per-instance throughput de-
fies it. We know that deploying more instances will result
in more resource contentions (e.g., memory, CPU), hence
lower per-instance throughput. However, what we observed
is just the opposite: instead of decreasing, per-instance
throughput actually keeps increasing with more instances
(up to a peak performance), which is quite counter intuitive.
Specifically, Figure 1(a) displays the instance throughput
vs number of instances. Though from 4-instance to 10-
instance the throughput drops as expected, from 1-instance
to 4-instance, the per-instance throughput increases quite
significantly (+26%).
We found that the above observations are caused by ACPI.
Briefly, according to ACPI, CPU dynamically scales the
frequency based on the actual load for the purpose of energy
saving. When the load is light, CPU runs at lower speed,
and hence application delivers less throughput. When the
load is high, as in the case of deploying multiple instances
concurrently, CPU runs at higher speed, and hence deliver
higher per-instance application throughput.
In Figures 1(b) and (c), we display the CPU usage and
average CPU frequency under different number of instances,
respectively. We can see that with more instances deployed,
the CPU load is higher, which causes the frequencies to
go up. With more powerful CPUs, no wonder per-instance
throughput increases for the first 3 scenarios (i.e., up to 4
instances)!
In these scenarios, it is still true that more instances
will cause more resource (including CPU) contention, but
the benefits gained from increased CPU frequencies ap-
parently outweigh the performance losses associated with
the resource contention in these scenarios. However, as
we keep increasing the number of instances beyond 4,
(a) Per-instance throughput
(b) Average CPU usage
Figure 2. Deploying multiple instances on the same machine (ACPI
OFF)
the performance losses outweigh the gains, hence the per-
instance throughput begins to drop.
D. ACPI disabled
To prevent the impact of ACPI, we disable the dynamical
scaling of ACPI by maximizing the CPU frequency, hence
CPUs are running in their full swing. We show the results
in Figure 2. We can see that the performance of each per-
formance is much higher than ACPI-on when the number of
instances is lower than 5. In particular, when 2 instances are
deployed, each instance achieves about 90K Bps throughput,
significantly higher than the 63K Bps throughput when ACPI
is on. The CPU frequency is kept at 2GHz due to disabled
ACPI, and the average CPU usage keeps increasing.
E. Summary
In business cloud computing environments, capacity plan-
ning for certain applications is often needed. We found many
questions that need to answer in such context can be affected
by ACPI. Careless treatment of these questions would lead
to incorrect answers and hence draw wrong conclusions
regarding the applications capacity, hence over-spend the
operation cost.
To understand this, let us use an example to demonstrate.
Assuming our SLA requires handling of 1000 KBps of
total throughput. Further assume we can at most deploy 2
instances on a single machine due to some other constraints
including memory footprint. Based on our previous results
with ACPI-on, we might conclude that we can only achieve
up to 126 KBps (63KBps per instance * 2 instances) with
a machine. So we need at least 8 machines. However, by
disabling ACPI and maximizing CPU, each machine can
deliver 180 KBps (90 KBps per instance * 2 instances),
hence we only need 6 machines, a significant operation cost
saving.
On the other hand, ACPI is blind of our performance
requirements and SLA. The deployed applications might de-
liver unnecessarily better performance than needed. Though
better performance is desired in most scenarios, it almost al-
ways come with higher energy consumption because of CPU
power levels. Stepping back, we might ask the question: why
should we spend more energy (and business operation cost)
delivering more-than-necessary performance?!
IV. DESIGN
We have seen in Section III that the dynamical scaling
of CPU powerfulness (i.e., ACPI) on modern machines
can result in undesirable outcomes due to its unawareness
of business requirements (i.e., performance SLA). To help
address such issues, in this work we propose design and
algorithm for SLA-aware dynamic CPU scaling. The al-
gorithm primarily aims at meeting the business-determined
performance SLA, while secondarily saves CPU energy
subjecting to the SLA.
Since later we will provide several different forms of
realizations of the algorithm, and the realizations may not be
limited to only adjusting CPU frequencies, we would like to
use a generic term to refer to different types of adjustment.
Specifically, we use the term scaling up to denote the action
of adjusting to more powerful CPU levels; while scaling
down to denote adjusting to less powerful CPU levels. We
also use the term maximizing CPU to denote the adjustment
that maximizing CPU powerfulness.
A. Design goal
The goal of the algorithm is to provide just-enough CPU
powerfulness to the workload such that the SLA is not
violated. In other words, subject to SLA, the CPU powers
will be consumed as little as possible. Specifically, the
algorithm aims to achieving the following 3 goals:
• Top-level goal: Ensuring SLA when the expected
performance is about to violate the SLA, scale up CPU
levels;
• Second-level goal: Reducing energy when the ex-
pected performance far exceeds the SLA, scale down
CPU levels to reduce energy consumption;
Another relevant design goal is that the algorithm should
avoid thrashing of adjustment; in other words, it should stay
with current CPU powerfulness level as long as possible to
avoid frequent adjustments.
B. High level design
The heart of the algorithm is the CPU scaling engine,
which determines whether to scale up or scale down CPU
levels. The decision is based on a set of factors including
current application performance, and SLA specifications.
Current performance is measured by a separate com-
ponent, which continuously reports how the application
is performing. If the current performance is worse than
SLA, then the engine may decide to scale up CPU levels.
Otherwise, it may decide to scale down CPU levels. Given
the nature of this problem, the design fits well into a control-
system [4] paradigm. Though many possible algorithms
can be proposed, each with varying design tradeoffs of
accuracy, speed and complexity, in this writing, we present
a straightforward algorithm with the goal of demonstrating
the working of the solution.
One important factor is the workload itself. For instance,
the workload may follow some time-series based shapes
(e.g., between 8AM and 2PM the traffic volume is increas-
ing). These workload trend information will be used by the
engine to make even smarter decisions regarding scaling
up/down CPU levels.
C. Performance monitoring
The current performance needs to be timely monitored.
The monitoring can be done continuously or based on sam-
pling. Current performance is expressed in a way consistent
with SLA. For instance, if the SLA is in the form of response
time, then the performance monitoring will report the current
performance in the form of response time. Similarly, it can
be in the form of throughput.
D. Engine
To determine how well the current performance is com-
pared to the SLA, the algorithm can rely on two additional
threshold values: A and B. 2
These two thresholds guard
the headroom of current performance in relation to SLA,
and B is farther away from A. These two thresholds can
be in the absolute form (e.g., 20 KB/s and 40KB/s) or in
the relative form (e.g., 110% and 120%). The values of
A and B are also dynamically adapted based on historical
performance. Specifically, they can start from fixes values
as 110% and 120%. If during the last time period, the SLA
is not completely met, then the values of A and B will be
increased by 1.5X. Otherwise, they can be reduced by 1.1x.
Performance metrics can be in different forms; for easy
presentation, we assume a throughput-based performance
metric. For other performance metrics such as response time,
simple adaptations can be made to work. For throughput-
based performance, the larger is better. So B is larger than A.
If the current performance is larger than B, then CPU scale
down. If it is between the two thresholds, the CPU level will
be kept as is. If it is smaller than A, CPU will scale up. If
2Though we present the algorithm by using only two guard thresholds
of A and B, the algorithm can also easily adapt to using multiple guard
thresholds.
Performance is worse
than SLA?
Performance is worse
than SLA?
Yes
No
Scaling up CPU to
highest level
Scaling up CPU to
highest level
No
Performance much
better than SLA (>B)
Performance much
better than SLA (>B)
Performance slightly
better than SLA (>A)
Yes
No
Yes
Stay with current
CPU level
Stay with current
CPU level
Scaling down CPU
to lower level
Scaling down CPU
to lower level
No
Scaling up CPU to
higher level
Figure 3. Flow chart of the algorithm
it does not meet SLA, CPU is maximized to allow the CPU
to do its best job. The decision flow is shown in Figure 3.
E. Scaling up/down CPU
The term of scaling up CPU means to allow the CPU
to work more aggressive. It can be implemented in several
different fashions. Though it can mean a higher CPU fre-
quency, it can also be implemented by adjusting to a more
aggressive governor (e.g., performance governor), or tuning
the parameter of a particular governor to make the governor
more aggressive. Similarly, scaling down CPU can also be
implemented in several ways. We will discuss the different
types of implementations in later sections.
F. Co-locating multiple heterogenous applications
If multiple heterogenous applications need to be co-
located on the same machine, the working of algorithm
needs to be adjusted. Since different applications may
have different SLAs and performance, it could be that one
application is seeing better-than-SLA performance, while
another is not. Such scenarios can be addressed by slightly
modifying the previously presented algorithm. Though the
specific forms of solutions differ, we argue for a conservative
solution, with which the CPU adjustment is based on the
least-performing application. For instance, CPUs will be
scaled down only when all applications are outperforming
SLAs.
V. DEPLOYMENT AND USAGE
The proposed algorithm can be realized or deployed in
several embodiments: (1) a new governor and directly con-
trol the adjustment of CPU frequencies; (2) a new governor
to aggregate existing governors and dynamically change to
one of them on the fly; (3) an improved version of current
governors by dynamically tuning the governor’s tunable
parameters.
New governor directly controlling CPU frequencies The
most straightforward way is to add a new governor start
from scratch. The new governor will be exposed to OS as
one element of the governor set and has no dependency
on existing governors. Inside the governor, the adaptation
of CPU frequencies is purely based on the algorithm and
current performance compared to SLA.
New governor aggregating existing governors It can
be implemented as a new governor which is built on top
of existing governors. Current available governors have
different aggressiveness with regard to CPU power usage and
performance. For instance, the performance governor is the
most aggressive governor, while the powersave governor is
the least aggressive governor. Other governors sit in between
these two extremes. Hence, these different governors can be
aggregated by the new governor. The new governor will set
the system to a particular governor based on the algorithm.
However, this deployment will depend on the existence of
other utilized governors, hence cannot be deployed by itself.
Improved version of governors It can also be implemented
as an improved version of an existing governor which
exposes tunable parameters. These tuning knobs can be
adjusted dynamically. For instance, ondemand governor has
the up-threshold knob which controls when the CPU will
be scaled up. It also has the sampling-interval knob which
determines how frequent the sampling can be done. The
algorithm can be embedded into the existing governor and
serves as a new version of the particular governor.
VI. EVALUATION
A. Prototypes
As noted in Section V, the algorithm can be realized in
several ways. We built a prototype and use it to verify the
working of our algorithm. The prototype is implemented as a
new governor which directly controls the CPU frequencies.
We refer to the first prototype as new-governor.
The prototype is written in Python and utilize the
CPUFreq package which allows control of a set of param-
eters including the governor selection. It uses the command
of “cpupower -c all frequency-set -f freq” command.
For this new-governor prototype, every 100MHz incre-
ment is treated as a new power level of CPU. For instance,
for a CPU with frequencies ranging from 1200MHz to
2000MHz, there are totally 10 levels. Scaling-up CPU means
going to the next power level with 100MHz-higher CPU
frequency. Scaling-down CPU means going to 100MHz-
lower, while maximizing-CPU means going to 2001MHz.
The workload is a Java-based application, which keeps
allocating user-specific objects and removing the oldest
objects once the number of objects reaches a threshold. It
periodically outputs the actual object allocation throughput
achieved during last period. The throughput is compared to
the pre-defined SLA level. Based on the comparison results,
the algorithm decides on one of the 4 actions: scaling up,
scaling down, maximizing CPU, and stay no change. These
actions then map to the specific steps in each prototype.
(a) Actual application throughput
(b) Average CPU frequencies (MHz)
(c) Average CPU usage
Figure 4. Baseline results (ondemand governor) (in second)
B. Experiment setup
To evaluate the capability of the prototypes to adapt to
different workloads, we vary the workload with regard to the
traffic intensity. By intensity, we mean the CPU-intensivess
of the workload. Specifically, we split the entire experiment
into 3 equal segments of durations. For the first segment
of duration, the workload is set to be the regular intensity.
For the second segment of duration, the workload is set to
be 80% of the regular intensity; while for the last segment
of duration, the workload is set to be 120% of the regular
intensity.
We set the performance SLA to be 60KB/s through-
put. The performance monitoring component continuously
obtains the current application performance. The current
throughput is divided by the SLA, and we obtain a per-
formance scale (denoted by p). Apparently, if p > 1.0, it
means the performance is better than SLA. While p < 1.0
means the performance is below SLA. We also set the A to
1.1 and B to 1.2, respectively. The algorithm is executed on
a second-basis.
C. Baseline results
We firstly show the baseline results using the default
ondemand governor and its default parameters. Specifically,
(a) Actual throughput of new-governor
(b) Average CPU frequencies (MHz)
(c) Average CPU usage
Figure 5. New-governor results (in second)
the up-threshold is 95%. We only show a tiny snapshot of the
experimented period for the purpose of closer examinations
of data points. The throughput is shown in Figure 4(a). We
can see that only about 30% of the time, the performance
meets the SLA with throughput higher than 60KB/s. In other
words, SLA is met only in the segment period where the
workload is only 80% of the regular intensity.
Figures 4(b) and (c) show the average CPU frequencies
set by the ondemand governor and the corresponding CPU
usage. The average CPU frequency is barely above the
lowest frequency of 1200MHz, indicating that the CPUs
are operating at the lowest powerfulness level, completely
unsympathetic to the violated SLA. CPU usage is compara-
tively low (e.g., 34%). , which showcases the irony that the
SLA is not met while most of the CPU is idle.
D. New-governor prototype results
For the implemented new-governor prototype, we show
the actual throughput in Figure 5. We see that all the
throughput exceed SLA, which is irrespective of the work-
load intensity.
Figure 5(b) shows the average CPU frequencies set by the
new-governor. We see that when the second segment period
with less workload intensity kicks in, the CPU frequency
is automatically scaled down to save energy; while when
third segment period with heavier workload comes, the CPU
frequency is scaled up accordingly; all with SLA being
nicely met. Figure 5(c) displays the corresponding CPU
usage. We see that in the second segment period, because
of the lowered CPU levels, the CPU usage increases.
VII. RELATED WORK
A. Power consumption studies
Many works have studied the power consumption of
various computing platforms. In particular, [5] breaks down
the power consumptions of different computing components
on a laptop. Work [6] analyzes the energy-time tradeoff in
high-performance computing environments. Work [7] studies
the impact of dynamically adjusting voltage and frequencies
on web servers and concluded that doing dynamic scaling
significantly reduces power consumption for such servers.
[8] addresses the challenge of power management in het-
erogeneous multi-tier web clusters and proposes algorithms
to save more energy. Embedded real-time systems present
unique challenges in the context of power saving, and
work [9] explicitly take system reliability into consideration.
Smartphones are increasing using multi-core CPUs, and
saving power is even more critical [10]. Moreover, energy
saving problems in cloud environments and data centers are
studied in [11], [12].
Our work does not disagree with these works. On the con-
trary, we strongly believe that ACPI (and dynamical scaling
CPU power levels) saves energy in many deployment scenar-
ios. However, based on experiences and observations with
LinkedIn’s production environments, we identify certain
issues in business computing environments where SLAs are
of higher priority than energy saving. We then present our
SLA-aware algorithm specifically for such environments.
B. ACPI Implementations and Impact
Advanced Configuration and Power Interface (ACPI) [1],
[13] specification is an open standard, and several flavors
are implemented on different OS. For instance, [14] imple-
ments ACPI on FreeBSD. Work [15] studied the energy-
performance tradeoff of multi-threaded applications, and
proposes a set of models to estimate the performance slow-
down of ACPI.
VIII. CONCLUSION
In this work, we demonstrate the weaknesses that con-
ventionally implemented ACPI-based CPU dynamic scaling
mechanisms on modern OS can expose in the face of
business SLA. We then propose a SLA-aware algorithm to
prioritize SLA requirements when adjusting CPU levels. The
algorithm also saves energy when SLA is met.
REFERENCES
[1] “Advanced configuration and power interface (acpi),”
http://en.wikipedia.org/wiki/Advanced Configuration
and Power Interface.
[2] “Cpu frequency scaling,” https://wiki.archlinux.org/index.php/
CPU frequency scaling.
[3] S. Das, C. Botev, and et al., “All aboard the databus!:
Linkedin’s scalable consistent change data capture platform,”
ser. SoCC ’12, New York, NY, USA, 2012.
[4] “Control system,” http://en.wikipedia.org/wiki/Control system.
[5] A. Mahesri and V. Vardhan, “Power consumption breakdown
on a modern laptop,” in Proceedings of the 4th Interna-
tional Conference on Power-Aware Computer Systems, ser.
PACS’04, Berlin, Heidelberg, 2005.
[6] V. W. Freeh, D. K. Lowenthal, F. Pan, N. Kappiah,
R. Springer, B. L. Rountree, and M. E. Femal, “Analyzing
the energy-time trade-off in high-performance computing
applications,” IEEE Trans. Parallel Distrib. Syst., vol. 18,
no. 6, pp. 835–848, Jun. 2007.
[7] P. Bohrer, E. N. Elnozahy, T. Keller, M. Kistler, C. Lefurgy,
C. McDowell, and R. Rajamony, “Power aware computing,”
R. Graybill and R. Melhem, Eds. Norwell, MA, USA:
Kluwer Academic Publishers, 2002, ch. The Case for Power
Management in Web Servers.
[8] P. Wang, Y. Qi, and X. Liu, “Power-aware optimization
for heterogeneous multi-tier clusters,” J. Parallel Distrib.
Comput., vol. 74, no. 1, Jan. 2014.
[9] D. Zhu, “Reliability-aware dynamic energy management in
dependable embedded real-time systems,” ACM Trans. Em-
bed. Comput. Syst., vol. 10, no. 2, Jan. 2011.
[10] Y. Zhang, X. Wang, X. Liu, Y. Liu, L. Zhuang, and F. Zhao,
“Towards better cpu power management on multicore smart-
phones,” in Proceedings of the Workshop on Power-Aware
Computing and Systems, ser. HotPower ’13. New York, NY,
USA: ACM, 2013.
[11] J. Yu, Z. Hu, N. N. Xiong, H. Liu, and Z. Zhou, “An
energy conservation replica placement strategy for dynamo,”
J. Supercomput., vol. 69, no. 3, Sep. 2014.
[12] Z. Guo, Z. Duan, Y. Xu, and H. J. Chao, “Jet: Electricity
cost-aware dynamic workload management in geographically
distributed datacenters,” Comput. Commun., vol. 50, Sep.
2014.
[13] L. Duflot, O. Levillain, and B. Morin, “Acpi: Design princi-
ples and concerns,” in Proceedings of the 2Nd International
Conference on Trusted Computing, ser. Trust ’09. Berlin,
Heidelberg: Springer-Verlag, 2009, pp. 14–28.
[14] T. Watanabe, “Acpi implementation on freebsd,” in Proceed-
ings of the FREENIX Track: 2002 USENIX Annual Technical
Conference, Berkeley, CA, USA, 2002, pp. 121–131.
[15] S. Park, W. Jiang, Y. Zhou, and S. Adve, “Managing energy-
performance tradeoffs for multithreaded applications on mul-
tiprocessor architectures,” SIGMETRICS Perform. Eval. Rev.,
vol. 35, no. 1, Jun. 2007.

More Related Content

DOC
Windows server power_efficiency___robben_and_worthington__final
PDF
Understanding software licensing with IBM Power Systems PowerVM virtualization
PDF
Reducing tco white paper rev5
PDF
An energy, memory, and performance analysis
PPTX
Presentation oracle on power power advantages and license optimization
PPT
IMSBufferpool Tuning concept AMS presentation v01
PDF
Open compute technology
 
PDF
IBM i Performance management and performance data collectors june 2012
Windows server power_efficiency___robben_and_worthington__final
Understanding software licensing with IBM Power Systems PowerVM virtualization
Reducing tco white paper rev5
An energy, memory, and performance analysis
Presentation oracle on power power advantages and license optimization
IMSBufferpool Tuning concept AMS presentation v01
Open compute technology
 
IBM i Performance management and performance data collectors june 2012

What's hot (15)

PDF
Intel speed-select-technology-base-frequency-enhancing-performance
PDF
Parallel Sysplex Performance Topics
PDF
AMD PowerTune Technology on Workstation Graphics
 
PPT
6dec2011 - Power Back-up_1
PDF
Capacity Planning for Virtualized Datacenters - Sun Network 2003
PDF
A Brief Survey of Current Power Limiting Strategies
PDF
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
PDF
eNlight- Intelligent Cloud Computing Platform
PDF
Introduction to eNlight Cloud Computing Platform
PDF
Arsys at hp discovery emea 2011
PDF
Deview 2013 rise of the wimpy machines - john mao
PDF
over_provisioning_m600_for_data_center_apps_tech_brief
PPTX
Data center Technologies
 
PPTX
AMD Opteron 4000 Series Platform Press Presentation
 
PDF
Much Ado About CPU
Intel speed-select-technology-base-frequency-enhancing-performance
Parallel Sysplex Performance Topics
AMD PowerTune Technology on Workstation Graphics
 
6dec2011 - Power Back-up_1
Capacity Planning for Virtualized Datacenters - Sun Network 2003
A Brief Survey of Current Power Limiting Strategies
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
eNlight- Intelligent Cloud Computing Platform
Introduction to eNlight Cloud Computing Platform
Arsys at hp discovery emea 2011
Deview 2013 rise of the wimpy machines - john mao
over_provisioning_m600_for_data_center_apps_tech_brief
Data center Technologies
 
AMD Opteron 4000 Series Platform Press Presentation
 
Much Ado About CPU
Ad

Viewers also liked (20)

PPTX
Accelerate Native Advertising Using Rich Media On Social
PDF
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
PDF
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
PDF
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
PDF
Mutual Exclusion in Wireless Sensor and Actor Networks
PDF
Libro 30-ideas
PDF
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
PDF
Leveraging Global Events to Reach Your Social Audience
PDF
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
PDF
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
PDF
Chon ram-may-tinh
PDF
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
PDF
Rich Social: Not Your Grandfather's Rich Media
PDF
A Distributed Approach to Solving Overlay Mismatching Problem
PDF
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
PDF
Hazard avoidance in wireless sensor and actor networks
PDF
Real time social media marketing in action
PPTX
The State of Social Rich Media
PDF
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
PDF
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
Accelerate Native Advertising Using Rich Media On Social
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Mutual Exclusion in Wireless Sensor and Actor Networks
Libro 30-ideas
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
Leveraging Global Events to Reach Your Social Audience
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Chon ram-may-tinh
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Rich Social: Not Your Grandfather's Rich Media
A Distributed Approach to Solving Overlay Mismatching Problem
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
Hazard avoidance in wireless sensor and actor networks
Real time social media marketing in action
The State of Social Rich Media
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
Ad

Similar to SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments (20)

PDF
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
PPTX
Energy Efficiency in Large Scale Systems
PDF
Linux Power Management Slideshare
PDF
Runtime Methods to Improve Energy Efficiency in HPC Applications
PPTX
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
PDF
CloudComputing_UNIT5.pdf
PDF
Energy efficient-resource-allocation-in-distributed-computing-systems
PDF
ACSD2016paper20_04013
PPTX
참여기관_발표자료-국민대학교 201301 정기회의
PDF
BKK16-317 How to generate power models for EAS and IPA
PDF
BKK16-TR08 How to generate power models for EAS and IPA
PDF
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
PPTX
Optimizing High Performance Computing Applications for Energy
PDF
Power management
PDF
E03403027030
PDF
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
PPTX
Computer Architecture and Organization
PDF
Parallel and Distributed Computing Chapter 9
PDF
BKK16-104 sched-freq
PDF
LCU14-410: How to build an Energy Model for your SoC
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
Energy Efficiency in Large Scale Systems
Linux Power Management Slideshare
Runtime Methods to Improve Energy Efficiency in HPC Applications
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
CloudComputing_UNIT5.pdf
Energy efficient-resource-allocation-in-distributed-computing-systems
ACSD2016paper20_04013
참여기관_발표자료-국민대학교 201301 정기회의
BKK16-317 How to generate power models for EAS and IPA
BKK16-TR08 How to generate power models for EAS and IPA
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
Optimizing High Performance Computing Applications for Energy
Power management
E03403027030
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
Computer Architecture and Organization
Parallel and Distributed Computing Chapter 9
BKK16-104 sched-freq
LCU14-410: How to build an Energy Model for your SoC

More from Zhenyun Zhuang (15)

PDF
Designing SSD-friendly Applications for Better Application Performance and Hi...
PDF
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
PDF
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
PDF
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PDF
WebAccel: Accelerating Web access for low-bandwidth hosts
PDF
Client-side web acceleration for low-bandwidth hosts
PDF
A3: application-aware acceleration for wireless data networks
PDF
Dynamic Layer Management in Super-Peer Architectures
PDF
Enhancing Intrusion Detection System with Proximity Information
PDF
Optimizing JMS Performance for Cloud-based Application Servers
PDF
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
PDF
OS caused Large JVM pauses: Deep dive and solutions
PDF
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
PDF
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
PDF
Improving energy efficiency of location sensing on smartphones
Designing SSD-friendly Applications for Better Application Performance and Hi...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
WebAccel: Accelerating Web access for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
A3: application-aware acceleration for wireless data networks
Dynamic Layer Management in Super-Peer Architectures
Enhancing Intrusion Detection System with Proximity Information
Optimizing JMS Performance for Cloud-based Application Servers
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
OS caused Large JVM pauses: Deep dive and solutions
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Improving energy efficiency of location sensing on smartphones

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
composite construction of structures.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
additive manufacturing of ss316l using mig welding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PPT on Performance Review to get promotions
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Geodesy 1.pptx...............................................
PPT
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Internet of Things (IOT) - A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
composite construction of structures.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
additive manufacturing of ss316l using mig welding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT on Performance Review to get promotions
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Geodesy 1.pptx...............................................
Project quality management in manufacturing

SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments

  • 1. SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments Zhenyun Zhuang, Haricharan Ramachandra, Badri Sridharan 2029 Stierlin Ct, Mountain View, CA 94043, USA {zzhuang, hramachandra, bsridharan}@linkedin.com Abstract—Modern cloud computing platforms (e.g. Linux on Intel CPUs) feature ACPI-based (Advanced Configuration and Power Interface) mechanism, which dynamically scales CPU frequencies/voltages to adjust the CPU frequencies based on the workload intensity. With this feature, CPU frequency is reduced when the workload is relatively light in order to save energy; while increased when the workload intensity is relatively high. In business cloud computing environments, software prod- ucts/services often need to “scale out” to multiple machines to form a cluster to achieve a pre-defined aggregated performance goal (e.g., SLA-devised throughput). To reduce business opera- tion cost, minimizing the provisioned cluster size is critical. However, as we show in this work, the working of ACPI in today’s modern OS may result in more machines being provisioned, hence higher business operation cost, To deal with this problem, we propose a SLA-aware CPU scaling algorithm based on business SLA (Service Level Agree- ment aware). The proposed design rational and algorithm are a fundamental rethinking of how ACPI mechanisms should be implemented in business cloud computing environments. Con- trary to the current forms of ACPI which simply adapt CPU power levels only based on workload intensity, the proposed SLA-aware algorithm is primarily based on current application performance relative to the pre-defined SLA. Specifically, the algorithm targets at achieving the pre-defined SLA as the top- level goal, while saving energy as the second-level goal. Keywords-ACPI; Power saving; Service level agreements; Performance I. INTRODUCTION Advanced Configuration and Power Interface (ACPI) [1] provides standards for power management by OS. ACPI allows dynamic scaling of CPU power levels and frequen- cies. Modern CPUs typically allow the computations under multiple CPU frequencies and voltages. Operating at higher frequency, a CPU is more powerful processing computing tasks, while more power will be consumed per unit of time. With the support of OS, the primary goal of ACPI is to save energy through the CPU scaling of frequencies and voltages. ACPI is particularly important in cloud computing scenarios where computing demand is elastic and energy saving is critical. Specifically, a mechanism called CPUfreq [2] is implemented in Linux kernel, which enables the operating system to scale the CPU frequency up or down in order to save power. To help managing the power levels, certain pre-configured power schemes are implemented in OS. These power schemes are referred to as governors [2]. Common governors are performance, ondemand, userspace, etc. Among them, the ondemand governor is enabled by default in Linux. Ondemand governor adjusts the CPU frequencies based on how heavy the workload is. The more intensive workload is detected, the higher frequency will be scaled up to. On the other hand, if the workload is detected to be light, the CPU frequency scales down. The detection mechanism that measures the current workload is based on sampling of par- ticular intervals (e.g., every 10ms). During the last interval, if the CPU usage is above a scaling threshold (e.g., CPU 95% busy), the frequency will be scaled up to maximum frequency; otherwise, the frequency will be scaled down, one level at a time. For instance, on a machine with Sandy Bridge single socket machine with 6 cores and 12 CPUs, and the CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz, there are totally 10 levels, ranging between 1200MHz and 2001MHz. With the heaviest workload, the CPU frequency will be 2001MHz, while the lightest workload will result in 1200MHz. In this work, we found that current ACPI-based CPU scal- ing mechanism, particularly the default ondemand governor, is fundamentally inappropriate to business cloud production environments. For example, in the production computing environments of Internet companies, particularly the cloud computing platforms, the primary goal is to meet the SLA (Service Level Agreement) with the minimum business op- eration cost, rather than blindly saving energy. The current design of the governors does NOT take into considerations the SLA part. Specifically, when the current performance of an application is violating the SLA (e.g., in the form of response latencies), the CPU frequencies should be scaled up, irrespective how busy the CPU actually is. In other words, without the gene of SLA considerations, current dynamic governors may unnecessarily violate the SLAs. Not only may current governors violate SLAs, the resulted operation cost may also be unnecessarily increased. To give an example, let us consider the following scenario. Assuming a particular application (e.g., a web service) needs to be deployed on a cluster of machines to achieve a required aggregated throughput of 200K event/s. To make it simple, we assume a machine can only deploy one application instance due to certain limitations. With the default on- demand governor, each application instance delivers 10K event/s throughput, hence we need to deploy 20 machines in order to meet our aggregated SLA of 200K event/s.
  • 2. However, by manually scaling up the CPU frequency, the machine could achieve higher throughput. Assuming each application instance can delivery 20K event/s with scaled- up CPU frequency, only 10 machines are needed - a 2X saving on the number of machines! We have verified such a hypothesis in our lab, which will be discussed in later section. CPU consumed energy is only part of the entire energy consumption (e.g., RAM, motherboard, disks), hence minimizing the cluster size oftentimes can lead to much more energy saving. In other words, even though running with the default governor can save the energy on each individual machines, thanks to the lower CPU frequencies resulted, considering the number of machines deployed, the total business cost of the cluster may well exceeds the cost of a smaller cluster with manually scaled-up CPU. Furthermore, current governors may unnecessarily increase the CPU power consumption. Since SLAs are businesses’ primary concerns, when SLAs are met, the CPU frequen- cies should be scaled down, irrespective of how busy the CPUs are. Subjecting to meeting SLA, scaling down CPU frequencies will result in less CPU energy consumption. So we argue that as long as the performance does not violate SLAs, such power-saving actions (i.e., scaling down CPU frequencies) should be taken. Unfortunately, current governors are blind to SLAs, and they only scale frequencies based on how busy the CPU is, which is a runaway from typical business requirements. In this work, motivated by the weaknesses of current governors on modern OS 1, we address this problem by proposing an entirely new SLA-aware paradigm to dynami- cally scale CPU frequencies. We also proposed an example mechanism, which is a fundamental change of how ACPI- based CPU scaling works on modern OS today. Unlike traditional ACPI mechanisms which do not consider business SLA requirements of the applications running on a machine, the proposed mechanism targets at achieving the SLA as the top-level goal, while treating saving energy as the second- level goal. In other words, the SLA-aware algorithm will achieve the following two goals, in the order of importance of: (1) Meeting SLA is the primary target; when SLAs are in jeopardy, the frequency will be scaled up to meet SLA; (2) Subject to meeting SLAs, the CPU frequencies will be scaled down whenever possible. To achieve the above goals, the proposed algorithm con- tinuously monitors the current application performance, and scales CPU frequencies accordingly. It can also consider the workload properties (e.g., traffic volume changes) to further ensure the above goals by finely tuning the scaling mechanism. The algorithm can be realized in three forms: (1) designing a new governor; (2) dynamically choosing different governors; (3) dynamically tuning a particular 1Note that we do not argue that the design of ACPI is wrong, instead, we attempt to enhance the application of ACPI by making ACPI SLA-aware. governor (e.g. ondemand). We will detail the solution in later sections. For the remainder of the writing, after providing some necessary technical background in section II, we then define and motivate the problems being addressed in this writing in Section III. We present the designs in Section IV and present in Section V the deployment model and how to use our algorithm. We build a prototype and perform performance evaluation using the prototype in Section VI. We also present certain related works in Section VII. Finally in Section VIII we conclude the work. II. BACKGROUND AND SCOPE A. Background CPU power consumption and ACPI CPU is one of the major components that consume power on computing platforms. Modern CPUs can operate on different power levels, which further depend on the voltage and the fre- quency the CPU operates on. The power consumption of a CPU is linearly determined by the frequency it operates. CPU frequencies also roughly determine how powerful the CPU is: the higher frequency, the better CPU computing performance. Advanced Configuration and Power Interface (ACPI) [1] specification provides an open standard for device config- uration and power management by the operating system. ACPI allows dynamic scaling of CPU power levels and frequencies. For the particular Linux/Intel platform we used, the CPU has totally 12 power levels, with the minimum 1.2GHz and maximum 2.0GHz. Apparently, the higher the frequency, the more power is consumed. Governors To help managing the power levels, cer- tain pre-configured power schemes are implemented in OS kernel. These power schemes are referred to as gover- nors. For instance, several governors are available with the CPUfreq subsystem: (1) Performance Governor, which sets CPU frequencies to the highest possible for maximum performance. (2) Power-save Governor, which sets CPU frequencies to the lowest possible. This can have severe impact on the performance, as the system will never rise above this frequency no matter how busy the processors are. (3) On-demand Governor, which dynamically adjusts CPU frequency. The governor monitors the CPU utilization. As soon as it exceeds a certain threshold, the governor will maximize CPU frequency. If the utilization is below the threshold, the next lowest frequency is used. B. Scope This work assumes the availability of different CPU power levels (e.g., frequencies) and the ability to adjust the CPU level. The proposed algorithm adjusts the CPU levels based on the comparison between the pre-defined performance
  • 3. (a) Per-instance throughput (b) Average CPU frequency (c) Average CPU usage Figure 1. Deploying multiple instances on the same machine (ACPI ON) SLA and current application performance, hence it requires the SLA definition and the measurement of current perfor- mance. To simply the presentation, we assume the SLA is defined as single absolute value such as “throughput be higher than 200KBps” or “response time be lower than 100ms”. To allow for timely adaptation of CPU levels, whenever necessary, coarsely defined SLAs need to be converted to specific performance requirements. For example, a SLA may be coarsely defined as “99% of the time in a day should be achieved at higher than 200KBps”. For this SLA, the converted SLA can be simply “throughput be higher than 200KBps”. As another example, “99.9 percentile of the response time should be smaller than 100ms”. For this SLA, the converted SLA can be “response time be lower than 100ms”. Similarly, the current application performance needs to be measured timely according the power adaptation periods, e.g., every 5 seconds. III. PROBLEM DEFINITION AND MOTIVATION SCENARIOS A. Problem The problem we try to address in this work is to de- termine the computing capacity needed to achieve certain performance goal (e.g., aggregated throughput). After certain SLA is set (i.e., in the form of aggregated throughput of processed events), the de facto practice in most Internet companies such as Facebook and LinkedIn is to “scale out” the computing infrastructure by parallelizing multiple deployments of the same computing component. For this, certain capacity planning is conducted to determined the number of nodes (i.e., machines) needed, as well as the number of computing instances are deployed on a single node. Capacity planning in business cloud computing environ- ments is closely tightened to business cost. The goal is to allocate “just enough” nodes such that the pre-determined SLA is met. For instance, LinkedIn’s products such as Databus [3] can be horizontally scaled up by deploying multiple instances of the product. To reduce the number of machines used, we need to know how many instances we can co-locate on the same machine. Based on these results, we can then answer further questions including: (1) Given a performance requirement (e.g. SLA), how many machine are needed; (2) Given a traffic volume, how many machines are needed; etc. During our investigations and experiments with capacity planning of Databus, we had interesting findings regarding an issue with ACPI that results in more-than-necessary number of machines are needed, hence higher business operation cost. Based on these findings, we propose an algorithm to address the issue. B. Production experiments In the experiment, we’d like to determine the minimum number of nodes needed for a pre-determined SLA. Firstly we need to know the maximum aggregated throughput can be achieved by a single node. To maximize the utilization of computing resources, typically multiple homogenous in- stances are deployed on the same node. For ease of reputability and configurability, we used an custom-built application which mimics our Databus product. The application mimics the major internal mechanism of Databus, while removing the dependence on production data. It consists of a pair of Java components which communicates via a TCP/IP connection. Briefly, the sending component keeps sending out events (i.e., certain bytes of data) to the receiving component. Upon receiving an event, the receiving component processes the event and replies back to the sender.
  • 4. Both components are deployed on the same machine. The machine is a Sandy Bridge single socket machine with 6 cores, 12 CPUs (i.e., hardware threads) and 64GB RAM. The CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz. The OS is Linux RedHat Enterprise with kernel version of 2.6.32-358.6.2.el6.x86 64. The experiment began from deploying a single instance, then adding more and more homogenous instances. We would expect that the per-instance throughput will keep dropping as we add more co-located instances due to the competition of computing resources. We also expect that the aggregated throughput of all deployed instances will rise with more instances, up to a particular threshold. After the threshold, the aggregated throughput will also drop. Such expectations are quite normal to the usual scenarios where we run multiple applications on the same machine. C. Using the default OS config (ACPI enabled) We first conduct the experiments with the default OS configuration. Though the aggregated throughput faithfully complies with our expectation, per-instance throughput de- fies it. We know that deploying more instances will result in more resource contentions (e.g., memory, CPU), hence lower per-instance throughput. However, what we observed is just the opposite: instead of decreasing, per-instance throughput actually keeps increasing with more instances (up to a peak performance), which is quite counter intuitive. Specifically, Figure 1(a) displays the instance throughput vs number of instances. Though from 4-instance to 10- instance the throughput drops as expected, from 1-instance to 4-instance, the per-instance throughput increases quite significantly (+26%). We found that the above observations are caused by ACPI. Briefly, according to ACPI, CPU dynamically scales the frequency based on the actual load for the purpose of energy saving. When the load is light, CPU runs at lower speed, and hence application delivers less throughput. When the load is high, as in the case of deploying multiple instances concurrently, CPU runs at higher speed, and hence deliver higher per-instance application throughput. In Figures 1(b) and (c), we display the CPU usage and average CPU frequency under different number of instances, respectively. We can see that with more instances deployed, the CPU load is higher, which causes the frequencies to go up. With more powerful CPUs, no wonder per-instance throughput increases for the first 3 scenarios (i.e., up to 4 instances)! In these scenarios, it is still true that more instances will cause more resource (including CPU) contention, but the benefits gained from increased CPU frequencies ap- parently outweigh the performance losses associated with the resource contention in these scenarios. However, as we keep increasing the number of instances beyond 4, (a) Per-instance throughput (b) Average CPU usage Figure 2. Deploying multiple instances on the same machine (ACPI OFF) the performance losses outweigh the gains, hence the per- instance throughput begins to drop. D. ACPI disabled To prevent the impact of ACPI, we disable the dynamical scaling of ACPI by maximizing the CPU frequency, hence CPUs are running in their full swing. We show the results in Figure 2. We can see that the performance of each per- formance is much higher than ACPI-on when the number of instances is lower than 5. In particular, when 2 instances are deployed, each instance achieves about 90K Bps throughput, significantly higher than the 63K Bps throughput when ACPI is on. The CPU frequency is kept at 2GHz due to disabled ACPI, and the average CPU usage keeps increasing. E. Summary In business cloud computing environments, capacity plan- ning for certain applications is often needed. We found many questions that need to answer in such context can be affected by ACPI. Careless treatment of these questions would lead to incorrect answers and hence draw wrong conclusions regarding the applications capacity, hence over-spend the operation cost. To understand this, let us use an example to demonstrate. Assuming our SLA requires handling of 1000 KBps of total throughput. Further assume we can at most deploy 2 instances on a single machine due to some other constraints including memory footprint. Based on our previous results with ACPI-on, we might conclude that we can only achieve up to 126 KBps (63KBps per instance * 2 instances) with
  • 5. a machine. So we need at least 8 machines. However, by disabling ACPI and maximizing CPU, each machine can deliver 180 KBps (90 KBps per instance * 2 instances), hence we only need 6 machines, a significant operation cost saving. On the other hand, ACPI is blind of our performance requirements and SLA. The deployed applications might de- liver unnecessarily better performance than needed. Though better performance is desired in most scenarios, it almost al- ways come with higher energy consumption because of CPU power levels. Stepping back, we might ask the question: why should we spend more energy (and business operation cost) delivering more-than-necessary performance?! IV. DESIGN We have seen in Section III that the dynamical scaling of CPU powerfulness (i.e., ACPI) on modern machines can result in undesirable outcomes due to its unawareness of business requirements (i.e., performance SLA). To help address such issues, in this work we propose design and algorithm for SLA-aware dynamic CPU scaling. The al- gorithm primarily aims at meeting the business-determined performance SLA, while secondarily saves CPU energy subjecting to the SLA. Since later we will provide several different forms of realizations of the algorithm, and the realizations may not be limited to only adjusting CPU frequencies, we would like to use a generic term to refer to different types of adjustment. Specifically, we use the term scaling up to denote the action of adjusting to more powerful CPU levels; while scaling down to denote adjusting to less powerful CPU levels. We also use the term maximizing CPU to denote the adjustment that maximizing CPU powerfulness. A. Design goal The goal of the algorithm is to provide just-enough CPU powerfulness to the workload such that the SLA is not violated. In other words, subject to SLA, the CPU powers will be consumed as little as possible. Specifically, the algorithm aims to achieving the following 3 goals: • Top-level goal: Ensuring SLA when the expected performance is about to violate the SLA, scale up CPU levels; • Second-level goal: Reducing energy when the ex- pected performance far exceeds the SLA, scale down CPU levels to reduce energy consumption; Another relevant design goal is that the algorithm should avoid thrashing of adjustment; in other words, it should stay with current CPU powerfulness level as long as possible to avoid frequent adjustments. B. High level design The heart of the algorithm is the CPU scaling engine, which determines whether to scale up or scale down CPU levels. The decision is based on a set of factors including current application performance, and SLA specifications. Current performance is measured by a separate com- ponent, which continuously reports how the application is performing. If the current performance is worse than SLA, then the engine may decide to scale up CPU levels. Otherwise, it may decide to scale down CPU levels. Given the nature of this problem, the design fits well into a control- system [4] paradigm. Though many possible algorithms can be proposed, each with varying design tradeoffs of accuracy, speed and complexity, in this writing, we present a straightforward algorithm with the goal of demonstrating the working of the solution. One important factor is the workload itself. For instance, the workload may follow some time-series based shapes (e.g., between 8AM and 2PM the traffic volume is increas- ing). These workload trend information will be used by the engine to make even smarter decisions regarding scaling up/down CPU levels. C. Performance monitoring The current performance needs to be timely monitored. The monitoring can be done continuously or based on sam- pling. Current performance is expressed in a way consistent with SLA. For instance, if the SLA is in the form of response time, then the performance monitoring will report the current performance in the form of response time. Similarly, it can be in the form of throughput. D. Engine To determine how well the current performance is com- pared to the SLA, the algorithm can rely on two additional threshold values: A and B. 2 These two thresholds guard the headroom of current performance in relation to SLA, and B is farther away from A. These two thresholds can be in the absolute form (e.g., 20 KB/s and 40KB/s) or in the relative form (e.g., 110% and 120%). The values of A and B are also dynamically adapted based on historical performance. Specifically, they can start from fixes values as 110% and 120%. If during the last time period, the SLA is not completely met, then the values of A and B will be increased by 1.5X. Otherwise, they can be reduced by 1.1x. Performance metrics can be in different forms; for easy presentation, we assume a throughput-based performance metric. For other performance metrics such as response time, simple adaptations can be made to work. For throughput- based performance, the larger is better. So B is larger than A. If the current performance is larger than B, then CPU scale down. If it is between the two thresholds, the CPU level will be kept as is. If it is smaller than A, CPU will scale up. If 2Though we present the algorithm by using only two guard thresholds of A and B, the algorithm can also easily adapt to using multiple guard thresholds.
  • 6. Performance is worse than SLA? Performance is worse than SLA? Yes No Scaling up CPU to highest level Scaling up CPU to highest level No Performance much better than SLA (>B) Performance much better than SLA (>B) Performance slightly better than SLA (>A) Yes No Yes Stay with current CPU level Stay with current CPU level Scaling down CPU to lower level Scaling down CPU to lower level No Scaling up CPU to higher level Figure 3. Flow chart of the algorithm it does not meet SLA, CPU is maximized to allow the CPU to do its best job. The decision flow is shown in Figure 3. E. Scaling up/down CPU The term of scaling up CPU means to allow the CPU to work more aggressive. It can be implemented in several different fashions. Though it can mean a higher CPU fre- quency, it can also be implemented by adjusting to a more aggressive governor (e.g., performance governor), or tuning the parameter of a particular governor to make the governor more aggressive. Similarly, scaling down CPU can also be implemented in several ways. We will discuss the different types of implementations in later sections. F. Co-locating multiple heterogenous applications If multiple heterogenous applications need to be co- located on the same machine, the working of algorithm needs to be adjusted. Since different applications may have different SLAs and performance, it could be that one application is seeing better-than-SLA performance, while another is not. Such scenarios can be addressed by slightly modifying the previously presented algorithm. Though the specific forms of solutions differ, we argue for a conservative solution, with which the CPU adjustment is based on the least-performing application. For instance, CPUs will be scaled down only when all applications are outperforming SLAs. V. DEPLOYMENT AND USAGE The proposed algorithm can be realized or deployed in several embodiments: (1) a new governor and directly con- trol the adjustment of CPU frequencies; (2) a new governor to aggregate existing governors and dynamically change to one of them on the fly; (3) an improved version of current governors by dynamically tuning the governor’s tunable parameters. New governor directly controlling CPU frequencies The most straightforward way is to add a new governor start from scratch. The new governor will be exposed to OS as one element of the governor set and has no dependency on existing governors. Inside the governor, the adaptation of CPU frequencies is purely based on the algorithm and current performance compared to SLA. New governor aggregating existing governors It can be implemented as a new governor which is built on top of existing governors. Current available governors have different aggressiveness with regard to CPU power usage and performance. For instance, the performance governor is the most aggressive governor, while the powersave governor is the least aggressive governor. Other governors sit in between these two extremes. Hence, these different governors can be aggregated by the new governor. The new governor will set the system to a particular governor based on the algorithm. However, this deployment will depend on the existence of other utilized governors, hence cannot be deployed by itself. Improved version of governors It can also be implemented as an improved version of an existing governor which exposes tunable parameters. These tuning knobs can be adjusted dynamically. For instance, ondemand governor has the up-threshold knob which controls when the CPU will be scaled up. It also has the sampling-interval knob which determines how frequent the sampling can be done. The algorithm can be embedded into the existing governor and serves as a new version of the particular governor. VI. EVALUATION A. Prototypes As noted in Section V, the algorithm can be realized in several ways. We built a prototype and use it to verify the working of our algorithm. The prototype is implemented as a new governor which directly controls the CPU frequencies. We refer to the first prototype as new-governor. The prototype is written in Python and utilize the CPUFreq package which allows control of a set of param- eters including the governor selection. It uses the command of “cpupower -c all frequency-set -f freq” command. For this new-governor prototype, every 100MHz incre- ment is treated as a new power level of CPU. For instance, for a CPU with frequencies ranging from 1200MHz to 2000MHz, there are totally 10 levels. Scaling-up CPU means going to the next power level with 100MHz-higher CPU frequency. Scaling-down CPU means going to 100MHz- lower, while maximizing-CPU means going to 2001MHz. The workload is a Java-based application, which keeps allocating user-specific objects and removing the oldest objects once the number of objects reaches a threshold. It periodically outputs the actual object allocation throughput achieved during last period. The throughput is compared to the pre-defined SLA level. Based on the comparison results, the algorithm decides on one of the 4 actions: scaling up, scaling down, maximizing CPU, and stay no change. These actions then map to the specific steps in each prototype.
  • 7. (a) Actual application throughput (b) Average CPU frequencies (MHz) (c) Average CPU usage Figure 4. Baseline results (ondemand governor) (in second) B. Experiment setup To evaluate the capability of the prototypes to adapt to different workloads, we vary the workload with regard to the traffic intensity. By intensity, we mean the CPU-intensivess of the workload. Specifically, we split the entire experiment into 3 equal segments of durations. For the first segment of duration, the workload is set to be the regular intensity. For the second segment of duration, the workload is set to be 80% of the regular intensity; while for the last segment of duration, the workload is set to be 120% of the regular intensity. We set the performance SLA to be 60KB/s through- put. The performance monitoring component continuously obtains the current application performance. The current throughput is divided by the SLA, and we obtain a per- formance scale (denoted by p). Apparently, if p > 1.0, it means the performance is better than SLA. While p < 1.0 means the performance is below SLA. We also set the A to 1.1 and B to 1.2, respectively. The algorithm is executed on a second-basis. C. Baseline results We firstly show the baseline results using the default ondemand governor and its default parameters. Specifically, (a) Actual throughput of new-governor (b) Average CPU frequencies (MHz) (c) Average CPU usage Figure 5. New-governor results (in second) the up-threshold is 95%. We only show a tiny snapshot of the experimented period for the purpose of closer examinations of data points. The throughput is shown in Figure 4(a). We can see that only about 30% of the time, the performance meets the SLA with throughput higher than 60KB/s. In other words, SLA is met only in the segment period where the workload is only 80% of the regular intensity. Figures 4(b) and (c) show the average CPU frequencies set by the ondemand governor and the corresponding CPU usage. The average CPU frequency is barely above the lowest frequency of 1200MHz, indicating that the CPUs are operating at the lowest powerfulness level, completely unsympathetic to the violated SLA. CPU usage is compara- tively low (e.g., 34%). , which showcases the irony that the SLA is not met while most of the CPU is idle. D. New-governor prototype results For the implemented new-governor prototype, we show the actual throughput in Figure 5. We see that all the throughput exceed SLA, which is irrespective of the work- load intensity. Figure 5(b) shows the average CPU frequencies set by the new-governor. We see that when the second segment period with less workload intensity kicks in, the CPU frequency
  • 8. is automatically scaled down to save energy; while when third segment period with heavier workload comes, the CPU frequency is scaled up accordingly; all with SLA being nicely met. Figure 5(c) displays the corresponding CPU usage. We see that in the second segment period, because of the lowered CPU levels, the CPU usage increases. VII. RELATED WORK A. Power consumption studies Many works have studied the power consumption of various computing platforms. In particular, [5] breaks down the power consumptions of different computing components on a laptop. Work [6] analyzes the energy-time tradeoff in high-performance computing environments. Work [7] studies the impact of dynamically adjusting voltage and frequencies on web servers and concluded that doing dynamic scaling significantly reduces power consumption for such servers. [8] addresses the challenge of power management in het- erogeneous multi-tier web clusters and proposes algorithms to save more energy. Embedded real-time systems present unique challenges in the context of power saving, and work [9] explicitly take system reliability into consideration. Smartphones are increasing using multi-core CPUs, and saving power is even more critical [10]. Moreover, energy saving problems in cloud environments and data centers are studied in [11], [12]. Our work does not disagree with these works. On the con- trary, we strongly believe that ACPI (and dynamical scaling CPU power levels) saves energy in many deployment scenar- ios. However, based on experiences and observations with LinkedIn’s production environments, we identify certain issues in business computing environments where SLAs are of higher priority than energy saving. We then present our SLA-aware algorithm specifically for such environments. B. ACPI Implementations and Impact Advanced Configuration and Power Interface (ACPI) [1], [13] specification is an open standard, and several flavors are implemented on different OS. For instance, [14] imple- ments ACPI on FreeBSD. Work [15] studied the energy- performance tradeoff of multi-threaded applications, and proposes a set of models to estimate the performance slow- down of ACPI. VIII. CONCLUSION In this work, we demonstrate the weaknesses that con- ventionally implemented ACPI-based CPU dynamic scaling mechanisms on modern OS can expose in the face of business SLA. We then propose a SLA-aware algorithm to prioritize SLA requirements when adjusting CPU levels. The algorithm also saves energy when SLA is met. REFERENCES [1] “Advanced configuration and power interface (acpi),” http://en.wikipedia.org/wiki/Advanced Configuration and Power Interface. [2] “Cpu frequency scaling,” https://wiki.archlinux.org/index.php/ CPU frequency scaling. [3] S. Das, C. Botev, and et al., “All aboard the databus!: Linkedin’s scalable consistent change data capture platform,” ser. SoCC ’12, New York, NY, USA, 2012. [4] “Control system,” http://en.wikipedia.org/wiki/Control system. [5] A. Mahesri and V. Vardhan, “Power consumption breakdown on a modern laptop,” in Proceedings of the 4th Interna- tional Conference on Power-Aware Computer Systems, ser. PACS’04, Berlin, Heidelberg, 2005. [6] V. W. Freeh, D. K. Lowenthal, F. Pan, N. Kappiah, R. Springer, B. L. Rountree, and M. E. Femal, “Analyzing the energy-time trade-off in high-performance computing applications,” IEEE Trans. Parallel Distrib. Syst., vol. 18, no. 6, pp. 835–848, Jun. 2007. [7] P. Bohrer, E. N. Elnozahy, T. Keller, M. Kistler, C. Lefurgy, C. McDowell, and R. Rajamony, “Power aware computing,” R. Graybill and R. Melhem, Eds. Norwell, MA, USA: Kluwer Academic Publishers, 2002, ch. The Case for Power Management in Web Servers. [8] P. Wang, Y. Qi, and X. Liu, “Power-aware optimization for heterogeneous multi-tier clusters,” J. Parallel Distrib. Comput., vol. 74, no. 1, Jan. 2014. [9] D. Zhu, “Reliability-aware dynamic energy management in dependable embedded real-time systems,” ACM Trans. Em- bed. Comput. Syst., vol. 10, no. 2, Jan. 2011. [10] Y. Zhang, X. Wang, X. Liu, Y. Liu, L. Zhuang, and F. Zhao, “Towards better cpu power management on multicore smart- phones,” in Proceedings of the Workshop on Power-Aware Computing and Systems, ser. HotPower ’13. New York, NY, USA: ACM, 2013. [11] J. Yu, Z. Hu, N. N. Xiong, H. Liu, and Z. Zhou, “An energy conservation replica placement strategy for dynamo,” J. Supercomput., vol. 69, no. 3, Sep. 2014. [12] Z. Guo, Z. Duan, Y. Xu, and H. J. Chao, “Jet: Electricity cost-aware dynamic workload management in geographically distributed datacenters,” Comput. Commun., vol. 50, Sep. 2014. [13] L. Duflot, O. Levillain, and B. Morin, “Acpi: Design princi- ples and concerns,” in Proceedings of the 2Nd International Conference on Trusted Computing, ser. Trust ’09. Berlin, Heidelberg: Springer-Verlag, 2009, pp. 14–28. [14] T. Watanabe, “Acpi implementation on freebsd,” in Proceed- ings of the FREENIX Track: 2002 USENIX Annual Technical Conference, Berkeley, CA, USA, 2002, pp. 121–131. [15] S. Park, W. Jiang, Y. Zhou, and S. Adve, “Managing energy- performance tradeoffs for multithreaded applications on mul- tiprocessor architectures,” SIGMETRICS Perform. Eval. Rev., vol. 35, no. 1, Jun. 2007.