SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments

SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
Zhenyun Zhuang, Haricharan Ramachandra, Badri Sridharan
2029 Stierlin Ct, Mountain View, CA 94043, USA
{zzhuang, hramachandra, bsridharan}@linkedin.com
Abstract—Modern cloud computing platforms (e.g. Linux
on Intel CPUs) feature ACPI-based (Advanced Configuration
and Power Interface) mechanism, which dynamically scales
CPU frequencies/voltages to adjust the CPU frequencies based
on the workload intensity. With this feature, CPU frequency
is reduced when the workload is relatively light in order to
save energy; while increased when the workload intensity is
relatively high.
In business cloud computing environments, software prod-
ucts/services often need to “scale out” to multiple machines to
form a cluster to achieve a pre-defined aggregated performance
goal (e.g., SLA-devised throughput). To reduce business opera-
tion cost, minimizing the provisioned cluster size is critical.
However, as we show in this work, the working of ACPI
in today’s modern OS may result in more machines being
provisioned, hence higher business operation cost,
To deal with this problem, we propose a SLA-aware CPU
scaling algorithm based on business SLA (Service Level Agree-
ment aware). The proposed design rational and algorithm are
a fundamental rethinking of how ACPI mechanisms should be
implemented in business cloud computing environments. Con-
trary to the current forms of ACPI which simply adapt CPU
power levels only based on workload intensity, the proposed
SLA-aware algorithm is primarily based on current application
performance relative to the pre-defined SLA. Specifically, the
algorithm targets at achieving the pre-defined SLA as the top-
level goal, while saving energy as the second-level goal.
Keywords-ACPI; Power saving; Service level agreements;
Performance
I. INTRODUCTION
Advanced Configuration and Power Interface (ACPI) [1]
provides standards for power management by OS. ACPI
allows dynamic scaling of CPU power levels and frequen-
cies. Modern CPUs typically allow the computations under
multiple CPU frequencies and voltages. Operating at higher
frequency, a CPU is more powerful processing computing
tasks, while more power will be consumed per unit of time.
With the support of OS, the primary goal of ACPI is to save
energy through the CPU scaling of frequencies and voltages.
ACPI is particularly important in cloud computing scenarios
where computing demand is elastic and energy saving is
critical. Specifically, a mechanism called CPUfreq [2] is
implemented in Linux kernel, which enables the operating
system to scale the CPU frequency up or down in order to
save power.
To help managing the power levels, certain pre-configured
power schemes are implemented in OS. These power
schemes are referred to as governors [2]. Common governors
are performance, ondemand, userspace, etc. Among them,
the ondemand governor is enabled by default in Linux.
Ondemand governor adjusts the CPU frequencies based on
how heavy the workload is. The more intensive workload
is detected, the higher frequency will be scaled up to. On
the other hand, if the workload is detected to be light, the
CPU frequency scales down. The detection mechanism that
measures the current workload is based on sampling of par-
ticular intervals (e.g., every 10ms). During the last interval,
if the CPU usage is above a scaling threshold (e.g., CPU
95% busy), the frequency will be scaled up to maximum
frequency; otherwise, the frequency will be scaled down,
one level at a time. For instance, on a machine with Sandy
Bridge single socket machine with 6 cores and 12 CPUs, and
the CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz,
there are totally 10 levels, ranging between 1200MHz and
2001MHz. With the heaviest workload, the CPU frequency
will be 2001MHz, while the lightest workload will result in
1200MHz.
In this work, we found that current ACPI-based CPU scal-
ing mechanism, particularly the default ondemand governor,
is fundamentally inappropriate to business cloud production
environments. For example, in the production computing
environments of Internet companies, particularly the cloud
computing platforms, the primary goal is to meet the SLA
(Service Level Agreement) with the minimum business op-
eration cost, rather than blindly saving energy. The current
design of the governors does NOT take into considerations
the SLA part. Specifically, when the current performance
of an application is violating the SLA (e.g., in the form of
response latencies), the CPU frequencies should be scaled
up, irrespective how busy the CPU actually is. In other
words, without the gene of SLA considerations, current
dynamic governors may unnecessarily violate the SLAs.
Not only may current governors violate SLAs, the resulted
operation cost may also be unnecessarily increased. To
give an example, let us consider the following scenario.
Assuming a particular application (e.g., a web service) needs
to be deployed on a cluster of machines to achieve a required
aggregated throughput of 200K event/s. To make it simple,
we assume a machine can only deploy one application
instance due to certain limitations. With the default on-
demand governor, each application instance delivers 10K
event/s throughput, hence we need to deploy 20 machines
in order to meet our aggregated SLA of 200K event/s.

However, by manually scaling up the CPU frequency, the
machine could achieve higher throughput. Assuming each
application instance can delivery 20K event/s with scaled-
up CPU frequency, only 10 machines are needed - a 2X
saving on the number of machines! We have verified such
a hypothesis in our lab, which will be discussed in later
section. CPU consumed energy is only part of the entire
energy consumption (e.g., RAM, motherboard, disks), hence
minimizing the cluster size oftentimes can lead to much
more energy saving. In other words, even though running
with the default governor can save the energy on each
individual machines, thanks to the lower CPU frequencies
resulted, considering the number of machines deployed, the
total business cost of the cluster may well exceeds the
cost of a smaller cluster with manually scaled-up CPU.
Furthermore, current governors may unnecessarily increase
the CPU power consumption. Since SLAs are businesses’
primary concerns, when SLAs are met, the CPU frequen-
cies should be scaled down, irrespective of how busy the
CPUs are. Subjecting to meeting SLA, scaling down CPU
frequencies will result in less CPU energy consumption.
So we argue that as long as the performance does not
violate SLAs, such power-saving actions (i.e., scaling down
CPU frequencies) should be taken. Unfortunately, current
governors are blind to SLAs, and they only scale frequencies
based on how busy the CPU is, which is a runaway from
typical business requirements.
In this work, motivated by the weaknesses of current
governors on modern OS 1, we address this problem by
proposing an entirely new SLA-aware paradigm to dynami-
cally scale CPU frequencies. We also proposed an example
mechanism, which is a fundamental change of how ACPI-
based CPU scaling works on modern OS today. Unlike
traditional ACPI mechanisms which do not consider business
SLA requirements of the applications running on a machine,
the proposed mechanism targets at achieving the SLA as the
top-level goal, while treating saving energy as the second-
level goal. In other words, the SLA-aware algorithm will
achieve the following two goals, in the order of importance
of: (1) Meeting SLA is the primary target; when SLAs are
in jeopardy, the frequency will be scaled up to meet SLA;
(2) Subject to meeting SLAs, the CPU frequencies will be
scaled down whenever possible.
To achieve the above goals, the proposed algorithm con-
tinuously monitors the current application performance, and
scales CPU frequencies accordingly. It can also consider
the workload properties (e.g., traffic volume changes) to
further ensure the above goals by finely tuning the scaling
mechanism. The algorithm can be realized in three forms:
(1) designing a new governor; (2) dynamically choosing
different governors; (3) dynamically tuning a particular
1Note that we do not argue that the design of ACPI is wrong, instead, we
attempt to enhance the application of ACPI by making ACPI SLA-aware.
governor (e.g. ondemand). We will detail the solution in
later sections.
For the remainder of the writing, after providing some
necessary technical background in section II, we then define
and motivate the problems being addressed in this writing in
Section III. We present the designs in Section IV and present
in Section V the deployment model and how to use our
algorithm. We build a prototype and perform performance
evaluation using the prototype in Section VI. We also present
certain related works in Section VII. Finally in Section VIII
we conclude the work.
II. BACKGROUND AND SCOPE
A. Background
CPU power consumption and ACPI CPU is one of
the major components that consume power on computing
platforms. Modern CPUs can operate on different power
levels, which further depend on the voltage and the fre-
quency the CPU operates on. The power consumption of
a CPU is linearly determined by the frequency it operates.
CPU frequencies also roughly determine how powerful the
CPU is: the higher frequency, the better CPU computing
performance.
Advanced Configuration and Power Interface (ACPI) [1]
specification provides an open standard for device config-
uration and power management by the operating system.
ACPI allows dynamic scaling of CPU power levels and
frequencies. For the particular Linux/Intel platform we used,
the CPU has totally 12 power levels, with the minimum
1.2GHz and maximum 2.0GHz. Apparently, the higher the
frequency, the more power is consumed.
Governors To help managing the power levels, cer-
tain pre-configured power schemes are implemented in OS
kernel. These power schemes are referred to as gover-
nors. For instance, several governors are available with
the CPUfreq subsystem: (1) Performance Governor, which
sets CPU frequencies to the highest possible for maximum
performance. (2) Power-save Governor, which sets CPU
frequencies to the lowest possible. This can have severe
impact on the performance, as the system will never rise
above this frequency no matter how busy the processors are.
(3) On-demand Governor, which dynamically adjusts CPU
frequency. The governor monitors the CPU utilization. As
soon as it exceeds a certain threshold, the governor will
maximize CPU frequency. If the utilization is below the
threshold, the next lowest frequency is used.
B. Scope
This work assumes the availability of different CPU power
levels (e.g., frequencies) and the ability to adjust the CPU
level. The proposed algorithm adjusts the CPU levels based
on the comparison between the pre-defined performance

(a) Per-instance throughput
(b) Average CPU frequency
(c) Average CPU usage
Figure 1. Deploying multiple instances on the same machine (ACPI
ON)
SLA and current application performance, hence it requires
the SLA definition and the measurement of current perfor-
mance. To simply the presentation, we assume the SLA is
defined as single absolute value such as “throughput be
higher than 200KBps” or “response time be lower than
100ms”.
To allow for timely adaptation of CPU levels, whenever
necessary, coarsely defined SLAs need to be converted to
specific performance requirements. For example, a SLA
may be coarsely defined as “99% of the time in a day
should be achieved at higher than 200KBps”. For this SLA,
the converted SLA can be simply “throughput be higher
than 200KBps”. As another example, “99.9 percentile of
the response time should be smaller than 100ms”. For this
SLA, the converted SLA can be “response time be lower
than 100ms”. Similarly, the current application performance
needs to be measured timely according the power adaptation
periods, e.g., every 5 seconds.
III. PROBLEM DEFINITION AND MOTIVATION
SCENARIOS
A. Problem
The problem we try to address in this work is to de-
termine the computing capacity needed to achieve certain
performance goal (e.g., aggregated throughput). After certain
SLA is set (i.e., in the form of aggregated throughput of
processed events), the de facto practice in most Internet
companies such as Facebook and LinkedIn is to “scale
out” the computing infrastructure by parallelizing multiple
deployments of the same computing component. For this,
certain capacity planning is conducted to determined the
number of nodes (i.e., machines) needed, as well as the
number of computing instances are deployed on a single
node.
Capacity planning in business cloud computing environ-
ments is closely tightened to business cost. The goal is to
allocate “just enough” nodes such that the pre-determined
SLA is met. For instance, LinkedIn’s products such as
Databus [3] can be horizontally scaled up by deploying
multiple instances of the product. To reduce the number of
machines used, we need to know how many instances we
can co-locate on the same machine. Based on these results,
we can then answer further questions including: (1) Given
a performance requirement (e.g. SLA), how many machine
are needed; (2) Given a traffic volume, how many machines
are needed; etc.
During our investigations and experiments with capacity
planning of Databus, we had interesting findings regarding
an issue with ACPI that results in more-than-necessary
number of machines are needed, hence higher business
operation cost. Based on these findings, we propose an
algorithm to address the issue.
B. Production experiments
In the experiment, we’d like to determine the minimum
number of nodes needed for a pre-determined SLA. Firstly
we need to know the maximum aggregated throughput can
be achieved by a single node. To maximize the utilization
of computing resources, typically multiple homogenous in-
stances are deployed on the same node.
For ease of reputability and configurability, we used an
custom-built application which mimics our Databus product.
The application mimics the major internal mechanism of
Databus, while removing the dependence on production data.
It consists of a pair of Java components which communicates
via a TCP/IP connection. Briefly, the sending component
keeps sending out events (i.e., certain bytes of data) to the
receiving component. Upon receiving an event, the receiving
component processes the event and replies back to the
sender.

Both components are deployed on the same machine. The
machine is a Sandy Bridge single socket machine with 6
cores, 12 CPUs (i.e., hardware threads) and 64GB RAM.
The CPU is Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz.
The OS is Linux RedHat Enterprise with kernel version of
2.6.32-358.6.2.el6.x86 64.
The experiment began from deploying a single instance,
then adding more and more homogenous instances. We
would expect that the per-instance throughput will keep
dropping as we add more co-located instances due to the
competition of computing resources. We also expect that
the aggregated throughput of all deployed instances will rise
with more instances, up to a particular threshold. After the
threshold, the aggregated throughput will also drop. Such
expectations are quite normal to the usual scenarios where
we run multiple applications on the same machine.
C. Using the default OS config (ACPI enabled)
We first conduct the experiments with the default OS
configuration. Though the aggregated throughput faithfully
complies with our expectation, per-instance throughput de-
fies it. We know that deploying more instances will result
in more resource contentions (e.g., memory, CPU), hence
lower per-instance throughput. However, what we observed
is just the opposite: instead of decreasing, per-instance
throughput actually keeps increasing with more instances
(up to a peak performance), which is quite counter intuitive.
Specifically, Figure 1(a) displays the instance throughput
vs number of instances. Though from 4-instance to 10-
instance the throughput drops as expected, from 1-instance
to 4-instance, the per-instance throughput increases quite
significantly (+26%).
We found that the above observations are caused by ACPI.
Briefly, according to ACPI, CPU dynamically scales the
frequency based on the actual load for the purpose of energy
saving. When the load is light, CPU runs at lower speed,
and hence application delivers less throughput. When the
load is high, as in the case of deploying multiple instances
concurrently, CPU runs at higher speed, and hence deliver
higher per-instance application throughput.
In Figures 1(b) and (c), we display the CPU usage and
average CPU frequency under different number of instances,
respectively. We can see that with more instances deployed,
the CPU load is higher, which causes the frequencies to
go up. With more powerful CPUs, no wonder per-instance
throughput increases for the first 3 scenarios (i.e., up to 4
instances)!
In these scenarios, it is still true that more instances
will cause more resource (including CPU) contention, but
the benefits gained from increased CPU frequencies ap-
parently outweigh the performance losses associated with
the resource contention in these scenarios. However, as
we keep increasing the number of instances beyond 4,
(a) Per-instance throughput
(b) Average CPU usage
Figure 2. Deploying multiple instances on the same machine (ACPI
OFF)
the performance losses outweigh the gains, hence the per-
instance throughput begins to drop.
D. ACPI disabled
To prevent the impact of ACPI, we disable the dynamical
scaling of ACPI by maximizing the CPU frequency, hence
CPUs are running in their full swing. We show the results
in Figure 2. We can see that the performance of each per-
formance is much higher than ACPI-on when the number of
instances is lower than 5. In particular, when 2 instances are
deployed, each instance achieves about 90K Bps throughput,
significantly higher than the 63K Bps throughput when ACPI
is on. The CPU frequency is kept at 2GHz due to disabled
ACPI, and the average CPU usage keeps increasing.
E. Summary
In business cloud computing environments, capacity plan-
ning for certain applications is often needed. We found many
questions that need to answer in such context can be affected
by ACPI. Careless treatment of these questions would lead
to incorrect answers and hence draw wrong conclusions
regarding the applications capacity, hence over-spend the
operation cost.
To understand this, let us use an example to demonstrate.
Assuming our SLA requires handling of 1000 KBps of
total throughput. Further assume we can at most deploy 2
instances on a single machine due to some other constraints
including memory footprint. Based on our previous results
with ACPI-on, we might conclude that we can only achieve
up to 126 KBps (63KBps per instance * 2 instances) with

a machine. So we need at least 8 machines. However, by
disabling ACPI and maximizing CPU, each machine can
deliver 180 KBps (90 KBps per instance * 2 instances),
hence we only need 6 machines, a significant operation cost
saving.
On the other hand, ACPI is blind of our performance
requirements and SLA. The deployed applications might de-
liver unnecessarily better performance than needed. Though
better performance is desired in most scenarios, it almost al-
ways come with higher energy consumption because of CPU
power levels. Stepping back, we might ask the question: why
should we spend more energy (and business operation cost)
delivering more-than-necessary performance?!
IV. DESIGN
We have seen in Section III that the dynamical scaling
of CPU powerfulness (i.e., ACPI) on modern machines
can result in undesirable outcomes due to its unawareness
of business requirements (i.e., performance SLA). To help
address such issues, in this work we propose design and
algorithm for SLA-aware dynamic CPU scaling. The al-
gorithm primarily aims at meeting the business-determined
performance SLA, while secondarily saves CPU energy
subjecting to the SLA.
Since later we will provide several different forms of
realizations of the algorithm, and the realizations may not be
limited to only adjusting CPU frequencies, we would like to
use a generic term to refer to different types of adjustment.
Specifically, we use the term scaling up to denote the action
of adjusting to more powerful CPU levels; while scaling
down to denote adjusting to less powerful CPU levels. We
also use the term maximizing CPU to denote the adjustment
that maximizing CPU powerfulness.
A. Design goal
The goal of the algorithm is to provide just-enough CPU
powerfulness to the workload such that the SLA is not
violated. In other words, subject to SLA, the CPU powers
will be consumed as little as possible. Specifically, the
algorithm aims to achieving the following 3 goals:
• Top-level goal: Ensuring SLA when the expected
performance is about to violate the SLA, scale up CPU
levels;
• Second-level goal: Reducing energy when the ex-
pected performance far exceeds the SLA, scale down
CPU levels to reduce energy consumption;
Another relevant design goal is that the algorithm should
avoid thrashing of adjustment; in other words, it should stay
with current CPU powerfulness level as long as possible to
avoid frequent adjustments.
B. High level design
The heart of the algorithm is the CPU scaling engine,
which determines whether to scale up or scale down CPU
levels. The decision is based on a set of factors including
current application performance, and SLA specifications.
Current performance is measured by a separate com-
ponent, which continuously reports how the application
is performing. If the current performance is worse than
SLA, then the engine may decide to scale up CPU levels.
Otherwise, it may decide to scale down CPU levels. Given
the nature of this problem, the design fits well into a control-
system [4] paradigm. Though many possible algorithms
can be proposed, each with varying design tradeoffs of
accuracy, speed and complexity, in this writing, we present
a straightforward algorithm with the goal of demonstrating
the working of the solution.
One important factor is the workload itself. For instance,
the workload may follow some time-series based shapes
(e.g., between 8AM and 2PM the traffic volume is increas-
ing). These workload trend information will be used by the
engine to make even smarter decisions regarding scaling
up/down CPU levels.
C. Performance monitoring
The current performance needs to be timely monitored.
The monitoring can be done continuously or based on sam-
pling. Current performance is expressed in a way consistent
with SLA. For instance, if the SLA is in the form of response
time, then the performance monitoring will report the current
performance in the form of response time. Similarly, it can
be in the form of throughput.
D. Engine
To determine how well the current performance is com-
pared to the SLA, the algorithm can rely on two additional
threshold values: A and B. 2
These two thresholds guard
the headroom of current performance in relation to SLA,
and B is farther away from A. These two thresholds can
be in the absolute form (e.g., 20 KB/s and 40KB/s) or in
the relative form (e.g., 110% and 120%). The values of
A and B are also dynamically adapted based on historical
performance. Specifically, they can start from fixes values
as 110% and 120%. If during the last time period, the SLA
is not completely met, then the values of A and B will be
increased by 1.5X. Otherwise, they can be reduced by 1.1x.
Performance metrics can be in different forms; for easy
presentation, we assume a throughput-based performance
metric. For other performance metrics such as response time,
simple adaptations can be made to work. For throughput-
based performance, the larger is better. So B is larger than A.
If the current performance is larger than B, then CPU scale
down. If it is between the two thresholds, the CPU level will
be kept as is. If it is smaller than A, CPU will scale up. If
2Though we present the algorithm by using only two guard thresholds
of A and B, the algorithm can also easily adapt to using multiple guard
thresholds.

Performance is worse
than SLA?
Performance is worse
than SLA?
Yes
No
Scaling up CPU to
highest level
Scaling up CPU to
highest level
No
Performance much
better than SLA (>B)
Performance much
better than SLA (>B)
Performance slightly
better than SLA (>A)
Yes
No
Yes
Stay with current
CPU level
Stay with current
CPU level
Scaling down CPU
to lower level
Scaling down CPU
to lower level
No
Scaling up CPU to
higher level
Figure 3. Flow chart of the algorithm
it does not meet SLA, CPU is maximized to allow the CPU
to do its best job. The decision flow is shown in Figure 3.
E. Scaling up/down CPU
The term of scaling up CPU means to allow the CPU
to work more aggressive. It can be implemented in several
different fashions. Though it can mean a higher CPU fre-
quency, it can also be implemented by adjusting to a more
aggressive governor (e.g., performance governor), or tuning
the parameter of a particular governor to make the governor
more aggressive. Similarly, scaling down CPU can also be
implemented in several ways. We will discuss the different
types of implementations in later sections.
F. Co-locating multiple heterogenous applications
If multiple heterogenous applications need to be co-
located on the same machine, the working of algorithm
needs to be adjusted. Since different applications may
have different SLAs and performance, it could be that one
application is seeing better-than-SLA performance, while
another is not. Such scenarios can be addressed by slightly
modifying the previously presented algorithm. Though the
specific forms of solutions differ, we argue for a conservative
solution, with which the CPU adjustment is based on the
least-performing application. For instance, CPUs will be
scaled down only when all applications are outperforming
SLAs.
V. DEPLOYMENT AND USAGE
The proposed algorithm can be realized or deployed in
several embodiments: (1) a new governor and directly con-
trol the adjustment of CPU frequencies; (2) a new governor
to aggregate existing governors and dynamically change to
one of them on the fly; (3) an improved version of current
governors by dynamically tuning the governor’s tunable
parameters.
New governor directly controlling CPU frequencies The
most straightforward way is to add a new governor start
from scratch. The new governor will be exposed to OS as
one element of the governor set and has no dependency
on existing governors. Inside the governor, the adaptation
of CPU frequencies is purely based on the algorithm and
current performance compared to SLA.
New governor aggregating existing governors It can
be implemented as a new governor which is built on top
of existing governors. Current available governors have
different aggressiveness with regard to CPU power usage and
performance. For instance, the performance governor is the
most aggressive governor, while the powersave governor is
the least aggressive governor. Other governors sit in between
these two extremes. Hence, these different governors can be
aggregated by the new governor. The new governor will set
the system to a particular governor based on the algorithm.
However, this deployment will depend on the existence of
other utilized governors, hence cannot be deployed by itself.
Improved version of governors It can also be implemented
as an improved version of an existing governor which
exposes tunable parameters. These tuning knobs can be
adjusted dynamically. For instance, ondemand governor has
the up-threshold knob which controls when the CPU will
be scaled up. It also has the sampling-interval knob which
determines how frequent the sampling can be done. The
algorithm can be embedded into the existing governor and
serves as a new version of the particular governor.
VI. EVALUATION
A. Prototypes
As noted in Section V, the algorithm can be realized in
several ways. We built a prototype and use it to verify the
working of our algorithm. The prototype is implemented as a
new governor which directly controls the CPU frequencies.
We refer to the first prototype as new-governor.
The prototype is written in Python and utilize the
CPUFreq package which allows control of a set of param-
eters including the governor selection. It uses the command
of “cpupower -c all frequency-set -f freq” command.
For this new-governor prototype, every 100MHz incre-
ment is treated as a new power level of CPU. For instance,
for a CPU with frequencies ranging from 1200MHz to
2000MHz, there are totally 10 levels. Scaling-up CPU means
going to the next power level with 100MHz-higher CPU
frequency. Scaling-down CPU means going to 100MHz-
lower, while maximizing-CPU means going to 2001MHz.
The workload is a Java-based application, which keeps
allocating user-specific objects and removing the oldest
objects once the number of objects reaches a threshold. It
periodically outputs the actual object allocation throughput
achieved during last period. The throughput is compared to
the pre-defined SLA level. Based on the comparison results,
the algorithm decides on one of the 4 actions: scaling up,
scaling down, maximizing CPU, and stay no change. These
actions then map to the specific steps in each prototype.

(a) Actual application throughput
(b) Average CPU frequencies (MHz)
Figure 4. Baseline results (ondemand governor) (in second)
B. Experiment setup
To evaluate the capability of the prototypes to adapt to
different workloads, we vary the workload with regard to the
traffic intensity. By intensity, we mean the CPU-intensivess
of the workload. Specifically, we split the entire experiment
into 3 equal segments of durations. For the first segment
of duration, the workload is set to be the regular intensity.
For the second segment of duration, the workload is set to
be 80% of the regular intensity; while for the last segment
of duration, the workload is set to be 120% of the regular
intensity.
We set the performance SLA to be 60KB/s through-
put. The performance monitoring component continuously
obtains the current application performance. The current
throughput is divided by the SLA, and we obtain a per-
formance scale (denoted by p). Apparently, if p > 1.0, it
means the performance is better than SLA. While p < 1.0
means the performance is below SLA. We also set the A to
1.1 and B to 1.2, respectively. The algorithm is executed on
a second-basis.
C. Baseline results
We firstly show the baseline results using the default
ondemand governor and its default parameters. Specifically,
(a) Actual throughput of new-governor
(b) Average CPU frequencies (MHz)
Figure 5. New-governor results (in second)
the up-threshold is 95%. We only show a tiny snapshot of the
experimented period for the purpose of closer examinations
of data points. The throughput is shown in Figure 4(a). We
can see that only about 30% of the time, the performance
meets the SLA with throughput higher than 60KB/s. In other
words, SLA is met only in the segment period where the
workload is only 80% of the regular intensity.
Figures 4(b) and (c) show the average CPU frequencies
set by the ondemand governor and the corresponding CPU
usage. The average CPU frequency is barely above the
lowest frequency of 1200MHz, indicating that the CPUs
are operating at the lowest powerfulness level, completely
unsympathetic to the violated SLA. CPU usage is compara-
tively low (e.g., 34%). , which showcases the irony that the
SLA is not met while most of the CPU is idle.
D. New-governor prototype results
For the implemented new-governor prototype, we show
the actual throughput in Figure 5. We see that all the
throughput exceed SLA, which is irrespective of the work-
load intensity.
Figure 5(b) shows the average CPU frequencies set by the
new-governor. We see that when the second segment period
with less workload intensity kicks in, the CPU frequency

is automatically scaled down to save energy; while when
third segment period with heavier workload comes, the CPU
frequency is scaled up accordingly; all with SLA being
nicely met. Figure 5(c) displays the corresponding CPU
usage. We see that in the second segment period, because
of the lowered CPU levels, the CPU usage increases.
VII. RELATED WORK
A. Power consumption studies
Many works have studied the power consumption of
various computing platforms. In particular, [5] breaks down
the power consumptions of different computing components
on a laptop. Work [6] analyzes the energy-time tradeoff in
high-performance computing environments. Work [7] studies
the impact of dynamically adjusting voltage and frequencies
on web servers and concluded that doing dynamic scaling
significantly reduces power consumption for such servers.
[8] addresses the challenge of power management in het-
erogeneous multi-tier web clusters and proposes algorithms
to save more energy. Embedded real-time systems present
unique challenges in the context of power saving, and
work [9] explicitly take system reliability into consideration.
Smartphones are increasing using multi-core CPUs, and
saving power is even more critical [10]. Moreover, energy
saving problems in cloud environments and data centers are
studied in [11], [12].
Our work does not disagree with these works. On the con-
trary, we strongly believe that ACPI (and dynamical scaling
CPU power levels) saves energy in many deployment scenar-
ios. However, based on experiences and observations with
LinkedIn’s production environments, we identify certain
issues in business computing environments where SLAs are
of higher priority than energy saving. We then present our
SLA-aware algorithm specifically for such environments.
B. ACPI Implementations and Impact
Advanced Configuration and Power Interface (ACPI) [1],
[13] specification is an open standard, and several flavors
are implemented on different OS. For instance, [14] imple-
ments ACPI on FreeBSD. Work [15] studied the energy-
performance tradeoff of multi-threaded applications, and
proposes a set of models to estimate the performance slow-
down of ACPI.
VIII. CONCLUSION
In this work, we demonstrate the weaknesses that con-
ventionally implemented ACPI-based CPU dynamic scaling
mechanisms on modern OS can expose in the face of
business SLA. We then propose a SLA-aware algorithm to
prioritize SLA requirements when adjusting CPU levels. The
algorithm also saves energy when SLA is met.
REFERENCES
[1] “Advanced configuration and power interface (acpi),”
http://en.wikipedia.org/wiki/Advanced Configuration
and Power Interface.
[2] “Cpu frequency scaling,” https://wiki.archlinux.org/index.php/
CPU frequency scaling.
[3] S. Das, C. Botev, and et al., “All aboard the databus!:
Linkedin’s scalable consistent change data capture platform,”
ser. SoCC ’12, New York, NY, USA, 2012.
[4] “Control system,” http://en.wikipedia.org/wiki/Control system.
[5] A. Mahesri and V. Vardhan, “Power consumption breakdown
on a modern laptop,” in Proceedings of the 4th Interna-
tional Conference on Power-Aware Computer Systems, ser.
PACS’04, Berlin, Heidelberg, 2005.
[6] V. W. Freeh, D. K. Lowenthal, F. Pan, N. Kappiah,
R. Springer, B. L. Rountree, and M. E. Femal, “Analyzing
the energy-time trade-off in high-performance computing
applications,” IEEE Trans. Parallel Distrib. Syst., vol. 18,
no. 6, pp. 835–848, Jun. 2007.
[7] P. Bohrer, E. N. Elnozahy, T. Keller, M. Kistler, C. Lefurgy,
C. McDowell, and R. Rajamony, “Power aware computing,”
R. Graybill and R. Melhem, Eds. Norwell, MA, USA:
Kluwer Academic Publishers, 2002, ch. The Case for Power
Management in Web Servers.
[8] P. Wang, Y. Qi, and X. Liu, “Power-aware optimization
for heterogeneous multi-tier clusters,” J. Parallel Distrib.
Comput., vol. 74, no. 1, Jan. 2014.
[9] D. Zhu, “Reliability-aware dynamic energy management in
dependable embedded real-time systems,” ACM Trans. Em-
bed. Comput. Syst., vol. 10, no. 2, Jan. 2011.
[10] Y. Zhang, X. Wang, X. Liu, Y. Liu, L. Zhuang, and F. Zhao,
“Towards better cpu power management on multicore smart-
phones,” in Proceedings of the Workshop on Power-Aware
Computing and Systems, ser. HotPower ’13. New York, NY,
USA: ACM, 2013.
[11] J. Yu, Z. Hu, N. N. Xiong, H. Liu, and Z. Zhou, “An
energy conservation replica placement strategy for dynamo,”
J. Supercomput., vol. 69, no. 3, Sep. 2014.
[12] Z. Guo, Z. Duan, Y. Xu, and H. J. Chao, “Jet: Electricity
cost-aware dynamic workload management in geographically
distributed datacenters,” Comput. Commun., vol. 50, Sep.
2014.
[13] L. Duflot, O. Levillain, and B. Morin, “Acpi: Design princi-
ples and concerns,” in Proceedings of the 2Nd International
Conference on Trusted Computing, ser. Trust ’09. Berlin,
Heidelberg: Springer-Verlag, 2009, pp. 14–28.
[14] T. Watanabe, “Acpi implementation on freebsd,” in Proceed-
ings of the FREENIX Track: 2002 USENIX Annual Technical
Conference, Berkeley, CA, USA, 2002, pp. 121–131.
[15] S. Park, W. Jiang, Y. Zhou, and S. Adve, “Managing energy-
performance tradeoffs for multithreaded applications on mul-
tiprocessor architectures,” SIGMETRICS Perform. Eval. Rev.,
vol. 35, no. 1, Jun. 2007.

SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments

More Related Content

What's hot (15)

Viewers also liked (20)

Similar to SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments (20)

More from Zhenyun Zhuang (15)

Recently uploaded (20)

SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments