Exposing the Cost of Performance Hidden in the Cloud

Exposing the Cost of Performance
Hidden in the Cloud
Neil Gunther @DrQz
Performance Dynamics Consulting, Castro Valley, California
Mohit Chawla @a1cy
Independent Systems Engineer, Hamburg, Germany
Performance Dynamics Co.
CMG CLOUDXCHANGE Event
10am Paciﬁc (5pm UTC), June 19, 2018
Video on CMG’s YouTube channel
c 2018 Performance Dynamics Co. Exposing the Cost of Performance June 20, 2018 1 / 35

Exposing the Cost of Performance
Hidden in the Cloud
Neil Gunther and Mohit Chawla
Abstract
Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also
reintroduces an old mainframe concept — chargeback1
— which thereby
rejuvenates the need for traditional performance and capacity management in the
new cloud context. Combining production JMX data with an appropriate
performance model, we show how to assess fee-based Amazon AWS conﬁgurations
for a mobile-user application running on a Linux-hosted Tomcat cluster. The
performance model also facilitates ongoing cost-beneﬁt analysis of various EC2
Auto Scaling policies.
1
Chargeback underpins the cloud business model, especially for hot application development, e.g., “Microsoft wants every
developer to be an AI developer, which would help its already booming Azure Cloud business do better still: AI demands data,
which requires cloud processing power and generates bills.” —The Register, May 2018

AWS cloud environment
Outline
1 AWS cloud environment
2 Performance data validation
3 Initial capacity model
4 Improved capacity model
5 Cost of Auto Scaling variants
6 Cloudy economics

Application Cloud Platform
Entire application runs in the Amazon cloud
Mobile Internet users
ELB load balancer
Auto Scaling (A/S) group
AWS EC2 cluster
Mobile users make requests to Apache
HTTP-server2
via ELB on EC2
Tomcat thread-server3
on EC2 calls external
services (belonging to 3rd parties)
Auto Scaling controls number of EC2 instances
based on incoming traffic and configured A/S
policies
ELB balances incoming traffic across all EC2
nodes in AWS cluster
2
Versions 2.2 and 2.4
3
Versions 7 and 8

Request Processing
On a single EC2 instance:
1 Incoming HTTP Request from mobile user processed by Apache + Tomcat
2 Tomcat then sends multiple requests to External Services based on original request
3 External services respond and Tomcat computes business logic based on all those
Responses
4 Tomcat sends the ﬁnal Response back to originating mobile user

Performance Tools and Scripts
JMX (Java Management Extensions) data from JVM
jmxterm
VisualVM
Java Mission Control
Datadog dd-agent
Datadog — also integrates with AWS CloudWatch metrics
Collectd — Linux performance statistics collection
Graphite and statsd — application metrics collection & storage
Grafana — time-series data plotting
Custom data collection scripts
R statistical libs and RStudio IDE
PDQ performance modeling lib

Performance data validation
Outline
6 Cloudy economics

Production Data Collection
1 Raw performance metrics:
Performance data primarily collected by datadog (dd-agent)
Mobile-user requests are analyzed as a homogeneous workload
JMX provides a GlobalRequestProcessor Mbean:
requestCount: total number of requests
processingTime: total processing time for all requests
2 Derived performance metrics:
Convert requestCount to a rate in datadog conﬁg to get average
throughput Xdat as requests/second
Average request processing time (seconds) is then derived as
Rdat =
processingTime
T
T
requestCount
during the same measurement interval, e.g., T = 300 seconds

Concurrency and Service Times
Apply Little’s law to derive additional performance metrics: concurrency (N)
and service time (S) from data
1 Little’s Law — macroscopic version
N = X ∗ R (gives concurrency)
Nest is the calculated or estimated number of concurrent requests in
Tomcat during each measurement interval
Verify correctness by comparing Nest with measured number of
threads Ndat in the service stage of Tomcat
We ﬁnd Nest ≡ Ndat
2 Little’s Law — microscopic version
U = X ∗ S (gives service time)
Udat is the measured processor utilization reported by dd-agent
(as a decimal fraction, not %)
Already have throughput X reqs/sec from collected JMX data
Estimated service time metric is S = U/X

Reduced EC2 Instance Data
These few metrics will be used to parameterize our capacity model
Timestamp, Xdat, Nest, Sest, Rdat, Udat
1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120
1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420
1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980
1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700
1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860
1486772700000, 528.587722, 201.187500, 0.000914, 0.366283, 0.483160
1486773000000, 533.439054, 202.600006, 0.000892, 0.378207, 0.476080
1486773300000, 531.708059, 208.187500, 0.000909, 0.392556, 0.483160
1486773600000, 532.693783, 203.266663, 0.000894, 0.379749, 0.476020
1486773900000, 519.748550, 200.937500, 0.000895, 0.381078, 0.465260
...
Unix Timestamp interval between rows is 300 seconds
Little’s law gives relationships between above metrics:
1 Nest = Xdat ∗ Rdat is macroscopic LL
2 Udat = Xdat ∗ Sest is microscopic LL
3 Time-averaged over T = 300 sec sampling intervals

Initial capacity model
Outline
6 Cloudy economics

Time Series View
18:00 23:00 04:00 09:00 14:00
0200400600800
UTC time (hours)
RequestrateX(t)
July 2016 Throughput X(t)

Time-Independent View
N
X
Thread-limited Throughput
N
R
Thread-limited Latency
Queueing theory tells us what to expect:
Relationship between metrics, e.g., X and N
Number of requests is thread-limited to N ≤ 500 typically
Throughput X approaches a saturation ceiling as N → 500 (concave)
Response time R grows linearly, aka “hockey stick handle” (convex)

Production X vs. N Data – July 2016
0 100 200 300 400 500
02004006008001000
Production Data July 2016
Concurrent users
Throughput(req/s)

Interpreting X vs. N Data
0 100 200 300 400 500
02004006008001000
Concurrent users
Throughput(req/s)
PDQ Model of Production Data July 2016
Nopt = 174.5367
thrds = 250.00
Data
PDQ

Interpreting R vs. N Data
0 100 200 300 400
0.00.20.40.60.8
Concurrent users
Responsetime(s)
PDQ Model of Production Data July 2016
Nopt = 174.5367
thrds = 250.00
Data
PDQ

Outstanding Questions
PDQ July model looks good visually but ...
Requires ∼ 350 “dummy” queues internally to get correct Rmin
Service time assumed to be CPU time ∼ 1 ms (see later)
What do dummy queues represent in Tomcat server?
Successive polling to external services?
Some kind of hidden parallelism?
October 2016 data breaks July PDQ model Why?

Improved capacity model
Outline
6 Cloudy economics

Production X vs. N Data – October 2016
0 100 200 300 400 500
02004006008001000
Production data Oct 2016
Concurrent users
Throughput(req/s)
Too much data “clouded” the July 2016 analysis

Interpreting X vs. N Data
0 100 200 300 400 500
02004006008001000
Concurrent users
Throughput(req/s)
PDQ Model of Oct 2016 Data

Interpreting R vs. N Data
0 100 200 300 400
0.00.20.40.60.8
Concurrent users
Responsetime(s)
Data
PDQ

Adjusted PDQ Model
library(pdq)
usrmax <- 500
nknee <- 350
smean <- 0.4444 # Rmin seconds
srate <- 1 / smean
arate <- 2.1 # per user
users <- seq(100, usrmax, 50)
tp <- NULL
rt <- NULL
pdqr <- TRUE # PDQ Report
for (i in 1:length(users)) {
if (users[i] <= nknee) {
Arate <- users[i] * arate # total arrivals
pdq::Init("Tomcat Submodel")
pdq::CreateOpen("requests", Arate)
pdq::CreateMultiNode(users[i], "TCthreads")
pdq::SetDemand("TCthreads", "requests", smean)
pdq::SetWUnit("Reqs")
pdq::Solve(CANON)
tp[i] <- pdq::GetThruput(TRANS, "requests")
rt[i] <- pdq::GetResponse(TRANS, "requests")
....
Key differences:
Old service time based on
%CPU busy: S = 0.8 ms
Rmin dominated by time
inside external services
New service time based
on Rmin: S = 444.4 ms
Tomcat threads are now
parallel service centers in
PDQ model
Analogous to every
supermarket customer
getting their own
checkout lane

Adjusted 2016 PDQ Outputs
0 100 200 300 400 500
02004006008001000
Concurrent users
Throughput(req/s)
0 100 200 300 400
0.00.20.40.60.8
Concurrent users
Responsetime(s)
Data
PDQ

Auto Scaling knee and pseudo-saturation
0 100 200 300 400 500
02004006008001000
Concurrent users
Throughput(req/s)
A/S policy triggered when instance CPU busy > 75%
Induces pseudo-saturation at Nknee = 300 threads (vertical line)
No additional Tomcat threads invoked above Nknee in this instance
A/S spins up additional new EC2 instances (elastic capacity)

Cost of Auto Scaling variants
Outline
6 Cloudy economics

AWS Scheduled Scaling
A/S policy threshold CPU > 75%
Additional EC2 instances require up to
10 minutes to spin up
Based on PDQ model, considered
pre-emptive scheduling of EC2s (clock)
Cheaper than A/S but only 10% savings
Use N service threads to size the
number of EC2 instances required for
incoming trafﬁc
Removes expected spikes in latency
and trafﬁc (seen in time series analysis)

AWS Spot Pricing
Spot instances available at 90%
discount over On-demand pricing
Challenging to diversify instance types
and sizes across the same group, e.g.,
Default instance type
m4.10xlarge
Spot market only has smaller
m4.2xlarge type
Forces manual reconﬁguration of
application
Thus, CPU%, latency, trafﬁc, no longer
useful metrics for A/S policy
Instead, use concurrency N as primary
metric in A/S policy

Cloudy economics
Outline
6 Cloudy economics

Cloudy economics
EC2 Instance Pricing
Missed revenue?
Max capacity line
Spot instances
On-demand instances
Reserved instances
Higher
risk
capex
Lower
risk
capex
Time
Instances
Instance capacity lines4
This is how AWS sees their own infrastructure capacity
4
J.D. Mills, “Amazon Lambda and the Transition to Cloud 2.0”, SF Bay ACM meetup, May 16, 2018

Cloudy economics
Updated 2018 PDQ Outputs
0 100 200 300 400 500 600
050010001500
PDQ Model of Prod Data Mar 2018
Concurrent users
Throughput(req/sec)
Rmin = 0.2236
Xknee = 1137.65
Nknee = 254.35
0 100 200 300 400 500 600
0.00.10.20.30.40.5
PDQ Model of Prod Data Mar 2018
Concurrent users
Responsetime(s)
Rmin = 0.2236
Xknee = 1137.65
Nknee = 254.35

Cloudy economics
Performance Evolution 2016 – 2018
2016 daily users
20:00 01:00 06:00 11:00 16:00
150200250300350400450
UTC time (hours)
Userrequests(N)
2018 daily users
20:00 01:00 06:00 11:00 16:00
0100200300400500600
UTC time (hours)
Userrequests(N)
Typical numero uno trafﬁc proﬁle
Increasing cost-effective performance
Date Rmin (ms) Xmax (RPS) Nknee
Jul 2016 394.1 761.23 350
Oct 2016 444.4 675.07 300
Mar 2018 223.6 1135.96 254

Cloudy economics
Name of the Game is Chargeback
Google Compute Engine also offers reserved and spot pricing
Table 1: Google VM per-hour pricing5
Machine vCPUs RAM (GB) Price ($) Preempt ($)
n1-umem-40 40 938 6.3039 1.3311
n1-umem-80 80 1922 12.6078 2.6622
n1-umem-96 96 1433 10.6740 2.2600
n1-umem-160 160 3844 25.2156 5.3244
Similarly for Microsoft Azure
5
TechCrunch, May 2018

Cloudy economics
Microsoft Acquires GitHub (cloud) for $7.5 BB 6
GitHub Enterprise on-site or cloud instances on AWS, Azure, Google or IBM Cloud is
$21 per user per month
From Twitter:
“Supporting the open source ecosystem is way more important to MS than anything else—the revenue they make from
hosting OSS-based apps on Azure in the future will dwarf their current devtools revenue.”
“[MS] isn’t the same company that [previousy] hated on open source, mostly because it’s [now] symbiotic to their hosting
business. They didn’t start supporting open source from altruism!”
6
NOTE: That’s Bs, as in billions, not Ms

Cloudy economics
Summary
Cloud services are more about economic beneﬁt for
the hosting company than they are about technological
innovation for the consumer 7
Old-fashioned mainframe chargeback is back! 8
It’s incumbent on paying customers to minimize their
own cloud services costs
Meaningful cost-beneﬁt decisions require ongoing
performance analysis and capacity planning
PDQ model presented here is a simple yet insightful
example of cloud sizing and performance tools 9
Queueing model framework helps expose where
hidden performance costs actually reside
You only have the cloud capacity that you pay for
7
Not just plug-and-play. More like pay-and-pay!
8
Chargeback had disappeared with the advent of non-monolithic client-server architectures
9
PDQ Workshop is available at a discount to CMG members. Email classes@perfdynamics.com for details.

Cloudy economics
Questions?
www.perfdynamics.com
Castro Valley, California
Training — including the PDQ Workshop
Blog
Twitter
Facebook
info@perfdynamics.com — any outstanding questions
+1-510-537-5758

Exposing the Cost of Performance Hidden in the Cloud

More Related Content

What's hot (18)

Similar to Exposing the Cost of Performance Hidden in the Cloud (20)

Recently uploaded (20)

Exposing the Cost of Performance Hidden in the Cloud