Calton pu experimental methods on performance in cloud and accuracy in big data analytics

Experimental Methods on
Performance in Clouds, …
Calton Pu
CERCS and School of Computer Science
Georgia Institute of Technology
1

2
Ancestors of Clouds (Hardware)
 Data processing centers (~1960s)
 Supercomputers, Grids (~1970s)
 P2P, SETI@Home (~1999), botnets
 Utility computing and data centers
(~2000s)

Modern Clouds
 Amazon data centers in early 2000s was only at 10%
capacity – introduction of Amazon Web Service (AWS)
in 2006
 2007 – Google & IBM join Cloud Computing research
(NSF), Microsoft joins in 2010, NSFCloud in 2014
3

Cloud & Big Data (company)
4
 Google Inc
 Third market cap in the world ($390B in 2014)
 Probably more data than anyone else
 13 declared
data centers
around the
world; drawing
260MW in 2011
(2,259,998
MWh total).

Cloud & Big Data (government)
5
 NSA (maybe more than Google)
 Utah Data Center, drawing 65MW (about half of
Salt Lake City)

Some Concrete Offerings
7
SaaS
PaaS
IaaS
Hardware
HigherLevelServices

Cloud Service Models
 Software as a Service (SaaS) [not covered]
 Use provider’s applications over a network
 Example: Salesforce.com
 Platform as a Service (PaaS) [not covered]
 Use system-level services (e.g., database) to
develop and deploy customer applications
 Example: Google App Engine, MS Azure
 Infrastructure as a Service (IaaS)
 Rent processing, storage, network
 Example: Amazon EC2, Emulab
8

Amazon EC2 (circa 2010)
 Elastic Block Store, CloudWatch,
Automated Scaling
9
Instance Type Memory Compute
(1GHz virt)
Local
Storage
Data Price
(hour)
Std Small 1.7 GB 1 X 1 160 GB 32b $0.085
Std Large 7.5 GB 2 X 2 850 GB 64-bit $0.34
Std X-Large 15.0 GB 4 X 2 1.7 TB 64-bit $0.68
High Memory X-L 17.1 GB 2 X 3.25 420 GB 64-bit $0.50
High Memory DB X-L 34.2 GB 4 X 3.25 850 GB 64-bit $1.00
High Memory QD X-L 68.4 GB 8 X 3.25 1.7 TB 64-bit $2.00
High CPU Medium 1.7 GB 2 X 2.5 350 GB 32-bit $0.17
High CPU X-L 7.0 GB 8 X 2.5 1.7 TB 64-bit $0.68
Cluster Compute X-L 23 GB 33.5 1.7 TB 64-bit $1.60

AWS Free Tier
 New AWS customers can get started with
Amazon EC2 for free. Each month for 1 year:
 750 hours of EC2 running Linux, RHEL, or SLES
t2.micro
 750 hours of EC2 running MS Windows Server t2.micro
 750 hours of Elastic Load Balancing plus 15 GB data
 30 GB of Amazon Elastic Block Storage in any
combination of General Purpose (SSD) or Magnetic,
plus 2 million I/Os and 1 GB of snapshot storage
 15 GB of bandwidth out
 1 GB of Regional Data Transfer
10

Resources Available for
Experiments
 Our own cluster (about 50 nodes)
 GT/CERCS cluster (about 800 nodes)
 Emulab (Utah), PROBE (CMU)
 A few hundreds nodes, a few dozen available
 CloudLab replaces Emulab (May 2015)
 Other partner clusters in companies and
universities
11

Challenges in Cloud Adoption
 From user’s point of view
 Data security/privacy (in a public cloud)
 Performance concerns
 From provider’s point of view
 Up-front hardware costs high, rapid aging
 Hardware capacity generally under-utilized
 Low scalability of most enterprise applications
 Negotiating SLA contracts and price structures
12

Cloud Management Challenge
 High utilization brings higher ROI
 Achievable by predictable/stationary workloads
 Mission-critical applications need SLA
 Resource Utilization Paradox
 Good ROI requires high utilization (many
papers on consolidation claim >90% utilization)
 Consistent reports of 18% average utilization
 Cloud management is more challenging
than we hoped initially
13

Representative Cloud Workloads
 Cloud workload – amount of processing that a
cloud has to do at a given time
 Use workloads to test a particular type of
application
 Types of workloads:
 E-commerce
 OLTP
 Forum/Message board
 Web 2.0 application
 MapReduce
14

15
Example 1: RUBiS Benchmark
 E-commerce applications (eBay auctions)
 N-tier (3 or more tiers): web servers, application
servers, database servers
 26 web interactions, requiring sophisticated
models, e.g., Layered Queuing Network Models

16
Typical Execution Environment
Client Browsers Tomcat
Servlet Engines
MySQL
DB Servers
Apache
Web Servers
HTTP
AJP13 JDBC
Hardware resource
Xen Hypervisor
D0 VM1 VM2 VM3
Hardware resource
Host OS
Hypervisor
Hardware resource
Host OS
Hypervisor
Virtual
Mgm.
Inferface

17
Meta-Model of RUBiS
 Layered Queuing Network Model of RUBiS
(3-Tier): one for each of 26 interactions;
total of 78 sub-models

18
Web Server Sub-Model
 3-tier: simplest implementation of RUBiS
 AboutMe (1 of 26), customized for 3-tier

Challenges in Modeling
 Layered Queuing Network Models become
very complex even for “simple” n-tier
applications
 Experiments are needed anyway
 Setting the values for various sub-models
 Need detailed experiments for a variety of
configurations
 Let’s try “pure” experiments
19

20
Example 2: RUBBoS Benchmark
 Another e-commerce workload
 Bulletin Board (Slashdot)
 DB server bottleneck, C-JDBC as load
balancer
 24 web interactions
 Configuration notation: 1-2-1-9
 1 web server, 2 app servers, 1 C-JDBC server,
9 DB servers
 Emulab (a relatively modest testbed)

22
Example Software Configuration

23
Sample Configuration (1-2-1-3)
Apache
Tomcat
C-JDBC
MySQL
PostgreSQL

24
Low end
DB server
Experiment Design
Web
server
1
App
server
1-3
C-
JDBC
1
DB
server
1-9
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
Normal
DB server

25
MySQL Throughput (Low-Cost)
Better scalability (different query processing strategies)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

What’s Different about Clouds (1)
Traditional benchmarks
 Static configuration: HW,
SW, workload range
 Find the “best tuning” to
achieve highest
throughput
Cloud benchmarks
 Dynamic and many
configurations
 Find representative
throughput and response
time for each
configuration
(reproducible results by
other users)
26

27
MySQL Throughput for R/W
Mix (read one, write all)
0
100
200
300
400
500
600
Throughput(ops/s)
Workload
1-1-1-8ML
1-2-1-4ML
1-2-1-5ML
1-2-1-6ML
1-2-1-7ML
1-2-1-9ML
1-3-1-9ML
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

28
1-2-1-9ML Configuration Data
 Clear bottleneck indicated by leveled
performance (previous slide)
 All high workloads (more than 4000)
 Same leveling for other configurations
 Average resource consumption on DB
servers quite low (CPU and disk I/O)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

29
Web Server CPU Utilization
(1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

30
Application Server CPU
Utilization (one of 1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

31
C-JDBC Server CPU Utilization
(1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

32
DB Server CPU Utilization
(one of 1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

33
DB Server Disk I/O Bandwidth
Utilization (one of 1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

34
Observations on 1-2-1-9ML
 No CPU bottlenecks anywhere
 Disk I/O bandwidth on the DB servers has
a slight peak at the high value spectrum
boundary
 An infrequent disk I/O bottleneck, which cannot
explain the observed lack of overall system
performance
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

36
Maximum of Disk I/O Bandwidth
Utilization (all of 1-2-1-9ML)
MySQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
PostgreSQL
Browse-
only
Read-
Write
Wait-
1st
Wait-
All
DB server

What’s Different about Clouds (2)
Traditional benchmarks
 Balanced configuration:
near-full utilization of all
resources for a stable
workload
 Single bottleneck, high
average utilization
Cloud benchmarks
 Almost always start all
low utilization
 No stable bottlenecks
 Often stays with all low
average utilization, but
performance remains low
 Found new phenomenon:
Multi-bottlenecks
37

Why Automation?
 Traditional benchmarks such as TPC and
SPEC answer the question
 For a given hardware/software configuration
and workload, what is the highest achievable
throughput?
 In the cloud this become very difficult due
to various dimensions:
 Horizontal scalability
 Vertical Scalability
 Variety of software components
38

Solution: Expertus
 A framework for large-scale benchmark
measurements through flexible automation
of experiments.
 It creates the scripts through a multi-stage
code generation process.
 Easy to plug new benchmarks and clouds
 Enables cloud measurements at a scale
that is beyond manual management of
benchmarks
39

Experiment Summary
 Over 500 different hardware configurations
(i.e., varying node number and type)
 Over 10,000 different software
configurations (i.e., varying software and
software setting).
 Over 100,000 computing nodes in various
cloud environments:
 Emulab
 Amazon EC2
 Open Cirrus.
40

 Many configuration variables
 Many applications, clear differences among them
 Many cloud offerings, non-obvious differences
 Different software/hardware configurations may
produce the same or different results
 Experimental setup challenges
 Dependencies among components
 Systematic search through the potentially large
configuration space
Experimental Challenges
42

43
Elba: Automating Measurements
0
20
40
60
80
100
L/ L
2H/ L
Analyzed
Result
79.515.298.2/97.246.2/36.22H/H
98.221.397.3/98.236.4/46.62H/L
87.25.398.265.3H/H
98.322.398.766.8H/L
78.311.297.999.8L/L
MemoryCPUMemoryCPU
APPServerDBServer
79.515.298.2/97.246.2/36.22H/H
98.221.397.3/98.236.4/46.62H/L
87.25.398.265.3H/H
98.322.398.766.8H/L
78.311.297.999.8L/L
MemoryCPUMemoryCPU
APPServerDBServer
Automated, Evolutionary
Staging Cycle
(0) Config. Design
Deployment
Scripts
Mulini
TBL
Analyzer
Monitors
App
Staging
Deployment
Workload
Driver
(1) Code Generation / Deployment
System Under TestWorkload Drivers
Monitor
(3) Analyzer
Monitor
Monitor
Monitor
Evaluation / Analysis
(4) Reconfiguration
(2) Execution
Automated
Adaptation
Benchmark
specs
Experiment
Spec. Lang.
Adapt.
Cost

 Automated Experiment Management
 Through Extensible, Flexible and modular code
generation
 Extensibility
 Extending the framework to support specification changes,
new benchmarks, computing clouds, and software
packages
 Flexibility
 Modification to input configuration or output configuration
without changing the source code of the framework
 Modularity
 Consists of a number of components that may be mixed
and matched in a variety of configurations
44
Benefits of Automation

 Abstraction mapping.
 External forces often drive changes
 Standards formulation/adoption
 Industry evolution
 Internal forces drive changes
 Goals, functionality refinement
 Interoperable heterogeneity.
 Heterogeneous clouds and applications
 Flexible customization.
 Experiment goals, API changes
45
Code Generation – Key
Challenges

 The code generator adopts a compiler approach
of multiple serial transformation stages.
 One type of transformation at any given stage
(e.g., cloud, operating system, application etc…)
 The number of stages is determined by the
experiment, application, software stack, operating
system, and cloud.
 At each stage Expertus uses the intermediate
XML document created from the previous stage
as the input to the current stage.
Expertus Approach
46

Multi-Stage Code Generation
47

 Create experiment specification with the
application, software packages, cloud and
experiments.
 Use Expertus and generate scripts.
 Platform configuration sets up the target cloud.
 Application deployment is to deploy the target
application on the configured cloud.
 Configure application correctly.
 Main script runs the test plan, which in fact
consists of multiple iterations.
 Upload the resource monitoring and performance
data to the data warehouse.
Experiment Automation Process
52

 Usability of the Tool.
 How quickly a user can change an existing specification
to run the same experiment with different settings
 Generated Script Types and Magnitude.
 Depends on the application, software packages,
deployment platform, number of experiments.
 Richness of the Tool.
 Magnitude of completed experiments
 Amount of different software packages, clouds, and
applications it supports
 Extensibility and Flexibility.
 Supporting new clouds
 Supporting new applications
Evaluation Metrics
55

Usability
Specification change to Code changes Number of Nodes vs. Generated Code
56

Evaluation Metrics
57

59
Complexity of the tool
Table1: Number of Experiments
Table2: Experiment size vs. NLOC

Evaluation Metrics
60

Adding a new Cloud
Template Changes Changes in Generated Code
61

Adding a new Application
62

Adding a new DBMS
63

 Significant strides towards realizing flexible
and scalable application testing for today’s
complex cloud environments.
 Over 500 different hardware configurations.
 Over 10, 000 software configurations.
 Five clouds (i.e., Emulab, EC2, Open Cirrus,
Georgia Tech cluster, and Wipro)
 Three representative applications (RUBBoS,
RUBiS, and CloudStone)
65
Usability

 Support new clouds, applications, and software
packages with only a few template line changes.
 8.21% of template line changes, to support
Amazon EC2 once we had support for the Emulab
cloud.
 Caused a 25.35% change in the generated code
for an application scenario with 18 nodes
 Switching from the RUBBoS to the RUBiS
required only a 5.66% template change
66
Flexibility and Extensibility

Configuration Planning
 Provider profit model (simplified)
67

Data Refinement in CloudXplor
70

Cloud Evaluation - CloudXplor
71

Maximal Throughput of
RUBBoS (1-2-4 R/W)
72

Throughput RUBBoS
(1-2-4 and R/W)
73

Response Time Dist.
(1-1-2 MySQL Cluster)
74

Revenue and Cost
Analysis (R/W)
75

Profit and Cost Analysis
RUBBoS (R/W)
76

Optimal Profit Configurations
77

Example: RUBBoS Benchmark
 E-commerce applications (Slashdot
Bulletin Board)
 N-tier (3 or more tiers): web servers,
application servers, database servers
 26 web interactions, requiring sophisticated
models, e.g., Layered Queuing Network
Models
78

Cloud Evaluation - Overview
 Main idea
 How and where to deploy your enterprise system in what
scenario?
 Automated empirical measurement and evaluation of
alternative platforms, configurations, and architectures
for n-tier apps in the cloud
 Hardware platforms (IaaS)
 Amazon EC2, Open Cirrus (HP), and Emulab
 System software configurations
 LAMP, MySQL Cluster (off-the-shelf RDBMS)
 Application software
 E-commerce application benchmarks (RUBBoS) 83

Cloud Evaluation - Deployment
87

Cloud Evaluation - Deployment
88

Cloud Evaluation – Infrastruct.
89

Fast-Forward A Few Years
 Using the automated experiment
generation infrastructure, we ran many
thousands of experiments
 We found several interesting phenomena
 The best: Very Short Bottlenecks that cause
Very Long Response-Time Requests
90

Latency Long Tail Problem
 At moderate CPU utilization levels (about
60% at 9000 users), 4% of requests take
several seconds, instead of milliseconds
91

Latency Long Tail: A Serious
Research Challenge
 No system resource is near saturation
 Very Long Response Time (VLRT) requests
start to appear at moderate utilization levels
(often at 50% or lower)
 VLRT requests themselves are not bugs:
 They only take milliseconds when run by
themselves
 Each run presents different VLRT requests
 VLRT requests appear and disappear too
quickly for most monitoring tools
92

Big Data & Clouds Need
Automation
 Experimental approaches
 Often the only choice (modeling too complex)
 Abundant resource availability
 Many configurations mean many experiments
and measurements
 Automated experiment generation,
execution, monitoring, and analysis
 Very interesting phenomena found (VSB)
93

End of Session
 Any Questions?
 Calton Pu (calton@cc.gatech.edu)
94

Calton pu experimental methods on performance in cloud and accuracy in big data analytics

More Related Content

What's hot (20)

Similar to Calton pu experimental methods on performance in cloud and accuracy in big data analytics (20)

More from jins0618 (20)

Recently uploaded (20)

Calton pu experimental methods on performance in cloud and accuracy in big data analytics