Simulating Heterogeneous Resources in CloudLightning

SELF-ORGANISING, SELF-MANAGING
HETEROGENEOUS CLOUD
Simulating Heterogeneous
Resources in CloudLightning
Dr. Christos
Papadopoulos-Filelis

• Modern Cloud Computing infrastructures are gradually equiped
with different types of hardware:
• Multicore processors
• Accelerators (i.e. Intel Xeon Phi, GPUs) paired with CPUs
• DFEs paired with CPUs
• CloudLightning project aims in the design of a system that can
manage these heterogeneous resources efficiently.
• CloudLightning is a self-managing, self-organizing system.
• Local decisions based on user input and needs are performed
from the system ensuring optimal execution and decreased power
consumption.
Overview of CloudLightning System

• Cloud environments allow for expansion of the underlying resources without
substantial changes in software.
• However, over-provisioning of resources was chosen as a method for assuring
service availability, leading to underutilization and extensive power consumption
(Barroso and Holzle, 2007).
• Leveraging performance based on homogeneity (adding identical general purpose
cores) poses problems such limitations on power density, heat removal, etc (Crago
and Walters, 2015).
• Currently delivery models (IaaS, PaaS, SaaS) are not designed for handling HPC
applications.
• Integration of different types of hardware and management of increasingly more
complex Cloud environments is at an early stage.
Facts and Challenges

• Clouds are complex environments. Alan Turing’s observation
“global order comes from local interactions” reveals a way of
managing and organizing without unnecessary overhead present
in hierarchical control. Thus, self-management and self-
organization is performed locally.
• Use and management of specialized high performance hardware
(i.e. GPUs, MICs, DFEs) with low Watt/FLOPS ratio.
• Limit overprovisioning by adaptively peaking optimal sets
(coalitions) of resources for each application.
• Specialized Template based system for characterizing
applications.
• New delivery model, namely HPC Resource-as-a-Cloud.
CloudLightning Perspective

• The life-cycle of the CloudLightning service begins by the Enterprise
Application Developer (EAD).
• EAD submits a Blueprint to the Gateway Service.
• Blueprint is a graph representing the workflow of Services collectively
composed to automate a particular task or business process.
• Gateway Service is the front-end of the CL system providing a unified
access interface as well as a graphical interface for EADs.
• Getaway receives resource options capable to service the Blueprint.
Moreover enables EAD to select and deploy the Blueprint.
• The Gateway service contacts the “Cell” which contains the self-managing
self-organizing system.
• CL system is typically composed of multiple cells.
Architecture

Architecture
• Cell is associated to one
geographical region.
• Each Cell is composed
of groups of resources
called vRacks.
• vRack maintains
information about
servers with same type
of physical resources.
• vRack Manager
manages a logical group
of these resources
called a coalition.
• vRack Group is
composed of vRacks
with same type of
resources.

• Three HPC use cases have been chosen to validate CL system:
• Genomics
• Oil and Gas exploration
• Ray tracing
• General Sparse and Dense matrix computations should be also investigated.
• These use cases require large scale computational infrastructures, which are
costly to be build and operate on-site.
• Cloud services are an effective choice for decreasing the cost of these
applications.
• These applications are suitable for use with modern accelerator type hardware
i.e.:
• Genomics: The local sequence alignment can be performed in DFEs
(Smith-Waterman algorithm).
• Oil and Gas exploration: The Real Time Migration (RTM) as well as the
Open Porous Media framework based simulations can be efficiently
accelerated in GPUs.
• Ray-Tracing: A variety of ray tracing engines exist for accelerators such
as Intel Embree that supports MICs and NVIDIA Optix that supports
GPUs.
Use Cases

• Resource characterization involves defining the parameters that describe
the execution of applications with respect to underlying hardware.
• The process of characterization becomes significantly more complex
especially in the case of heterogeneous hardware.
• The performance of modern processors is affected by an extensive
number of parameters. Thus, modelling such hardware posses significant
difficulties.
• In order to limit the number of parameters required and obtain more
accurate metrics, for the large scale simulation of the system, differential
metrics with respect to a baseline executions can be used.
• Moreover, the chosen tests should resemble the “type” of computational
work.
Resource Characterization

• Examples of applications that can be used for characterization of
resources with respect to use cases are the following:
• Genomics:
• For CPUs: MUMer, (Kurtz et al., 2004)
• For GPUs: MUMerGPU, (Schatz et al., 2007)
• For DFEs: Smith Waterman, (Maxeler, 2015)
• Ray-Tracing:
• For CPUs: Embree, (Embree, 2016).
• For GPUs: NVIDIA OptiX, (NVIDIA, 2016).
• For MICs: Embree, (Embree, 2016).
• Using CPU executions and acquiring metrics such as Performance, with
respect to execution time, or GFLOPS and Energy consumption, a
baseline set of results can be collected and used to produce indexes of
performance. For example for Ray Tracing:
1

• The chosen type of indexes simplifies computation of accurate metrics
with respect to chosen applications.
• They can be easily induced with more characteristics such as energy
consumption, scalability etc.
• They are dimensionless numbers and can be combined with each other to
produce collective indexes to characterize the performance of a system
based on various applications.
• They simplify the simulation process from the aspect of simulating the
expected time of completion for a task based on the input data and
baseline metrics.
• Baseline metrics can be obtained using high end CPUs. Thus, enabling
easy realization of the improvement gained by using the CL system for a
given application.

Performance Index
{
CPU Performance:
Oil and Gas Exploration and Sparse and Dense Matrix Computations
C1: SGEMM
C2: DGEMM
C3: Dense LU Factorization (SGESV, DGESV)
C4: Sparse LU Factorization
C5: SFFT
C6: DFFT
GPU Performance:
C1: SGEMM
C2: DGEMM
C5: SFFT
C6: DFFT
MIC Performance:
C1: SGEMM
C2: DGEMM
C5: SFFT
C6: DFFT
}

• Oil and Gas exploration is not computationally monolithic since it involves
dense and sparse matrix computations which have different memory
access patterns and parallel performance.
• Oil and Gas exploration involves general sparse matrix computations
including: dense and sparse matrix multiplication and solution of large
sparse and dense linear systems.
• Thus, metrics for this use case can be obtained by a set of tests involving
general sparse and dense matrix computations and not perform oil and
gas simulations with variable data input.
• Moreover, these metrics can be described by general sparse and dense
matrix computations present most large scale simulations including in
fields such as: Computational Fluid Dynamics, Computational
Astrophysics or Computational Finance.

• Two major categories of Cloud Simulation tools:
• Discrete Event Based (DES): Avoid building and processing small
simulation objects (like packets). Instead, the effect of object
interaction is captured. (Better performance, Less accuracy).
• Packet Level: Whenever a data message has to be transmitted
between simulator entities a packet structure with its protocol headers
is allocated in the memory and all the associated protocol processing
is performed.
• The is no “Holy Grail” for Cloud Simulation.
• Various simulation tools can be used to simulate a cloud environment
(CloudAnalyst (DES), CloudSched (DES), CloudSim (DES), DCSim
(DES), GDCSim, GreenCloud (Packet Level), iCanCloud (DES),
NetworkCloudSim)
Simulation Tools

Simulation Tools
GUI Virtualization Scheduling
Network
Models
Energy
Models
Parallel
Experiments
CloudAnalyst ✓ ✓ ✓
Limited
(latency)
✗ ✗
CloudSim
(original version)
✗ ✓ ✓
Limited
(latency)
Limited ✗
CloudSched ✓ ✓ ✓ Limited ✓ ✗
DCSim ✗ ✓ ✓ ✗ ✓ ✗
GDCSim ✗ ✓ ✓ ✗ ✓ ✗
GreenCloud ✗ ✗ Limited ✓ ✓ ✗
iCanCloud ✓ ✓ ✓ ✓ ✗ ✓
NetworkCloudSim
(integrated into
CloudSim)
✗ ✓ ✓
✓ (latency &
bandwidth)
Limited ✓

• The CL system has various components and services not present in
todays Cloud systems, thus classical simulation environments cannot be
used.
• These components and services include: Coalition formation mechanism,
vRack, vRack Manager, complex network communications, etc.
• Moreover, the different strategies for coalition formation, i.e static or
dynamic, impact the system and affect the complexity of the simulation.
• Simulating the CL system requires the simulator to be inherently parallel
since the increased complexity of the system results in increased
computational work.
• CL system targets three distinct use cases that can be modeled by the
aforementioned dimensionless indexes.
Elements of Simulation

• In order to accurately simulate a CL based cloud environment the time scale
should be defined. A more general approach will be to discretize time in terms
of a time unit (tu). For simplicity let us consider 1 tu=1 second.
• A task can be simulated as follows:
Task
{
Number of Computational Units: W
Required Hardware: X
Application: Y
Time Units: Z
}
• Each task occupies W hardware instances of computational capability X for an
application Y and for Z time units.
• Required Hardware defines what type(s) of hardware is required and Y is the
type of application.

• Similarly a coalition of capable hardware can be described by the
following parameters:
Coalition
{
Number of Computational Units: CU
Compute Capability: CC
Interconnection: I
Storage Interconnection: SI
Power Consumption: P
Cost: CO
Server Initialization: S
}
• Compute Capability is a dimensionless index computed with the
procedure presented in the characterization of hardware. Power
Consumption and Cost are dimensionless numbers measured with
respect to a baseline.
• Interconnection and Storage Interconnection is measured in (Gbps).
• Server initialization is measured in (tu).

• The hardware is engaged for the number of time units prescribed by a
task (Z). An amount of time (S) is required for initializing the hardware in
case it was on a sleep state. Moreover, before the Task begins execution,
initialization time to a functional state is required: transfer of software
images, installation of appropriate libraries, mounting of storage, etc.
Thus, the real time a Task requires is:
• Time of Execution (tu) = S+Z+α+β
• where α is the time required by the hardware to be functional and β the
time required by the hardware to return to idle waiting for the next Task.
• α mostly depends on the speed of the network as well as on the number
of users (N) occupying the same storage server simultaneously.
• α (tu) = (Size of the image (GB))/(SI(Gbps)*tu/(N*8*109)).
• β is the time required to free the resource and return it to an idle state.
• Finally for the network a linear model is considered:
• delay=latency+size (GB)/ BW (GBps)

• Finally, a custom simulation framework will be used since CL is a
unique Cloud Computing environment.
• The presented simulation entities are enhanced in order to
describe different types of deployment such as containers and
bare metal images.
• The simulator must be able to cope with static and dynamic
coalition choices as well as update the list of static coalitions
based on users’ choices.
• Auto-scaling model should be introduced in the simulation for the
three use cases.
• This parallel hybrid DES-packet level simulation scheme is
expected to describe adequately the CL system.
Elements of Simulation and Future Work

THANK YOU
Dr. Christos Papadopoulos-Filelis
cpapad@ee.duth.gr

Simulating Heterogeneous Resources in CloudLightning

More Related Content

What's hot (20)

Similar to Simulating Heterogeneous Resources in CloudLightning (20)

More from CloudLightning (8)

Recently uploaded (20)

Simulating Heterogeneous Resources in CloudLightning