SlideShare a Scribd company logo
SELF-ORGANISING, SELF-MANAGING
HETEROGENEOUS CLOUD
Simulating Heterogeneous
Resources in CloudLightning
Dr. Christos
Papadopoulos-Filelis
• Modern Cloud Computing infrastructures are gradually equiped
with different types of hardware:
• Multicore processors
• Accelerators (i.e. Intel Xeon Phi, GPUs) paired with CPUs
• DFEs paired with CPUs
• CloudLightning project aims in the design of a system that can
manage these heterogeneous resources efficiently.
• CloudLightning is a self-managing, self-organizing system.
• Local decisions based on user input and needs are performed
from the system ensuring optimal execution and decreased power
consumption.
Overview of CloudLightning System
• Cloud environments allow for expansion of the underlying resources without
substantial changes in software.
• However, over-provisioning of resources was chosen as a method for assuring
service availability, leading to underutilization and extensive power consumption
(Barroso and Holzle, 2007).
• Leveraging performance based on homogeneity (adding identical general purpose
cores) poses problems such limitations on power density, heat removal, etc (Crago
and Walters, 2015).
• Currently delivery models (IaaS, PaaS, SaaS) are not designed for handling HPC
applications.
• Integration of different types of hardware and management of increasingly more
complex Cloud environments is at an early stage.
Facts and Challenges
• Clouds are complex environments. Alan Turing’s observation
“global order comes from local interactions” reveals a way of
managing and organizing without unnecessary overhead present
in hierarchical control. Thus, self-management and self-
organization is performed locally.
• Use and management of specialized high performance hardware
(i.e. GPUs, MICs, DFEs) with low Watt/FLOPS ratio.
• Limit overprovisioning by adaptively peaking optimal sets
(coalitions) of resources for each application.
• Specialized Template based system for characterizing
applications.
• New delivery model, namely HPC Resource-as-a-Cloud.
CloudLightning Perspective
• The life-cycle of the CloudLightning service begins by the Enterprise
Application Developer (EAD).
• EAD submits a Blueprint to the Gateway Service.
• Blueprint is a graph representing the workflow of Services collectively
composed to automate a particular task or business process.
• Gateway Service is the front-end of the CL system providing a unified
access interface as well as a graphical interface for EADs.
• Getaway receives resource options capable to service the Blueprint.
Moreover enables EAD to select and deploy the Blueprint.
• The Gateway service contacts the “Cell” which contains the self-managing
self-organizing system.
• CL system is typically composed of multiple cells.
Architecture
Architecture
• Cell is associated to one
geographical region.
• Each Cell is composed
of groups of resources
called vRacks.
• vRack maintains
information about
servers with same type
of physical resources.
• vRack Manager
manages a logical group
of these resources
called a coalition.
• vRack Group is
composed of vRacks
with same type of
resources.
• Three HPC use cases have been chosen to validate CL system:
• Genomics
• Oil and Gas exploration
• Ray tracing
• General Sparse and Dense matrix computations should be also investigated.
• These use cases require large scale computational infrastructures, which are
costly to be build and operate on-site.
• Cloud services are an effective choice for decreasing the cost of these
applications.
• These applications are suitable for use with modern accelerator type hardware
i.e.:
• Genomics: The local sequence alignment can be performed in DFEs
(Smith-Waterman algorithm).
• Oil and Gas exploration: The Real Time Migration (RTM) as well as the
Open Porous Media framework based simulations can be efficiently
accelerated in GPUs.
• Ray-Tracing: A variety of ray tracing engines exist for accelerators such
as Intel Embree that supports MICs and NVIDIA Optix that supports
GPUs.
Use Cases
• Resource characterization involves defining the parameters that describe
the execution of applications with respect to underlying hardware.
• The process of characterization becomes significantly more complex
especially in the case of heterogeneous hardware.
• The performance of modern processors is affected by an extensive
number of parameters. Thus, modelling such hardware posses significant
difficulties.
• In order to limit the number of parameters required and obtain more
accurate metrics, for the large scale simulation of the system, differential
metrics with respect to a baseline executions can be used.
• Moreover, the chosen tests should resemble the “type” of computational
work.
Resource Characterization
• Examples of applications that can be used for characterization of
resources with respect to use cases are the following:
• Genomics:
• For CPUs: MUMer, (Kurtz et al., 2004)
• For GPUs: MUMerGPU, (Schatz et al., 2007)
• For DFEs: Smith Waterman, (Maxeler, 2015)
• Ray-Tracing:
• For CPUs: Embree, (Embree, 2016).
• For GPUs: NVIDIA OptiX, (NVIDIA, 2016).
• For MICs: Embree, (Embree, 2016).
• Using CPU executions and acquiring metrics such as Performance, with
respect to execution time, or GFLOPS and Energy consumption, a
baseline set of results can be collected and used to produce indexes of
performance. For example for Ray Tracing:
1
Resource Characterization
• The chosen type of indexes simplifies computation of accurate metrics
with respect to chosen applications.
• They can be easily induced with more characteristics such as energy
consumption, scalability etc.
• They are dimensionless numbers and can be combined with each other to
produce collective indexes to characterize the performance of a system
based on various applications.
• They simplify the simulation process from the aspect of simulating the
expected time of completion for a task based on the input data and
baseline metrics.
• Baseline metrics can be obtained using high end CPUs. Thus, enabling
easy realization of the improvement gained by using the CL system for a
given application.
Resource Characterization
Performance Index
{
CPU Performance:
Oil and Gas Exploration and Sparse and Dense Matrix Computations
C1: SGEMM
C2: DGEMM
C3: Dense LU Factorization (SGESV, DGESV)
C4: Sparse LU Factorization
C5: SFFT
C6: DFFT
GPU Performance:
Oil and Gas Exploration and Sparse and Dense Matrix Computations
C1: SGEMM
C2: DGEMM
C3: Dense LU Factorization (SGESV, DGESV)
C4: Sparse LU Factorization
C5: SFFT
C6: DFFT
MIC Performance:
Oil and Gas Exploration and Sparse and Dense Matrix Computations
C1: SGEMM
C2: DGEMM
C3: Dense LU Factorization (SGESV, DGESV)
C4: Sparse LU Factorization
C5: SFFT
C6: DFFT
}
Resource Characterization
• Oil and Gas exploration is not computationally monolithic since it involves
dense and sparse matrix computations which have different memory
access patterns and parallel performance.
• Oil and Gas exploration involves general sparse matrix computations
including: dense and sparse matrix multiplication and solution of large
sparse and dense linear systems.
• Thus, metrics for this use case can be obtained by a set of tests involving
general sparse and dense matrix computations and not perform oil and
gas simulations with variable data input.
• Moreover, these metrics can be described by general sparse and dense
matrix computations present most large scale simulations including in
fields such as: Computational Fluid Dynamics, Computational
Astrophysics or Computational Finance.
Resource Characterization
• Two major categories of Cloud Simulation tools:
• Discrete Event Based (DES): Avoid building and processing small
simulation objects (like packets). Instead, the effect of object
interaction is captured. (Better performance, Less accuracy).
• Packet Level: Whenever a data message has to be transmitted
between simulator entities a packet structure with its protocol headers
is allocated in the memory and all the associated protocol processing
is performed.
• The is no “Holy Grail” for Cloud Simulation.
• Various simulation tools can be used to simulate a cloud environment
(CloudAnalyst (DES), CloudSched (DES), CloudSim (DES), DCSim
(DES), GDCSim, GreenCloud (Packet Level), iCanCloud (DES),
NetworkCloudSim)
Simulation Tools
Simulation Tools
GUI Virtualization Scheduling
Network
Models
Energy
Models
Parallel
Experiments
CloudAnalyst ✓ ✓ ✓
Limited
(latency)
✗ ✗
CloudSim
(original version)
✗ ✓ ✓
Limited
(latency)
Limited ✗
CloudSched ✓ ✓ ✓ Limited ✓ ✗
DCSim ✗ ✓ ✓ ✗ ✓ ✗
GDCSim ✗ ✓ ✓ ✗ ✓ ✗
GreenCloud ✗ ✗ Limited ✓ ✓ ✗
iCanCloud ✓ ✓ ✓ ✓ ✗ ✓
NetworkCloudSim
(integrated into
CloudSim)
✗ ✓ ✓
✓ (latency &
bandwidth)
Limited ✓
• The CL system has various components and services not present in
todays Cloud systems, thus classical simulation environments cannot be
used.
• These components and services include: Coalition formation mechanism,
vRack, vRack Manager, complex network communications, etc.
• Moreover, the different strategies for coalition formation, i.e static or
dynamic, impact the system and affect the complexity of the simulation.
• Simulating the CL system requires the simulator to be inherently parallel
since the increased complexity of the system results in increased
computational work.
• CL system targets three distinct use cases that can be modeled by the
aforementioned dimensionless indexes.
Elements of Simulation
• In order to accurately simulate a CL based cloud environment the time scale
should be defined. A more general approach will be to discretize time in terms
of a time unit (tu). For simplicity let us consider 1 tu=1 second.
• A task can be simulated as follows:
Task
{
Number of Computational Units: W
Required Hardware: X
Application: Y
Time Units: Z
}
• Each task occupies W hardware instances of computational capability X for an
application Y and for Z time units.
• Required Hardware defines what type(s) of hardware is required and Y is the
type of application.
Elements of Simulation
• Similarly a coalition of capable hardware can be described by the
following parameters:
Coalition
{
Number of Computational Units: CU
Compute Capability: CC
Interconnection: I
Storage Interconnection: SI
Power Consumption: P
Cost: CO
Server Initialization: S
}
• Compute Capability is a dimensionless index computed with the
procedure presented in the characterization of hardware. Power
Consumption and Cost are dimensionless numbers measured with
respect to a baseline.
• Interconnection and Storage Interconnection is measured in (Gbps).
• Server initialization is measured in (tu).
Elements of Simulation
• The hardware is engaged for the number of time units prescribed by a
task (Z). An amount of time (S) is required for initializing the hardware in
case it was on a sleep state. Moreover, before the Task begins execution,
initialization time to a functional state is required: transfer of software
images, installation of appropriate libraries, mounting of storage, etc.
Thus, the real time a Task requires is:
• Time of Execution (tu) = S+Z+α+β
• where α is the time required by the hardware to be functional and β the
time required by the hardware to return to idle waiting for the next Task.
• α mostly depends on the speed of the network as well as on the number
of users (N) occupying the same storage server simultaneously.
• α (tu) = (Size of the image (GB))/(SI(Gbps)*tu/(N*8*109)).
• β is the time required to free the resource and return it to an idle state.
• Finally for the network a linear model is considered:
• delay=latency+size (GB)/ BW (GBps)
Elements of Simulation
• Finally, a custom simulation framework will be used since CL is a
unique Cloud Computing environment.
• The presented simulation entities are enhanced in order to
describe different types of deployment such as containers and
bare metal images.
• The simulator must be able to cope with static and dynamic
coalition choices as well as update the list of static coalitions
based on users’ choices.
• Auto-scaling model should be introduced in the simulation for the
three use cases.
• This parallel hybrid DES-packet level simulation scheme is
expected to describe adequately the CL system.
Elements of Simulation and Future Work
THANK YOU
Dr. Christos Papadopoulos-Filelis
cpapad@ee.duth.gr

More Related Content

PPTX
Testbed for Heterogeneous Cloud
PPTX
CloudLightning Service Description Language
PPTX
load balancing in public cloud ppt
PPTX
Task programming
PDF
Mod05lec24(resource mgmt i)
PDF
Mod05lec23(map reduce tutorial)
PPTX
My Dissertation 2016
PDF
The Future of Computing is Distributed
Testbed for Heterogeneous Cloud
CloudLightning Service Description Language
load balancing in public cloud ppt
Task programming
Mod05lec24(resource mgmt i)
Mod05lec23(map reduce tutorial)
My Dissertation 2016
The Future of Computing is Distributed

What's hot (20)

PPTX
Big Data Quickstart Series 3: Perform Data Integration
PDF
Distributed Database practicals
PPTX
How to Set Up ApsaraDB for RDS on Alibaba Cloud
PDF
Jolt: Distributed, fault-tolerant test running at scale using Mesos
PPTX
VTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
PPT
Emerging computer environments- By Dr. V. Rajaraman
PPTX
MEW22 22nd Machine Evaluation Workshop Microsoft
PDF
dynamic resource allocation using virtual machines for cloud computing enviro...
PPTX
Optimal load balancing in cloud computing
PDF
F233842
PPTX
Cloud computing_Final
PDF
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
PPTX
LOAD BALANCING ALGORITHMS
PPT
Scalable analytics for iaas cloud availability
PPTX
Взгляд на облака с точки зрения HPC
PPTX
Load balancing
PPT
Comet Cloud
PDF
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
PDF
PDF
Optimized placement in Openstack for NFV
Big Data Quickstart Series 3: Perform Data Integration
Distributed Database practicals
How to Set Up ApsaraDB for RDS on Alibaba Cloud
Jolt: Distributed, fault-tolerant test running at scale using Mesos
VTU Open Elective 6th Sem CSE - Module 2 - Cloud Computing
Emerging computer environments- By Dr. V. Rajaraman
MEW22 22nd Machine Evaluation Workshop Microsoft
dynamic resource allocation using virtual machines for cloud computing enviro...
Optimal load balancing in cloud computing
F233842
Cloud computing_Final
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
LOAD BALANCING ALGORITHMS
Scalable analytics for iaas cloud availability
Взгляд на облака с точки зрения HPC
Load balancing
Comet Cloud
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Optimized placement in Openstack for NFV
Ad

Similar to Simulating Heterogeneous Resources in CloudLightning (20)

PPTX
Simulation of Heterogeneous Cloud Infrastructures
PPTX
CloudLightning Simulator
PPTX
Cluster Technique used in Advanced Computer Architecture.pptx
PDF
intercloud-global.pdf-INTERCHANGE OF GLOBAL RESOURCES
PPT
System models for distributed and cloud computing
PPTX
Cloudsim & Green Cloud
PPTX
Challenges in Cloud Computing – VM Migration
DOC
Scalable analytics for iaas cloud availability
PPTX
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
PPT
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
PPT
Cloudsim & greencloud
PPT
Consistency as a Service: Auditing Cloud Consistency
PDF
DISTRIBUED AND PARALLEL COMPUTING NOTES.pdf
PPTX
CC_Unit4_2024_Class4 Cloud Computing UNIT IV
PPTX
Scalable Data Analytics: Technologies and Methods
PDF
chap2_slidesforparallelcomputingananthgarama
PPT
Cloud Computing and Virtualization Overview by Amr Ali
PPTX
Parallel Algorithms Advantages and Disadvantages
DOCX
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
DOCX
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning Simulator
Cluster Technique used in Advanced Computer Architecture.pptx
intercloud-global.pdf-INTERCHANGE OF GLOBAL RESOURCES
System models for distributed and cloud computing
Cloudsim & Green Cloud
Challenges in Cloud Computing – VM Migration
Scalable analytics for iaas cloud availability
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Cloudsim & greencloud
Consistency as a Service: Auditing Cloud Consistency
DISTRIBUED AND PARALLEL COMPUTING NOTES.pdf
CC_Unit4_2024_Class4 Cloud Computing UNIT IV
Scalable Data Analytics: Technologies and Methods
chap2_slidesforparallelcomputingananthgarama
Cloud Computing and Virtualization Overview by Amr Ali
Parallel Algorithms Advantages and Disadvantages
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
Ad

More from CloudLightning (8)

PDF
CloudLightning and the OPM-based Use Case
PPTX
Self-Organisation as a Cloud Resource Management Strategy
PDF
CloudLightning - Project and Architecture Overview
PDF
CloudLightning at a Glance Infographic
PPTX
CloudLighting - A Brief Overview
PDF
CloudLightning - Project Overview
PDF
CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
PDF
CloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning and the OPM-based Use Case
Self-Organisation as a Cloud Resource Management Strategy
CloudLightning - Project and Architecture Overview
CloudLightning at a Glance Infographic
CloudLighting - A Brief Overview
CloudLightning - Project Overview
CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud
CloudLightning - Multiclouds: Challenges and Current Solutions

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Review of recent advances in non-invasive hemoglobin estimation
Programs and apps: productivity, graphics, security and other tools
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Simulating Heterogeneous Resources in CloudLightning

  • 1. SELF-ORGANISING, SELF-MANAGING HETEROGENEOUS CLOUD Simulating Heterogeneous Resources in CloudLightning Dr. Christos Papadopoulos-Filelis
  • 2. • Modern Cloud Computing infrastructures are gradually equiped with different types of hardware: • Multicore processors • Accelerators (i.e. Intel Xeon Phi, GPUs) paired with CPUs • DFEs paired with CPUs • CloudLightning project aims in the design of a system that can manage these heterogeneous resources efficiently. • CloudLightning is a self-managing, self-organizing system. • Local decisions based on user input and needs are performed from the system ensuring optimal execution and decreased power consumption. Overview of CloudLightning System
  • 3. • Cloud environments allow for expansion of the underlying resources without substantial changes in software. • However, over-provisioning of resources was chosen as a method for assuring service availability, leading to underutilization and extensive power consumption (Barroso and Holzle, 2007). • Leveraging performance based on homogeneity (adding identical general purpose cores) poses problems such limitations on power density, heat removal, etc (Crago and Walters, 2015). • Currently delivery models (IaaS, PaaS, SaaS) are not designed for handling HPC applications. • Integration of different types of hardware and management of increasingly more complex Cloud environments is at an early stage. Facts and Challenges
  • 4. • Clouds are complex environments. Alan Turing’s observation “global order comes from local interactions” reveals a way of managing and organizing without unnecessary overhead present in hierarchical control. Thus, self-management and self- organization is performed locally. • Use and management of specialized high performance hardware (i.e. GPUs, MICs, DFEs) with low Watt/FLOPS ratio. • Limit overprovisioning by adaptively peaking optimal sets (coalitions) of resources for each application. • Specialized Template based system for characterizing applications. • New delivery model, namely HPC Resource-as-a-Cloud. CloudLightning Perspective
  • 5. • The life-cycle of the CloudLightning service begins by the Enterprise Application Developer (EAD). • EAD submits a Blueprint to the Gateway Service. • Blueprint is a graph representing the workflow of Services collectively composed to automate a particular task or business process. • Gateway Service is the front-end of the CL system providing a unified access interface as well as a graphical interface for EADs. • Getaway receives resource options capable to service the Blueprint. Moreover enables EAD to select and deploy the Blueprint. • The Gateway service contacts the “Cell” which contains the self-managing self-organizing system. • CL system is typically composed of multiple cells. Architecture
  • 6. Architecture • Cell is associated to one geographical region. • Each Cell is composed of groups of resources called vRacks. • vRack maintains information about servers with same type of physical resources. • vRack Manager manages a logical group of these resources called a coalition. • vRack Group is composed of vRacks with same type of resources.
  • 7. • Three HPC use cases have been chosen to validate CL system: • Genomics • Oil and Gas exploration • Ray tracing • General Sparse and Dense matrix computations should be also investigated. • These use cases require large scale computational infrastructures, which are costly to be build and operate on-site. • Cloud services are an effective choice for decreasing the cost of these applications. • These applications are suitable for use with modern accelerator type hardware i.e.: • Genomics: The local sequence alignment can be performed in DFEs (Smith-Waterman algorithm). • Oil and Gas exploration: The Real Time Migration (RTM) as well as the Open Porous Media framework based simulations can be efficiently accelerated in GPUs. • Ray-Tracing: A variety of ray tracing engines exist for accelerators such as Intel Embree that supports MICs and NVIDIA Optix that supports GPUs. Use Cases
  • 8. • Resource characterization involves defining the parameters that describe the execution of applications with respect to underlying hardware. • The process of characterization becomes significantly more complex especially in the case of heterogeneous hardware. • The performance of modern processors is affected by an extensive number of parameters. Thus, modelling such hardware posses significant difficulties. • In order to limit the number of parameters required and obtain more accurate metrics, for the large scale simulation of the system, differential metrics with respect to a baseline executions can be used. • Moreover, the chosen tests should resemble the “type” of computational work. Resource Characterization
  • 9. • Examples of applications that can be used for characterization of resources with respect to use cases are the following: • Genomics: • For CPUs: MUMer, (Kurtz et al., 2004) • For GPUs: MUMerGPU, (Schatz et al., 2007) • For DFEs: Smith Waterman, (Maxeler, 2015) • Ray-Tracing: • For CPUs: Embree, (Embree, 2016). • For GPUs: NVIDIA OptiX, (NVIDIA, 2016). • For MICs: Embree, (Embree, 2016). • Using CPU executions and acquiring metrics such as Performance, with respect to execution time, or GFLOPS and Energy consumption, a baseline set of results can be collected and used to produce indexes of performance. For example for Ray Tracing: 1 Resource Characterization
  • 10. • The chosen type of indexes simplifies computation of accurate metrics with respect to chosen applications. • They can be easily induced with more characteristics such as energy consumption, scalability etc. • They are dimensionless numbers and can be combined with each other to produce collective indexes to characterize the performance of a system based on various applications. • They simplify the simulation process from the aspect of simulating the expected time of completion for a task based on the input data and baseline metrics. • Baseline metrics can be obtained using high end CPUs. Thus, enabling easy realization of the improvement gained by using the CL system for a given application. Resource Characterization
  • 11. Performance Index { CPU Performance: Oil and Gas Exploration and Sparse and Dense Matrix Computations C1: SGEMM C2: DGEMM C3: Dense LU Factorization (SGESV, DGESV) C4: Sparse LU Factorization C5: SFFT C6: DFFT GPU Performance: Oil and Gas Exploration and Sparse and Dense Matrix Computations C1: SGEMM C2: DGEMM C3: Dense LU Factorization (SGESV, DGESV) C4: Sparse LU Factorization C5: SFFT C6: DFFT MIC Performance: Oil and Gas Exploration and Sparse and Dense Matrix Computations C1: SGEMM C2: DGEMM C3: Dense LU Factorization (SGESV, DGESV) C4: Sparse LU Factorization C5: SFFT C6: DFFT } Resource Characterization
  • 12. • Oil and Gas exploration is not computationally monolithic since it involves dense and sparse matrix computations which have different memory access patterns and parallel performance. • Oil and Gas exploration involves general sparse matrix computations including: dense and sparse matrix multiplication and solution of large sparse and dense linear systems. • Thus, metrics for this use case can be obtained by a set of tests involving general sparse and dense matrix computations and not perform oil and gas simulations with variable data input. • Moreover, these metrics can be described by general sparse and dense matrix computations present most large scale simulations including in fields such as: Computational Fluid Dynamics, Computational Astrophysics or Computational Finance. Resource Characterization
  • 13. • Two major categories of Cloud Simulation tools: • Discrete Event Based (DES): Avoid building and processing small simulation objects (like packets). Instead, the effect of object interaction is captured. (Better performance, Less accuracy). • Packet Level: Whenever a data message has to be transmitted between simulator entities a packet structure with its protocol headers is allocated in the memory and all the associated protocol processing is performed. • The is no “Holy Grail” for Cloud Simulation. • Various simulation tools can be used to simulate a cloud environment (CloudAnalyst (DES), CloudSched (DES), CloudSim (DES), DCSim (DES), GDCSim, GreenCloud (Packet Level), iCanCloud (DES), NetworkCloudSim) Simulation Tools
  • 14. Simulation Tools GUI Virtualization Scheduling Network Models Energy Models Parallel Experiments CloudAnalyst ✓ ✓ ✓ Limited (latency) ✗ ✗ CloudSim (original version) ✗ ✓ ✓ Limited (latency) Limited ✗ CloudSched ✓ ✓ ✓ Limited ✓ ✗ DCSim ✗ ✓ ✓ ✗ ✓ ✗ GDCSim ✗ ✓ ✓ ✗ ✓ ✗ GreenCloud ✗ ✗ Limited ✓ ✓ ✗ iCanCloud ✓ ✓ ✓ ✓ ✗ ✓ NetworkCloudSim (integrated into CloudSim) ✗ ✓ ✓ ✓ (latency & bandwidth) Limited ✓
  • 15. • The CL system has various components and services not present in todays Cloud systems, thus classical simulation environments cannot be used. • These components and services include: Coalition formation mechanism, vRack, vRack Manager, complex network communications, etc. • Moreover, the different strategies for coalition formation, i.e static or dynamic, impact the system and affect the complexity of the simulation. • Simulating the CL system requires the simulator to be inherently parallel since the increased complexity of the system results in increased computational work. • CL system targets three distinct use cases that can be modeled by the aforementioned dimensionless indexes. Elements of Simulation
  • 16. • In order to accurately simulate a CL based cloud environment the time scale should be defined. A more general approach will be to discretize time in terms of a time unit (tu). For simplicity let us consider 1 tu=1 second. • A task can be simulated as follows: Task { Number of Computational Units: W Required Hardware: X Application: Y Time Units: Z } • Each task occupies W hardware instances of computational capability X for an application Y and for Z time units. • Required Hardware defines what type(s) of hardware is required and Y is the type of application. Elements of Simulation
  • 17. • Similarly a coalition of capable hardware can be described by the following parameters: Coalition { Number of Computational Units: CU Compute Capability: CC Interconnection: I Storage Interconnection: SI Power Consumption: P Cost: CO Server Initialization: S } • Compute Capability is a dimensionless index computed with the procedure presented in the characterization of hardware. Power Consumption and Cost are dimensionless numbers measured with respect to a baseline. • Interconnection and Storage Interconnection is measured in (Gbps). • Server initialization is measured in (tu). Elements of Simulation
  • 18. • The hardware is engaged for the number of time units prescribed by a task (Z). An amount of time (S) is required for initializing the hardware in case it was on a sleep state. Moreover, before the Task begins execution, initialization time to a functional state is required: transfer of software images, installation of appropriate libraries, mounting of storage, etc. Thus, the real time a Task requires is: • Time of Execution (tu) = S+Z+α+β • where α is the time required by the hardware to be functional and β the time required by the hardware to return to idle waiting for the next Task. • α mostly depends on the speed of the network as well as on the number of users (N) occupying the same storage server simultaneously. • α (tu) = (Size of the image (GB))/(SI(Gbps)*tu/(N*8*109)). • β is the time required to free the resource and return it to an idle state. • Finally for the network a linear model is considered: • delay=latency+size (GB)/ BW (GBps) Elements of Simulation
  • 19. • Finally, a custom simulation framework will be used since CL is a unique Cloud Computing environment. • The presented simulation entities are enhanced in order to describe different types of deployment such as containers and bare metal images. • The simulator must be able to cope with static and dynamic coalition choices as well as update the list of static coalitions based on users’ choices. • Auto-scaling model should be introduced in the simulation for the three use cases. • This parallel hybrid DES-packet level simulation scheme is expected to describe adequately the CL system. Elements of Simulation and Future Work
  • 20. THANK YOU Dr. Christos Papadopoulos-Filelis cpapad@ee.duth.gr