SlideShare a Scribd company logo
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Risk assessment based cloudification
Szilárd Bozóki
Gábor Koronka
Supervisor: Prof. András Pataricza
Department of Measurement and Information Systems
Cloud computing around the globe
2
Source: Cisco Global Cloud Index: Forecast and Methodology 2013–2018
Mission critical cloud computing
 August 31, 2015
o Federal aviation administration
o 108$ million now, $1 billion in 10 years
o Source: CSC news
 Network Functions Virtualization
o The „telco” cloud
o Source: NFV
3
Problem statement
 Moving to the cloud, cloudification:
„How large pool of VMs is needed?”
o It depends…
• SLA?
• Application and platform characteristics
 Aim: mission critical and high value applications
 Approach:
o Modeling distinctive cloud features
o Revisit decades old modeling and fault tolerance
techniques leveraging distinctive cloud features
4
From dependability to risk
 A definition of dependability
o “the ability of a system to avoid service failures that are
more frequent or more severe than is acceptable.”
 Core idea:
Mission insurance ? Mission incident probability * value of
asset
e.g. 3$ < > = 50% * 10$
expected value of loss (risk): 5$
while insurance < expected value of loss  good insurance
 insurance  investment in redundancy
o diminishing returns
5
?
Risk interpretation
 Risk the expected value of loss
due to potential service outages
o Failure frequency
• Cloud fault model
– Fault tree, or reliability block diagram
o Severity
• Cost of application downtime
 Risk mitigation
o Design for resiliency, cloudification
o Reduction of failure frequency or severity
6
Critical services over ordinary clouds?
 Environment
o HW/SW stack
o Cloud service models
 Research objective
o Carrier grade IaaS
o Fast resiliency (SDN)
o Redundancy architectural
pattern
• HW,VM, App, else
o Measures
• Availability (downtime)
• Cost
7
HW 3
HW 2
HW 1
PaaSprovider
SaaSprovider
Hardware
VMM
(Hypervisor)
+ optional
Host OS
VM
Guest OS
Container
Application
IaaSprovider
VM 3
VM 2
VM 1
APP 3
APP 2
APP 1
Physical cloud model
 A distinctive cloud feature to begin with.
8
VM
Data Center
Client
Internet
HW- Cluster
HW- Server
Availability Zone
Related work
 SLA requirement from provider view
o Provider manages and optimizes the SLA portfolio
 Concrete application specific cloud user view
o Concentrates on queuing and scheduler based job
execution service
 Deploying a scientific grid on a private cloud
o Profitability analysis
 Our focus: user view on an IaaS public cloud
9
Cost resilience trade-off
 In order to reduce total cost, how many
redundant virtual machines are needed?
 Risk
o Failure frequency
• Cloud fault model based on the physical layout of IaaS
o Severity
• Cost of application downtime (SLA)
 Risk mitigation 
failure frequency reduction 
redundant VMs (hot-running)
Cost overhead of VMs (VM price/hour)
10
 Abstract cloud model based on the physical cloud
Abstract cloud model
11
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM 𝒇 𝟏(𝒕) 𝒇 𝟐(𝒕) 𝒇 𝟑(𝒕) 𝒇 𝟒(𝒕)
𝒈 𝟏(𝒕) 𝒈 𝟐(𝒕) 𝒈 𝟑(𝒕)
𝒉 𝟏(𝒕) 𝒉 𝟐(𝒕)
Top level fault tree - basic model
12
Path down
AV zone
down
VM down
Region
down
System Down
All paths down
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM
N+7 replication based redundancy- basic model
13
AVZone
1
fail
VM 1
fail
Region
1
fail
VM 2
fail
All VMs fail
AVzone 1
path fail
AVZone
2
fail
VM 1
fail
VM 2
fail
All VMs fail
AVzone 2
path fail
Total AVzone path
fail
Region 1 path
fail
AVZone
1
fail
VM 1
fail
Region
2
fail
VM 2
fail
All VMs fail
AVzone 1
path fail
AVZone
2
fail
VM 1
fail
VM 2
fail
All VMs fail
AVzone 2
path fail
Total AVzone path
fail
Region 2 path
fail
System failure
All paths down
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM
Variance of resource quality
 A distinctive cloud feature with impact.
 If any resource underperforms  VM down
o Interference
o Noisy neighbors
14
VM down
Compute
down
Network
down
VM down
Level 4 VM
Network resource quality variation
15
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A.
“Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
Compute resource quality variation
16
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A.
“Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
Results – availability (number of 9s)
17
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
Numberof9s
Number of VMs
A basic
A extended
Results – availability (number of 9s)
18
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
Numberof9s
Number of VMs
A basic
A extended
B basic
B extended
C basic
C extended
Results – annual expected total cost
19
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
0 1 2 3 4 5 6 7
Annualtotalcost
Number of VMs
A basic model
A extended
Results – annual expected total cost
20
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
0 1 2 3 4 5 6 7
Annualtotalcost
Number of VMs
A basic model
A extended
B basic
B extended
C basic
C extended
Concluding remarks
 Aim: mission critical and high value applications
 Modeling distinctive cloud features
 Risk based interpretation of dependability
 Fault tree based on the physical cloud
 Revisit decades old modeling and fault tolerance techniques
leveraging distinctive cloud features
 Redundant VMs on the cloud
 Numerical analysis
 Redundant VMs  N+(4-6VMs)  extra availability 
reduction of risk  reduction of total cost
 Future work
o Model refinement
o Data acquisition, measurement
o Dynamic replication for critical mission phases
21

More Related Content

PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PPTX
Engineering Cross-Layer Fault Tolerance in Many-Core Systems
PDF
SERENE 2014 School: Daniel varro serene2014_school
PDF
Towards Robust and Safe Autonomous Drones
PDF
SERENE 2014 School: Andras pataricza serene2014_school
PDF
Available technologies: algorithm for flexible bandwidth reservations for dat...
PPT
Semantics in Sensor Networks
PDF
European quantum computing roadmap uploaded by Skip Sanzeri
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
Engineering Cross-Layer Fault Tolerance in Many-Core Systems
SERENE 2014 School: Daniel varro serene2014_school
Towards Robust and Safe Autonomous Drones
SERENE 2014 School: Andras pataricza serene2014_school
Available technologies: algorithm for flexible bandwidth reservations for dat...
Semantics in Sensor Networks
European quantum computing roadmap uploaded by Skip Sanzeri

What's hot (17)

PDF
Machine learning for 5G and beyond
 
PPTX
Visual Sensor Network & Coverage Issue
DOCX
Generic technology ieee projects titles
PDF
Machine Learning for Weather Forecasts
PDF
Ieeepro techno solutions 2013 ieee embedded project design of a wsn platfor...
PDF
M.E Computer Science Secure Computing Projects
PDF
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
PDF
M phil-computer-science-secure-computing-projects
PDF
M.Phil Computer Science Secure Computing Projects
DOC
Next Technology Wave
PPTX
Design Resources for Small UAVs (Drones) - May 2018 - Dave Litwiller
PPTX
CEM and Radar Cross Section @ Zeus Numerix
PPTX
Wireless sensor network
PPT
A smart Multi-hop hierarchical routing protocol for Efficient VIdeo communica...
PDF
Front page
PDF
Reinventing the Share Button for Physical Spaces
DOCX
An optimization framework for mobile data collection in energy harvesting wir...
Machine learning for 5G and beyond
 
Visual Sensor Network & Coverage Issue
Generic technology ieee projects titles
Machine Learning for Weather Forecasts
Ieeepro techno solutions 2013 ieee embedded project design of a wsn platfor...
M.E Computer Science Secure Computing Projects
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
M phil-computer-science-secure-computing-projects
M.Phil Computer Science Secure Computing Projects
Next Technology Wave
Design Resources for Small UAVs (Drones) - May 2018 - Dave Litwiller
CEM and Radar Cross Section @ Zeus Numerix
Wireless sensor network
A smart Multi-hop hierarchical routing protocol for Efficient VIdeo communica...
Front page
Reinventing the Share Button for Physical Spaces
An optimization framework for mobile data collection in energy harvesting wir...
Ad

Viewers also liked (17)

PDF
Biological Immunity and Software Resilience: Two Faces of the Same Coin?
PDF
Considering Execution Environment Resilience: A White-Box Approach
PDF
SERENE 2014 School: System management overview
PDF
SERENE 2014 School: System-Level Concurrent Error Detection
PPTX
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
SERENE 2014 School: Incremental Model Queries over the Cloud
PDF
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
PDF
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
PDF
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
PDF
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
PDF
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
PDF
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
PDF
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
PDF
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
PDF
SERENE 2014 School: Challenges in Cyber-Physical Systems
PDF
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
Biological Immunity and Software Resilience: Two Faces of the Same Coin?
Considering Execution Environment Resilience: A White-Box Approach
SERENE 2014 School: System management overview
SERENE 2014 School: System-Level Concurrent Error Detection
Hot Stand-By Disaster Recovery Solutions for Ensuring the Resilience of Railw...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 Workshop: Paper "Advanced Modelling, Simulation and Verification ...
SERENE 2014 Workshop: Paper "Combined Error Propagation Analysis and Runtime ...
SERENE 2014 Workshop: Paper "Adaptive Domain-Specific Service Monitoring"
SERENE 2014 Workshop: Paper "Simulation Testing and Model Checking: A Case St...
SERENE 2014 Workshop: Paper "Using Instrumentation for Quality Assessment of ...
SERENE 2014 Workshop: Paper "Verification and Validation of a Pressure Contro...
SERENE 2014 Workshop: Panel on "Views on Runtime Resilience Assessment of Dyn...
SERENE 2014 Workshop: Paper "Formal Fault Tolerance Analysis of Algorithms fo...
SERENE 2014 School: Challenges in Cyber-Physical Systems
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
Ad

Similar to Risk Assessment Based Cloudification (20)

PDF
6620handout5o
DOCX
Ruminations on Cloud / Microservices / DevOps
PDF
IEEE 2014 NS2 Projects
PDF
IEEE 2014 NS2 Projects
PDF
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
PPTX
todd-ncts-2011-110828224616-phpapp02 (1).pptx
PDF
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
PPT
cloud computing
PPTX
Mastering Chaos - A Netflix Guide to Microservices
PDF
6620handout5t
PPT
Cloud Computing: Architecture, IT Security and Operational Perspectives
PDF
Tech Talks On Site- Edição de Maio- AutoScaling
PDF
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
PDF
Handout1o
PDF
Detecting Hacks: Anomaly Detection on Networking Data
PDF
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
PDF
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
PPTX
D-STREAMON - NFV-capable distributed framework for network monitoring
PPTX
Detecting Hacks: Anomaly Detection on Networking Data
PDF
O Outro Lado BSidesSP Ed. 5 - As Nove Principais Ameaças na Computação em Nuvem
6620handout5o
Ruminations on Cloud / Microservices / DevOps
IEEE 2014 NS2 Projects
IEEE 2014 NS2 Projects
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
todd-ncts-2011-110828224616-phpapp02 (1).pptx
QConSF2016-JoshEvans-MasteringChaosANetflixGuidetoMicroservices-compressed.pdf
cloud computing
Mastering Chaos - A Netflix Guide to Microservices
6620handout5t
Cloud Computing: Architecture, IT Security and Operational Perspectives
Tech Talks On Site- Edição de Maio- AutoScaling
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
Handout1o
Detecting Hacks: Anomaly Detection on Networking Data
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
D-STREAMON - NFV-capable distributed framework for network monitoring
Detecting Hacks: Anomaly Detection on Networking Data
O Outro Lado BSidesSP Ed. 5 - As Nove Principais Ameaças na Computação em Nuvem

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Pharmacology of Autonomic nervous system
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Microbiology with diagram medical studies .pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
famous lake in india and its disturibution and importance
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Sciences of Europe No 170 (2025)
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet earth and life
Classification Systems_TAXONOMY_SCIENCE8.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
POSITIONING IN OPERATION THEATRE ROOM.ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
The KM-GBF monitoring framework – status & key messages.pptx
The scientific heritage No 166 (166) (2025)
Pharmacology of Autonomic nervous system
ECG_Course_Presentation د.محمد صقران ppt
Microbiology with diagram medical studies .pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
. Radiology Case Scenariosssssssssssssss
famous lake in india and its disturibution and importance
neck nodes and dissection types and lymph nodes levels
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Sciences of Europe No 170 (2025)
6.1 High Risk New Born. Padetric health ppt
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx

Risk Assessment Based Cloudification

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Risk assessment based cloudification Szilárd Bozóki Gábor Koronka Supervisor: Prof. András Pataricza Department of Measurement and Information Systems
  • 2. Cloud computing around the globe 2 Source: Cisco Global Cloud Index: Forecast and Methodology 2013–2018
  • 3. Mission critical cloud computing  August 31, 2015 o Federal aviation administration o 108$ million now, $1 billion in 10 years o Source: CSC news  Network Functions Virtualization o The „telco” cloud o Source: NFV 3
  • 4. Problem statement  Moving to the cloud, cloudification: „How large pool of VMs is needed?” o It depends… • SLA? • Application and platform characteristics  Aim: mission critical and high value applications  Approach: o Modeling distinctive cloud features o Revisit decades old modeling and fault tolerance techniques leveraging distinctive cloud features 4
  • 5. From dependability to risk  A definition of dependability o “the ability of a system to avoid service failures that are more frequent or more severe than is acceptable.”  Core idea: Mission insurance ? Mission incident probability * value of asset e.g. 3$ < > = 50% * 10$ expected value of loss (risk): 5$ while insurance < expected value of loss  good insurance  insurance  investment in redundancy o diminishing returns 5 ?
  • 6. Risk interpretation  Risk the expected value of loss due to potential service outages o Failure frequency • Cloud fault model – Fault tree, or reliability block diagram o Severity • Cost of application downtime  Risk mitigation o Design for resiliency, cloudification o Reduction of failure frequency or severity 6
  • 7. Critical services over ordinary clouds?  Environment o HW/SW stack o Cloud service models  Research objective o Carrier grade IaaS o Fast resiliency (SDN) o Redundancy architectural pattern • HW,VM, App, else o Measures • Availability (downtime) • Cost 7 HW 3 HW 2 HW 1 PaaSprovider SaaSprovider Hardware VMM (Hypervisor) + optional Host OS VM Guest OS Container Application IaaSprovider VM 3 VM 2 VM 1 APP 3 APP 2 APP 1
  • 8. Physical cloud model  A distinctive cloud feature to begin with. 8 VM Data Center Client Internet HW- Cluster HW- Server Availability Zone
  • 9. Related work  SLA requirement from provider view o Provider manages and optimizes the SLA portfolio  Concrete application specific cloud user view o Concentrates on queuing and scheduler based job execution service  Deploying a scientific grid on a private cloud o Profitability analysis  Our focus: user view on an IaaS public cloud 9
  • 10. Cost resilience trade-off  In order to reduce total cost, how many redundant virtual machines are needed?  Risk o Failure frequency • Cloud fault model based on the physical layout of IaaS o Severity • Cost of application downtime (SLA)  Risk mitigation  failure frequency reduction  redundant VMs (hot-running) Cost overhead of VMs (VM price/hour) 10
  • 11.  Abstract cloud model based on the physical cloud Abstract cloud model 11 Level1 Client Level2 Region Level 3 AV zone Level 4 VM 𝒇 𝟏(𝒕) 𝒇 𝟐(𝒕) 𝒇 𝟑(𝒕) 𝒇 𝟒(𝒕) 𝒈 𝟏(𝒕) 𝒈 𝟐(𝒕) 𝒈 𝟑(𝒕) 𝒉 𝟏(𝒕) 𝒉 𝟐(𝒕)
  • 12. Top level fault tree - basic model 12 Path down AV zone down VM down Region down System Down All paths down Level1 Client Level2 Region Level 3 AV zone Level 4 VM
  • 13. N+7 replication based redundancy- basic model 13 AVZone 1 fail VM 1 fail Region 1 fail VM 2 fail All VMs fail AVzone 1 path fail AVZone 2 fail VM 1 fail VM 2 fail All VMs fail AVzone 2 path fail Total AVzone path fail Region 1 path fail AVZone 1 fail VM 1 fail Region 2 fail VM 2 fail All VMs fail AVzone 1 path fail AVZone 2 fail VM 1 fail VM 2 fail All VMs fail AVzone 2 path fail Total AVzone path fail Region 2 path fail System failure All paths down Level1 Client Level2 Region Level 3 AV zone Level 4 VM
  • 14. Variance of resource quality  A distinctive cloud feature with impact.  If any resource underperforms  VM down o Interference o Noisy neighbors 14 VM down Compute down Network down VM down Level 4 VM
  • 15. Network resource quality variation 15 Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A. “Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
  • 16. Compute resource quality variation 16 Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A. “Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012
  • 17. Results – availability (number of 9s) 17 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 Numberof9s Number of VMs A basic A extended
  • 18. Results – availability (number of 9s) 18 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 Numberof9s Number of VMs A basic A extended B basic B extended C basic C extended
  • 19. Results – annual expected total cost 19 1.00 10.00 100.00 1,000.00 10,000.00 100,000.00 1,000,000.00 10,000,000.00 100,000,000.00 0 1 2 3 4 5 6 7 Annualtotalcost Number of VMs A basic model A extended
  • 20. Results – annual expected total cost 20 1.00 10.00 100.00 1,000.00 10,000.00 100,000.00 1,000,000.00 10,000,000.00 100,000,000.00 0 1 2 3 4 5 6 7 Annualtotalcost Number of VMs A basic model A extended B basic B extended C basic C extended
  • 21. Concluding remarks  Aim: mission critical and high value applications  Modeling distinctive cloud features  Risk based interpretation of dependability  Fault tree based on the physical cloud  Revisit decades old modeling and fault tolerance techniques leveraging distinctive cloud features  Redundant VMs on the cloud  Numerical analysis  Redundant VMs  N+(4-6VMs)  extra availability  reduction of risk  reduction of total cost  Future work o Model refinement o Data acquisition, measurement o Dynamic replication for critical mission phases 21