Risk Assessment Based Cloudification

Budapest University of Technology and Economics
Department of Measurement and Information Systems
Risk assessment based cloudification
Szilárd Bozóki
Gábor Koronka
Supervisor: Prof. András Pataricza
Department of Measurement and Information Systems

Cloud computing around the globe
2
Source: Cisco Global Cloud Index: Forecast and Methodology 2013–2018

Mission critical cloud computing
 August 31, 2015
o Federal aviation administration
o 108$ million now, $1 billion in 10 years
o Source: CSC news
 Network Functions Virtualization
o The „telco” cloud
o Source: NFV
3

Problem statement
 Moving to the cloud, cloudification:
„How large pool of VMs is needed?”
o It depends…
• SLA?
• Application and platform characteristics
 Aim: mission critical and high value applications
 Approach:
o Modeling distinctive cloud features
o Revisit decades old modeling and fault tolerance
techniques leveraging distinctive cloud features
4

From dependability to risk
 A definition of dependability
o “the ability of a system to avoid service failures that are
more frequent or more severe than is acceptable.”
 Core idea:
Mission insurance ? Mission incident probability * value of
asset
e.g. 3$ < > = 50% * 10$
expected value of loss (risk): 5$
while insurance < expected value of loss  good insurance
 insurance  investment in redundancy
o diminishing returns
5
?

Risk interpretation
 Risk the expected value of loss
due to potential service outages
o Failure frequency
• Cloud fault model
– Fault tree, or reliability block diagram
o Severity
• Cost of application downtime
 Risk mitigation
o Design for resiliency, cloudification
o Reduction of failure frequency or severity
6

Critical services over ordinary clouds?
 Environment
o HW/SW stack
o Cloud service models
 Research objective
o Carrier grade IaaS
o Fast resiliency (SDN)
o Redundancy architectural
pattern
• HW,VM, App, else
o Measures
• Availability (downtime)
• Cost
7
HW 3
HW 2
HW 1
PaaSprovider
SaaSprovider
Hardware
VMM
(Hypervisor)
+ optional
Host OS
VM
Guest OS
Container
Application
IaaSprovider
VM 3
VM 2
VM 1
APP 3
APP 2
APP 1

Physical cloud model
 A distinctive cloud feature to begin with.
8
VM
Data Center
Client
Internet
HW- Cluster
HW- Server
Availability Zone

Related work
 SLA requirement from provider view
o Provider manages and optimizes the SLA portfolio
 Concrete application specific cloud user view
o Concentrates on queuing and scheduler based job
execution service
 Deploying a scientific grid on a private cloud
o Profitability analysis
 Our focus: user view on an IaaS public cloud
9

Cost resilience trade-off
 In order to reduce total cost, how many
redundant virtual machines are needed?
 Risk
o Failure frequency
• Cloud fault model based on the physical layout of IaaS
o Severity
• Cost of application downtime (SLA)
 Risk mitigation 
failure frequency reduction 
redundant VMs (hot-running)
Cost overhead of VMs (VM price/hour)
10

 Abstract cloud model based on the physical cloud
Abstract cloud model
11
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM 𝒇 𝟏(𝒕) 𝒇 𝟐(𝒕) 𝒇 𝟑(𝒕) 𝒇 𝟒(𝒕)
𝒈 𝟏(𝒕) 𝒈 𝟐(𝒕) 𝒈 𝟑(𝒕)
𝒉 𝟏(𝒕) 𝒉 𝟐(𝒕)

Top level fault tree - basic model
12
Path down
AV zone
down
VM down
Region
down
System Down
All paths down
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM

N+7 replication based redundancy- basic model
13
AVZone
1
fail
VM 1
fail
Region
1
fail
VM 2
fail
All VMs fail
AVzone 1
path fail
AVZone
2
fail
VM 1
fail
VM 2
fail
All VMs fail
AVzone 2
path fail
Total AVzone path
fail
Region 1 path
fail
AVZone
1
fail
VM 1
fail
Region
2
fail
VM 2
fail
All VMs fail
AVzone 1
path fail
AVZone
2
fail
VM 1
fail
VM 2
fail
All VMs fail
AVzone 2
path fail
Total AVzone path
fail
Region 2 path
fail
System failure
All paths down
Level1 Client
Level2 Region
Level 3 AV zone
Level 4 VM

Variance of resource quality
 A distinctive cloud feature with impact.
 If any resource underperforms  VM down
o Interference
o Noisy neighbors
14
VM down
Compute
down
Network
down
VM down
Level 4 VM

Network resource quality variation
15
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A.
“Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012

Compute resource quality variation
16
Source: Gorbenko, A., Kharchenko, V., Mamutov, S., Tarasyuk, O., Romanovsky, A.
“Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time.” 2012

Results – availability (number of 9s)
17
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
Numberof9s
Number of VMs
A basic
A extended

Results – availability (number of 9s)
18
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
Numberof9s
Number of VMs
A basic
A extended
B basic
B extended
C basic
C extended

Results – annual expected total cost
19
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
0 1 2 3 4 5 6 7
Annualtotalcost
Number of VMs
A basic model
A extended

Results – annual expected total cost
20
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
10,000,000.00
100,000,000.00
0 1 2 3 4 5 6 7
Annualtotalcost
Number of VMs
A basic model
A extended
B basic
B extended
C basic
C extended

Concluding remarks
 Aim: mission critical and high value applications
 Modeling distinctive cloud features
 Risk based interpretation of dependability
 Fault tree based on the physical cloud
 Revisit decades old modeling and fault tolerance techniques
leveraging distinctive cloud features
 Redundant VMs on the cloud
 Numerical analysis
 Redundant VMs  N+(4-6VMs)  extra availability 
reduction of risk  reduction of total cost
 Future work
o Model refinement
o Data acquisition, measurement
o Dynamic replication for critical mission phases
21

Risk Assessment Based Cloudification

More Related Content

What's hot (17)

Viewers also liked (17)

Similar to Risk Assessment Based Cloudification (20)

Recently uploaded (20)

Risk Assessment Based Cloudification