Self-Learning Cloud Controllers
Pooyan Jamshidi, Amir Sharifloo, Claus Pahl,
Andreas Metzger, Giovani Estrada
IC4, Dublin City University, Ireland
University of Duisburg-Essen, Germany
Intel, Ireland
pooyan.jamshidi@computing.dcu.ie
Invited presentation at
UFC, Brazil, Fortaleza
~50% = wasted hardware
Actual
traffic
Typical weekly traffic to Web-based applications (e.g., Amazon.com)
Problem 1: ~75% wasted capacity
Actual
demand
Problem 2:
customer lost
Traffic in an unexpected burst in requests (e.g. end of year
traffic to Amazon.com)
Really like this??
Auto-scaling enables you to realize this ideal on-demand provisioning
Time	
  
Demand	
  
?	
  
Enacting change in the
Cloud resources are not
real-time
Capacity we can provision
with Auto-Scaling
A realistic figure of dynamic provisioning
Self learning cloud controllers
0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
Self learning cloud controllers
These quantitative
values are required to be
determined by the user
⇒  requires deep
knowledge of
application (CPU,
memory, thresholds)
⇒  requires performance
modeling expertise
(when and how to
scale)
⇒  A unified opinion of
user(s) is required
Amazon auto scaling
Microsoft Azure Watch
9	
  
Microsoft Azure Auto-
scaling Application Block
Self learning cloud controllers
Naeem Esfahani and Sam Malek,
“Uncertainty in Self-Adaptive
Software Systems”
Pooyan Jamshidi, Aakash Ahmad,
Claus Pahl, Muhammad Ali Babar
, “Sources of Uncertainty in Dynamic
Management of Elastic Systems”,
Under Review
Uncertainty related to enactment latency:
The same scaling action (adding/removing
a VM with precisely the same size) took
different time to be enacted on the
cloud platform (here is Microsoft Azure)
at different points and
this difference were significant
(up to couple of minutes).
The enactment latency would be also different
on different cloud platforms.
Ø Offline benchmarking
Ø Trial-and-error
Ø Expert knowledge
Costly and
not systematic
A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive,
“Model-driven Autoscaling for Cloud Applications”, ICAC’14
arrival	
  rate	
  (req/s)	
  
95%	
  Resp.	
  :me	
  (ms)	
  
400	
  ms	
  	
  
60	
  req/s	
  
RobusT2Scale
Initial setting +
elasticity rules +
response-time SLA
environment
monitoring
application
monitoring
scaling
actions
Fuzzy Reasoning
Users
Prediction/
Smoothing
Self learning cloud controllers
  	
   	
  0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Region	
  of	
  
definite	
  
satisfaction	
  
Region	
  of	
  
definite	
  
dissatisfaction	
  Region	
  of	
  
uncertain	
  
satisfaction	
  
Performance Index
Possibility
Performance Index
Possibility
words can mean different
things to different people
Different users often
recommend
different elasticity policies
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Type-2 MF
Type-1 MF
Self learning cloud controllers
Workload
Response time
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uMembershipgrade
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
uMembershipgrade
=>
=>
UMF
LMF
Embedded
FOU
mean
sd
Rule	
  
(𝒍)	
  
Antecedents	
   Consequent	
  
𝒄 𝒂𝒗𝒈
𝒍 	
  
Workload	
  
Response-­‐
time	
  
Normal	
  
(-­‐2)	
  
Effort	
  
(-­‐1)	
  
Medium	
  
Effort	
  
(0)	
  
High	
  
Effort	
  
(+1)	
  
Maximum	
  
Effort	
  (+2)	
  
1	
   Very	
  low	
   Instantaneous	
   7	
   2	
   1	
   0	
   0	
   -­‐1.6	
  
2	
   Very	
  low	
   Fast	
   5	
   4	
   1	
   0	
   0	
   -­‐1.4	
  
3	
   Very	
  low	
   Medium	
   0	
   2	
   6	
   2	
   0	
   0	
  
4	
   Very	
  low	
   Slow	
   0	
   0	
   4	
   6	
   0	
   0.6	
  
5	
   Very	
  low	
   Very	
  slow	
   0	
   0	
   0	
   6	
   4	
   1.4	
  
6	
   Low	
   Instantaneous	
   5	
   3	
   2	
   0	
   0	
   -­‐1.3	
  
7	
   Low	
   Fast	
   2	
   7	
   1	
   0	
   0	
   -­‐1.1	
  
8	
   Low	
   Medium	
   0	
   1	
   5	
   3	
   1	
   0.4	
  
9	
   Low	
   Slow	
   0	
   0	
   1	
   8	
   1	
   1	
  
10	
   Low	
   Very	
  slow	
   0	
   0	
   0	
   4	
   6	
   1.6	
  
11	
   Medium	
   Instantaneous	
   6	
   4	
   0	
   0	
   0	
   -­‐1.6	
  
12	
   Medium	
   Fast	
   2	
   5	
   3	
   0	
   0	
   -­‐0.9	
  
13	
   Medium	
   Medium	
   0	
   0	
   5	
   4	
   1	
   0.6	
  
14	
   Medium	
   Slow	
   0	
   0	
   1	
   7	
   2	
   1.1	
  
15	
   Medium	
   Very	
  slow	
   0	
   0	
   1	
   3	
   6	
   1.5	
  
16	
   High	
   Instantaneous	
   8	
   2	
   0	
   0	
   0	
   -­‐1.8	
  
17	
   High	
   Fast	
   4	
   6	
   0	
   0	
   0	
   -­‐1.4	
  
18	
   High	
   Medium	
   0	
   1	
   5	
   3	
   1	
   0.4	
  
19	
   High	
   Slow	
   0	
   0	
   1	
   7	
   2	
   1.1	
  
20	
   High	
   Very	
  slow	
   0	
   0	
   0	
   6	
   4	
   1.4	
  
21	
   Very	
  high	
   Instantaneous	
   9	
   1	
   0	
   0	
   0	
   -­‐1.9	
  
22	
   Very	
  high	
   Fast	
   3	
   6	
   1	
   0	
   0	
   -­‐1.2	
  
23	
   Very	
  high	
   Medium	
   0	
   1	
   4	
   4	
   1	
   0.5	
  
24	
   Very	
  high	
   Slow	
   0	
   0	
   1	
   8	
   1	
   1	
  
25	
   Very	
  high	
   Very	
  slow	
   0	
   0	
   0	
   4	
   6	
   1.6	
  
Rule
()	
  
Antecedents	
   Consequent	
  
Work
load	
  
Respons
e
-time	
  
-2	
   -1	
   0 +1	
   +2	
  
12	
   Medium	
   Fast	
   2	
   5	
   3 0	
   0	
   -0.9	
  
10 experts’ responses
​ 𝑅↑𝑙 : IF (the workload (​ 𝑥↓1 ) is ​​ 𝐹 ↓​ 𝑖↓1  , AND the response-
time (​ 𝑥↓2 ) is ​​ 𝐺 ↓​ 𝑖↓2  ), THEN (add/remove ​ 𝑐↓𝑎𝑣𝑔↑𝑙 
instances).
​ 𝑐↓𝑎𝑣𝑔↑𝑙 =​∑ 𝑢=1↑​ 𝑁↓𝑙 ▒​ 𝑤↓𝑢↑𝑙 × 𝐶 /∑𝑢=1↑​ 𝑁↓
Goal: pre-computations of costly calculations to
make a runtime efficient elasticity reasoning based on
fuzzy inference
Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic
systems: theory and design. Fuzzy Systems, IEEE Transactions
on, 8(5), 535-550.
Scaling Actions
Monitoring Data
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
𝑀
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2212
0.0000
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
u
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
u Monitoring data
Workload
Response time
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.9568
0.9377
Self learning cloud controllers
Performance index
​ 𝑦↓𝑙 ,​ 𝑦↓𝑟 
Self learning cloud controllers
0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
0 50 100
0
500
1000
0 50 100
0
500
1000
0 10 20 30 40 50 60 70 80 90 100
-500
0
500
1000
1500
2000
Time (seconds)
Numberofhits
Original data
betta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703
betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504
betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase
0 50 100
0
500
1000
1500
0 50 100
100
200
300
400
500
0 50 100
0
1000
2000
0 50 100
0
200
400
600
0 50 100
0
500
1000
0 50 100
0
500
1000
RootRelativeSquaredError
SUT Criteria
Big
spike
Dual
phase
Large
variations
Quickly
varying
Slowly
varying
Steep tri
phase
RobusT2Scale
973ms 537ms 509ms 451ms 423ms 498ms
3.2 3.8 5.1 5.3 3.7 3.9
Overprovisioni
ng
354ms 411ms 395ms 446ms 371ms 491ms
6 6 6 6 6 6
Under
provisioning
1465ms 1832ms 1789ms 1594ms 1898ms 2194ms
2 2 2 2 2 2
SLA: ​ 𝒓 𝒕↓ 𝟗𝟓 ≤𝟔𝟎𝟎 𝒎𝒔
For every 10s control interval
• RobusT2Scale is superior to under-provisioning in terms of
guaranteeing the SLA and does not require excessive resources
• RobusT2Scale is superior to over-provisioning in terms of
guaranteeing required resources while guaranteeing the SLA
0
0.02
0.04
0.06
0.08
0.1
alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0
RootMeanSquareError
Noise level: 10%
Self-Learning Controller
Current Solution
(my PhD Thesis
+ IC4 work on
auto-scaling)
Future Plan
Updating K
in MAPE-K
@ RuntimeDesign-time
Assistance
Multi-cloud
Fuzzifier
Inference
Engine
Defuzzifier
Rule
base
Fuzzy
Q-learning
Cloud ApplicationMonitoring Actuator
Cloud Platform
Fuzzy Logic
Controller
Knowledge Learning
AutonomicController
𝑟𝑡
𝑤𝑤,  𝑟𝑡,  𝑡ℎ,  𝑣𝑚 𝑠𝑎
system state system goal
RobusT2Scal
e
Learned rules
FQL
Monitoring Actuator
Cloud Platform
.fis
L
W
W
ElasticBench
𝑤,   𝑟𝑡
𝑤,   𝑟𝑡,  
   𝑡ℎ,   𝑣𝑚
𝑠𝑎
Load
Generator
C
system state
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟
Cloud Platform (PaaS)On-Premise
P:
Worker
Role
L: Web
Role
P:
Worker
Role
P:
Worker
Role
Cache
M:
Worker
Role
Results
:
Storag
e
Blackboard
: Storage
LG:
Console
Auto-scaling
Logic (controller)
Policy Enforcer
1112
10
LB:
Load
Balance
r
Queue
Actuator
Monitoring
0
0.2
0.4
0.6
0.8
1
1.2
0
4
12
15
21
26
33
39
47
53
60
68
76
83
90
95
103
110
118
124
130
136
145
152
159
166
171
179
185
193
200
206
214
218
219
228
236
242
249
255
263
267
271
275
283
290
295
303
307
314
320
S1 S2 S3 S4 S5
0
0.5
1
1.5
2
2.5
0 50 100 150 200 250 300 350
q(9,5)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 50 100 150 200 250 300 350
q(7,5)
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300 350
q(9,3)
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 50 100 150 200 250 300 350
q(3,3)
0
1
2
3
4
5
6
7
8
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
1
2
3
4
Challenge	
  1:	
  ~75%	
  wasted	
  capacity
A c t u a l  
d e m a n d
Challenge	
  2:	
  
customer	
  lost
Fuzzifier
Inference	
  
Engine
Defuzzifier
Rule	
  
base
Fuzzy
Q-­‐learning
Cloud	
  ApplicationMonitoring Actuator
Cloud	
  Platform
Fuzzy	
  Logic	
  
Controller
Knowledge	
  Learning
Autonomic	
  Controller
𝑟𝑡
𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
system	
  state system	
  goal
RobusT2Scale
Learned	
  rules
FQL
Monitoring Actuator
Cloud	
  Platform
.fis
L
W
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡,  
  𝑡ℎ, 𝑣𝑚
𝑠𝑎
Load	
  Generator
C
system	
  state
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟
http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf
More
Details?
=>
http://www.slideshare.net/pooyanjamshidi/
Slides?
=>
Thank you!
Aakash AhmadClaus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie
https://github.com/pooyanjamshidi
Code?
=>
Hamed Jamshidi
Nabor Mendonca
Brian CarrollReza Teimourzadegan

More Related Content

PDF
Fuzzy Control meets Software Engineering
PDF
Autonomic Resource Provisioning for Cloud-Based Software
PDF
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
PDF
Configuration Optimization for Big Data Software
PDF
Learning Software Performance Models for Dynamic and Uncertain Environments
PDF
Cw13 0.01-final
PDF
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
PDF
Resource scheduling algorithm
Fuzzy Control meets Software Engineering
Autonomic Resource Provisioning for Cloud-Based Software
Workload Patterns for Quality-driven Dynamic Cloud Service Configuration and...
Configuration Optimization for Big Data Software
Learning Software Performance Models for Dynamic and Uncertain Environments
Cw13 0.01-final
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
Resource scheduling algorithm

What's hot (18)

PDF
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
PDF
Towards a Unified View of Cloud Elasticity
PDF
Task Scheduling in Grid Computing.
PPTX
Cloud Computing and PSo
PPTX
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...
PDF
Noha danms13 talk_final
PDF
Dynamic Cloud Partitioning and Load Balancing in Cloud
PPT
task scheduling in cloud datacentre using genetic algorithm
PDF
Fault tolerant mechanisms in Big Data
PPTX
Where the wild things are - Benchmarking and Micro-Optimisations
PPTX
Performance is a Feature! at DDD 11
PPTX
Performance and how to measure it - ProgSCon London 2016
PPTX
Genetic Algorithm for task scheduling in Cloud Computing Environment
PPTX
Task scheduling Survey in Cloud Computing
PPTX
Shree krishna 20140214
PDF
Cloud-enabled Performance Testing vis-à-vis On-premise- Impetus White Paper
PPTX
Boston hug
PDF
Automatic Resource Elasticity for HPC Applications
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
Towards a Unified View of Cloud Elasticity
Task Scheduling in Grid Computing.
Cloud Computing and PSo
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...
Noha danms13 talk_final
Dynamic Cloud Partitioning and Load Balancing in Cloud
task scheduling in cloud datacentre using genetic algorithm
Fault tolerant mechanisms in Big Data
Where the wild things are - Benchmarking and Micro-Optimisations
Performance is a Feature! at DDD 11
Performance and how to measure it - ProgSCon London 2016
Genetic Algorithm for task scheduling in Cloud Computing Environment
Task scheduling Survey in Cloud Computing
Shree krishna 20140214
Cloud-enabled Performance Testing vis-à-vis On-premise- Impetus White Paper
Boston hug
Automatic Resource Elasticity for HPC Applications
Ad

Viewers also liked (7)

PDF
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
PDF
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
PDF
Transfer Learning for Improving Model Predictions in Robotic Systems
PDF
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
PDF
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
PDF
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
PDF
Machine Learning meets DevOps
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Microservices Architecture Enables DevOps: Migration to a Cloud-Native Archit...
Cloud Migration Patterns: A Multi-Cloud Architectural Perspective
Machine Learning meets DevOps
Ad

Similar to Self learning cloud controllers (20)

PDF
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
PDF
Resilient Design Using Queue Theory
PDF
Data dayposter v1.2
PPTX
Self-Adaptive SLA-Driven Capacity Management for Internet Services
PDF
CMG15 Session 525
PPTX
Training - What is Performance ?
PDF
Latency vs everything
PDF
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
PDF
Everybody Lies
PDF
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
PDF
Continuous delivery while minimizing performance risks
PPTX
Performance testing in scope of migration to cloud by Serghei Radov
PPTX
Workshop early or rapid cosmic fsm - Frank Vogelezang
PPTX
Performance optimization and Cloud applications
PDF
High Dimensionality Structures Selection for Efficient Economic Big data usin...
PPTX
Measure to fail
PPTX
4Developers 2015: Measure to fail - Tomasz Kowalczewski
PDF
Benchmarks, performance, scalability, and capacity what's behind the numbers
PDF
Benchmarks, performance, scalability, and capacity what s behind the numbers...
PPTX
Coordinating CPU and Memory Elasticity Controllers to Meet Service Response T...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
Resilient Design Using Queue Theory
Data dayposter v1.2
Self-Adaptive SLA-Driven Capacity Management for Internet Services
CMG15 Session 525
Training - What is Performance ?
Latency vs everything
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Everybody Lies
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Continuous delivery while minimizing performance risks
Performance testing in scope of migration to cloud by Serghei Radov
Workshop early or rapid cosmic fsm - Frank Vogelezang
Performance optimization and Cloud applications
High Dimensionality Structures Selection for Efficient Economic Big data usin...
Measure to fail
4Developers 2015: Measure to fail - Tomasz Kowalczewski
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Coordinating CPU and Memory Elasticity Controllers to Meet Service Response T...

More from Pooyan Jamshidi (17)

PDF
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
PDF
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
PDF
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
PDF
Transfer Learning for Performance Analysis of Machine Learning Systems
PDF
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
PDF
Machine Learning meets DevOps
PDF
Learning to Sample
PDF
Integrated Model Discovery and Self-Adaptation of Robots
PDF
Transfer Learning for Performance Analysis of Highly-Configurable Software
PDF
Architectural Tradeoff in Learning-Based Software
PDF
Production-Ready Machine Learning for the Software Architect
PDF
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
PDF
Architecting for Scale
PDF
Sensitivity Analysis for Building Adaptive Robotic Software
PDF
Configuration Optimization Tool
PPTX
Towards Quality-Aware Development of Big Data Applications with DICE
PDF
Autonomic Resource Provisioning for Cloud-Based Software
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach
Machine Learning Meets Quantitative Planning: Enabling Self-Adaptation in Aut...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Transfer Learning for Performance Analysis of Machine Learning Systems
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Machine Learning meets DevOps
Learning to Sample
Integrated Model Discovery and Self-Adaptation of Robots
Transfer Learning for Performance Analysis of Highly-Configurable Software
Architectural Tradeoff in Learning-Based Software
Production-Ready Machine Learning for the Software Architect
Transfer Learning for Software Performance Analysis: An Exploratory Analysis
Architecting for Scale
Sensitivity Analysis for Building Adaptive Robotic Software
Configuration Optimization Tool
Towards Quality-Aware Development of Big Data Applications with DICE
Autonomic Resource Provisioning for Cloud-Based Software

Recently uploaded (20)

PDF
Microsoft Office 365 Crack Download Free
PDF
AI Guide for Business Growth - Arna Softech
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PPTX
most interesting chapter in the world ppt
PPTX
Lecture 5 Software Requirement Engineering
PPTX
How to Odoo 19 Installation on Ubuntu - CandidRoot
PDF
Guide to Food Delivery App Development.pdf
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
E-Commerce Website Development Companyin india
PDF
Workplace Software and Skills - OpenStax
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Tech Workshop Escape Room Tech Workshop
Microsoft Office 365 Crack Download Free
AI Guide for Business Growth - Arna Softech
Topaz Photo AI Crack New Download (Latest 2025)
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
CNN LeNet5 Architecture: Neural Networks
MCP Security Tutorial - Beginner to Advanced
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Practical Indispensable Project Management Tips for Delivering Successful Exp...
most interesting chapter in the world ppt
Lecture 5 Software Requirement Engineering
How to Odoo 19 Installation on Ubuntu - CandidRoot
Guide to Food Delivery App Development.pdf
CCleaner 6.39.11548 Crack 2025 License Key
E-Commerce Website Development Companyin india
Workplace Software and Skills - OpenStax
How to Use SharePoint as an ISO-Compliant Document Management System
Tech Workshop Escape Room Tech Workshop

Self learning cloud controllers

  • 1. Self-Learning Cloud Controllers Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada IC4, Dublin City University, Ireland University of Duisburg-Essen, Germany Intel, Ireland pooyan.jamshidi@computing.dcu.ie Invited presentation at UFC, Brazil, Fortaleza
  • 2. ~50% = wasted hardware Actual traffic Typical weekly traffic to Web-based applications (e.g., Amazon.com)
  • 3. Problem 1: ~75% wasted capacity Actual demand Problem 2: customer lost Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)
  • 4. Really like this?? Auto-scaling enables you to realize this ideal on-demand provisioning Time   Demand   ?   Enacting change in the Cloud resources are not real-time
  • 5. Capacity we can provision with Auto-Scaling A realistic figure of dynamic provisioning
  • 7. 0 50 100 0 500 1000 1500 0 50 100 100 200 300 400 500 0 50 100 0 1000 2000 0 50 100 0 200 400 600
  • 9. These quantitative values are required to be determined by the user ⇒  requires deep knowledge of application (CPU, memory, thresholds) ⇒  requires performance modeling expertise (when and how to scale) ⇒  A unified opinion of user(s) is required Amazon auto scaling Microsoft Azure Watch 9   Microsoft Azure Auto- scaling Application Block
  • 11. Naeem Esfahani and Sam Malek, “Uncertainty in Self-Adaptive Software Systems” Pooyan Jamshidi, Aakash Ahmad, Claus Pahl, Muhammad Ali Babar , “Sources of Uncertainty in Dynamic Management of Elastic Systems”, Under Review
  • 12. Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes). The enactment latency would be also different on different cloud platforms.
  • 13. Ø Offline benchmarking Ø Trial-and-error Ø Expert knowledge Costly and not systematic A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14 arrival  rate  (req/s)   95%  Resp.  :me  (ms)   400  ms     60  req/s  
  • 14. RobusT2Scale Initial setting + elasticity rules + response-time SLA environment monitoring application monitoring scaling actions Fuzzy Reasoning Users Prediction/ Smoothing
  • 16.      0 0.5 1 1.5 2 2.5 3 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Region  of   definite   satisfaction   Region  of   definite   dissatisfaction  Region  of   uncertain   satisfaction   Performance Index Possibility Performance Index Possibility words can mean different things to different people Different users often recommend different elasticity policies 0 0.5 1 1.5 2 2.5 3 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Type-2 MF Type-1 MF
  • 18. Workload Response time 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x2 uMembershipgrade 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 uMembershipgrade => => UMF LMF Embedded FOU mean sd
  • 19. Rule   (𝒍)   Antecedents   Consequent   𝒄 𝒂𝒗𝒈 𝒍   Workload   Response-­‐ time   Normal   (-­‐2)   Effort   (-­‐1)   Medium   Effort   (0)   High   Effort   (+1)   Maximum   Effort  (+2)   1   Very  low   Instantaneous   7   2   1   0   0   -­‐1.6   2   Very  low   Fast   5   4   1   0   0   -­‐1.4   3   Very  low   Medium   0   2   6   2   0   0   4   Very  low   Slow   0   0   4   6   0   0.6   5   Very  low   Very  slow   0   0   0   6   4   1.4   6   Low   Instantaneous   5   3   2   0   0   -­‐1.3   7   Low   Fast   2   7   1   0   0   -­‐1.1   8   Low   Medium   0   1   5   3   1   0.4   9   Low   Slow   0   0   1   8   1   1   10   Low   Very  slow   0   0   0   4   6   1.6   11   Medium   Instantaneous   6   4   0   0   0   -­‐1.6   12   Medium   Fast   2   5   3   0   0   -­‐0.9   13   Medium   Medium   0   0   5   4   1   0.6   14   Medium   Slow   0   0   1   7   2   1.1   15   Medium   Very  slow   0   0   1   3   6   1.5   16   High   Instantaneous   8   2   0   0   0   -­‐1.8   17   High   Fast   4   6   0   0   0   -­‐1.4   18   High   Medium   0   1   5   3   1   0.4   19   High   Slow   0   0   1   7   2   1.1   20   High   Very  slow   0   0   0   6   4   1.4   21   Very  high   Instantaneous   9   1   0   0   0   -­‐1.9   22   Very  high   Fast   3   6   1   0   0   -­‐1.2   23   Very  high   Medium   0   1   4   4   1   0.5   24   Very  high   Slow   0   0   1   8   1   1   25   Very  high   Very  slow   0   0   0   4   6   1.6   Rule ()   Antecedents   Consequent   Work load   Respons e -time   -2   -1   0 +1   +2   12   Medium   Fast   2   5   3 0   0   -0.9   10 experts’ responses ​ 𝑅↑𝑙 : IF (the workload (​ 𝑥↓1 ) is ​​ 𝐹 ↓​ 𝑖↓1  , AND the response- time (​ 𝑥↓2 ) is ​​ 𝐺 ↓​ 𝑖↓2  ), THEN (add/remove ​ 𝑐↓𝑎𝑣𝑔↑𝑙  instances). ​ 𝑐↓𝑎𝑣𝑔↑𝑙 =​∑ 𝑢=1↑​ 𝑁↓𝑙 ▒​ 𝑤↓𝑢↑𝑙 × 𝐶 /∑𝑢=1↑​ 𝑁↓ Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference
  • 20. Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550. Scaling Actions Monitoring Data
  • 21. 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5954 0.3797 𝑀 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2212 0.0000 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x2 u 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 u Monitoring data Workload Response time
  • 22. 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5954 0.3797 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9568 0.9377
  • 26. 0 50 100 0 500 1000 1500 0 50 100 100 200 300 400 500 0 50 100 0 1000 2000 0 50 100 0 200 400 600 0 50 100 0 500 1000 0 50 100 0 500 1000
  • 27. 0 10 20 30 40 50 60 70 80 90 100 -500 0 500 1000 1500 2000 Time (seconds) Numberofhits Original data betta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703 betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504 betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase 0 50 100 0 500 1000 1500 0 50 100 100 200 300 400 500 0 50 100 0 1000 2000 0 50 100 0 200 400 600 0 50 100 0 500 1000 0 50 100 0 500 1000 RootRelativeSquaredError
  • 28. SUT Criteria Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase RobusT2Scale 973ms 537ms 509ms 451ms 423ms 498ms 3.2 3.8 5.1 5.3 3.7 3.9 Overprovisioni ng 354ms 411ms 395ms 446ms 371ms 491ms 6 6 6 6 6 6 Under provisioning 1465ms 1832ms 1789ms 1594ms 1898ms 2194ms 2 2 2 2 2 2 SLA: ​ 𝒓 𝒕↓ 𝟗𝟓 ≤𝟔𝟎𝟎 𝒎𝒔 For every 10s control interval • RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources • RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA
  • 29. 0 0.02 0.04 0.06 0.08 0.1 alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0 RootMeanSquareError Noise level: 10%
  • 31. Current Solution (my PhD Thesis + IC4 work on auto-scaling) Future Plan Updating K in MAPE-K @ RuntimeDesign-time Assistance Multi-cloud
  • 32. Fuzzifier Inference Engine Defuzzifier Rule base Fuzzy Q-learning Cloud ApplicationMonitoring Actuator Cloud Platform Fuzzy Logic Controller Knowledge Learning AutonomicController 𝑟𝑡 𝑤𝑤,  𝑟𝑡,  𝑡ℎ,  𝑣𝑚 𝑠𝑎 system state system goal
  • 33. RobusT2Scal e Learned rules FQL Monitoring Actuator Cloud Platform .fis L W W ElasticBench 𝑤,   𝑟𝑡 𝑤,   𝑟𝑡,     𝑡ℎ,   𝑣𝑚 𝑠𝑎 Load Generator C system state WCF REST 𝛾, 𝜂, 𝜀, 𝑟
  • 34. Cloud Platform (PaaS)On-Premise P: Worker Role L: Web Role P: Worker Role P: Worker Role Cache M: Worker Role Results : Storag e Blackboard : Storage LG: Console Auto-scaling Logic (controller) Policy Enforcer 1112 10 LB: Load Balance r Queue Actuator Monitoring
  • 36. 0 0.5 1 1.5 2 2.5 0 50 100 150 200 250 300 350 q(9,5) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 50 100 150 200 250 300 350 q(7,5) -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 50 100 150 200 250 300 350 q(9,3) -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0 50 100 150 200 250 300 350 q(3,3)
  • 37. 0 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
  • 39. Challenge  1:  ~75%  wasted  capacity A c t u a l   d e m a n d Challenge  2:   customer  lost Fuzzifier Inference   Engine Defuzzifier Rule   base Fuzzy Q-­‐learning Cloud  ApplicationMonitoring Actuator Cloud  Platform Fuzzy  Logic   Controller Knowledge  Learning Autonomic  Controller 𝑟𝑡 𝑤 𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚 𝑠𝑎 system  state system  goal RobusT2Scale Learned  rules FQL Monitoring Actuator Cloud  Platform .fis L W W ElasticBench 𝑤, 𝑟𝑡 𝑤, 𝑟𝑡,    𝑡ℎ, 𝑣𝑚 𝑠𝑎 Load  Generator C system  state WCF REST 𝛾, 𝜂, 𝜀, 𝑟
  • 40. http://computing.dcu.ie/~pjamshidi/PDF/SEAMS2014.pdf More Details? => http://www.slideshare.net/pooyanjamshidi/ Slides? => Thank you! Aakash AhmadClaus Pahl Soodeh Farokhi Amir Sharifloo Armin Balalaie https://github.com/pooyanjamshidi Code? => Hamed Jamshidi Nabor Mendonca Brian CarrollReza Teimourzadegan