Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf

Improving Resource Utilization
in Data Centers using an LSTM
based Prediction Model
Kundjanasith Thonglek1
, Kohei Ichikawa1
, Keichi Takahashi1
,
Chawanat Nakasan2
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University
1
Workshop on Monitoring and Analysis for
High Performance Computing Systems Plus Applications
IEEE Cluster 2019 conference, Albuquerque, New Mexico, USA

Outline Introduction
Methodology
Evaluation
Conclusion
Conclusion
2

Introduction
● Data centers are centralized resources where computing and networking
equipment is concentrated.
● They handle large amounts of data and computation efﬁciently.
4
❖ High Availability
❖ Continuous Migration
❖ Energy Efﬁciency
❖ Resource Utilization

Problem statement
Users tend to request more resources than their applications actually need
● Unused resources by applications are wasted
● Overall resource utilization in the data center degrades
5
wasted resource

Objective
Improving resource utilization in data center by predicting suitable
resource allocation for each application.
6
Computing Resource
Allocated Resource
Used Resource
Wasted Resource
Computing Resource
Allocated Resource
Used Resource
Wasted Resource

Improve scheduling algorithm
- Prioritize computing factors [1]
- The arrival time or the submitted time
- The worst case of computational time
- The deadline
7
[1] Singh, Harvinder. (2014). Improve Resource Utilization by ask Scheduling in Cluster Computing. International Journal of Emerging
Research in Management &Technology.
[2] Tejaswini Choudhari, Melody Moh, and Teng-Sheng Moh. 2018. Prioritized task scheduling in fog computing. In Proceedings of the
ACMSE 2018 Conference (ACMSE '18). ACM, New York, NY, USA, Article 22, 8 pages.
- Can not custom to the different data center for handle the different
workload characteristics [2]
- Static scheduling algorithm

Predict resource utilization
8
[3] Wang, Jina & Yan, Yongming & Guo, Jun. (2016). Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural
Network. MATEC Web of Conferences. 65.
[4] Taraneh Taghavi, Maria Lupetini, and Yaron Kretchmer. 2016. Compute Job Memory Recommender System Using Machine Learning.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, NY, USA,
- Create predictive model to predict the suitable allocated computing
resources [3]
- Can not improve CPU resource and memory resource utilization at the
same time [4]
- More dynamic resource allocation

Approaches
Long Short-Term Memory or LSTM is applied to predict the allocated
resources for minimize wasted resources with increasing the resource
utilization rate.
Memory cell’s size is the
signiﬁcant parameters to
recognize the time interval in
the memory cell of LSTM
10
Our approach can handle both
CPU and memory resources to increase
resource utilization.

Improving resource utilization in data centers using LSTM
- Download and analyse
Google’s cluster usage
data which is real
workload in data center
Google’s Cluster Data
- Design and implement
LSTM to improve allocated
resource for increase
resource utilization rate
Long Short-Term Memory
- Apply our LSTM model to
inference with real workload
in Google’s data center by
varying memory cell’s size
Improve Resource Utilization
Google’s
Open Data
Long Short-
Term Memory
Improve Resource
Utilization
Usage
Simulation
- Simulate resource
utilization in data center
using allocated resources
and resources usage
Usage Simulation
11

Google’s cluster usage data
Google’s cluster usage data is real workload data in Google’s data center.
12
Computing Resource Requested Resource Resource Usage
CPU Requested CPU CPU usage
Memory Requested memory memory usage

Distribution of CPU requested by jobs
13

Distribution of memory requested by jobs
14

Classiﬁcation of jobs in Google’s data center
15
Job
95.95 %
1.65 %
Over-allocated CPU and
over-allocated memory
Under-allocated CPU and
under-allocated memory
Under-allocated CPU and
over-allocated memory
Over-allocated CPU and
under-allocated memory
1.75 %
0.65 %

Recurrent Neural Network
16
● Each node represents
a layer of neurons at a
single time step
● During training model,
gradients may explode
or vanish because of
temporal depth.

Long Short-Term Memory
Long Short-Term Memory or LSTM introduces long-term memory into RNN.
17
● LSTM migrates the vanishing gradient problem, which is where the neural
network stops learning because the updates to the various weights within
a given neural network become smaller and smaller.
● There are two states that are being transferred to the next cell; the cell
state and the hidden state.
● The memory cell replaces hidden neurons used in traditional RNNs to build
a hidden layer.
x : input
A : action
h : hidden state

Our proposed neural network
Input Layer:
(1) Requested CPU resources
(2) Requested memory resources
(3) Used CPU resources
(4) Used memory resources
19
Output Layer:
(1) Predicted efﬁcient CPU allocation
(2) Predicted efﬁcient memory allocation
1st
LSTM Layer:
Finding the correlation between CPU and memory
Fully Connected Layer:
Connected each neuron to one layers
2nd
LSTM Layer:
Finding the correlation between allocated and used

Improve Resource Utilization
Improving resource utilization
by implement Long Short-Term
Memory model using requested
CPU, requested memory, CPU
usage and memory usage.
20
Allocated Resource
Resource Usage
CPU (%)
Memory (%) M
O
D
E
L
Memory cell’s size
➔ 20 minutes
➔ 40 minutes
➔ 60 minutes
CPU (%)
Memory (%)
The memory cell’s
size in Long Short -Term
Memory model is
memorizing each step
input-output pair of
values in each sequence.

Usage Simulation
21
Google’s cluster usage data
(100%)
Training dataset
(80%)
Testing dataset
(20%)
[LSTM/RNN] MODEL
Allocated Resource [Predicted]
CPU (%)
Memory (%)
Simulation
Resource Allocation
Simulate resource utilization in
data center from allocated resource
which is predicted using our time-
series predictive model to apply
with the actual computing
resources.
CPU (%)
Memory (%)

Google’s cluster scheduler simulator
22
Github URL: https://github.com/google/cluster-scheduler-simulator
[1] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes. Omega: ﬂexible, scalable schedulers for large compute
clusters. In Proceedings of the 8th European Conference on Computer Systems (EuroSys 2013).
[2] Charles Reiss, Alexey Tumanov, Gregory Ganger, Randy Katz and Michael Kotzuch. Heterogeneity and Dynamicity of Clouds at Scale:
Google Trace Analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC 2012).
Cluster
State
State 1
State 2
State 3
- Google’s cluster grants each scheduler full
access to the entire cluster, allowing them to
compete in a free-for-all manner.
- There is no central resource allocator as all of
the resource-allocation decisions take place in
the schedulers.
- There is no central policy-enforcement engine,
individual schedulers are taking decisions in
this variant of the two-level scheme.
- By supporting independent scheduler
implementations and exposing the entire
allocation state of the schedulers

CPU utilization result in simulation
24

Memory utilization result in simulation
25

Decreased CPU resource wastage
26
97.88%
94.37% 96.08%
92.43% 93.02%
89.22%

Decreased memory resource wastage
27
88.63%
73.39%
77.20%
67.04% 65.52%
52.83%

Training time & Inference time
28
408.93
35.67
130.82
49.77
35.13
28.78

Conclusion
❖ We studied how to improve the resource utilization in data center
using Long Short-Term Memory
➢ Discovered the impact from various memory cell size in Long
Short-Term Memory model.
➢ Analyzed the real workload include allocated resource and
used resource in Google’s data center.
➢ Improved the suitable allocated resources to increase
resource utilization.
30
❖ We would like to apply the other time-series forecasting techniques to
improve the resource utilization.

Q & A
Email: thonglek.kundjanasith.ti7@is.naist.jp
Software Design & Analysis Laboratory, NAIST
31

Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf

More Related Content

What's hot (14)

Similar to Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf (20)

More from Kundjanasith Thonglek (8)

Recently uploaded (20)

Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf