SlideShare a Scribd company logo
Improving Resource Utilization
in Data Centers using an LSTM
based Prediction Model
Kundjanasith Thonglek1
, Kohei Ichikawa1
, Keichi Takahashi1
,
Chawanat Nakasan2
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University
1
Workshop on Monitoring and Analysis for
High Performance Computing Systems Plus Applications
IEEE Cluster 2019 conference, Albuquerque, New Mexico, USA
Outline Introduction
Methodology
Evaluation
Conclusion
Conclusion
2
Introduction
3
Introduction
● Data centers are centralized resources where computing and networking
equipment is concentrated.
● They handle large amounts of data and computation efficiently.
4
❖ High Availability
❖ Continuous Migration
❖ Energy Efficiency
❖ Resource Utilization
Problem statement
Users tend to request more resources than their applications actually need
● Unused resources by applications are wasted
● Overall resource utilization in the data center degrades
5
wasted resource
Objective
Improving resource utilization in data center by predicting suitable
resource allocation for each application.
6
Computing Resource
Allocated Resource
Used Resource
Wasted Resource
Computing Resource
Allocated Resource
Used Resource
Wasted Resource
Improve scheduling algorithm
- Prioritize computing factors [1]
- The arrival time or the submitted time
- The worst case of computational time
- The deadline
7
[1] Singh, Harvinder. (2014). Improve Resource Utilization by ask Scheduling in Cluster Computing. International Journal of Emerging
Research in Management &Technology.
[2] Tejaswini Choudhari, Melody Moh, and Teng-Sheng Moh. 2018. Prioritized task scheduling in fog computing. In Proceedings of the
ACMSE 2018 Conference (ACMSE '18). ACM, New York, NY, USA, Article 22, 8 pages.
- Can not custom to the different data center for handle the different
workload characteristics [2]
- Static scheduling algorithm
Predict resource utilization
8
[3] Wang, Jina & Yan, Yongming & Guo, Jun. (2016). Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural
Network. MATEC Web of Conferences. 65.
[4] Taraneh Taghavi, Maria Lupetini, and Yaron Kretchmer. 2016. Compute Job Memory Recommender System Using Machine Learning.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, NY, USA,
- Create predictive model to predict the suitable allocated computing
resources [3]
- Can not improve CPU resource and memory resource utilization at the
same time [4]
- More dynamic resource allocation
Methodology
9
Approaches
Long Short-Term Memory or LSTM is applied to predict the allocated
resources for minimize wasted resources with increasing the resource
utilization rate.
Memory cell’s size is the
significant parameters to
recognize the time interval in
the memory cell of LSTM
10
Our approach can handle both
CPU and memory resources to increase
resource utilization.
Improving resource utilization in data centers using LSTM
- Download and analyse
Google’s cluster usage
data which is real
workload in data center
Google’s Cluster Data
- Design and implement
LSTM to improve allocated
resource for increase
resource utilization rate
Long Short-Term Memory
- Apply our LSTM model to
inference with real workload
in Google’s data center by
varying memory cell’s size
Improve Resource Utilization
Google’s
Open Data
Long Short-
Term Memory
Improve Resource
Utilization
Usage
Simulation
- Simulate resource
utilization in data center
using allocated resources
and resources usage
Usage Simulation
11
Google’s cluster usage data
Google’s cluster usage data is real workload data in Google’s data center.
12
Computing Resource Requested Resource Resource Usage
CPU Requested CPU CPU usage
Memory Requested memory memory usage
Distribution of CPU requested by jobs
13
Distribution of memory requested by jobs
14
Classification of jobs in Google’s data center
15
Job
95.95 %
1.65 %
Over-allocated CPU and
over-allocated memory
Under-allocated CPU and
under-allocated memory
Under-allocated CPU and
over-allocated memory
Over-allocated CPU and
under-allocated memory
1.75 %
0.65 %
Recurrent Neural Network
16
● Each node represents
a layer of neurons at a
single time step
● During training model,
gradients may explode
or vanish because of
temporal depth.
Long Short-Term Memory
Long Short-Term Memory or LSTM introduces long-term memory into RNN.
17
● LSTM migrates the vanishing gradient problem, which is where the neural
network stops learning because the updates to the various weights within
a given neural network become smaller and smaller.
● There are two states that are being transferred to the next cell; the cell
state and the hidden state.
● The memory cell replaces hidden neurons used in traditional RNNs to build
a hidden layer.
x : input
A : action
h : hidden state
LSTM block
18
Our proposed neural network
Input Layer:
(1) Requested CPU resources
(2) Requested memory resources
(3) Used CPU resources
(4) Used memory resources
19
Output Layer:
(1) Predicted efficient CPU allocation
(2) Predicted efficient memory allocation
1st
LSTM Layer:
Finding the correlation between CPU and memory
Fully Connected Layer:
Connected each neuron to one layers
2nd
LSTM Layer:
Finding the correlation between allocated and used
Improve Resource Utilization
Improving resource utilization
by implement Long Short-Term
Memory model using requested
CPU, requested memory, CPU
usage and memory usage.
20
Allocated Resource
Resource Usage
CPU (%)
Memory (%) M
O
D
E
L
Memory cell’s size
➔ 20 minutes
➔ 40 minutes
➔ 60 minutes
CPU (%)
Memory (%)
The memory cell’s
size in Long Short -Term
Memory model is
memorizing each step
input-output pair of
values in each sequence.
Usage Simulation
21
Google’s cluster usage data
(100%)
Training dataset
(80%)
Testing dataset
(20%)
[LSTM/RNN] MODEL
Allocated Resource [Predicted]
CPU (%)
Memory (%)
Simulation
Resource Allocation
Simulate resource utilization in
data center from allocated resource
which is predicted using our time-
series predictive model to apply
with the actual computing
resources.
CPU (%)
Memory (%)
Google’s cluster scheduler simulator
22
Github URL: https://github.com/google/cluster-scheduler-simulator
[1] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes. Omega: flexible, scalable schedulers for large compute
clusters. In Proceedings of the 8th European Conference on Computer Systems (EuroSys 2013).
[2] Charles Reiss, Alexey Tumanov, Gregory Ganger, Randy Katz and Michael Kotzuch. Heterogeneity and Dynamicity of Clouds at Scale:
Google Trace Analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC 2012).
Cluster
State
State 1
State 2
State 3
- Google’s cluster grants each scheduler full
access to the entire cluster, allowing them to
compete in a free-for-all manner.
- There is no central resource allocator as all of
the resource-allocation decisions take place in
the schedulers.
- There is no central policy-enforcement engine,
individual schedulers are taking decisions in
this variant of the two-level scheme.
- By supporting independent scheduler
implementations and exposing the entire
allocation state of the schedulers
Evaluation
23
CPU utilization result in simulation
24
Memory utilization result in simulation
25
Decreased CPU resource wastage
26
97.88%
94.37% 96.08%
92.43% 93.02%
89.22%
Decreased memory resource wastage
27
88.63%
73.39%
77.20%
67.04% 65.52%
52.83%
Training time & Inference time
28
408.93
35.67
130.82
49.77
35.13
28.78
Conclusion
29
Conclusion
❖ We studied how to improve the resource utilization in data center
using Long Short-Term Memory
➢ Discovered the impact from various memory cell size in Long
Short-Term Memory model.
➢ Analyzed the real workload include allocated resource and
used resource in Google’s data center.
➢ Improved the suitable allocated resources to increase
resource utilization.
30
❖ We would like to apply the other time-series forecasting techniques to
improve the resource utilization.
Q & A
Email: thonglek.kundjanasith.ti7@is.naist.jp
Software Design & Analysis Laboratory, NAIST
31

More Related Content

PDF
Presto as a Service - Tips for operation and monitoring
PPT
Controlling Your Tongue
PDF
Lessons from managing a Pulsar cluster (Nutanix)
PPTX
divine sermon.pptx
PDF
How Development Teams Cut Costs with ScyllaDB.pdf
PDF
Paul Washer - One True God
PPTX
Part 1 historical prospective of the liturgy inovations throughout the history
PDF
Optimizing Autovacuum: PostgreSQL's vacuum cleaner
Presto as a Service - Tips for operation and monitoring
Controlling Your Tongue
Lessons from managing a Pulsar cluster (Nutanix)
divine sermon.pptx
How Development Teams Cut Costs with ScyllaDB.pdf
Paul Washer - One True God
Part 1 historical prospective of the liturgy inovations throughout the history
Optimizing Autovacuum: PostgreSQL's vacuum cleaner

What's hot (14)

PPTX
Oracle Hyperion and Planning Public Sector Budgeting
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PPTX
Real-time Stream Processing with Apache Flink
PDF
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
PDF
Real-Time Market Data Analytics Using Kafka Streams
PPTX
Ten commandments - First Commandment GRADE 8
PPTX
True education 8 studying the waldenses
PDF
Architects Draw_ Freehand Fundamentals - PDF Room.pdf
PPTX
473721 dba feature_usage_statistics
PPTX
Cfc clp talk 7
PPTX
Cfc clp talk 7
PPT
Sun Oracle Exadata V2 For OLTP And DWH
PDF
XStream: stream processing platform at facebook
Oracle Hyperion and Planning Public Sector Budgeting
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Real-time Stream Processing with Apache Flink
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Real-Time Market Data Analytics Using Kafka Streams
Ten commandments - First Commandment GRADE 8
True education 8 studying the waldenses
Architects Draw_ Freehand Fundamentals - PDF Room.pdf
473721 dba feature_usage_statistics
Cfc clp talk 7
Cfc clp talk 7
Sun Oracle Exadata V2 For OLTP And DWH
XStream: stream processing platform at facebook
Ad

Similar to Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf (20)

PPT
PPT - Ph.D. Semester Progress Review 3.ppt
PPTX
A methodology for full system power modeling in heterogeneous data centers
PDF
Dn32717720
PPTX
Seminar PPT on computer cluster by unknown.pptx
PDF
Auto-scaling Techniques for Elastic Data Stream Processing
PDF
A Survey of Job Scheduling Algorithms Whit Hierarchical Structure to Load Ba...
PDF
Cluster computing report
PDF
Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...
PPTX
Assignment-1 Updated Version advanced comp.pptx
PDF
An enhanced adaptive scoring job scheduling algorithm with replication strate...
PDF
Modern Computing: Cloud, Distributed, & High Performance
PDF
Grid computing notes
PDF
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
PDF
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
PPTX
Grid applications
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
Cloud Computing
PPTX
Cloud Computing
PDF
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
PPT - Ph.D. Semester Progress Review 3.ppt
A methodology for full system power modeling in heterogeneous data centers
Dn32717720
Seminar PPT on computer cluster by unknown.pptx
Auto-scaling Techniques for Elastic Data Stream Processing
A Survey of Job Scheduling Algorithms Whit Hierarchical Structure to Load Ba...
Cluster computing report
Propose a Method to Improve Performance in Grid Environment, Using Multi-Crit...
Assignment-1 Updated Version advanced comp.pptx
An enhanced adaptive scoring job scheduling algorithm with replication strate...
Modern Computing: Cloud, Distributed, & High Performance
Grid computing notes
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Grid applications
International Journal of Engineering Research and Development (IJERD)
Cloud Computing
Cloud Computing
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
Ad

More from Kundjanasith Thonglek (8)

PDF
Sparse Communication for Federated Learning
PDF
Improving Resource Availability in Data Center using Deep Learning.pdf
PDF
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
PDF
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
PDF
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
PDF
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
PDF
Intelligent Vehicle Accident Analysis System.pdf
PDF
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Sparse Communication for Federated Learning
Improving Resource Availability in Data Center using Deep Learning.pdf
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
Intelligent Vehicle Accident Analysis System.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
August Patch Tuesday
PPTX
Tartificialntelligence_presentation.pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
Getting Started with Data Integration: FME Form 101
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
MIND Revenue Release Quarter 2 2025 Press Release
DP Operators-handbook-extract for the Mautical Institute
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
1 - Historical Antecedents, Social Consideration.pdf
OMC Textile Division Presentation 2021.pptx
Enhancing emotion recognition model for a student engagement use case through...
cloud_computing_Infrastucture_as_cloud_p
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative study of natural language inference in Swahili using monolingua...
August Patch Tuesday
Tartificialntelligence_presentation.pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Mushroom cultivation and it's methods.pdf
Group 1 Presentation -Planning and Decision Making .pptx
WOOl fibre morphology and structure.pdf for textiles

Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model.pdf

  • 1. Improving Resource Utilization in Data Centers using an LSTM based Prediction Model Kundjanasith Thonglek1 , Kohei Ichikawa1 , Keichi Takahashi1 , Chawanat Nakasan2 and Hajimu Iida1 1 Nara Institute of Science and Technology, 2 Kanazawa University 1 Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications IEEE Cluster 2019 conference, Albuquerque, New Mexico, USA
  • 4. Introduction ● Data centers are centralized resources where computing and networking equipment is concentrated. ● They handle large amounts of data and computation efficiently. 4 ❖ High Availability ❖ Continuous Migration ❖ Energy Efficiency ❖ Resource Utilization
  • 5. Problem statement Users tend to request more resources than their applications actually need ● Unused resources by applications are wasted ● Overall resource utilization in the data center degrades 5 wasted resource
  • 6. Objective Improving resource utilization in data center by predicting suitable resource allocation for each application. 6 Computing Resource Allocated Resource Used Resource Wasted Resource Computing Resource Allocated Resource Used Resource Wasted Resource
  • 7. Improve scheduling algorithm - Prioritize computing factors [1] - The arrival time or the submitted time - The worst case of computational time - The deadline 7 [1] Singh, Harvinder. (2014). Improve Resource Utilization by ask Scheduling in Cluster Computing. International Journal of Emerging Research in Management &Technology. [2] Tejaswini Choudhari, Melody Moh, and Teng-Sheng Moh. 2018. Prioritized task scheduling in fog computing. In Proceedings of the ACMSE 2018 Conference (ACMSE '18). ACM, New York, NY, USA, Article 22, 8 pages. - Can not custom to the different data center for handle the different workload characteristics [2] - Static scheduling algorithm
  • 8. Predict resource utilization 8 [3] Wang, Jina & Yan, Yongming & Guo, Jun. (2016). Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural Network. MATEC Web of Conferences. 65. [4] Taraneh Taghavi, Maria Lupetini, and Yaron Kretchmer. 2016. Compute Job Memory Recommender System Using Machine Learning. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, NY, USA, - Create predictive model to predict the suitable allocated computing resources [3] - Can not improve CPU resource and memory resource utilization at the same time [4] - More dynamic resource allocation
  • 10. Approaches Long Short-Term Memory or LSTM is applied to predict the allocated resources for minimize wasted resources with increasing the resource utilization rate. Memory cell’s size is the significant parameters to recognize the time interval in the memory cell of LSTM 10 Our approach can handle both CPU and memory resources to increase resource utilization.
  • 11. Improving resource utilization in data centers using LSTM - Download and analyse Google’s cluster usage data which is real workload in data center Google’s Cluster Data - Design and implement LSTM to improve allocated resource for increase resource utilization rate Long Short-Term Memory - Apply our LSTM model to inference with real workload in Google’s data center by varying memory cell’s size Improve Resource Utilization Google’s Open Data Long Short- Term Memory Improve Resource Utilization Usage Simulation - Simulate resource utilization in data center using allocated resources and resources usage Usage Simulation 11
  • 12. Google’s cluster usage data Google’s cluster usage data is real workload data in Google’s data center. 12 Computing Resource Requested Resource Resource Usage CPU Requested CPU CPU usage Memory Requested memory memory usage
  • 13. Distribution of CPU requested by jobs 13
  • 14. Distribution of memory requested by jobs 14
  • 15. Classification of jobs in Google’s data center 15 Job 95.95 % 1.65 % Over-allocated CPU and over-allocated memory Under-allocated CPU and under-allocated memory Under-allocated CPU and over-allocated memory Over-allocated CPU and under-allocated memory 1.75 % 0.65 %
  • 16. Recurrent Neural Network 16 ● Each node represents a layer of neurons at a single time step ● During training model, gradients may explode or vanish because of temporal depth.
  • 17. Long Short-Term Memory Long Short-Term Memory or LSTM introduces long-term memory into RNN. 17 ● LSTM migrates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. ● There are two states that are being transferred to the next cell; the cell state and the hidden state. ● The memory cell replaces hidden neurons used in traditional RNNs to build a hidden layer. x : input A : action h : hidden state
  • 19. Our proposed neural network Input Layer: (1) Requested CPU resources (2) Requested memory resources (3) Used CPU resources (4) Used memory resources 19 Output Layer: (1) Predicted efficient CPU allocation (2) Predicted efficient memory allocation 1st LSTM Layer: Finding the correlation between CPU and memory Fully Connected Layer: Connected each neuron to one layers 2nd LSTM Layer: Finding the correlation between allocated and used
  • 20. Improve Resource Utilization Improving resource utilization by implement Long Short-Term Memory model using requested CPU, requested memory, CPU usage and memory usage. 20 Allocated Resource Resource Usage CPU (%) Memory (%) M O D E L Memory cell’s size ➔ 20 minutes ➔ 40 minutes ➔ 60 minutes CPU (%) Memory (%) The memory cell’s size in Long Short -Term Memory model is memorizing each step input-output pair of values in each sequence.
  • 21. Usage Simulation 21 Google’s cluster usage data (100%) Training dataset (80%) Testing dataset (20%) [LSTM/RNN] MODEL Allocated Resource [Predicted] CPU (%) Memory (%) Simulation Resource Allocation Simulate resource utilization in data center from allocated resource which is predicted using our time- series predictive model to apply with the actual computing resources. CPU (%) Memory (%)
  • 22. Google’s cluster scheduler simulator 22 Github URL: https://github.com/google/cluster-scheduler-simulator [1] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th European Conference on Computer Systems (EuroSys 2013). [2] Charles Reiss, Alexey Tumanov, Gregory Ganger, Randy Katz and Michael Kotzuch. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC 2012). Cluster State State 1 State 2 State 3 - Google’s cluster grants each scheduler full access to the entire cluster, allowing them to compete in a free-for-all manner. - There is no central resource allocator as all of the resource-allocation decisions take place in the schedulers. - There is no central policy-enforcement engine, individual schedulers are taking decisions in this variant of the two-level scheme. - By supporting independent scheduler implementations and exposing the entire allocation state of the schedulers
  • 24. CPU utilization result in simulation 24
  • 25. Memory utilization result in simulation 25
  • 26. Decreased CPU resource wastage 26 97.88% 94.37% 96.08% 92.43% 93.02% 89.22%
  • 27. Decreased memory resource wastage 27 88.63% 73.39% 77.20% 67.04% 65.52% 52.83%
  • 28. Training time & Inference time 28 408.93 35.67 130.82 49.77 35.13 28.78
  • 30. Conclusion ❖ We studied how to improve the resource utilization in data center using Long Short-Term Memory ➢ Discovered the impact from various memory cell size in Long Short-Term Memory model. ➢ Analyzed the real workload include allocated resource and used resource in Google’s data center. ➢ Improved the suitable allocated resources to increase resource utilization. 30 ❖ We would like to apply the other time-series forecasting techniques to improve the resource utilization.
  • 31. Q & A Email: thonglek.kundjanasith.ti7@is.naist.jp Software Design & Analysis Laboratory, NAIST 31