SlideShare a Scribd company logo
A Game-Theoretic Approach for Runtime
Capacity Allocation in MapReduce
Eugenio Gianniti *, Danilo Ardagna *, Michele Ciavotta *,
Mauro Passacantando * *
*Politecnico di Milano
**Università di Pisa
POLITECNICO DI MILANOSlide 1
Goals
POLITECNICO DI MILANO
Cost-effective deployment configuration for MapReduce systems hosted on
private Clouds
Efficient solution algorithm to support run-time cluster management
Distributed approach leveraging on local knowledge of problem parameters
Slide 2
Reference System
POLITECNICO DI MILANO
Hadoop YARN
Slide 3
MM
R
R
M
Preliminary Formulation
POLITECNICO DI MILANO
min
r,h,sM ,sR
NX
i=1
¯⇢ri +
NX
i=1
Pi (hi)
NX
i=1
ri  R
Hlow
i  hi  Hup
i , 8i 2 A
Aihi
sM
i
+
Bihi
sR
i
+ Ei  0, 8i 2 A
sM
i
cM
i
+
sR
i
cR
i
 ri, 8i 2 A
ri 2 N0, 8i 2 A
sM
i 2 N0, 8i 2 A
sR
i 2 N0, 8i 2 A
hi 2 N0, 8i 2 A
subject to:
Integer Nonlinear Problem
Non-convex constraints
DECISION VARIABLES
hi — concurrency level
ri — virtual machines
si
M — Map slots
si
R — Reduce slots
Continuous relaxation
hi à ( 𝜓i)-1
Slide 4
Centralized Problem
POLITECNICO DI MILANO
Continuous Nonlinear Problem
Need to centralize information
Convex constraints
KKT conditions are necessary and
sufficient for optimality
DECISION VARIABLES
𝜓i — reciprocal concurrency level
ri — virtual machines
si
M — Map slots
si
R — Reduce slots
Slide 5
min
r, ,sM ,sR
NX
i=1
¯⇢ri +
NX
i=1
(↵i i i)
subject to:
NX
i=1
ri  R
low
i  i  up
i , 8i 2 A
Ai
sM
i i
+
Bi
sR
i i
+ Ei  0, 8i 2 A
sM
i
cM
i
+
sR
i
cR
i
 ri, 8i 2 A
ri 2 R+, 8i 2 A
sM
i 2 R+, 8i 2 A
sR
i 2 R+, 8i 2 A
i 2 R+, 8i 2 A
Optimal Configuration Formulae
POLITECNICO DI MILANO
In the optimal configuration it holds:
sM
i =
cM
i
1 +
r
Bi
Ai
cM
i
cR
i
ri, 8i 2 A
sR
i =
cR
i
1 +
r
Ai
Bi
cR
i
cM
i
ri, 8i 2 A
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
r 1
i , 8i 2 A
PROPOSITION
rlow
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
up
i
, 8i 2 A
rup
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
low
i
, 8i 2 A
Exact bounds on resources
requirement
Full knowledge of the
problem parameters
Slide 6
¯⇢  ⇢a
i  ⇢up
i
low
i  i  up
i
Ai
sM
i i
+
Bi
sR
i i
+ Ei  0
sM
i
cM
i
+
sR
i
cR
i
 ri
sM
i 2 R+
sR
i 2 R+
i 2 R+
⇢a
i 2 R+
min
i,⇢a
i ,sM
i ,sR
i
↵i i i
subject to:
Application Masters Problems
POLITECNICO DI MILANO
One instance per AM
Continuous Nonlinear Problem
Only application-specific
information
DECISION VARIABLES
𝜓i — reciprocal concurrency level
𝜌i
a — bid for VMs
si
M — Map slots
si
R — Reduce slots
Slide 7
max
r,y,⇢
NX
i=1
(⇢ ¯⇢) ˜ri
NX
i=1
pi (rup
i ri)
subject to:
NX
i=1
ri  R
ri rlow
i , 8i 2 A
ri  rup
i rlow
i yi + rlow
i , 8i 2 A
¯⇢  ⇢  ˜⇢
⇢ ⇢a
i  M (1 yi) , 8i 2 A
⇢a
i ⇢  Myi, 8i 2 A
ri 2 R+, 8i 2 A
yi 2 {0, 1}, 8i 2 A
⇢ 2 R+
Resource Manager Problem
POLITECNICO DI MILANO
DECISION VARIABLES
𝜌 — price of VMs
ri — virtual machines
yi — AM i offers more than price
One instance for the whole cluster
Mixed Integer Nonlinear Problem
Takes care of resource
management only
Two alternatives:
and
˜ri = ri
˜ri = ri rlow
i
Slide 8
Generalized Nash Equilibrium Problems
POLITECNICO DI MILANO
Not jointly convex
Lack of theoretical guarantees
NEP
JC-NEP
GNEP
Slide 9
A set of players, N, each with utility function Θi
Solution concept:
¯x 2 X is equilibrium , 8i 2 N, 8xi 2 Xi (¯x i) , ⇥i (¯x)  ⇥i (xi, ¯x i)
Feasible set of player i: Xi = Xi (x i)
Optimal Configuration Formulae
POLITECNICO DI MILANO
sM
i =
cM
i
1 +
r
Bi
Ai
cM
i
cR
i
ri
sR
i =
cR
i
1 +
r
Ai
Bi
cR
i
cM
i
ri
i =
⇣q
Ai
cM
i
+
q
Bi
cR
i
⌘2
Ei
r 1
i
The optimal configuration for every AM, given an amount of resources ri, is given
by the following relations:
PROPOSITION
Each AM problem reduces to a
quick algebraic update
Local knowledge of application-
specific parameters
Slide 10
Best Reply Algorithm
POLITECNICO DI MILANO
Each iteration is performed in
parallel by the RM and AMs
Continuous equilibrium
configuration
1: ri rlow
i , 8i 2 A
2: sM
i sM, low
i , 8i 2 A
3: sR
i sR, low
i , 8i 2 A
4: i
up
i , 8i 2 A
5: ⇢a
i ¯⇢, 8i 2 A
6: repeat
7: rold
i ri, 8i 2 A
8: solve RM problem
9: for all i 2 A do
10: solve AM i problem
11: if i > low
i then
12: ⇢a
i max {⇢a
i , ⇢} + ⇢up
i
13: end if
14: end for
15: "
PN
i=1
|ri rold
i |
rold
i
16: until " < ¯"
Slide 11
Experimental Overview
POLITECNICO DI MILANO
PRELIMINARY ANALYSIS
Ensure the model behavior is consistent with intuition when applied to
simple problems
SCALABILITY ANALYSIS
Verify the feasibility of solving problem instances at a realistic scale in
production environments
STOPPING CRITERION TOLERANCE ANALYSIS
Check the sensitivity of the distributed algorithm with respect to the
tolerance on the relative increment
VALIDATION WITH YARN SLS
Compare model solutions and timings measured on the official simulator
Slide 12
Preliminary Analysis
POLITECNICO DI MILANO
100 AMs 1,000 AMs
˜ri = ri rlow
iDecreasing cluster capacity experiment,
Slide 13
Alternative Virtual Gain Terms
POLITECNICO DI MILANO
˜ri = ri rlow
i ˜ri = ri
Increasing concurrency level experiment, 10 AMs
Slide 14
Scalability Analysis
POLITECNICO DI MILANO
Execution time [s]
˜ri = ri rlow
i ˜ri = ri
Slide 15
Stopping Criterion Tolerance Analysis
POLITECNICO DI MILANO
Relative error with respect to centralized solutions
Slide 16
Validation with YARN SLS
POLITECNICO DI MILANO
Relative error absolute values average: 16.999 %
Users R Di [s] S [s] ⌘ [%]
(20, 10, 10) 256 3090 2487.54 24.2191
(20, 10, 10) 512 1694 1142.3 48.2977
(20, 10, 20) 256 3380 2948.78 14.6237
(20, 10, 20) 512 1835 1299.59 41.1987
(20, 15, 10) 256 3390 2849.85 18.9536
(20, 15, 10) 512 1845 1409.88 30.8625
(20, 20, 10) 256 3700 3339.28 10.8023
(20, 20, 10) 512 1995 1512 31.9444
(20, 20, 15) 256 3845 3694.03 4.08677
(20, 20, 15) 512 2063 1745.9 18.1626
(20, 20, 20) 256 3987 4041.44 -1.34704
(20, 20, 20) 512 2140 1877.4 13.9874
(25, 15, 25) 256 4300 3893.71 10.4346
(25, 15, 25) 512 2290 1773.53 29.1213
(25, 25, 25) 512 2597 2653.64 -2.13455
Users R Di [s] S [s] ⌘ [%]
(10, 15, 20) 256 2740 2767.58 -0.996658
(10, 15, 20) 512 1508 1447.12 4.20698
(10, 20, 15) 256 2900 2708.3 7.07837
(10, 20, 15) 512 1589 1379.58 15.18
(15, 10, 10) 256 2575 2257.81 14.0487
(15, 10, 10) 512 1456 980.087 48.5583
(15, 15, 15) 256 3070 3183.02 -3.55061
(15, 15, 15) 512 1678 1456.93 15.174
(15, 15, 20) 256 3210 3088.88 3.92116
(15, 15, 20) 512 1748 1547.92 12.9255
(15, 20, 10) 256 3220 3070.87 4.85639
(15, 20, 10) 512 1755 1328.58 32.0956
(15, 20, 20) 256 3512 3951.58 -11.1242
(15, 20, 20) 512 1899 1574.1 20.6404
(15, 25, 10) 256 3520 3276.66 7.42646
(15, 25, 10) 512 1906 1524.83 24.9975
Slide 17
Conclusions and Future Work
POLITECNICO DI MILANO
The distributed approach yields accurate approximations of optimal
solutions, even without theoretical guarantees
Extend the formulation to Apache Tez and Spark (based on DAGs)
Couple the proposed algorithm with a local search method based on Colored
Petri Nets simulations
Slide 18
Thanks for your attention…
POLITECNICO DI MILANO
…any questions?
Slide 19
Solution Rounding Algorithm
POLITECNICO DI MILANO
1: sort A according to increasing ↵i
2: ri dˆrie , 8i 2 A
3: for all j 2 A do
4: if
PN
i=1 ri > R then
5: rj rj 1
6: end if
7: end for
8: sM
i
⌃
ˆsM
i
⌥
, 8i 2 A
9: sR
i
⌃
ˆsR
i
⌥
, 8i 2 A
10: for all j 2 A do
11: while sM
j /cM
j + sR
j /cR
j > rj do
12: sR
j sR
j 1
13: if sM
j /cM
j + sR
j /cR
j > rj then
14: sM
j sM
j 1
15: end if
16: end while
17: end for
The RM runs an O(N) loop
Each AM runs an O(1) loop,
concurrently
Sorting is O(N log N), but can be
done once and cached
Slide 20
Preliminary Analysis
POLITECNICO DI MILANO
˜ri = ri rlow
iDecreasing deadlines experiment,
100 AMs 1,000 AMs
Slide 21
Scalability Analysis
POLITECNICO DI MILANO
Total cost and penalties
˜ri = ri rlow
i ˜ri = ri
Slide 22
Obtained Concurrency Levels
POLITECNICO DI MILANO
Centralized model Closed form model
Decreasing cluster capacity experiment
Slide 23

More Related Content

PPTX
Num Integration
PPTX
Lecture 16
PPTX
DC MACHINE WINDINGS
ODP
parameterized complexity for graph Motif
PDF
คู่มือการใช้ Casiofx5800 p surveyingprograms
PPT
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
PDF
Escola naval 2016
PDF
H2O World - Generalized Low Rank Models - Madeleine Udell
Num Integration
Lecture 16
DC MACHINE WINDINGS
parameterized complexity for graph Motif
คู่มือการใช้ Casiofx5800 p surveyingprograms
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Escola naval 2016
H2O World - Generalized Low Rank Models - Madeleine Udell

What's hot (12)

PPT
4.5 sec and csc worked 3rd
PPTX
Time series analysis 101
PDF
Intel 8085 largest number in a data array
PPTX
Presentation 2(power point presentation) dis2016
PDF
Lesson 1 Feb 10 2010
DOCX
Đề Thi HK2 Toán 6 - THCS Cầu Kiều
PDF
Intel 8085 - Smallest number in a data array
PDF
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
DOC
Javier dominguez 20800945 actividad 1_estructuras discretas
PPTX
April 1, 2014
PDF
Invited presentation at SIAM PP 18: Communication Hiding Through Pipelining i...
PDF
Figures
4.5 sec and csc worked 3rd
Time series analysis 101
Intel 8085 largest number in a data array
Presentation 2(power point presentation) dis2016
Lesson 1 Feb 10 2010
Đề Thi HK2 Toán 6 - THCS Cầu Kiều
Intel 8085 - Smallest number in a data array
Tall-and-skinny Matrix Computations in MapReduce (ICME colloquium)
Javier dominguez 20800945 actividad 1_estructuras discretas
April 1, 2014
Invited presentation at SIAM PP 18: Communication Hiding Through Pipelining i...
Figures
Ad

Similar to A game theoretic approach for runtime capacity allocation in map-reduce (WACC2017) (20)

PDF
An accurate retrieval through R-MAC+ descriptors for landmark recognition
PDF
Model-counting Approaches For Nonlinear Numerical Constraints
PDF
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
PDF
Co-Learning: Consensus-based Learning for Multi-Agent Systems
PDF
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
PPT
Project seminar ppt_steelcasting
PDF
Updating PageRank for Streaming Graphs
PPT
data unit notes from department of computer science
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
daa_unit THIS IS GNDFJG SDGSGS SFDF .ppt
PDF
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
PDF
MapReduce Tall-and-skinny QR and applications
PPT
Prof. Sameh Saad - A Digital Model for 3D Characterization of Groundwater Qua...
PDF
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
PPTX
A hybrid sine cosine optimization algorithm for solving global optimization p...
PDF
Hand gesture recognition using discrete wavelet transform and hidden Markov m...
PPT
daaadafrhdncxfbfbgdngfmfhmhagshh_unit_i.ppt
An accurate retrieval through R-MAC+ descriptors for landmark recognition
Model-counting Approaches For Nonlinear Numerical Constraints
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Co-Learning: Consensus-based Learning for Multi-Agent Systems
SSII2018企画: センシングデバイスの多様化と空間モデリングの未来
Project seminar ppt_steelcasting
Updating PageRank for Streaming Graphs
data unit notes from department of computer science
FV_IGARSS11.ppt
FV_IGARSS11.ppt
FV_IGARSS11.ppt
FV_IGARSS11.ppt
daa_unit THIS IS GNDFJG SDGSGS SFDF .ppt
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
MapReduce Tall-and-skinny QR and applications
Prof. Sameh Saad - A Digital Model for 3D Characterization of Groundwater Qua...
Robust PID Controller Design for Non-Minimum Phase Systems using Magnitude Op...
A hybrid sine cosine optimization algorithm for solving global optimization p...
Hand gesture recognition using discrete wavelet transform and hidden Markov m...
daaadafrhdncxfbfbgdngfmfhmhagshh_unit_i.ppt
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)

A game theoretic approach for runtime capacity allocation in map-reduce (WACC2017)

  • 1. A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce Eugenio Gianniti *, Danilo Ardagna *, Michele Ciavotta *, Mauro Passacantando * * *Politecnico di Milano **Università di Pisa POLITECNICO DI MILANOSlide 1
  • 2. Goals POLITECNICO DI MILANO Cost-effective deployment configuration for MapReduce systems hosted on private Clouds Efficient solution algorithm to support run-time cluster management Distributed approach leveraging on local knowledge of problem parameters Slide 2
  • 3. Reference System POLITECNICO DI MILANO Hadoop YARN Slide 3 MM R R M
  • 4. Preliminary Formulation POLITECNICO DI MILANO min r,h,sM ,sR NX i=1 ¯⇢ri + NX i=1 Pi (hi) NX i=1 ri  R Hlow i  hi  Hup i , 8i 2 A Aihi sM i + Bihi sR i + Ei  0, 8i 2 A sM i cM i + sR i cR i  ri, 8i 2 A ri 2 N0, 8i 2 A sM i 2 N0, 8i 2 A sR i 2 N0, 8i 2 A hi 2 N0, 8i 2 A subject to: Integer Nonlinear Problem Non-convex constraints DECISION VARIABLES hi — concurrency level ri — virtual machines si M — Map slots si R — Reduce slots Continuous relaxation hi à ( 𝜓i)-1 Slide 4
  • 5. Centralized Problem POLITECNICO DI MILANO Continuous Nonlinear Problem Need to centralize information Convex constraints KKT conditions are necessary and sufficient for optimality DECISION VARIABLES 𝜓i — reciprocal concurrency level ri — virtual machines si M — Map slots si R — Reduce slots Slide 5 min r, ,sM ,sR NX i=1 ¯⇢ri + NX i=1 (↵i i i) subject to: NX i=1 ri  R low i  i  up i , 8i 2 A Ai sM i i + Bi sR i i + Ei  0, 8i 2 A sM i cM i + sR i cR i  ri, 8i 2 A ri 2 R+, 8i 2 A sM i 2 R+, 8i 2 A sR i 2 R+, 8i 2 A i 2 R+, 8i 2 A
  • 6. Optimal Configuration Formulae POLITECNICO DI MILANO In the optimal configuration it holds: sM i = cM i 1 + r Bi Ai cM i cR i ri, 8i 2 A sR i = cR i 1 + r Ai Bi cR i cM i ri, 8i 2 A i = ⇣q Ai cM i + q Bi cR i ⌘2 Ei r 1 i , 8i 2 A PROPOSITION rlow i = ⇣q Ai cM i + q Bi cR i ⌘2 Ei up i , 8i 2 A rup i = ⇣q Ai cM i + q Bi cR i ⌘2 Ei low i , 8i 2 A Exact bounds on resources requirement Full knowledge of the problem parameters Slide 6
  • 7. ¯⇢  ⇢a i  ⇢up i low i  i  up i Ai sM i i + Bi sR i i + Ei  0 sM i cM i + sR i cR i  ri sM i 2 R+ sR i 2 R+ i 2 R+ ⇢a i 2 R+ min i,⇢a i ,sM i ,sR i ↵i i i subject to: Application Masters Problems POLITECNICO DI MILANO One instance per AM Continuous Nonlinear Problem Only application-specific information DECISION VARIABLES 𝜓i — reciprocal concurrency level 𝜌i a — bid for VMs si M — Map slots si R — Reduce slots Slide 7
  • 8. max r,y,⇢ NX i=1 (⇢ ¯⇢) ˜ri NX i=1 pi (rup i ri) subject to: NX i=1 ri  R ri rlow i , 8i 2 A ri  rup i rlow i yi + rlow i , 8i 2 A ¯⇢  ⇢  ˜⇢ ⇢ ⇢a i  M (1 yi) , 8i 2 A ⇢a i ⇢  Myi, 8i 2 A ri 2 R+, 8i 2 A yi 2 {0, 1}, 8i 2 A ⇢ 2 R+ Resource Manager Problem POLITECNICO DI MILANO DECISION VARIABLES 𝜌 — price of VMs ri — virtual machines yi — AM i offers more than price One instance for the whole cluster Mixed Integer Nonlinear Problem Takes care of resource management only Two alternatives: and ˜ri = ri ˜ri = ri rlow i Slide 8
  • 9. Generalized Nash Equilibrium Problems POLITECNICO DI MILANO Not jointly convex Lack of theoretical guarantees NEP JC-NEP GNEP Slide 9 A set of players, N, each with utility function Θi Solution concept: ¯x 2 X is equilibrium , 8i 2 N, 8xi 2 Xi (¯x i) , ⇥i (¯x)  ⇥i (xi, ¯x i) Feasible set of player i: Xi = Xi (x i)
  • 10. Optimal Configuration Formulae POLITECNICO DI MILANO sM i = cM i 1 + r Bi Ai cM i cR i ri sR i = cR i 1 + r Ai Bi cR i cM i ri i = ⇣q Ai cM i + q Bi cR i ⌘2 Ei r 1 i The optimal configuration for every AM, given an amount of resources ri, is given by the following relations: PROPOSITION Each AM problem reduces to a quick algebraic update Local knowledge of application- specific parameters Slide 10
  • 11. Best Reply Algorithm POLITECNICO DI MILANO Each iteration is performed in parallel by the RM and AMs Continuous equilibrium configuration 1: ri rlow i , 8i 2 A 2: sM i sM, low i , 8i 2 A 3: sR i sR, low i , 8i 2 A 4: i up i , 8i 2 A 5: ⇢a i ¯⇢, 8i 2 A 6: repeat 7: rold i ri, 8i 2 A 8: solve RM problem 9: for all i 2 A do 10: solve AM i problem 11: if i > low i then 12: ⇢a i max {⇢a i , ⇢} + ⇢up i 13: end if 14: end for 15: " PN i=1 |ri rold i | rold i 16: until " < ¯" Slide 11
  • 12. Experimental Overview POLITECNICO DI MILANO PRELIMINARY ANALYSIS Ensure the model behavior is consistent with intuition when applied to simple problems SCALABILITY ANALYSIS Verify the feasibility of solving problem instances at a realistic scale in production environments STOPPING CRITERION TOLERANCE ANALYSIS Check the sensitivity of the distributed algorithm with respect to the tolerance on the relative increment VALIDATION WITH YARN SLS Compare model solutions and timings measured on the official simulator Slide 12
  • 13. Preliminary Analysis POLITECNICO DI MILANO 100 AMs 1,000 AMs ˜ri = ri rlow iDecreasing cluster capacity experiment, Slide 13
  • 14. Alternative Virtual Gain Terms POLITECNICO DI MILANO ˜ri = ri rlow i ˜ri = ri Increasing concurrency level experiment, 10 AMs Slide 14
  • 15. Scalability Analysis POLITECNICO DI MILANO Execution time [s] ˜ri = ri rlow i ˜ri = ri Slide 15
  • 16. Stopping Criterion Tolerance Analysis POLITECNICO DI MILANO Relative error with respect to centralized solutions Slide 16
  • 17. Validation with YARN SLS POLITECNICO DI MILANO Relative error absolute values average: 16.999 % Users R Di [s] S [s] ⌘ [%] (20, 10, 10) 256 3090 2487.54 24.2191 (20, 10, 10) 512 1694 1142.3 48.2977 (20, 10, 20) 256 3380 2948.78 14.6237 (20, 10, 20) 512 1835 1299.59 41.1987 (20, 15, 10) 256 3390 2849.85 18.9536 (20, 15, 10) 512 1845 1409.88 30.8625 (20, 20, 10) 256 3700 3339.28 10.8023 (20, 20, 10) 512 1995 1512 31.9444 (20, 20, 15) 256 3845 3694.03 4.08677 (20, 20, 15) 512 2063 1745.9 18.1626 (20, 20, 20) 256 3987 4041.44 -1.34704 (20, 20, 20) 512 2140 1877.4 13.9874 (25, 15, 25) 256 4300 3893.71 10.4346 (25, 15, 25) 512 2290 1773.53 29.1213 (25, 25, 25) 512 2597 2653.64 -2.13455 Users R Di [s] S [s] ⌘ [%] (10, 15, 20) 256 2740 2767.58 -0.996658 (10, 15, 20) 512 1508 1447.12 4.20698 (10, 20, 15) 256 2900 2708.3 7.07837 (10, 20, 15) 512 1589 1379.58 15.18 (15, 10, 10) 256 2575 2257.81 14.0487 (15, 10, 10) 512 1456 980.087 48.5583 (15, 15, 15) 256 3070 3183.02 -3.55061 (15, 15, 15) 512 1678 1456.93 15.174 (15, 15, 20) 256 3210 3088.88 3.92116 (15, 15, 20) 512 1748 1547.92 12.9255 (15, 20, 10) 256 3220 3070.87 4.85639 (15, 20, 10) 512 1755 1328.58 32.0956 (15, 20, 20) 256 3512 3951.58 -11.1242 (15, 20, 20) 512 1899 1574.1 20.6404 (15, 25, 10) 256 3520 3276.66 7.42646 (15, 25, 10) 512 1906 1524.83 24.9975 Slide 17
  • 18. Conclusions and Future Work POLITECNICO DI MILANO The distributed approach yields accurate approximations of optimal solutions, even without theoretical guarantees Extend the formulation to Apache Tez and Spark (based on DAGs) Couple the proposed algorithm with a local search method based on Colored Petri Nets simulations Slide 18
  • 19. Thanks for your attention… POLITECNICO DI MILANO …any questions? Slide 19
  • 20. Solution Rounding Algorithm POLITECNICO DI MILANO 1: sort A according to increasing ↵i 2: ri dˆrie , 8i 2 A 3: for all j 2 A do 4: if PN i=1 ri > R then 5: rj rj 1 6: end if 7: end for 8: sM i ⌃ ˆsM i ⌥ , 8i 2 A 9: sR i ⌃ ˆsR i ⌥ , 8i 2 A 10: for all j 2 A do 11: while sM j /cM j + sR j /cR j > rj do 12: sR j sR j 1 13: if sM j /cM j + sR j /cR j > rj then 14: sM j sM j 1 15: end if 16: end while 17: end for The RM runs an O(N) loop Each AM runs an O(1) loop, concurrently Sorting is O(N log N), but can be done once and cached Slide 20
  • 21. Preliminary Analysis POLITECNICO DI MILANO ˜ri = ri rlow iDecreasing deadlines experiment, 100 AMs 1,000 AMs Slide 21
  • 22. Scalability Analysis POLITECNICO DI MILANO Total cost and penalties ˜ri = ri rlow i ˜ri = ri Slide 22
  • 23. Obtained Concurrency Levels POLITECNICO DI MILANO Centralized model Closed form model Decreasing cluster capacity experiment Slide 23