Predicting Optimal Parallelism for Data Analytics

Predicting Optimal
Parallelism for Data
Analytics
Rathijit Sen, Vishal Rohra

Agenda
▪ Overview
▪ AutoDOP
▪ AutoToken
▪ TASQ (AutoToken_vNext)
▪ AutoExecutor
▪ Summary

Resource Provisioning in the Cloud
• Focus: Automatically predict Optimal Parallelism for jobs
• Allow flexibility in selecting optimal point for cost-efficient performance
• Enable optimal resource provisioning
Users
dynamic,
fine-grained
provisioning
for jobs
Providers
Provision
cluster
capacities
How much resources does a job actually need?

General Approach
• Prediction of job run time or peak parallelism :
Peak Parallelism = f (query characteristics) [lowest run time]
Run time = f (query characteristics, #parallelism)
• Query characteristics: compile/optimization time-
properties and estimates
• Learn f using Machine Learning models on past executions

Case Studies
Performance Characteristic Curve (PCC)
Run
Time
Parallelism
Study Platform Num Nodes Prediction
AutoDOP SQL Server Single Run Time
AutoToken Cosmos Multiple Peak Parallelism
AutoToken_vNext /
TASQ
Cosmos Multiple
Run Time, PCC
(Strictly Monotonic)
AutoExecutor Spark Multiple PCC (Monotonic)

AutoDOP
Zhiwei Fan, RathijitSen, Paris Koutris, Aws Albarghouthi,“Automated Tuning of Query Degree of Parallelism via
MachineLearning”, aiDM@SIGMOD, 2020
Zhiwei Fan, RathijitSen, Paris Koutris, Aws Albarghouthi,“A ComparativeExplorationof ML Techniques for Tuning
Query Degree of Parallelism”, arXiv, 2020

Context
• Platform: SQL Server, single node
• Degree Of Parallelism (DOP)
• Maximum number of threads that can be active at any time for query execution
• Per-query selection
• Impact of DOP for running a query:
• Query Performance and Cost
• Resource Utilization of Multicore Servers
• Resource Provisioning in Cloud-Computing Platforms

Dependence on query characteristics
TPC-DS1000 Example Queries
Well-Parallelizable Queries
Other Queries

Dependence on data size (scale factor)
• The average and median shift towards larger DOP values
as the scale factor/dataset size increases
• More variation in TPC-DS
compared to TPC-H due to
the larger variety of query
templates in TPC-DS
• No workload has a single
per-query optimal DOP value

Approach
• Goal: predict optimal DOP
• ML model type: Regression, not Classification
• More flexibility in choosing optimal point for cost vs performance tradeoffs
ML Model
Random Forest
…
• Query plan operators
• Number of tuples
(cardinality), other
compile/optimization-
time estimates
Run time
DOP +

Example results
• AutoDOP is closer to optimal (oracle selection) than static DOP selection policies
• ML: each query at predicted-optimal DOP
given by ML model
• Query-Optimal: each query at Optimal DOP
(oracle selection)
• Workload-Optimal: all queries at optimal
DOP for overall workload (oracle selection)
• 40: each query at DOP 40
• 80: each query at DOP 40
• Speedup over DOP 64 (default DOP)
TPC-DS1000 Queries (subset)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Test 1 Test 2
Speedup
ML Query-Optimal Workload-Optimal 40 80

AutoToken
RathijitSen, Alekh Jindal,Hiren Patel, Shi Qiao, “AutoToken:Predicting Peak Parallelismfor Big Data Analyticsat
Microsoft”, VLDB, 2020

Context
• Platform: Exabyte-scale Big Data analytics platform for SCOPE
queries
• Token: unit of resource allocation
• Per-job allocation
• Guaranteed and spare tokens
• Impact of number of tokens for running a job:
• Query performance and cost
• Resource utilization and provisioning

Peak Parallelism / Peak Resource Provisioning
• How many guaranteed tokens to request for the job?
• Depends on peak parallelism
• More tokens: unnecessary wait time, unused guaranteed tokens
• Less tokens: loss of performance or predictable performance
• Possible options:
• Default value
• User guesstimate
• Default VC percentage

Approach
• Automatically eliminate over-allocations for recurring jobs
• Ideally, no performance impact
• Use ML models to learn peak tokens from past behavior
• Simple models per job group (signature)
Default Allocation
Over-
allocation
Resources
Ideal
Allocation
AutoToken

Results
• Overall prediction accuracy:
• Median error: 0
• 90th percentile error  50%
• Coverage: 10.7%—28.1%
• #Jobs:
• Total: approx. 8.8M
• 0.8—2.4M training
• 162—528K testing
RequestedTokens/ActualPeak
Cumulative
Percentage

Resource Allocation Policies
Peak Allocation
Resources
Resources
TightAllocation
AutoToken
(only recurring jobs)
TASQ

TASQ
Anish Pimpley, Shuo Li, AnubhaSrivastava, Vishal Rohra, Yi Zhu, Soundarajan Srinivasan,Alekh Jindal,Hiren Patel, Shi
Qiao, Rathijit Sen, “OptimalResource Allocationfor Serverless Queries”, [Under Submission]

Why Tight Allocation
• Cost Savings
• Negligible change in performance
• 50% of the jobs can request fewer tokens
• 20% require less than 50% of requested tokens
• 5% performance loss
• 92% of the jobs can request fewer tokens
• 30% require less than 50% of requested tokens
• Reduces job wait times
• Wider resource availability

TASQ’s Approach
Given compile time features of a job
=> Predict Tight Allocation
Observation
• Optimal allocation means different thing for different users, f(cost, time)
• Predicting the relationship between tokens and runtime >> Predicting Tight allocation
• Relationship between tokens and runtime is an exponentially decaying curve,
referred to as performance characteristic curve (PCC)
Parameters (a, b)
for PCC

Challenge: Limited Trend Data
• Historical workloads executed with single token count
• In order to predict PCC, we need data for multiple token counts

Solution: Data Augmentation
• Area Preserving Allocation
Simulator (AREPAS)
• Based on past skylines, generate skylines
for multiple token counts using the simulator.
• Assumptions
• Total computations stay constant
• Total tokens-seconds used stay constant
• Area under skyline stays constant

Modeling the Runtime vs Token relationship
• Need for monotonically non-increasing curve
• User expectation: more resources → faster runtime
• ‘Elbow’ region of the curve usually emerges before parallelism overhead
• How do you enforce that in modeling
• Expect a power-law curve
Runtime t(n) = f (n: TokenAllocation) = b * n -a where a, b > 0
Predict: Scalar parameters ‘a’ and ‘b’

Results
• XGBoost are not designed to enforce monotonicity
• NN and GNN perform better in trend prediction
• NN has comparable performance with lower training time
Model
Pattern
(Non-Increase)
MAE
(Curve Params)
Median AE
(Run-Time)
XGBoost SS 32% NA 53%
XGBoost PL 93% 0.202 52%
NN 100% 0.163 39%
GNN 100% 0.168 33%

User Interface
• Workflow
• Submit the job script
• Graph generated at compile time
• Two options
• Visualize the Runtime vs Token Predictions
• Get an optimal token count
• Advantages
• Informed decision
• For all jobs
• Before job execution

Integration
• Bullet 1
• Sub-bullet
• Sub-bullet
• Bullet 2
• Sub-bullet
• Sub-bullet

AutoExecutor
RathijitSen, Abhishek Roy, Alekh Jindal,Rui Fang, Jeff Zheng, XiaoleiLiu, Ruiping Li, “AutoExecutor: Predictive
Parallelism for Spark SQL Queries”, [Under Submission]

Context
• Platform: Spark, Azure Synapse
• Executors: processes on worker nodes
• Each executor can use a certain number or cores and amount of memory
• Impact of number of executors for running a query:
• Query performance and cost
• Resource utilization and provisioning

Modeling Approach
• Reuse and extend TASQ PCC model
• Power-law curve with lower bound
• Run time t(n) with executor count n:
t(n) = max(b*na , m)
• a, b, m: parameters
ML Model
• Count of operators
• Input Cardinality
• Avg. Row length
• …
PCC model parameters
Random Forest
…
t(n) = b*na
t(n) = m

Example predictions
• Sparklens: predict after one execution of the query
• AutoExecutor: predict before execution of the query

Error distributions (different templates, SF=100)
• Most prediction errors at small number of executors
S: Sparklens
AE: AutoExecutor
F1..F10:
• ten-fold cross
validation
• 80% queries in
training set
• 20% in test set

System Architecture
Feature Extraction
Model Training
Extensions
Workload
Table
Anonymized
Plans, Metrics
Executor Events
Telemetry Pipeline
AutoExecutor
Workload Analysis
Peregrine Events
PCC

Automatic selection of optimal parallelism
• Capability and Approach:
• Enable selection of optimal operating point with respect to optimization objective
• ML models to predict run time/peak parallelism using query characteristics
• Challenges:
• Modeling PCC characteristics
• AutoDOP: Point-wise
• TASQ: Point-wise, Power-law function
• AutoExecutor: Power-law + constant function
• Collecting training data
• TASQ: AREPAS
• AutoExecutor: Sparklens
Could we have other
models for PCC?
How would you simulate
for other parameter
changes?

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Predicting Optimal Parallelism for Data Analytics

More Related Content

What's hot (20)

Similar to Predicting Optimal Parallelism for Data Analytics (20)

More from Databricks (20)

Recently uploaded (20)

Predicting Optimal Parallelism for Data Analytics