SlideShare a Scribd company logo
Co-funded by the European Commission
Horizon 2020 - Grant #777154
Gray-Box Models for Performance
Assessment of Spark Applications
atmosphere-eubrazil.eu @AtmosphereEUBR
M. Lattuada, E. Gianniti, M. Hosseini, D. Ardagna,
A. Maros, F. Murai, A. P. Couto da Silva and J.M. Almeida
Politecnico di Milano, Italy
Universidade Federal de Minas Gerais, Brazil
● Develop models for:
○ Identify minimum cost configuration with a priori deadlines
○ Allow the adaptive actuation mechanisms to predict if QoS objective will be
reached
○ Assessing (a posteriori) the main performance metrics in multi-tenancy
environments
○ Evaluate if an application run was affected by resource contention
● Approach:
○ Gray box models based on Machine Learning (ML)
○ Open source ML library
○ Open source benchmarks
2
Contextualizing Performance Models
The adoption of accurate models allows anticipating QoS
violations and increasing cloud services trustworthiness by
improving their performance
• Regression Models: l1-regularized Linear Regression, Neural Network,
Decision Tree, Random Forests, and Support Vector Regression
• Hyper-parameters optimization
3
ML Models overview
• Workloads:
• TPC-DS - the industry benchmark (Query 26) for data warehouse systems
• ML benchmarks (K-means) from the Sparkbench library
• SparkDL developed on top of Sparkbench
• Platforms:
• Microsoft HDInsights on Azure - cloud computing service
• IBM Power8 Cluster – dedicated cluster
4
ML Experiments Settings
• Sampling scheme analysis
• Evaluation Metric:
• Mean Absolute Percentage Error:
5
ML Experiments Settings
• Ernest Model by Spark inventors
• Pure black-box approach:
• Non Negative Least Square Regression (NNLS)
• Features:
• Ratio of data size to number of cores
• Log of number of cores
• Data size
• Number of cores
• Number of TensorFlow cores (SparkDL only)
6
Comparison with State of the Art Solutions
S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica. “Ernest: Efficient Performance
Prediction for Large-Scale Advanced Analytics”. In: 13th USENIX Symposium on Networked
Systems Design and Implementation. 2016, pp. 363–378.
7
Gray-Box Model Features
DAG structure constant across different runs
8
Query 26 Results
Interpolation Scenario Extrapolation Scenario
9
K-Means & SparkDL Results
Interpolation Scenario Extrapolation Scenario
• Gray-box models are effective for performance assessment to identify
performance degradation (about 4-25% percentage error)
• Work better when the application data size is fixed
• Comparison with Ernest:in most cases, our best models improve Ernest
considerably, especially when few profiling configurations are available
in the training set and when workloads are less regular
10
ML Models Results Summary
• There is no ML technique which always outperforms the others, hence
different techniques have to be evaluated in each scenario to choose
the best model
• Study the performance of Spark applications running on GPU- based
clusters
• Validate the models on production environments
11
Conclusions & Future work
12
Thanks for your attention…

More Related Content

PPT
Stochastic kronecker graphs
PDF
A modeling approach for cloud infrastructure planning considering dependabili...
PPT
Intel Faster Risk Oct08 - Vassil Alexandrov
PDF
Graph based Clustering
PPTX
AI at Scale for Materials and Chemistry
DOCX
Optimal route queries with arbitrary order constraints
Stochastic kronecker graphs
A modeling approach for cloud infrastructure planning considering dependabili...
Intel Faster Risk Oct08 - Vassil Alexandrov
Graph based Clustering
AI at Scale for Materials and Chemistry
Optimal route queries with arbitrary order constraints

What's hot (17)

PDF
Dotnet modeling and optimizing the performance- security tradeoff on d-ncs u...
PPTX
Linear regression on 1 terabytes of data? Some crazy observations and actions
PPT
Berlin 6 Open Access Conference: Patrick Vandewalle
PPTX
Data Science with Azure Machine Learning and  R
PPTX
Scaling Deep Learning Models for Large Spatial Time-Series Forecasting
DOCX
Density maximization for improving graph matching with its applications
DOCX
SUBSPACE LEARNING AND IMPUTATION FOR STREAMING BIG DATA MATRICES AND TENSORS
PPT
environmental scivis via dynamic and thematc mapping
PPT
Clustering (from Google)
DOCX
Salient object detection with higher order potentials and learning affinity
PPTX
Seminar_Presentation_ppt
PPTX
Neural Network Presentation
DOCX
Design of Power and Area Efficient Approximate Multipliers
PDF
Indexing data on the web a comparison of schema level indices for data search
PDF
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
Dotnet modeling and optimizing the performance- security tradeoff on d-ncs u...
Linear regression on 1 terabytes of data? Some crazy observations and actions
Berlin 6 Open Access Conference: Patrick Vandewalle
Data Science with Azure Machine Learning and  R
Scaling Deep Learning Models for Large Spatial Time-Series Forecasting
Density maximization for improving graph matching with its applications
SUBSPACE LEARNING AND IMPUTATION FOR STREAMING BIG DATA MATRICES AND TENSORS
environmental scivis via dynamic and thematc mapping
Clustering (from Google)
Salient object detection with higher order potentials and learning affinity
Seminar_Presentation_ppt
Neural Network Presentation
Design of Power and Area Efficient Approximate Multipliers
Indexing data on the web a comparison of schema level indices for data search
SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION
Ad

Similar to Gray-Box Models for Performance Assessment of Spark Applications (20)

PPTX
Open, Secure & Transparent AI Pipelines
PDF
Data ops: Machine Learning in production
PDF
Foundations for Scaling ML in Apache Spark
PDF
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
PDF
Machine Learning by Example - Apache Spark
PPTX
AdClickFraud_Bigdata-Apic-Ist-2019
PPTX
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
PPTX
Apache Spark MLlib
PDF
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
PDF
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
PDF
Machine learning for IoT - unpacking the blackbox
PDF
Machine learning at scale challenges and solutions
PDF
MLlib sparkmeetup_8_6_13_final_reduced
PDF
Spark Under the Hood - Meetup @ Data Science London
PPTX
Combining Machine Learning frameworks with Apache Spark
PDF
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
PPTX
Combining Machine Learning Frameworks with Apache Spark
PDF
Pretzel: optimized Machine Learning framework for low-latency and high throu...
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
PPTX
Time Series Anomaly Detection with Azure and .NETT
Open, Secure & Transparent AI Pipelines
Data ops: Machine Learning in production
Foundations for Scaling ML in Apache Spark
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Machine Learning by Example - Apache Spark
AdClickFraud_Bigdata-Apic-Ist-2019
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Apache Spark MLlib
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Machine learning for IoT - unpacking the blackbox
Machine learning at scale challenges and solutions
MLlib sparkmeetup_8_6_13_final_reduced
Spark Under the Hood - Meetup @ Data Science London
Combining Machine Learning frameworks with Apache Spark
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
Combining Machine Learning Frameworks with Apache Spark
Pretzel: optimized Machine Learning framework for low-latency and high throu...
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Time Series Anomaly Detection with Azure and .NETT
Ad

More from ATMOSPHERE . (20)

PDF
Software Defined Networking in the ATMOSPHERE project
PDF
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
PDF
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...
PDF
Control Plane Data Characterisation for an 5G NFV Environment
PDF
Designing an Open IoT Ecosystem
PDF
Cloud Robotics: Cognitive Augmentation for Robots via the Cloud
PDF
Artificial Neural Networks for Resource Allocation in 5G Remote Areas
PDF
Compliance of the privacy regulations in an international Europe-Brazil context
PDF
Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...
PDF
Optimization Models for on-demand GPUs in the Cloud
PDF
SBC Thematic Groups Organisation
PDF
Cloud Computing Interest Group
PDF
5G-Range - 5G networks for remote areas
PDF
NECOS Project: Lightweight Slicing of CloudFederated Infrastructures
PDF
SWAMP: Smart Water Management Platform
PDF
OCARIoT - Smart Childhood Obesity Caring Solution using IoT Potential
PDF
ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...
PDF
Secure containers for trustworthy cloud services: business opportunities
PDF
Integration of the Trustworthiness Assessment with Industry Systems
PDF
Trustworthy cloud services for Medical Imaging Biomarkers
Software Defined Networking in the ATMOSPHERE project
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...
Control Plane Data Characterisation for an 5G NFV Environment
Designing an Open IoT Ecosystem
Cloud Robotics: Cognitive Augmentation for Robots via the Cloud
Artificial Neural Networks for Resource Allocation in 5G Remote Areas
Compliance of the privacy regulations in an international Europe-Brazil context
Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...
Optimization Models for on-demand GPUs in the Cloud
SBC Thematic Groups Organisation
Cloud Computing Interest Group
5G-Range - 5G networks for remote areas
NECOS Project: Lightweight Slicing of CloudFederated Infrastructures
SWAMP: Smart Water Management Platform
OCARIoT - Smart Childhood Obesity Caring Solution using IoT Potential
ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...
Secure containers for trustworthy cloud services: business opportunities
Integration of the Trustworthiness Assessment with Industry Systems
Trustworthy cloud services for Medical Imaging Biomarkers

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectroscopy.pptx food analysis technology
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars

Gray-Box Models for Performance Assessment of Spark Applications

  • 1. Co-funded by the European Commission Horizon 2020 - Grant #777154 Gray-Box Models for Performance Assessment of Spark Applications atmosphere-eubrazil.eu @AtmosphereEUBR M. Lattuada, E. Gianniti, M. Hosseini, D. Ardagna, A. Maros, F. Murai, A. P. Couto da Silva and J.M. Almeida Politecnico di Milano, Italy Universidade Federal de Minas Gerais, Brazil
  • 2. ● Develop models for: ○ Identify minimum cost configuration with a priori deadlines ○ Allow the adaptive actuation mechanisms to predict if QoS objective will be reached ○ Assessing (a posteriori) the main performance metrics in multi-tenancy environments ○ Evaluate if an application run was affected by resource contention ● Approach: ○ Gray box models based on Machine Learning (ML) ○ Open source ML library ○ Open source benchmarks 2 Contextualizing Performance Models The adoption of accurate models allows anticipating QoS violations and increasing cloud services trustworthiness by improving their performance
  • 3. • Regression Models: l1-regularized Linear Regression, Neural Network, Decision Tree, Random Forests, and Support Vector Regression • Hyper-parameters optimization 3 ML Models overview
  • 4. • Workloads: • TPC-DS - the industry benchmark (Query 26) for data warehouse systems • ML benchmarks (K-means) from the Sparkbench library • SparkDL developed on top of Sparkbench • Platforms: • Microsoft HDInsights on Azure - cloud computing service • IBM Power8 Cluster – dedicated cluster 4 ML Experiments Settings
  • 5. • Sampling scheme analysis • Evaluation Metric: • Mean Absolute Percentage Error: 5 ML Experiments Settings
  • 6. • Ernest Model by Spark inventors • Pure black-box approach: • Non Negative Least Square Regression (NNLS) • Features: • Ratio of data size to number of cores • Log of number of cores • Data size • Number of cores • Number of TensorFlow cores (SparkDL only) 6 Comparison with State of the Art Solutions S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica. “Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics”. In: 13th USENIX Symposium on Networked Systems Design and Implementation. 2016, pp. 363–378.
  • 7. 7 Gray-Box Model Features DAG structure constant across different runs
  • 8. 8 Query 26 Results Interpolation Scenario Extrapolation Scenario
  • 9. 9 K-Means & SparkDL Results Interpolation Scenario Extrapolation Scenario
  • 10. • Gray-box models are effective for performance assessment to identify performance degradation (about 4-25% percentage error) • Work better when the application data size is fixed • Comparison with Ernest:in most cases, our best models improve Ernest considerably, especially when few profiling configurations are available in the training set and when workloads are less regular 10 ML Models Results Summary
  • 11. • There is no ML technique which always outperforms the others, hence different techniques have to be evaluated in each scenario to choose the best model • Study the performance of Spark applications running on GPU- based clusters • Validate the models on production environments 11 Conclusions & Future work
  • 12. 12 Thanks for your attention…