SlideShare a Scribd company logo
Boosting Spark Performance:
An Overview of Techniques
Ahsan Javed Awan
Motivation
About me
● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain.
● Visiting Researcher at Barcelona Super Computing Center.
● Speaker at Spark Summit Europe 2016.
● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics
with Apache Spark”
● https://www.kth.se/profile/ajawan/
Motivation
Why should you listen?
● What's new in Apache Spark 2.0
● Phase 1: Memory Management and Cache-aware algorithms
● Phase 2: Whole-stage Codegen and Columnar In-Memory Support
● How to get better performance by
● Choosing and Tuning GC.
● Using multiple executors each with the heap size of not more than 32 GB.
● Exploiting Data Locality on DRAM nodes.
● Turning off Hardware Pre-fetchers
● Keeping the Hyper-Threading on
Motivation
Apache Spark Philosophy?
Motivation
Cont...
I
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,
Metis, Ostrich, etc..
Hadoop, Spark,
Flink, etc..
Motivation
Cont..
*Source: SGI
● Exponential increase in core count.
● A mismatch between the characteristics of emerging big data workloads and the
underlying hardware.
● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc)
● Clearing the clouds, ASPLOS' 12
● Characterizing data analysis
workloads, IISWC' 13
● Understanding the behavior of in-
memory computing workloads,
IISWC' 14
Substantially improve the memory and CPU efficiency of Spark
backend execution and push performance closer to the limits of
modern hardware.
Goals of Project Tungsten
Phase 1
Foundation
Memory Management
Code Generation
Cache-aware Algorithms
Phase 2
Order-of-magnitude Faster
Whole-stage Codegen
Vectorization
Cont..
Perform explicit memory management instead of relying on Java objects
• Reduce memory footprint
• Eliminate garbage collection overheads
• Use sun.misc.unsafe rows and off heap memory
Code generation for expression evaluation
• Reduce virtual function calls and interpretation overhead
Cache conscious sorting
• Reduce bad memory access patterns
Summary of Phase I
Progress Meeting 12-12-14
Which Benchmarks ?
Our Hardware Configuration
Which Machine ?
Intel's Ivy Bridge Server
Performance of Cache-aware algorithms ?
DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles
25% less BW consumption10% less starvation of execution resources
Difficult to get order of magnitude performance speed ups with
profiling techniques
• For 10x improvement, would need of find top hotspots that add up to
90% and make them instantaneous
• For 100x, 99%
Instead, look bottom up, how fast should it run?
Phase 2
Scan
Filter
Project
Aggregate
select count(*) from store_sales
where ss_item_sk = 1000
Cont..
Standard for 30 years:
almost all databases do it
Each operator is an
“iterator” that consumes
records from its input
operator
class Filter(
child: Operator,
predicate: (Row => Boolean))
extends Operator {
def next(): Row = {
var current = child.next()
while (current == null ||predicate(current)) {
current = child.next()
}
return current
}
}
Volcano Iterator Model
select count(*) from store_sales
where ss_item_sk = 1000
long count = 0;
for (ss_item_sk in store_sales) {
if (ss_item_sk == 1000) {
count += 1;
}
}
Hand Written Code
Volcano 13.95 million
rows/sec
college
freshman
125 million
rows/sec
Note: End-to-end, single thread, single column, and data originated in Parquet on disk
High throughput
Volcano Model vs Hand Written Code
Volcano Model
1. Too many virtual function calls
2. Intermediate data in memory (or
L1/L2/L3 cache)
3. Can’t take advantage of modern
CPU features -- no loop unrolling,
SIMD, pipelining, prefetching, branch
prediction etc.
Hand-written code
1. No virtual function calls
2. Data in CPU registers
3. Compiler loop unrolling, SIMD,
pipelining
Volcano vs Hand Written Code
Fusing operators together so the generated code looks like hand
optimized code:
- Identify chains of operators (“stages”)
- Compile each stage into a single function
- Functionality of a general purpose execution engine;
performance as if hand built system just to run your query
Whole-Stage Codegen
mike
In-memory
Row Format
1 john 4.1
2 3.5
3 sally 6.4
1 2 3
john mike sally
4.1 3.5 6.4
In-memory
Column Format
Columnar In-Memory
1. More efficient: denser storage, regular data access, easier to
index into
2. More compatible: Most high-performance external systems
are already columnar (numpy, TensorFlow, Parquet); zero
serialization/copy to work with them
3. Easier to extend: process encoded data
Why Columnar?
Parquet 11 million
rows/sec
Parquet
vectorized
90 million
rows/sec
Note: End-to-end, single thread, single column, and data originated in Parquet on disk
High throughput
Phase 1
Spark 1.4 - 1.6
Memory Management
Code Generation
Cache-aware Algorithms
Phase 2
Spark 2.0+
Whole-stage Code Generation
Columnar in Memory Support
Both whole stage codegen [SPARK-12795] and the vectorized
parquet reader [SPARK-12992] are enabled by default in Spark 2.0+
5-30x
Speedups
Operator Benchmarks: Cost/Row (ns)
1. SPARK-16026: Cost Based Optimizer
- Leverage table/column level statistics to optimize joins and aggregates
- Statistics Collection Framework (Spark 2.1)
- Cost Based Optimizer (Spark 2.2)
2. Boosting Spark’s Performance on Many-Core Machines
- Qifan’s Talk Today at 2:55pm (Research Track)
- In-memory/ single node shuffle
3. Improving quality of generated code and better integration
with the in-memory column format in Spark
Spark 2.1, 2.2 and beyond
Motivation
The choice of Garbage Collector impact the data processing
capability of the system
Improvement in DPS ranges from 1.4x to 3.7x on average in
Parallel Scavenge as compared to G1
Our Approach
Multiple Small executors instead of single large executor
Multiple small executors can provide up-to 36% performance gain
Our Approach
NUMA Awareness
NUMA Awareness results in 10% speed up on average
Our Approach
Hyper Threading is effective
Hyper threading reduces the DRAM bound stalls by 50%
Our Approach
Disable next-line prefetchers
Disabling next-line prefetchers can improve the
performance by 15%
Our Approach
Further Reading
●
Performance characterization of in-memory data analytics on a modern cloud server, in 5th
IEEE
Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).
●
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th
Workshop on
Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in
conjunction with VLDB 2015, Hawaii, USA .
●
Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads,
in 6th
IEEE Conference on Big Data and Cloud Computing, 2016.
●
Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters in 3rd
IEEE/ACM
Conference in Big Data Computing, Applications and Technologies, 2016.
●
Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing
Architectures (under submission).
THANK YOU.
Acknowledgements:
Sameer Agarwal for Project Tugsten slides

More Related Content

PDF
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
PDF
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
PDF
Spark summit 2019 infrastructure for deep learning in apache spark 0425
PDF
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
PDF
Spark Streaming and MLlib - Hyderabad Spark Group
PDF
Spark and Couchbase: Augmenting the Operational Database with Spark
PDF
Apache Spark: The Next Gen toolset for Big Data Processing
PPTX
Large-Scale Data Science in Apache Spark 2.0
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Spark Streaming and MLlib - Hyderabad Spark Group
Spark and Couchbase: Augmenting the Operational Database with Spark
Apache Spark: The Next Gen toolset for Big Data Processing
Large-Scale Data Science in Apache Spark 2.0

What's hot (20)

PPTX
Lightening Fast Big Data Analytics using Apache Spark
PPTX
Intro to Spark development
PPTX
Introduction to Apache Spark Developer Training
PDF
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
PDF
A Journey into Databricks' Pipelines: Journey and Lessons Learned
PPTX
Data science on big data. Pragmatic approach
PDF
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
PDF
Spark Based Distributed Deep Learning Framework For Big Data Applications
PPTX
Spark Summit EU talk by Sameer Agarwal
PPTX
Introduction to Apache Spark
PDF
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
PDF
Spark Summit EU 2015: Lessons from 300+ production users
PPTX
Apache Spark and Online Analytics
PPTX
Apache spark - History and market overview
PDF
Re-Architecting Spark For Performance Understandability
PDF
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
PDF
Strata NYC 2015 - Supercharging R with Apache Spark
PDF
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
PDF
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Lightening Fast Big Data Analytics using Apache Spark
Intro to Spark development
Introduction to Apache Spark Developer Training
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Data science on big data. Pragmatic approach
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Summit EU talk by Sameer Agarwal
Introduction to Apache Spark
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Spark Summit EU 2015: Lessons from 300+ production users
Apache Spark and Online Analytics
Apache spark - History and market overview
Re-Architecting Spark For Performance Understandability
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Strata NYC 2015 - Supercharging R with Apache Spark
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Ad

Viewers also liked (20)

PPTX
Transformation Processing Smackdown; Spark vs Hive vs Pig
PDF
Machine Learning by Example - Apache Spark
PPTX
Tuning and Debugging in Apache Spark
PPTX
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
PDF
Spark 2.x Troubleshooting Guide
 
PPTX
Mutable Data in Hive's Immutable World
PPTX
Double Your Hadoop Hardware Performance with SmartSense
PDF
Stream Processing using Apache Spark and Apache Kafka
PPTX
Introduction to Machine Learning
PDF
Deep learning - Part I
PDF
Data Pipelines with Apache Kafka
PDF
Realtime Analytical Query Processing and Predictive Model Building on High Di...
PDF
Dive into Spark Streaming
PDF
Deep learning and Apache Spark
PDF
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
PPTX
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
PDF
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
PDF
Architecting a Next Generation Data Platform
PPTX
Why your Spark Job is Failing
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Transformation Processing Smackdown; Spark vs Hive vs Pig
Machine Learning by Example - Apache Spark
Tuning and Debugging in Apache Spark
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark 2.x Troubleshooting Guide
 
Mutable Data in Hive's Immutable World
Double Your Hadoop Hardware Performance with SmartSense
Stream Processing using Apache Spark and Apache Kafka
Introduction to Machine Learning
Deep learning - Part I
Data Pipelines with Apache Kafka
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Dive into Spark Streaming
Deep learning and Apache Spark
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Architecting a Next Generation Data Platform
Why your Spark Job is Failing
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Ad

Similar to Boosting spark performance: An Overview of Techniques (20)

PDF
Spark Summit EU talk by Ahsan Javed Awan
PPTX
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
PPTX
Profiling & Testing with Spark
PDF
Performance Characterization and Optimization of In-Memory Data Analytics on ...
PDF
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
PDF
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
PPTX
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
PPT
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
PPTX
Explore big data at speed of thought with Spark 2.0 and Snappydata
PDF
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
PDF
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
PPTX
Nike tech talk.2
PDF
Primitive Pursuits: Slaying Latency with Low-Level Primitives and Instructions
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Spark Summit EU talk by Ahsan Javed Awan
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
Profiling & Testing with Spark
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Explore big data at speed of thought with Spark 2.0 and Snappydata
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Data Analytics and Machine Learning: From Node to Cluster on ARM64
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Project Tungsten: Bringing Spark Closer to Bare Metal
Nike tech talk.2
Primitive Pursuits: Slaying Latency with Low-Level Primitives and Instructions
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Hybrid Transactional/Analytics Processing with Spark and IMDGs

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PPTX
Logistic Regression ml machine learning.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Mega Projects Data Mega Projects Data
Moving the Public Sector (Government) to a Digital Adoption
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Database Infoormation System (DBIS).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
Logistic Regression ml machine learning.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Galatica Smart Energy Infrastructure Startup Pitch Deck
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Foundation of Data Science unit number two notes
Introduction to Knowledge Engineering Part 1
Taxes Foundatisdcsdcsdon Certificate.pdf
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Boosting spark performance: An Overview of Techniques

  • 1. Boosting Spark Performance: An Overview of Techniques Ahsan Javed Awan
  • 2. Motivation About me ● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain. ● Visiting Researcher at Barcelona Super Computing Center. ● Speaker at Spark Summit Europe 2016. ● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics with Apache Spark” ● https://www.kth.se/profile/ajawan/
  • 3. Motivation Why should you listen? ● What's new in Apache Spark 2.0 ● Phase 1: Memory Management and Cache-aware algorithms ● Phase 2: Whole-stage Codegen and Columnar In-Memory Support ● How to get better performance by ● Choosing and Tuning GC. ● Using multiple executors each with the heap size of not more than 32 GB. ● Exploiting Data Locality on DRAM nodes. ● Turning off Hardware Pre-fetchers ● Keeping the Hyper-Threading on
  • 6. Motivation Cont.. *Source: SGI ● Exponential increase in core count. ● A mismatch between the characteristics of emerging big data workloads and the underlying hardware. ● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc) ● Clearing the clouds, ASPLOS' 12 ● Characterizing data analysis workloads, IISWC' 13 ● Understanding the behavior of in- memory computing workloads, IISWC' 14
  • 7. Substantially improve the memory and CPU efficiency of Spark backend execution and push performance closer to the limits of modern hardware. Goals of Project Tungsten
  • 8. Phase 1 Foundation Memory Management Code Generation Cache-aware Algorithms Phase 2 Order-of-magnitude Faster Whole-stage Codegen Vectorization Cont..
  • 9. Perform explicit memory management instead of relying on Java objects • Reduce memory footprint • Eliminate garbage collection overheads • Use sun.misc.unsafe rows and off heap memory Code generation for expression evaluation • Reduce virtual function calls and interpretation overhead Cache conscious sorting • Reduce bad memory access patterns Summary of Phase I
  • 11. Our Hardware Configuration Which Machine ? Intel's Ivy Bridge Server
  • 12. Performance of Cache-aware algorithms ? DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles 25% less BW consumption10% less starvation of execution resources
  • 13. Difficult to get order of magnitude performance speed ups with profiling techniques • For 10x improvement, would need of find top hotspots that add up to 90% and make them instantaneous • For 100x, 99% Instead, look bottom up, how fast should it run? Phase 2
  • 14. Scan Filter Project Aggregate select count(*) from store_sales where ss_item_sk = 1000 Cont..
  • 15. Standard for 30 years: almost all databases do it Each operator is an “iterator” that consumes records from its input operator class Filter( child: Operator, predicate: (Row => Boolean)) extends Operator { def next(): Row = { var current = child.next() while (current == null ||predicate(current)) { current = child.next() } return current } } Volcano Iterator Model
  • 16. select count(*) from store_sales where ss_item_sk = 1000 long count = 0; for (ss_item_sk in store_sales) { if (ss_item_sk == 1000) { count += 1; } } Hand Written Code
  • 17. Volcano 13.95 million rows/sec college freshman 125 million rows/sec Note: End-to-end, single thread, single column, and data originated in Parquet on disk High throughput Volcano Model vs Hand Written Code
  • 18. Volcano Model 1. Too many virtual function calls 2. Intermediate data in memory (or L1/L2/L3 cache) 3. Can’t take advantage of modern CPU features -- no loop unrolling, SIMD, pipelining, prefetching, branch prediction etc. Hand-written code 1. No virtual function calls 2. Data in CPU registers 3. Compiler loop unrolling, SIMD, pipelining Volcano vs Hand Written Code
  • 19. Fusing operators together so the generated code looks like hand optimized code: - Identify chains of operators (“stages”) - Compile each stage into a single function - Functionality of a general purpose execution engine; performance as if hand built system just to run your query Whole-Stage Codegen
  • 20. mike In-memory Row Format 1 john 4.1 2 3.5 3 sally 6.4 1 2 3 john mike sally 4.1 3.5 6.4 In-memory Column Format Columnar In-Memory
  • 21. 1. More efficient: denser storage, regular data access, easier to index into 2. More compatible: Most high-performance external systems are already columnar (numpy, TensorFlow, Parquet); zero serialization/copy to work with them 3. Easier to extend: process encoded data Why Columnar?
  • 22. Parquet 11 million rows/sec Parquet vectorized 90 million rows/sec Note: End-to-end, single thread, single column, and data originated in Parquet on disk High throughput
  • 23. Phase 1 Spark 1.4 - 1.6 Memory Management Code Generation Cache-aware Algorithms Phase 2 Spark 2.0+ Whole-stage Code Generation Columnar in Memory Support Both whole stage codegen [SPARK-12795] and the vectorized parquet reader [SPARK-12992] are enabled by default in Spark 2.0+
  • 25. 1. SPARK-16026: Cost Based Optimizer - Leverage table/column level statistics to optimize joins and aggregates - Statistics Collection Framework (Spark 2.1) - Cost Based Optimizer (Spark 2.2) 2. Boosting Spark’s Performance on Many-Core Machines - Qifan’s Talk Today at 2:55pm (Research Track) - In-memory/ single node shuffle 3. Improving quality of generated code and better integration with the in-memory column format in Spark Spark 2.1, 2.2 and beyond
  • 26. Motivation The choice of Garbage Collector impact the data processing capability of the system Improvement in DPS ranges from 1.4x to 3.7x on average in Parallel Scavenge as compared to G1
  • 27. Our Approach Multiple Small executors instead of single large executor Multiple small executors can provide up-to 36% performance gain
  • 28. Our Approach NUMA Awareness NUMA Awareness results in 10% speed up on average
  • 29. Our Approach Hyper Threading is effective Hyper threading reduces the DRAM bound stalls by 50%
  • 30. Our Approach Disable next-line prefetchers Disabling next-line prefetchers can improve the performance by 15%
  • 31. Our Approach Further Reading ● Performance characterization of in-memory data analytics on a modern cloud server, in 5th IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award). ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA . ● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads, in 6th IEEE Conference on Big Data and Cloud Computing, 2016. ● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters in 3rd IEEE/ACM Conference in Big Data Computing, Applications and Technologies, 2016. ● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).
  • 32. THANK YOU. Acknowledgements: Sameer Agarwal for Project Tugsten slides