SlideShare a Scribd company logo
Ahsan J. Awan (KTH)
Near Data Computing Architectures:
Opportunities and Challenges for
Apache Spark
#EUres10
1
2#EUres10
About me ?
2
1988 20112010 20132012 2014 2015 2016 2017
B.E. MTS
NUST,
Pakistan
EMECS,
TUKL,
Germany
EMECS,
UoS,
UK
Lecturer,
NUST,
Pakistan
EMJD-DC,
KTH/SICS,
Sweden
PhD Intern,
Recore
Netherlands
EMJD-DC,
UPC/BSC,
Spain
PhD Intern,
IBM Research,
Japan
Born in
Pakistan
3#EUres10
A Quick Recap from last year ?
3
Problems Identified Solutions Proposed
https://spark-summit.org/eu-2016/events/performance-characterization-of-apache-spark-on-scale-up-servers/
Work Time Inflation
Poor Multi-core Scalability of
data analytics with Spark
Thread Level Load
Imbalance
Wait Time on I/O
GC overhead
DRAM Bound
Latency
NUMA Awareness
Hyper Threaded
Cores
No next-line
prefetchers
Lower DRAM
speed
Exploiting Near Data Processing
Choice of GC
algorithm
Multiple Small
executors
4#EUres10
Further Reading ?
4
● Project Night-King: Improving the single performance for Apache Spark using Near
Data Processing Architectures.
● Identifying the potential of Near Data Computing Architectures for Apache Spark in
Memory Systems Symposium, 2017.
● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters in
IEEE/ACM Conference in Big Data Computing, Applications and Technologies, 2016.
● Micro-architectural Characterization of Apache Spark on Batch and Stream
Processing Workloads, in IEEE Conference on Big Data and Cloud Computing, 2016.
●
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th
Workshop on Big Data Benchmarks, Performance Optimization and Emerging
Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA .
● Performance characterization of in-memory data analytics on a modern cloud server,
in IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).
5#EUres10
Exploiting NDP/Moving compute closer to data ?
5
Loh et al. A processing in memory taxonomy and a case for studying fixedfunction pim. In Workshop on Near-Data Processing (WoNDP), 2013.
1. Processing in Memory
2. In-Storage Processing
Improve the performance
by reducing costly data
movements back and forth
between the CPUs and
Memories
6#EUres10
Trends of Integrating NVM in the System Architecture ?
6
Chang et al. A limits study of benefits from nanostore-based future data-centric system architectures. In Computing Frontiers 2012
7#EUres10
Can Spark workloads benefit from Near data processing ?
7
Host
CPU
PIM
device
ISP
device
Project: NightKing
8#EUres10
The case for in-storage processing ?
8
Grep (Gp)
K-means (Km)Windowed Word Count (Wwc)
9#EUres10
The case for 2D integrated PIM instead of 3D Stacked PIM ?
9
M. Radulovic et al. Another Trip to the Wall: How
Much Will Stacked DRAM Benefit HPC?
10#EUres10
A refined hypothesis based on workload characterization ?
10
● Spark workloads, which are not iterative and have high ratio of I/O wait time / CPU time
like join, aggregation, filter, word count and sort are ideal candidates for ISP.
● Spark workloads, which have low ratio of I/O wait time / CPU time like stream
processing and iterative graph processing workloads are bound by latency of frequent
accesses to DRAM and are ideal candidates for 2D integrated PIM.
● Spark workloads, which are iterative and have moderate ratio of I/O wait time / CPU
time like K-means, have both I/O bound and memory bound phases and hence will
benefit from hybrid 2D integrated PIM and ISP.
● In order to satisfy the varying compute demands of Spark workloads, we envision an NDC
architecture with programmable logic based hybrid ISP and 2D integrated PIM.
11#EUres10
How to test the refined hypothesis ?
11
● Simulation Approach
● Very slow for big data applications :(
● Modeling Approach
● Overly estimated numbers :(
● Emulation Approach
● A lot of development :(
How about a combination of Modeling and partial Emulation ?
12#EUres10
Can existing tightly coupled servers be used as emulators ?
12
13#EUres10
Our System Design ?
13
14#EUres10
Which programming model ?
14
Iterative MapReduce
15#EUres10
Which workloads ?
15
K-means and SGD
Mahan et al. TABLA: A unified template-based framework for accelerating statistical machine learning
16#EUres10
Our programmable accelerators ?
16
17#EUres10
Advantages of the design ?
17
● Template based design to support generality.
● No of mappers and reducers can be instantiated based on the FPGA
card.
● General Sequencer is a Finite State Machine whose states can be varied
to meet the diverse set of workloads
● Mappers and Reducers can be programmed in C/C++ and can be
synthesized using High Level Synthesis.
● Support hardware acceleration of Diverse set of workloads
18#EUres10
How about using a roof-line model ?
18
Used to estimate Arithmetic
Intensities
Used to generate roof-line model
19#EUres10
Let's show some numbers ?
19
Poor multi-core scalability of Apache Spark
20#EUres10
What are the opportunities ?
20
K-means (Km)
21#EUres10
What are the challenges ?
21
● How to design the best hybrid CPU + FPGA ML workloads ?
● How to attain peak performance on CPU side ?
● How to attain peak performance on FPGA side ?
● How to balance load between CPU and FPGA ?
● How hide communication between JVM and FPGA ?
● How to attain peak CAPI bandwidth consumption ?
● How to design the clever ML workload accelerators using HLS tools ?
22#EUres10
What High Level Synthesis (Xilinx SDSoC Tool Chain) can do ?
22
20x 10x
23#EUres10
Things to remember from this talk ?
23
● 3D Stacked Memories are the over-kill for Apache Spark Workloads.
● Project Night-King aims at improving the single node performance of Apache Spark
using programmable accelerators near DRAM and NVRAM.
● Conservatively, Near-data accelerators augmented Scale-up Servers can improve the
performance of Spark MLlib by 5x.
● Never Trust the 20x speed-up claims being made in the industry. Most of the time, the
reference points are wrong!
● Xilinx SDSoC Tool Chain needs to support pragmas for map-reduce programming model.
● Performance Characterization and Optimization of In-Memory Data Analytics on a Scale-
up Server, PhD thesis, Ahsan Javed Awan (ISBN: 978-91-7729-584-6)
24#EUres10
That's all for now ?
24
Email: ajawan@kth.seEmail: ajawan@kth.se
Profile:Profile: www.kth.se/profile/ajawan/www.kth.se/profile/ajawan/
https://se.linkedin.com/in/ahsanjavedawanhttps://se.linkedin.com/in/ahsanjavedawan
THANK YOU

More Related Content

PDF
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Feature Hashing for Scalable Machine Learning with Nick Pentreath
PDF
Migrating to Apache Spark at Netflix
PDF
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
PDF
Spark Summit EU talk by Ahsan Javed Awan
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Migrating to Apache Spark at Netflix
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Spark Summit EU talk by Ahsan Javed Awan
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira

What's hot (20)

PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Spark Streaming and MLlib - Hyderabad Spark Group
PDF
End-to-End Data Pipelines with Apache Spark
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
PDF
Large Scale Multimedia Data Intelligence And Analysis On Spark
PDF
Elastify Cloud-Native Spark Application with Persistent Memory
PDF
Cooperative Task Execution for Apache Spark
PDF
Spark Summit 2016: Connecting Python to the Spark Ecosystem
PDF
Spark summit 2019 infrastructure for deep learning in apache spark 0425
PDF
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
PDF
Spark Summit EU talk by Heiko Korndorf
PDF
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
PDF
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
PDF
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Powering a Startup with Apache Spark with Kevin Kim
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Streaming and MLlib - Hyderabad Spark Group
End-to-End Data Pipelines with Apache Spark
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Large Scale Multimedia Data Intelligence And Analysis On Spark
Elastify Cloud-Native Spark Application with Persistent Memory
Cooperative Task Execution for Apache Spark
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Spark Summit EU talk by Heiko Korndorf
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Spark Summit EU talk by Kaarthik Sivashanmugam
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Spark Summit EU talk by Debasish Das and Pramod Narasimha
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Ad

Similar to Near Data Computing Architectures for Apache Spark: Challenges and Opportunities with Ahsan Awan (20)

PDF
Performance Characterization and Optimization of In-Memory Data Analytics on ...
PDF
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
PDF
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
PDF
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
PDF
Resume_Mahadevan_new (2)
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
PPTX
OpenACC Monthly Highlights: October2020
PPTX
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
PPTX
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
PPTX
OpenACC Monthly Highlights: June 2020
PPTX
Deep Learning with Spark and GPUs
PDF
Conference Paper: Universal Node: Towards a high-performance NFV environment
PPTX
Profiling & Testing with Spark
PDF
OpenMP tasking model: from the standard to the classroom
PPTX
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
PPT
0507036
PDF
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Resume_Mahadevan_new (2)
Data Analytics and Machine Learning: From Node to Cluster on ARM64
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
OpenACC Monthly Highlights: October2020
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
OpenACC Monthly Highlights: June 2020
Deep Learning with Spark and GPUs
Conference Paper: Universal Node: Towards a high-performance NFV environment
Profiling & Testing with Spark
OpenMP tasking model: from the standard to the classroom
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
0507036
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PDF
Variant-Apache Spark for Bioinformatics with Piotr Szul
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Variant-Apache Spark for Bioinformatics with Piotr Szul

Recently uploaded (20)

PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Lecture1 pattern recognition............
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Global journeys: estimating international migration
PDF
Launch Your Data Science Career in Kochi – 2025
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
Fluorescence-microscope_Botany_detailed content
Moving the Public Sector (Government) to a Digital Adoption
climate analysis of Dhaka ,Banglades.pptx
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Lecture1 pattern recognition............
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
1_Introduction to advance data techniques.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Global journeys: estimating international migration
Launch Your Data Science Career in Kochi – 2025

Near Data Computing Architectures for Apache Spark: Challenges and Opportunities with Ahsan Awan

  • 1. Ahsan J. Awan (KTH) Near Data Computing Architectures: Opportunities and Challenges for Apache Spark #EUres10 1
  • 2. 2#EUres10 About me ? 2 1988 20112010 20132012 2014 2015 2016 2017 B.E. MTS NUST, Pakistan EMECS, TUKL, Germany EMECS, UoS, UK Lecturer, NUST, Pakistan EMJD-DC, KTH/SICS, Sweden PhD Intern, Recore Netherlands EMJD-DC, UPC/BSC, Spain PhD Intern, IBM Research, Japan Born in Pakistan
  • 3. 3#EUres10 A Quick Recap from last year ? 3 Problems Identified Solutions Proposed https://spark-summit.org/eu-2016/events/performance-characterization-of-apache-spark-on-scale-up-servers/ Work Time Inflation Poor Multi-core Scalability of data analytics with Spark Thread Level Load Imbalance Wait Time on I/O GC overhead DRAM Bound Latency NUMA Awareness Hyper Threaded Cores No next-line prefetchers Lower DRAM speed Exploiting Near Data Processing Choice of GC algorithm Multiple Small executors
  • 4. 4#EUres10 Further Reading ? 4 ● Project Night-King: Improving the single performance for Apache Spark using Near Data Processing Architectures. ● Identifying the potential of Near Data Computing Architectures for Apache Spark in Memory Systems Symposium, 2017. ● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters in IEEE/ACM Conference in Big Data Computing, Applications and Technologies, 2016. ● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads, in IEEE Conference on Big Data and Cloud Computing, 2016. ● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA . ● Performance characterization of in-memory data analytics on a modern cloud server, in IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).
  • 5. 5#EUres10 Exploiting NDP/Moving compute closer to data ? 5 Loh et al. A processing in memory taxonomy and a case for studying fixedfunction pim. In Workshop on Near-Data Processing (WoNDP), 2013. 1. Processing in Memory 2. In-Storage Processing Improve the performance by reducing costly data movements back and forth between the CPUs and Memories
  • 6. 6#EUres10 Trends of Integrating NVM in the System Architecture ? 6 Chang et al. A limits study of benefits from nanostore-based future data-centric system architectures. In Computing Frontiers 2012
  • 7. 7#EUres10 Can Spark workloads benefit from Near data processing ? 7 Host CPU PIM device ISP device Project: NightKing
  • 8. 8#EUres10 The case for in-storage processing ? 8 Grep (Gp) K-means (Km)Windowed Word Count (Wwc)
  • 9. 9#EUres10 The case for 2D integrated PIM instead of 3D Stacked PIM ? 9 M. Radulovic et al. Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
  • 10. 10#EUres10 A refined hypothesis based on workload characterization ? 10 ● Spark workloads, which are not iterative and have high ratio of I/O wait time / CPU time like join, aggregation, filter, word count and sort are ideal candidates for ISP. ● Spark workloads, which have low ratio of I/O wait time / CPU time like stream processing and iterative graph processing workloads are bound by latency of frequent accesses to DRAM and are ideal candidates for 2D integrated PIM. ● Spark workloads, which are iterative and have moderate ratio of I/O wait time / CPU time like K-means, have both I/O bound and memory bound phases and hence will benefit from hybrid 2D integrated PIM and ISP. ● In order to satisfy the varying compute demands of Spark workloads, we envision an NDC architecture with programmable logic based hybrid ISP and 2D integrated PIM.
  • 11. 11#EUres10 How to test the refined hypothesis ? 11 ● Simulation Approach ● Very slow for big data applications :( ● Modeling Approach ● Overly estimated numbers :( ● Emulation Approach ● A lot of development :( How about a combination of Modeling and partial Emulation ?
  • 12. 12#EUres10 Can existing tightly coupled servers be used as emulators ? 12
  • 14. 14#EUres10 Which programming model ? 14 Iterative MapReduce
  • 15. 15#EUres10 Which workloads ? 15 K-means and SGD Mahan et al. TABLA: A unified template-based framework for accelerating statistical machine learning
  • 17. 17#EUres10 Advantages of the design ? 17 ● Template based design to support generality. ● No of mappers and reducers can be instantiated based on the FPGA card. ● General Sequencer is a Finite State Machine whose states can be varied to meet the diverse set of workloads ● Mappers and Reducers can be programmed in C/C++ and can be synthesized using High Level Synthesis. ● Support hardware acceleration of Diverse set of workloads
  • 18. 18#EUres10 How about using a roof-line model ? 18 Used to estimate Arithmetic Intensities Used to generate roof-line model
  • 19. 19#EUres10 Let's show some numbers ? 19 Poor multi-core scalability of Apache Spark
  • 20. 20#EUres10 What are the opportunities ? 20 K-means (Km)
  • 21. 21#EUres10 What are the challenges ? 21 ● How to design the best hybrid CPU + FPGA ML workloads ? ● How to attain peak performance on CPU side ? ● How to attain peak performance on FPGA side ? ● How to balance load between CPU and FPGA ? ● How hide communication between JVM and FPGA ? ● How to attain peak CAPI bandwidth consumption ? ● How to design the clever ML workload accelerators using HLS tools ?
  • 22. 22#EUres10 What High Level Synthesis (Xilinx SDSoC Tool Chain) can do ? 22 20x 10x
  • 23. 23#EUres10 Things to remember from this talk ? 23 ● 3D Stacked Memories are the over-kill for Apache Spark Workloads. ● Project Night-King aims at improving the single node performance of Apache Spark using programmable accelerators near DRAM and NVRAM. ● Conservatively, Near-data accelerators augmented Scale-up Servers can improve the performance of Spark MLlib by 5x. ● Never Trust the 20x speed-up claims being made in the industry. Most of the time, the reference points are wrong! ● Xilinx SDSoC Tool Chain needs to support pragmas for map-reduce programming model. ● Performance Characterization and Optimization of In-Memory Data Analytics on a Scale- up Server, PhD thesis, Ahsan Javed Awan (ISBN: 978-91-7729-584-6)
  • 24. 24#EUres10 That's all for now ? 24 Email: ajawan@kth.seEmail: ajawan@kth.se Profile:Profile: www.kth.se/profile/ajawan/www.kth.se/profile/ajawan/ https://se.linkedin.com/in/ahsanjavedawanhttps://se.linkedin.com/in/ahsanjavedawan THANK YOU