SlideShare a Scribd company logo
Gregory Essertel, Purdue University
Tiark Rompf, Purdue University
Flare and TensorFlare: Native
Compilation for Spark and
TensorFlow Pipelines
#Res5SAIS
1
2#Res4SAIS
3#Res4SAIS
4#Res4SAIS
5#Res4SAIS
How Fast Is Spark?
#Res4SAIS 6
#Res4SAIS 7
Demo
#Res4SAIS 8
Spark Architecture
#Res4SAIS 9
Flare: a New Back-end for Spark
#Res4SAIS 10
Results
#Res4SAIS 11
Single-Core Running Time: TPCH
Absolute running time in milliseconds (ms) for Postgres, Spark, HyPer and Flare in SF10
#Res4SAIS 12
Apache Parquet Format
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
Spark CSV 16762 12244 21730 19836 19316 12278 24484 17726 30050 29533 5224
Spark Parquet 3728 13520 9099 6083 8706 535 13555 5512 19413 21822 3926
Flare CSV 641 168 757 698 758 568 788 875 1417 854 128
Flare Parquet 187 17 125 127 151 99 183 160 698 309 9
Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Spark CSV 21688 8554 12962 26721 12941 24690 27012 12409 19369 57330 7050
Spark Parquet 5570 7034 719 4506 21834 5176 6757 2681 8562 25089 5295
Flare CSV 701 388 573 551 150 1426 1229 605 792 1868 178
Flare Parquet 133 246 86 88 66 264 181 178 165 324 22
#Res4SAIS 13
What about parallelism?
#Res4SAIS 14
Parallel Scaling Experiment
Scaling-up Flare and Spark
SQL in SF20
Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
#Res4SAIS 15
NUMA Optimization
#Res4SAIS 16
NUMA Optimization
Scaling-up Flare for SF100 with NUMA optimizations on different configurations: threads pinned to one, two and four sockets
Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
#Res4SAIS 17
Heterogeneous Workloads:
UDFs and ML Kernels
#Res4SAIS 18
TensorFlow -> TensorFlare
#Res4SAIS 19
TensorFlare architecture
#Res4SAIS 20
Flare
TensorFlow Model
Specialized data loading
TensorFlow
Runtime
XLA
HDD
SQL
Engine
produces
Demo
#Res4SAIS 21
flaredata.github.io
22
flaredata.github.io
23
Thank You!
Web: flaredata.github.io
Twitter: @flaredata
FLARE
24

More Related Content

PPT
Fix gcc lra bug
PDF
grep.metacpan.org
PPT
Svccg nosql 2011_sri-cassandra
PPTX
Kafka timestamp offset_final
PPTX
Kafka timestamp offset
TXT
plc program examples for study
PPTX
Querying Network Packet Captures with Spark and Drill
PDF
IPv6 in IPv4/MPLS in a Nutshell
Fix gcc lra bug
grep.metacpan.org
Svccg nosql 2011_sri-cassandra
Kafka timestamp offset_final
Kafka timestamp offset
plc program examples for study
Querying Network Packet Captures with Spark and Drill
IPv6 in IPv4/MPLS in a Nutshell

Similar to Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf (20)

PDF
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
PDF
[Webinar Slides] Programming the Network Dataplane in P4
PDF
RSP4J: An API for RDF Stream Processing
PDF
Dongwon Kim – A Comparative Performance Evaluation of Flink
PPTX
A Comparative Performance Evaluation of Apache Flink
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Accelerating apache spark with rdma
PDF
Perl family: 15 years of Perl 6 and Perl 5
PPTX
PLNOG 13: P. Kupisiewicz, O. Pelerin: Make IOS-XE Troubleshooting Easy – Pack...
PDF
How You Will Get Hacked Ten Years from Now
PDF
Hands on Experience with IPv6 Routing and Switching Services
PDF
SRv6 Mobile User Plane P4 proto-type
PPT
Low Latency SQL on Hadoop - What's best for your cluster
PDF
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
PDF
Apache Spark Best Practices Meetup Talk
PDF
Unlock cassandra data for application developers using graphQL
PPTX
USE_OF_PACKET_CAPTURE.pptx
PPTX
High Performance Flow Matching Architecture for Openflow Data Plane
PDF
Introduction to Spark with Python
PDF
Prod presentation0900aecd80312824
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
[Webinar Slides] Programming the Network Dataplane in P4
RSP4J: An API for RDF Stream Processing
Dongwon Kim – A Comparative Performance Evaluation of Flink
A Comparative Performance Evaluation of Apache Flink
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Accelerating apache spark with rdma
Perl family: 15 years of Perl 6 and Perl 5
PLNOG 13: P. Kupisiewicz, O. Pelerin: Make IOS-XE Troubleshooting Easy – Pack...
How You Will Get Hacked Ten Years from Now
Hands on Experience with IPv6 Routing and Switching Services
SRv6 Mobile User Plane P4 proto-type
Low Latency SQL on Hadoop - What's best for your cluster
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Apache Spark Best Practices Meetup Talk
Unlock cassandra data for application developers using graphQL
USE_OF_PACKET_CAPTURE.pptx
High Performance Flow Matching Architecture for Openflow Data Plane
Introduction to Spark with Python
Prod presentation0900aecd80312824
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Managing Community Partner Relationships
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
annual-report-2024-2025 original latest.
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Transcultural that can help you someday.
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
A Complete Guide to Streamlining Business Processes
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
New ISO 27001_2022 standard and the changes
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to Data Science and Data Analysis
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Microsoft Core Cloud Services powerpoint
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Managing Community Partner Relationships
IMPACT OF LANDSLIDE.....................
CYBER SECURITY the Next Warefare Tactics
IBA_Chapter_11_Slides_Final_Accessible.pptx
Qualitative Qantitative and Mixed Methods.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
annual-report-2024-2025 original latest.
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Transcultural that can help you someday.
Topic 5 Presentation 5 Lesson 5 Corporate Fin
A Complete Guide to Streamlining Business Processes
[EN] Industrial Machine Downtime Prediction
Pilar Kemerdekaan dan Identi Bangsa.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
New ISO 27001_2022 standard and the changes
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Data Science and Data Analysis
SAP 2 completion done . PRESENTATION.pptx
Microsoft Core Cloud Services powerpoint

Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf