Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf

Gregory Essertel, Purdue University
Tiark Rompf, Purdue University
Flare and TensorFlare: Native
Compilation for Spark and
TensorFlow Pipelines
#Res5SAIS
1

How Fast Is Spark?
#Res4SAIS 6

Spark Architecture
#Res4SAIS 9

Flare: a New Back-end for Spark
#Res4SAIS 10

Single-Core Running Time: TPCH
Absolute running time in milliseconds (ms) for Postgres, Spark, HyPer and Flare in SF10
#Res4SAIS 12

Apache Parquet Format
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
Spark CSV 16762 12244 21730 19836 19316 12278 24484 17726 30050 29533 5224
Spark Parquet 3728 13520 9099 6083 8706 535 13555 5512 19413 21822 3926
Flare CSV 641 168 757 698 758 568 788 875 1417 854 128
Flare Parquet 187 17 125 127 151 99 183 160 698 309 9
Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Spark CSV 21688 8554 12962 26721 12941 24690 27012 12409 19369 57330 7050
Spark Parquet 5570 7034 719 4506 21834 5176 6757 2681 8562 25089 5295
Flare CSV 701 388 573 551 150 1426 1229 605 792 1868 178
Flare Parquet 133 246 86 88 66 264 181 178 165 324 22
#Res4SAIS 13

What about parallelism?
#Res4SAIS 14

Parallel Scaling Experiment
Scaling-up Flare and Spark
SQL in SF20
Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
#Res4SAIS 15

NUMA Optimization
#Res4SAIS 16

NUMA Optimization
Scaling-up Flare for SF100 with NUMA optimizations on different configurations: threads pinned to one, two and four sockets
Hardware: Single NUMA machine with 4 sockets, 18 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
#Res4SAIS 17

Heterogeneous Workloads:
UDFs and ML Kernels
#Res4SAIS 18

TensorFlow -> TensorFlare
#Res4SAIS 19

TensorFlare architecture
#Res4SAIS 20
Flare
TensorFlow Model
Specialized data loading
TensorFlow
Runtime
XLA
HDD
SQL
Engine
produces

Thank You!
Web: flaredata.github.io
Twitter: @flaredata
FLARE
24

Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf

More Related Content

Similar to Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf (20)

More from Databricks (20)

Recently uploaded (20)

Flare and TensorFlare: Native Compilation for Spark and TensorFlow Pipelines with Gregory Essertel and Tiark Rompf