SlideShare a Scribd company logo
Lessons Learned
Optimizing NoSQLfor
Apache Spark
John	Musser	@johnmusser	 /		Basho	@basho
Spark	Summit	Europe,	2016
NoSQL
NoSQL
Key-Value
Document
Columnar
Graph
NoSQL
Key-Value
Document
Columnar
Graph
+NoSQL
+NoSQL = ?
+ =NoSQL
+ =NoSQL
Howdoweturnthis…
+ =NoSQL
intothis?
+ =
Webuiltthis.
+ =
Webuiltthis.
Hereareourlessons…
Parallelize
Mapsmart
Optimizeallthelevels
Beflexible
Simplify
Riak
• Distributed,	key-value	NoSQL	database
• Known	for	scalability,	reliability,	ops	simplicity
• Launched	2009,	used	by	1/3	of	Fortune	50
• Open	source	(Apache),	on	GitHub
https://github.com/basho/riak/
• Enterprise	Edition,	see	multi-cluster	replication
Westarted to seeinourcustomer base…
Moredemandaround:
timeseries,
IoT,
metrics
Moredemandaround:
timeseries,
IoT,
metrics
RiakTS
Spark-Riak
Connector
Key-Value
data
Time	Series
data
Riak	KV Riak	TS
User	data
Session	data
Profile	data	
Log	data
IoT	/	Device	data
Metrics	data
Event	data
Streaming	data
Riak	Core
Released	
in	2016
Time	Series
data
Riak	TS
DDL	for	tables	(with	data	types)
SQL	subset	(with	filters	and	aggregations)
Fast	bulk	writes
Efficient	reads	via	“time	slice”	queries
Riak	Core
Released	
in	2016
Intellicore Sports	Data	Platform
• 1GB	telemetry	per	driver
• 400	packets/second
• 1.2M	packets/race
• Platform	setup	for	40,000	TPS
Spark Summit EU talk by John Musser
Spark-Riak	Connector
• Version	1.0:	published	Sept.	2015
• Current	version:	1.6,	published	Sept.	2016
• Scala	/	JVM	based
• Support	for	Java,	Scala,	Python
• Supports	Spark	1.6.x	
• Open	source	(Apache),	on	GitHub
https://github.com/basho/spark-riak-connector/
Enable	SQL	analytics	over	Riak
Use	Riak	to	store	results	generated	by	Spark
Use	Riak	to	store	streaming	data
READ WRITE STREAM
(this inturn uses learnings from
other connectors we’vebuilt...)
Parallelize
whenever possible
Spark Summit EU talk by John Musser
?
?
How	to	move	lots	of	data
quickly	and	efficiently?
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs
Using	Direct	Key-based	GETs	
Lesson	1(a):
Too	many	Gets	make	Spark	unhappy
Using	Secondary	Index	(2i)
Using	Secondary	Index	(2i)
Using	Secondary	Index	(2i)
Using	Secondary	Index	(2i)
Using	Secondary	Index	(2i)
Lesson	1(b):
Too	many	2i	queries	make	
Riak	unhappy
Using	Secondary	Index	(2i)
Coverage	Plan	+	Parallel	Extract
Coverage	Plan	+	Parallel	Extract
Coverage	Plan:
locations	of	data	across	cluster
Coverage	Plan	+	Parallel	Extract
Coverage	Plan	+	Parallel	Extract
Coverage	Plan	+	Parallel	Extract
Coverage	Plan	+	Parallel	Extract
Parallelization
Coverage	Plan	+	Parallel	Extract
Coverage	Plan	+	Parallel	Extract
Parallelization
Coverage	Plan	+	Parallel	Extract
=	everybody’s	happy
Besmart
about
datamapping
Key-Value	
&	Time	Series
Data DataFrames
RDDs
DataSets
Key-Value	
&	Time	Series
Data DataFrames
RDDs
DataSets
Plain	Text
XML
JSON
Binary
Key-Value	
&	Time	Series
Data DataFrames
RDDs
DataSets
?
Key-Value	
&	Time	Series
Data DataFrames
RDDs
DataSets
?
How	to	map	the	data	as
efficiently	and	seamlessly	
as	possible?
Properties:	"r":"quorum"
How	Riak	Stores	Data
MiscBucket
User123 122883|dave|…
Item17Z {
“color”:”blue”,
“size”:”small”,
…
}
LogoHD
Key ValueKeys
Values
Buckets
Bucket	Types
Specify	Bucket
RDDsLoad	Data
Key-Value	Data
Specify	Bucket
Load	Data
val kv_bucket = new Namespace(”MiscBucket")
val riakRdd =
sc.riakBucket[String](kv_bucket).queryAll()
Code: Key/Value Query
Query	by	Keys
Query	by	2i	Range
val rdd =
sc.riakBucket[String](kv_bucket_name)
.queryBucketKeys("Alice", "Bob", "Charlie")
val rdd =
sc.riakBucket[String](kv_bucket_name)
.query2iRange("myIndex", 1L, 5000L)
Code: Key/Value Query
Query	by	2i	Strings
val rdd =
sc.riakBucket[String](kv_bucket_name)
.query2iKeys("dailyDataIndx", ”Jan", ”Feb”)
Specify	Bucket
RDDsLoad	Data
Key-Value	Data
Which	is	“fine”,	but,	the	data	is	
still	a	bit	opaque…
Specify	Bucket
Load	Data
Key-Value	Data
Often	this	data	is	
stored	as	JSON
Specify	Bucket
Map	Schema
Key-Value	Data
Load	Data
We	can	tell	Spark	
how	to	interpret	
the	NoSQL	values
Specify	Bucket
Map	Schema
Key-Value	Data
Load	Data DataFrames
Specify	Bucket
Map	Schema
Key-Value	Data
Load	Data DataFrames
Now	we	have	full-fledged	
DataFrames
Specify	Bucket
Map	Schema
Load	Data
val kv_bucket = new Namespace(”MiscBucket")
case class UserData(
user_id: String, name: String, age: Int)
val riakRdd =
sc.riakBucket[UserData](kv_bucket).queryAll()
val df = riakRdd.toDF()
Code
Specify	Bucket
Map	Schema
Key-Value	Data
Load	Data
Specify	Bucket
Map	Schema
Time	Series	Data
Load	Data
Specify	Bucket
Map	Schema
Time	Series	Data
Load	Data
But	time	series	
data	already	has	
a	schema	defined
Specify	Bucket
Map	Schema
Time	Series	Data
Load	Data
So	let’s	use	
automatic	schema	
discovery instead
Specify	Bucket
Load	Data
Time	Series	Data
DataFrames
Specify	Table
Load	Data
val ts_table_name = "test-table"
df = sqlContext.re.option(
"spark.riak.connection.hosts",
"riak_host_ip:10017")
.format("org.apache.spark.sql.riak")
.load(ts_table_name)
.select(“time”, “col1”, “col2”)
.filter(s"time >= CAST($from AS TIMESTAMP)")
Time Series Code
Specify	Table
Load	Data
val ts_table_name = "test-table"
df = sqlContext.re.option(
"spark.riak.connection.hosts",
"riak_host_ip:10017")
.format("org.apache.spark.sql.riak")
.load(ts_table_name)
.select(“time”, “col1”, “col2”)
.filter(s"time >= CAST($from AS TIMESTAMP)")
Time Series Code
Use	Data
df.where(df("age") >= 50).select("id", "name")
df.groupBy(”age").count
Specify	Table
Load	Data
val ts_table_name = "test-table"
df = sqlContext.re.option(
"spark.riak.connection.hosts",
"riak_host_ip:10017")
.format("org.apache.spark.sql.riak")
.load(ts_table_name)
.select(“time”, “col1”, “col2”)
.filter(s"time >= CAST($from AS TIMESTAMP)")
Time Series Code
Use	Data
df.where(df("age") >= 50).select("id", "name")
df.groupBy(”age").count
Uses	the	Spark	
Data	Source	API
Optimize
allthelevels
Optimize
allthelayers
HTTP Protocol	Buffers
2primary interfaces toRiak
HTTP Protocol	Buffers
2primary interfaces toRiak
Flexibility Performance
Protocol	Buffers
• Data	serialization	and	interchange
• Developed	by	Google
• IDL	+	RPC	
• Messages	serialized	to	binary	wire	format
• Library	support	for	20+	languages
Protocol	Buffers
• For	data	serialization	and	interchange
• Originally	developed	by	Google
• IDL	+	RPC	
• Messages	serialized	to	binary	wire	format
• Library	support	for	20+	languages
Note:	In	Riak,	you	typically	don’t	
have	to	know	the	details,	the	client	
SDKs	take	care	of	it	for	you
HTTP Protocol	Buffers
Howmuch faster?
150-300%
faster
Spark Summit EU talk by John Musser
This	interaction	defaults	to	
using	Protocol	Buffers
to	optimize	performance
HTTP Protocol	Buffers
Whatif wecan makethis faster?
?
HTTP Protocol	Buffers Optimized	Binary
HTTP Protocol	Buffers
Spark-Riak Connector dynamically
selects basedon querytype
Optimized	Binary
Protocol	Buffers
BulkTS
Operations
Optimized	Binary
Other
Operations
Fetch
Query
Store
Protocol	Buffers
BulkTS
Operations
Optimized	Binary
Other
Operations
30-50%
increased
throughput
2use case-specific
optimizations
FullBucketReads
Riak	KV	supports	these	as	optimization:	
Give	me	all	the	data	in	this	bucket,
and	I’ll	work	with	it	over	here	in	Spark
Time-based Data Locality
Riak	TS	uses	a	time	based	‘quanta’	to	
intelligently	partition	data	across	the	
cluster	based	on	user-specified	time
Location,	location,	location
Key/Value	cluster	vnodes
PUT PUT
Time	Series	cluster	vnodes
Local	grouping	based	on	time	quanta
Write	to	same	vnode
Query	direct	to	data
Location,	location,	location
Key/Value	cluster	vnodes
PUT GET
Time	Series	cluster	vnodes
Local	grouping	based	on	time	quanta
Write	to	same	vnode
Query	direct	to	data
Riak	Time	Series	SQL
Define	
table
CREATE TABLE WEATHER (
region VARCHAR NOT NULL,
city VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
PRIMARY KEY(
(region, state, QUANTUM(time, 2, 'h')),
region, state, time
)
)
Riak	Time	Series	SQL
Define	
table
CREATE TABLE WEATHER (
region VARCHAR NOT NULL,
city VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
PRIMARY KEY(
(region, state, QUANTUM(time, 2, 'h')),
region, state, time
)
)
The	quantum	is	the	
tunable	key	to	
performance
Riak	Time	Series	SQL
SELECT * FROM WEATHER
WHERE city = ’Brussels'
time >= ‘2016-01-01’ AND
time <= ‘2016-02-01 00:00:00’
Define	
table
Query
CREATE TABLE WEATHER (
region VARCHAR NOT NULL,
city VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
PRIMARY KEY(
(region, state, QUANTUM(time, 2, 'h')),
region, state, time
)
)
Be
flexible
Be
polyglot
Supportmultiple languages
Python
import pyspark_riak
conf = pyspark.SparkConf().setAppName("My Spark Riak App")
conf.set("spark.riak.connection.host", "127.0.0.1:8087")
sc = pyspark.SparkContext(conf)
pyspark_riak.riak_context(sc)
Setup
Python
import pyspark_riak
conf = pyspark.SparkConf().setAppName("My Spark Riak App")
conf.set("spark.riak.connection.host", "127.0.0.1:8087")
sc = pyspark.SparkContext(conf)
pyspark_riak.riak_context(sc)
my_data = [{'key0':{'data': 0}}, {'key1':{'data': 1}}]
kv_write_rdd = sc.parallelize(my_data)
kv_write_rdd.saveToRiak(‘kv_sample_bucket’)
Setup
Write
Python
import pyspark_riak
conf = pyspark.SparkConf().setAppName("My Spark Riak App")
conf.set("spark.riak.connection.host", "127.0.0.1:8087")
sc = pyspark.SparkContext(conf)
pyspark_riak.riak_context(sc)
my_data = [{'key0':{'data': 0}}, {'key1':{'data': 1}}]
kv_write_rdd = sc.parallelize(my_data)
kv_write_rdd.saveToRiak(‘kv_sample_bucket’)
Setup
Write
Read
kv_read_rdd = sc.riakBucket(‘kv_sample_bucket’).queryAll()
print(kv_read_rdd.collect())
Spark	Streaming
Spark	Streaming
Setup
Stream
import com.basho.riak.spark.streaming._
val ssc = new StreamingContext(sparkConf, Seconds(1))
val lines = ssc.socketTextStream(serverIP, serverPort)
val errs = lines.filter(lines => lines contains "ERROR")
Spark-Riak Streaming
Save errs.saveToRiak("test-bucket-4store")
Deployment
Onpremise
Cloud
Hybrid
Geo-distributed
Deployment
Deployment
Deployment
https://github.com/basho-labs/riak-mesos
+
Simplify
Howtoreduce
friction?
Connector	hosted	at	Spark-Packages
https://spark-packages.org/package/basho/spark-riak-connector
https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html
Tutorial	notebook	on	Databricks.com
Don’tdie
FailureHandling
If	a	Riak	node	dies	during	data	retrieval,	
Spark	connector	will	request	an	
Alternative	Coverage	Plan
NextstepsforRiak-Spark?
Spark2.0
DataSets
Structured Streaming
So,back tothequestion…
+NoSQL = ?
+ =NoSQL
+ =NoSQL
Parallelize
Mapsmart
Optimizeallthelevels
Beflexible
Simplify
Thank You
@johnmusser
@basho
Photo	Credits
Race	car:	Spacesuit	Media
Intellicore application	screenshots:	 Intellicore,	http://www.intellicore.tv/

More Related Content

PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Spark Summit EU talk by Michael Nitschinger
PPTX
Simplifying Big Data Applications with Apache Spark 2.0
PDF
Spark Summit EU talk by Rolf Jagerman
PDF
Spark Summit EU talk by Bas Geerdink
PDF
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
PDF
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Michael Nitschinger
Simplifying Big Data Applications with Apache Spark 2.0
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Bas Geerdink
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit EU talk by Ruben Pulido Behar Veliqi

What's hot (20)

PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PDF
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
PDF
Spark Summit EU talk by Simon Whitear
PDF
Spark Summit EU talk by Yiannis Gkoufas
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
PPTX
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
PDF
Spark Summit EU talk by Jim Dowling
PDF
Degrading Performance? You Might be Suffering From the Small Files Syndrome
PDF
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
PDF
Spark Summit EU talk by Stephan Kessler
PDF
Spark Summit EU talk by Bas Geerdink
PDF
Spark Summit EU talk by Sital Kedia
PDF
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
PDF
Operational Tips For Deploying Apache Spark
PDF
Spark Summit EU talk by Christos Erotocritou
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
PDF
Spark Summit EU talk by Oscar Castaneda
Efficient State Management With Spark 2.0 And Scale-Out Databases
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit EU talk by Simon Whitear
Spark Summit EU talk by Yiannis Gkoufas
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit EU talk by Yaroslav Nedashkovsky and Andy Starzhinsky
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
Spark Summit EU talk by Jim Dowling
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Sital Kedia
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Operational Tips For Deploying Apache Spark
Spark Summit EU talk by Christos Erotocritou
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit EU talk by Oscar Castaneda
Ad

Viewers also liked (20)

PPTX
Spark Summit EU talk by Sameer Agarwal
PDF
Spark Summit EU talk by Luca Canali
PDF
Spark Summit EU talk by Qifan Pu
PDF
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
PDF
Spark Summit EU talk by Nimbus Goehausen
PDF
Spark Summit EU talk by Ross Lawley
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PDF
Spark Summit EU talk by Dean Wampler
PDF
Spark Summit EU talk by Herman van Hovell
PDF
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
PPTX
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
PPTX
Optimizing Apache Spark SQL Joins
PDF
Spark Summit EU talk by Elena Lazovik
PDF
Spark Summit EU talk by Larisa Sawyer
PPTX
Hundreds of queries in the time of one - Gianmario Spacagna
PDF
Spark Summit EU talk by Jakub Hava
PDF
Spark Summit EU talk by Miha Pelko and Til Piffl
PPTX
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
PDF
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Qifan Pu
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit EU talk by Nimbus Goehausen
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Herman van Hovell
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Optimizing Apache Spark SQL Joins
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Larisa Sawyer
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Miha Pelko and Til Piffl
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Spark Summit EU talk by Pat Patterson
Ad

Similar to Spark Summit EU talk by John Musser (20)

PDF
Hyperspace for Delta Lake
PDF
Hyperspace: An Indexing Subsystem for Apache Spark
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
PDF
Apache spark 2.4 and beyond
PDF
Headaches and Breakthroughs in Building Continuous Applications
PPTX
Spark Streaming with Azure Databricks
PDF
Web Scale Reasoning and the LarKC Project
PPTX
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
PDF
Chti jug - 2018-06-26
PDF
Jug - ecosystem
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PPTX
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
PPTX
Experience sql server on l inux and docker
PPTX
StackMate - CloudFormation for CloudStack
PDF
Introduction to Apache NiFi 1.11.4
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Thing you didn't know you could do in Spark
PDF
Realtime Analytics on AWS
PDF
Real time analytics at uber @ strata data 2019
PPTX
PowerStream Demo
Hyperspace for Delta Lake
Hyperspace: An Indexing Subsystem for Apache Spark
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Apache spark 2.4 and beyond
Headaches and Breakthroughs in Building Continuous Applications
Spark Streaming with Azure Databricks
Web Scale Reasoning and the LarKC Project
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Chti jug - 2018-06-26
Jug - ecosystem
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Experience sql server on l inux and docker
StackMate - CloudFormation for CloudStack
Introduction to Apache NiFi 1.11.4
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Thing you didn't know you could do in Spark
Realtime Analytics on AWS
Real time analytics at uber @ strata data 2019
PowerStream Demo

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Quality review (1)_presentation of this 21
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Moving the Public Sector (Government) to a Digital Adoption
Quality review (1)_presentation of this 21
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
Introduction to Knowledge Engineering Part 1
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Spark Summit EU talk by John Musser