Spark vs storm

Spark vs Storm
Trong-Ton PHAM
trongton@gmail.com

Batch vs Streaming
Spark
• Batch & micro-batch
processing
Storm
• Micro-batch & real-time
stream processing
Batch Streaming

Usability
Spark Storm
Production mode Since 2013 (UC Berkeley) Since 2011 (Twitter)
Implemented in Scala (In-memory processing) Clojure
API Language Java, Scala, Python Java, Scala, Clojure, others
Library components
SparkSQL
Spark Streaming
MLLib (Machine Learning)
GraphX (graph)
Stream
Spouts (read data stream)
Bolts (filters, joins)
Topologies

Hadoop compatibility
Spark Storm
Data sources HDFS, Hbase, Cassandra HDFS, Hbase, Kafka
Ressource Manager YARN, Mesos Mesos
Latency Few seconds < 1 second
Fault tolerance (every
record processed)
Exactly once At least once
Reliability
Improved reliability (Spark +
YARN)
Guarantees no data loss
(Storm + Kafka)

Supported distribution
N/A
Manual configuration needed Supported

Performance
• This is NOT an official benchmark in term of
performance of Spark and Storm
System Performance
Storm (Twitter) 10,000 records/s/node
Spark Streaming 400,000 records/s/node
Apache S4 7,000 records/s/node
Other Commericial Systems 100,000 records/s/node
http://www.cs.duke.edu/~kmoses/cps516/dstream.html

References
• http://xinhstechblog.blogspot.fr/2014/06/storm-
vs-spark-streaming-side-by-side.html
• https://www.linkedin.com/groups/Can-anyone-
share-some-experience-4158686.S.235367680
• http://www.slideshare.net/ptgoetz/apache-
storm-vs-spark-streaming
• http://www.slideshare.net/nathanmarz/storm-
distributed-and-faulttolerant-realtime-
computation
• Spark & Storm websites

Spark vs storm

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to Spark vs storm (20)

Recently uploaded (20)

Spark vs storm