Storm and Cassandra
Cassandra NYC Meetup 11/5/2013
Jake Luciani (@tjake)
What is Storm?
•

Distributed event processor

•

Provides constructs to reliably process all events

•

Simple conceptual model

•

New to Apache Incubator:
http://wiki.apache.org/incubator/StormProposal
Storm Concepts
Spout - Collects work and submits it to be processed.
Tracks success or failure of each tuple.

…

Tuple - A collection of data that is passed within storm.

Bolt - Processes tuples and optionally emits more tuples.
Stream - Identifies outputs from a Spout/Bolt.
Forces tuples have some declared structure.
Storm Topologies
A directed graph of spouts and bolts connected via streams

A-F
G-P

Firehose

Zookeeper

Q-Z

Host A

Host B

Host C

Cassandra
(optional)
Example Topologies

•

Track the top 10 most popular links being shared in the
last N minutes.
Where does data end up?
•

Storm supports built in RPC so client requests can
effectively become a spout.
!

•

Put the data into a database…

•

Why Cassandra though?
Why Cassandra?

•

Cassandra’s Data model allows incremental
modifications to rows.

•

Different bolts can update different parts of a
Cassandra row asynchronously.
Example
StormScraper!
A web crawling system built on
Storm + Cassandra
!
http://github.com/tjake/stormscraper
StormScraper C* DataModel
!

CREATE TABLE scrape_list (
url text PRIMARY KEY,
last_update timestamp,
depth int
);

CREATE TABLE pages (
url text,
scrape_date timestamp,
title text,
html text,
text text,
inbound_links set<text>,
outbound_links set<text>,
PRIMARY KEY (url, scrape_date)
);
StormScraper Topology
StormScraper Topology

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
StormScraper Topology
Fail
Html Writer

Url
Spout

Scraper
Bolt

Link Writer

Text
Extraction
Bolt
Cassandra

Text Writer
Code Walkthrough
http://github.com/tjake/
stormscraper
Storm Summary

•

Powerful

•

But easy to make mistakes
•

Wrong tuple expectation, names, types

•

Bad topology wiring
Thank You!
Q&A?

More Related Content

PDF
Apache Storm Concepts
PPTX
Multi-Tenant Storm Service on Hadoop Grid
PPTX
Storm-on-YARN: Convergence of Low-Latency and Big-Data
PPTX
Apache Storm Internals
PPTX
Apache Storm 0.9 basic training - Verisign
PPTX
Introduction to Storm
PDF
Hadoop Summit Europe 2014: Apache Storm Architecture
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
Apache Storm Concepts
Multi-Tenant Storm Service on Hadoop Grid
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Apache Storm Internals
Apache Storm 0.9 basic training - Verisign
Introduction to Storm
Hadoop Summit Europe 2014: Apache Storm Architecture
Real-Time Big Data at In-Memory Speed, Using Storm

What's hot (19)

PDF
Learning Stream Processing with Apache Storm
PPTX
Slide #1:Introduction to Apache Storm
PPTX
Scaling Apache Storm (Hadoop Summit 2015)
PDF
PHP Backends for Real-Time User Interaction using Apache Storm.
PDF
Introduction to Twitter Storm
PDF
Storm: The Real-Time Layer - GlueCon 2012
PDF
Real-time Big Data Processing with Storm
PPTX
Yahoo compares Storm and Spark
PDF
Storm Real Time Computation
PDF
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
PPTX
Resource Aware Scheduling in Apache Storm
PDF
Introduction to Apache Storm
PDF
Introduction to Apache Storm - Concept & Example
PPTX
Cassandra and Storm at Health Market Sceince
PPTX
PDF
Realtime Analytics with Storm and Hadoop
PPS
Storm presentation
PPTX
Spark vs storm
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
Learning Stream Processing with Apache Storm
Slide #1:Introduction to Apache Storm
Scaling Apache Storm (Hadoop Summit 2015)
PHP Backends for Real-Time User Interaction using Apache Storm.
Introduction to Twitter Storm
Storm: The Real-Time Layer - GlueCon 2012
Real-time Big Data Processing with Storm
Yahoo compares Storm and Spark
Storm Real Time Computation
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Resource Aware Scheduling in Apache Storm
Introduction to Apache Storm
Introduction to Apache Storm - Concept & Example
Cassandra and Storm at Health Market Sceince
Realtime Analytics with Storm and Hadoop
Storm presentation
Spark vs storm
Scaling Apache Storm - Strata + Hadoop World 2014
Ad

Viewers also liked (18)

PDF
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PDF
Kafka and Storm - event processing in realtime
PPTX
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
PDF
Real Time Data Streaming using Kafka & Storm
PPTX
Aids and the_duty_to_warn-1
PDF
The Modern Web Part 4: Cloud Computing
PDF
KDB database (EPAM tech talks, Sofia, April, 2015)
KEY
Actors and Threads
PDF
Asynchronous stream processing with Akka Streams
PPTX
Kafka replication apachecon_2013
PDF
Cassandra + Spark + Elk
PDF
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
PPT
A Well Structured Essay
PPTX
Resource Aware Scheduling in Apache Storm
PDF
Storm: distributed and fault-tolerant realtime computation
PDF
Apache storm vs. Spark Streaming
PPTX
Apache Beam
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Real-Time Analytics with Kafka, Cassandra and Storm
Kafka and Storm - event processing in realtime
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Real Time Data Streaming using Kafka & Storm
Aids and the_duty_to_warn-1
The Modern Web Part 4: Cloud Computing
KDB database (EPAM tech talks, Sofia, April, 2015)
Actors and Threads
Asynchronous stream processing with Akka Streams
Kafka replication apachecon_2013
Cassandra + Spark + Elk
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
A Well Structured Essay
Resource Aware Scheduling in Apache Storm
Storm: distributed and fault-tolerant realtime computation
Apache storm vs. Spark Streaming
Apache Beam
Ad

Similar to Storm and Cassandra (20)

PDF
All Day DevOps - FLiP Stack for Cloud Data Lakes
PDF
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PDF
Apache Storm
PPTX
Cleveland HUG - Storm
PPTX
Apache storm
PDF
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
PPTX
Connecting kafka message systems with scylla
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
PDF
Introduction to Apache NiFi And Storm
PPTX
Spark Summit EU talk by Sameer Agarwal
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
PDF
Anomaly Detection at Scale
PPTX
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
PDF
Streaming ETL with Apache Kafka and KSQL
PDF
Netflix Keystone—Cloud scale event processing pipeline
PDF
Huawei Advanced Data Science With Spark Streaming
PPTX
Introduction to Storm
All Day DevOps - FLiP Stack for Cloud Data Lakes
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Apache Storm
Cleveland HUG - Storm
Apache storm
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Connecting kafka message systems with scylla
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Introduction to Apache NiFi And Storm
Spark Summit EU talk by Sameer Agarwal
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Anomaly Detection at Scale
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Streaming ETL with Apache Kafka and KSQL
Netflix Keystone—Cloud scale event processing pipeline
Huawei Advanced Data Science With Spark Streaming
Introduction to Storm

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
Tartificialntelligence_presentation.pptx
PPT
Geologic Time for studying geology for geologist
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Architecture types and enterprise applications.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPT
What is a Computer? Input Devices /output devices
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
August Patch Tuesday
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Group 1 Presentation -Planning and Decision Making .pptx
DP Operators-handbook-extract for the Mautical Institute
Tartificialntelligence_presentation.pptx
Geologic Time for studying geology for geologist
sustainability-14-14877-v2.pddhzftheheeeee
NewMind AI Weekly Chronicles – August ’25 Week III
Taming the Chaos: How to Turn Unstructured Data into Decisions
Zenith AI: Advanced Artificial Intelligence
Architecture types and enterprise applications.pdf
A comparative study of natural language inference in Swahili using monolingua...
What is a Computer? Input Devices /output devices
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
August Patch Tuesday
Getting Started with Data Integration: FME Form 101
Final SEM Unit 1 for mit wpu at pune .pptx
A novel scalable deep ensemble learning framework for big data classification...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
observCloud-Native Containerability and monitoring.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...

Storm and Cassandra