SlideShare a Scribd company logo
Mariusz Gil

Data streams

processing with

STORM
data expire fast. very fast
Streams processing with Storm
realtime processing?
Storm is a free and open source distributed realtime
computation system. Storm makes it easy to reliably
process unbounded streams of data, doing for realtime
processing what Hadoop did for batch processing.
Storm is fast, a benchmark clocked it at over a million
tuples processed per second per node. It is scalable,
fault-tolerant, guarantees your data will be processed,
and is easy to set up and operate.
concept architecture
tuple
tuple
tuple
tuple
tuple
tuple
tuple

(val1, val2)
(val3, val4)
(val5, val6)

Stream

unbounded sequence of tuples
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams
tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple
tuple
tuple

tuple
tuple
tuple
tuple

Reliable and unreliable Spouts
replay or forget about touple
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-Kafka
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-Kestrel
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-AMQP-Spout
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-JMS
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-PubSub*
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple
tuple

Spouts

tuple
tuple

source of streams

Storm-Beanstalkd-Spout
tuple
tuple

tuple
tuple

tuple

tuple

tuple

tuple

tuple

tuple

tuple
tuple

tuple
tuple

Bolts

process input streams and produce new streams
tuple
tuple

tuple
tuple

tuple

tuple

tuple

tuple

tuple

le

tuple

tup

le

le
tup

tup

le

le
tup

tuple

process input streams and produce new streams

tuple

tuple

tuple

tuple

Bolts

tuple

tuple

tuple

tup

tup

le

le
tup

tuple

tuple
tuple
TextSpout

[word, count]

[word]

[sentence]

SplitSentenceBolt

Topologies

WordCountBolt

network of spouts and bolts
[sentence]

[word, count]

TextSpout

SplitSentenceBolt

[word]

[sentence]

WordCountBolt

TextSpout

SplitSentenceBolt

Topologies

network of spouts and bolts

xyzBolt
servers architecture
Nimbus

process responsible for distributing processing across the cluster
Supervisors

worker process responsible for executing subset of topology
zookeepers

coordination layer between Nimbus and Supervisors
fast
il
a
f
CLUSTER STATE IS STORED
LOCALLY OR IN ZOOKEEPERS
sample code
public class RandomSentenceSpout extends BaseRichSpout {
SpoutOutputCollector _collector;
Random _rand;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_rand = new Random();
}
@Override
public void nextTuple() {
Utils.sleep(100);
String[] sentences = new String[] {
"the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature"};
String sentence = sentences[_rand.nextInt(sentences.length)];
_collector.emit(new Values(sentence));
}
@Override
public void ack(Object id) {
}
@Override
public void fail(Object id) {
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}

Spouts
}
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null) count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}

Bolts
public static class ExclamationBolt implements IRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
public Map getComponentConfiguration() {
return null;
}
}

Bolts
public class WordCountTopology {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
}

Topology
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}

import storm
class SplitSentenceBolt(storm.BasicBolt):
def process(self, tup):
words = tup.values[0].split(" ")
for word in words:
storm.emit([word])
SplitSentenceBolt().run()

Bolts
github.com/nathanmarz/storm-starter
streams groupping
public class WordCountTopology {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
}

Topology
Groupping

shuffle, fields, all, global, none, direct, local or shuffle
distributed rpc
[request-id, results]

results

arguments

RPC
distributed

[request-id, arguments]
public static class ExclaimBolt extends BaseBasicBolt {
public void execute(Tuple tuple, BasicOutputCollector collector) {
[request-id, results]
String input = tuple.getString(1);
collector.emit(new Values(tuple.getValue(0), input + "!"));
}

RPC

public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "result"));
}

results

}

public static void main(String[] args) throws Exception {
LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation");
builder.addBolt(new ExclaimBolt(), 3);

arguments

LocalDRPC drpc = new LocalDRPC();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc));

distributed

System.out.println("Results for 'hello':" + drpc.execute("exclamation", "hello"));

cluster.shutdown();
drpc.shutdown();

}

[request-id, arguments]
realtime analytics
personalization
search
revenue
optimization
monitoring
content search
realtime analytics
generating feeds
integrated with
elastic search,
Hbase,hadoop
and hdfs
realtime scoring
moments generation
integrated with
kafka queues and
hdfs storage
Storm-YARN enables
Storm applications to
utilize the
computational
resources in a Hadoop
cluster along with
accessing Hadoop
storage resources
such As HBase and
HDFS
thanks!
mail: mariusz@mariuszgil.pl
twitter: @mariuszgil

More Related Content

PDF
Real time and reliable processing with Apache Storm
PDF
Storm Anatomy
PDF
Storm - As deep into real-time data processing as you can get in 30 minutes.
PDF
Introduction to Twitter Storm
PPTX
PDF
Distributed Realtime Computation using Apache Storm
PDF
Introduction to Apache Storm
PDF
Hadoop Summit Europe 2014: Apache Storm Architecture
Real time and reliable processing with Apache Storm
Storm Anatomy
Storm - As deep into real-time data processing as you can get in 30 minutes.
Introduction to Twitter Storm
Distributed Realtime Computation using Apache Storm
Introduction to Apache Storm
Hadoop Summit Europe 2014: Apache Storm Architecture

What's hot (19)

PDF
Apache Storm Tutorial
PPTX
Improved Reliable Streaming Processing: Apache Storm as example
PPTX
Cassandra and Storm at Health Market Sceince
PPTX
Slide #1:Introduction to Apache Storm
PPTX
Stream Processing Frameworks
PDF
Storm and Cassandra
PDF
Storm Real Time Computation
PPTX
Apache Storm 0.9 basic training - Verisign
PPTX
Apache Storm and twitter Streaming API integration
PDF
Apache Storm
PDF
Distributed real time stream processing- why and how
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
PPTX
Michael Häusler – Everyday flink
PDF
Introduction to Apache Storm - Concept & Example
PPTX
Introduction to Storm
PDF
Storm
PDF
Storm: The Real-Time Layer - GlueCon 2012
PDF
QConSF 2014 talk on Netflix Mantis, a stream processing system
Apache Storm Tutorial
Improved Reliable Streaming Processing: Apache Storm as example
Cassandra and Storm at Health Market Sceince
Slide #1:Introduction to Apache Storm
Stream Processing Frameworks
Storm and Cassandra
Storm Real Time Computation
Apache Storm 0.9 basic training - Verisign
Apache Storm and twitter Streaming API integration
Apache Storm
Distributed real time stream processing- why and how
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Michael Häusler – Everyday flink
Introduction to Apache Storm - Concept & Example
Introduction to Storm
Storm
Storm: The Real-Time Layer - GlueCon 2012
QConSF 2014 talk on Netflix Mantis, a stream processing system
Ad

Viewers also liked (13)

PDF
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
PDF
Asynchronous stream processing with Akka Streams
PDF
Akka streams
PDF
Reactive Streams, j.u.concurrent & Beyond!
PDF
18 Data Streams
PDF
Reactive Streams 1.0.0 and Why You Should Care (webinar)
PDF
Reactive Stream Processing with Akka Streams
PDF
Reactive Streams: Handling Data-Flow the Reactive Way
PDF
Akka Streams and HTTP
PDF
A dive into akka streams: from the basics to a real-world scenario
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
PPTX
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
PDF
Introduction to Kafka Streams
[Japanese] How Reactive Streams and Akka Streams change the JVM Ecosystem @ R...
Asynchronous stream processing with Akka Streams
Akka streams
Reactive Streams, j.u.concurrent & Beyond!
18 Data Streams
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Stream Processing with Akka Streams
Reactive Streams: Handling Data-Flow the Reactive Way
Akka Streams and HTTP
A dive into akka streams: from the basics to a real-world scenario
Reducing Microservice Complexity with Kafka and Reactive Streams
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introduction to Kafka Streams
Ad

Similar to Streams processing with Storm (20)

PPTX
Storm is coming
PDF
Storm - The Real-Time Layer Your Big Data's Been Missing
PDF
Developing Java Streaming Applications with Apache Storm
PDF
BWB Meetup: Storm - distributed realtime computation system
PDF
The Future of Apache Storm
PPTX
실시간 인벤트 처리
PDF
Apache PIG - User Defined Functions
PPTX
Real-Time Big Data with Storm, Kafka and GigaSpaces
PDF
Processing Big Data in Real-Time - Yanai Franchi, Tikal
PDF
PPTX
PDF
Kotlin: forse è la volta buona (Trento)
PDF
The Future of Apache Storm
PDF
Storm introduction
PPTX
Faster Workflows, Faster
PPTX
storm-170531123446.pptx
PDF
Introducción a hadoop
PDF
Distributed Real-Time Stream Processing: Why and How 2.0
PDF
Distributed Stream Processing - Spark Summit East 2017
Storm is coming
Storm - The Real-Time Layer Your Big Data's Been Missing
Developing Java Streaming Applications with Apache Storm
BWB Meetup: Storm - distributed realtime computation system
The Future of Apache Storm
실시간 인벤트 처리
Apache PIG - User Defined Functions
Real-Time Big Data with Storm, Kafka and GigaSpaces
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Kotlin: forse è la volta buona (Trento)
The Future of Apache Storm
Storm introduction
Faster Workflows, Faster
storm-170531123446.pptx
Introducción a hadoop
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Stream Processing - Spark Summit East 2017

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
sap open course for s4hana steps from ECC to s4
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Programs and apps: productivity, graphics, security and other tools
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Streams processing with Storm