SlideShare a Scribd company logo
Spark Streaming
Much easier than Storm
Replaces Storm spouts/bolts with Akka Actors
Better API(make time part of API) and integration
Hadoop 2.3/Spark 0.9.1
Sbt setup

Create a separate sbt project; sbt run

Includes the jars and sets the class path
− Batch and Streaming,
http://spark.apache.org/docs/latest/quick-start.html
− Create a project directory
− Add dependencies; scalaized maven

libraryDependencies += "org.apache.hadoop" %
"hadoop-client" % "2.3.0"

scalaVersion:="2.10.3"
Manage the sbt/scala versions locally
Maven setup

Run the demo using maven/eclipse

Easier, maven central to find jars/artifacts

Add the external libs using maven to local repo
and mvn package in spark source distro

Eclipse: add Scala Nature, Maven project
Demo

Connect to twitter stream and process
− Test Twitter4j connection w/Java first. Print out a
twitter stream

Batch Mode: sc.stop(); RealTime Streaming
stream.awaitTermination().

Dstream/scala lazy evaluation
− Create a stream using #:: like the recursive List
operator. (#iphone,1)#:(#andriod,3)#(#apple,10).
Unlike a list head/tail behave differently. Head is a
val.
Spark Streams

StreamingContext start scheduler
− JobScheduler.scala: starts JobGenerator and runs
them in a thread pool
− JobGenerator.scala: Starts event actor, checkpoint
writer, for each thread

Storage:
− DStream appends to blockgenerator
− BlockGenerator.scala: Spark BlockGenerator w/2
threads. On termination wait for blockpush thread to
join.
Kafka Streaming Demo

KafkaUtils/Consumer connection

IOItec connection lib

Need to add more features/testing for faults

Read source how to fill out params

Start zookeeper, start a producer, define a
topic, etc...
Send data from the producer
Demo Output showing console
producer to Spark Consumer
Producer/Executor
Match the broker-id in the server conf file with
groupID in the consumer call
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181
", "1", Map("testtopic" -> 1))
Producer
Use awaitTermination() to get infinite loop so you
can see what you enter into the producer; Start
w/1 executor
val stream = new StreamingContext("local[2]","TestObject", Seconds(1))
val kafkaMessages=
KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1))
//create 5 executors
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1))
kafkaMessages.print()
stream.start()
stream.awaitTermination()
Producer
Use awaitTermination() to get infinite loop so you
can see what you enter into the producer; Start
w/1 executor
val stream = new StreamingContext("local[2]","TestObject", Seconds(1))
val kafkaMessages=
KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1))
//create 5 executors
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1))
kafkaMessages.print()
stream.start()
stream.awaitTermination()

More Related Content

PPTX
Capistrano 3 Deployment
PDF
Server(less) Swift at SwiftCloudWorkshop 3
PPT
Capistrano - Deployment Tool
PPTX
Control your deployments with Capistrano
PDF
2014-10-30 Taverna 3 status
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
PDF
Final_Report_new (1)
PDF
Publishing AwsLlambda Logs Into SplunkCloud
Capistrano 3 Deployment
Server(less) Swift at SwiftCloudWorkshop 3
Capistrano - Deployment Tool
Control your deployments with Capistrano
2014-10-30 Taverna 3 status
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Final_Report_new (1)
Publishing AwsLlambda Logs Into SplunkCloud

What's hot (20)

PPTX
Deployment with capistrano
PPT
Learn ELK in docker
PPT
Capistrano
PPTX
Getting Started with Capistrano
PPTX
Capistrano - automate all the things
PPTX
Introducing Chef | An IT automation for speed and awesomeness
PDF
How to contribute Apache CloudStack
PPTX
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
PDF
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
PDF
Performance testing meets the cloud - Artem Shendrikov
PPT
Learn RabbitMQ with Python in 90mins
PDF
OpenNebula and SaltStack - OpenNebulaConf 2013
PDF
Using SaltStack to orchestrate microservices in application containers at Sal...
PPTX
DevOps Hackathon - Session 1: Vagrant
PDF
Docker {at,with} SignalFx
PDF
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
PDF
Running your Jenkins Infrastructure with ClusterHQ
PPTX
Container Monitoring with Sysdig
PPTX
Where is my scalable api?
PDF
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Deployment with capistrano
Learn ELK in docker
Capistrano
Getting Started with Capistrano
Capistrano - automate all the things
Introducing Chef | An IT automation for speed and awesomeness
How to contribute Apache CloudStack
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance testing meets the cloud - Artem Shendrikov
Learn RabbitMQ with Python in 90mins
OpenNebula and SaltStack - OpenNebulaConf 2013
Using SaltStack to orchestrate microservices in application containers at Sal...
DevOps Hackathon - Session 1: Vagrant
Docker {at,with} SignalFx
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
Running your Jenkins Infrastructure with ClusterHQ
Container Monitoring with Sysdig
Where is my scalable api?
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Ad

Viewers also liked (7)

PPT
Odersky week1 notes
ODP
Apache bigtopwg7142013
PPTX
Demographics andweblogtargeting
ODP
Bigtop june302013
PPT
Hadoop applicationarchitectures
ODP
Bigtop elancesmallrev1
ODP
Training
Odersky week1 notes
Apache bigtopwg7142013
Demographics andweblogtargeting
Bigtop june302013
Hadoop applicationarchitectures
Bigtop elancesmallrev1
Training
Ad

Similar to Spark Streaming Info (20)

PDF
Bellevue Big Data meetup: Dive Deep into Spark Streaming
PDF
Lessons Learned: Using Spark and Microservices
PPT
Spark streaming
PDF
Spark streaming State of the Union - Strata San Jose 2015
PPT
strata_spark_streaming.ppt
PDF
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
PPTX
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
PDF
Sparkstreaming
PDF
Comparing processing frameworks v7
PPTX
Spark Streaming & Kafka-The Future of Stream Processing
PPTX
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
PDF
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
PPTX
KDD 2016 Streaming Analytics Tutorial
PDF
Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...
PPTX
Real-time streaming and data pipelines with Apache Kafka
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PPTX
Kafka for data scientists
PDF
Strata NYC 2015: What's new in Spark Streaming
PPTX
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
PPT
strata_spark_streaming.ppt
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Lessons Learned: Using Spark and Microservices
Spark streaming
Spark streaming State of the Union - Strata San Jose 2015
strata_spark_streaming.ppt
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Sparkstreaming
Comparing processing frameworks v7
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
KDD 2016 Streaming Analytics Tutorial
Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...
Real-time streaming and data pipelines with Apache Kafka
Real time Analytics with Apache Kafka and Apache Spark
Kafka for data scientists
Strata NYC 2015: What's new in Spark Streaming
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
strata_spark_streaming.ppt

More from Doug Chang (6)

PPTX
BRV CTO Summit Deep Learning Talk
PPTX
DOC
Capital onehadoopclass
PPT
Capital onehadoopintro
PPTX
L'Oreal Tech Talk
PPTX
Hadoop/HBase POC framework
BRV CTO Summit Deep Learning Talk
Capital onehadoopclass
Capital onehadoopintro
L'Oreal Tech Talk
Hadoop/HBase POC framework

Recently uploaded (20)

PDF
Cost to Outsource Software Development in 2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
history of c programming in notes for students .pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Cost to Outsource Software Development in 2025
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Reimagine Home Health with the Power of Agentic AI​
Why Generative AI is the Future of Content, Code & Creativity?
history of c programming in notes for students .pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Wondershare Filmora 15 Crack With Activation Key [2025
iTop VPN Crack Latest Version Full Key 2025
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
Designing Intelligence for the Shop Floor.pdf
Weekly report ppt - harsh dattuprasad patel.pptx
Design an Analysis of Algorithms I-SECS-1021-03
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Complete Guide to Website Development in Malaysia for SMEs
iTop VPN Free 5.6.0.5262 Crack latest version 2025
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...

Spark Streaming Info

  • 1. Spark Streaming Much easier than Storm Replaces Storm spouts/bolts with Akka Actors Better API(make time part of API) and integration Hadoop 2.3/Spark 0.9.1
  • 2. Sbt setup  Create a separate sbt project; sbt run  Includes the jars and sets the class path − Batch and Streaming, http://spark.apache.org/docs/latest/quick-start.html − Create a project directory − Add dependencies; scalaized maven  libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"  scalaVersion:="2.10.3" Manage the sbt/scala versions locally
  • 3. Maven setup  Run the demo using maven/eclipse  Easier, maven central to find jars/artifacts  Add the external libs using maven to local repo and mvn package in spark source distro  Eclipse: add Scala Nature, Maven project
  • 4. Demo  Connect to twitter stream and process − Test Twitter4j connection w/Java first. Print out a twitter stream  Batch Mode: sc.stop(); RealTime Streaming stream.awaitTermination().  Dstream/scala lazy evaluation − Create a stream using #:: like the recursive List operator. (#iphone,1)#:(#andriod,3)#(#apple,10). Unlike a list head/tail behave differently. Head is a val.
  • 5. Spark Streams  StreamingContext start scheduler − JobScheduler.scala: starts JobGenerator and runs them in a thread pool − JobGenerator.scala: Starts event actor, checkpoint writer, for each thread  Storage: − DStream appends to blockgenerator − BlockGenerator.scala: Spark BlockGenerator w/2 threads. On termination wait for blockpush thread to join.
  • 6. Kafka Streaming Demo  KafkaUtils/Consumer connection  IOItec connection lib  Need to add more features/testing for faults  Read source how to fill out params  Start zookeeper, start a producer, define a topic, etc... Send data from the producer
  • 7. Demo Output showing console producer to Spark Consumer
  • 8. Producer/Executor Match the broker-id in the server conf file with groupID in the consumer call val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181 ", "1", Map("testtopic" -> 1))
  • 9. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()
  • 10. Producer Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages= KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()