SlideShare a Scribd company logo
1
Stream Processing
in the Cloud
Rafał Leszko (@RafalLeszko)
Cloud Software Engineer at Hazelcast
Hands Up
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
● ...you have ever used Stream Processing?
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
● ...you have ever used Stream Processing?
● ...you have ever used Hazelcast Jet?
Agenda
● Part 1: Stream Processing Basics
○ What is Stream Processing and Hazelcast Jet?
○ Example: Word Count
● Part 2: Jet Under the Hood
○ How does it work?
○ Infinite Streams
○ Example: Twitter Cryptocurrency Analysis
● Part 3: Jet in the Cloud
○ Cloud (Kubernetes) integration
○ Example: Stock Trade Aggregator
● Part 4: Jet Features & Use Cases
○ Why would I need it?
○ Example: Web Crawler
Part 1: Stream Processing Basics
What is Hazelcast?
What is Hazelcast?
Products:
What is Hazelcast?
Products:
What is Hazelcast Jet?
What is Hazelcast Jet?
DAG - Direct Acyclic Graph
What is Hazelcast Jet?
What is Hazelcast Jet?
Example 1: Word Count
Problem:
Count the number of occurrences of each word in the given text.
Sample Input:
Lorem ipsum dolor, dolor.
Sample Output:
lorem=1
ipsum=1
dolor=2
Example 1: Word Count
Pure Java
Pattern delimiter = Pattern.compile("W+");
return lines.entrySet().stream()
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> Arrays.stream(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.collect(
groupingBy(
identity(),
counting()));
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Pure Java
Pattern delimiter = Pattern.compile("W+");
return lines.entrySet().stream()
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> Arrays.stream(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.collect(
groupingBy(
identity(),
counting()));
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Demo:
https://github.com/hazelcast/hazelcast-jet-code-samples
Part 2: Jet Under the Hood
How does it work?
How does it work?
How does it work?
How does it work?
Under the Hood:
● Generate DAG representation from Pipeline
● Serialize DAG
● Send DAG to every Node
● Deserialize DAG
● Executes DAG on each Node
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Infinite Streams
Infinite Streams
Examples:
● Currency Exchange Rates
● Tweets from Twitter
● Events in some Event-Based system
● ...
Windowing
pipeline.drawFrom(...)
.withNativeTimestamps(0)
.window(sliding(30_000, 10_000))
Example 2: Twitter Cryptocurrency Analysis
Problem:
Present in real-time the sentiments about cryptocurrencies
Input:
Tweets are streamed from Twitter and categorized by coin type
(BTC, ETC, XRP, etc)
Output:
Tweets sentiments (last 30 sec, last minute, last 5 minutes)
Example 2: Twitter Cryptocurrency Analysis
Demo:
https://jet.hazelcast.org/demos/
Part 3: Jet in the Cloud
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: deploying on k8
Jet in the Cloud: deploying on k8
$ helm install stable/hazelcast-jet
Jet in the Cloud: deploying on k8
$ kubectl scale <name> --replicas=6
Example 3: Stock Trade Aggregator
Problem:
Present in real-time the aggregated trade price of stocks
Input:
Stock trades with name and price
Output:
Sum of prices per stock name
Example 3: Stock Trade Aggregator
Demo:
https://github.com/hazelcast/hazelcast-jet-code-samples/t
ree/master/integration/kubernetes
Part 4: Jet Features & Use Cases
Jet Features
Categories of Features
● Easy to Use
● Performance
Jet Features: Performance
Jet Features: Performance
Jet Features: other features
Why would I need it?
● Big Data Projects
Why would I need it?
● Big Data Projects
● Speed up Everything
Why would I need it?
● Big Data Projects
● Speed up Everything
Example 4: Web Crawler
Problem:
Parse all blog posts from the webpage
Input:
URL of Blog Trips
Output:
All the content from the Blog
Example 4: Web Crawler
Demo:
https://github.com/leszko/geodump
Thank You!

More Related Content

PDF
Building Conclave: a decentralized, real-time collaborative text editor
PDF
DConf 2016: Keynote by Walter Bright
PDF
Richard Salter: Using the Titanium OpenGL Module
PDF
Toy Model Overview
PDF
Declarative Infrastructure Tools
PDF
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
PDF
Asynchronous single page applications without a line of HTML or Javascript, o...
PPT
NS2 Object Construction
Building Conclave: a decentralized, real-time collaborative text editor
DConf 2016: Keynote by Walter Bright
Richard Salter: Using the Titanium OpenGL Module
Toy Model Overview
Declarative Infrastructure Tools
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
Asynchronous single page applications without a line of HTML or Javascript, o...
NS2 Object Construction

What's hot (20)

PPT
20100712-OTcl Command -- Getting Started
PDF
Parallel computing with GPars
ODP
LCDS - State Presentation
PDF
How the Go runtime implement maps efficiently
PPTX
TypeScript
PPTX
Іван Лаврів "Transducers for ruby developers"
PPT
NS2: Binding C++ and OTcl variables
PPT
jimmy hacking (at) Microsoft
PDF
Kotlin workshop 2018-06-11
PDF
Cilk - An Efficient Multithreaded Runtime System
PDF
Incremental and parallel computation of structural graph summaries for evolvi...
PPTX
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
PPTX
Object Detection with Tensorflow
PDF
Open GL Programming Training Session I
PDF
Internship - Final Presentation (26-08-2015)
PDF
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
KEY
packet destruction in NS2
PPT
NS2 Shadow Object Construction
PDF
Gaucheで本を作る
20100712-OTcl Command -- Getting Started
Parallel computing with GPars
LCDS - State Presentation
How the Go runtime implement maps efficiently
TypeScript
Іван Лаврів "Transducers for ruby developers"
NS2: Binding C++ and OTcl variables
jimmy hacking (at) Microsoft
Kotlin workshop 2018-06-11
Cilk - An Efficient Multithreaded Runtime System
Incremental and parallel computation of structural graph summaries for evolvi...
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
Object Detection with Tensorflow
Open GL Programming Training Session I
Internship - Final Presentation (26-08-2015)
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
packet destruction in NS2
NS2 Shadow Object Construction
Gaucheで本を作る
Ad

Similar to Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019 (20)

PDF
Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018
PPTX
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
PPTX
In-Memory Stream Processing with Hazelcast Jet @JEEConf
PPTX
Stream Processing and Real-Time Data Pipelines
PDF
[Jfokus] Riding the Jet Streams
PPTX
Hazelcast Jet v0.4 - August 9, 2017
PPTX
Hazelcast Jet - January 08, 2018
PPTX
JEEConf 2017 - In-Memory Data Streams With Hazelcast Jet
PDF
Low latency stream processing with jet
PPTX
vJUG - Introduction to data streaming
PPTX
JUG Tirana - Introduction to data streaming
PPTX
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
PPTX
JUG SF - Introduction to data streaming
PDF
Event Streaming in Academia With John Desjardins | Current 2022
PPTX
SCALE - Stream processing and Open Data, a match made in Heaven
PPTX
WaJUG - Introduction to data streaming
PPTX
BruJUG - Introduction to data streaming
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PDF
Distributed computing with Hazelcast - JavaOne 2014
Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @JEEConf
Stream Processing and Real-Time Data Pipelines
[Jfokus] Riding the Jet Streams
Hazelcast Jet v0.4 - August 9, 2017
Hazelcast Jet - January 08, 2018
JEEConf 2017 - In-Memory Data Streams With Hazelcast Jet
Low latency stream processing with jet
vJUG - Introduction to data streaming
JUG Tirana - Introduction to data streaming
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
JUG SF - Introduction to data streaming
Event Streaming in Academia With John Desjardins | Current 2022
SCALE - Stream processing and Open Data, a match made in Heaven
WaJUG - Introduction to data streaming
BruJUG - Introduction to data streaming
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Distributed computing with Hazelcast - JavaOne 2014
Ad

More from Rafał Leszko (20)

PDF
Build Your Kubernetes Operator with the Right Tool!
PDF
Mutation Testing with PIT
PDF
Distributed Locking in Kubernetes
PDF
Architectural patterns for high performance microservices in kubernetes
PDF
Architectural caching patterns for kubernetes
PDF
Architectural patterns for caching microservices
PDF
Mutation testing with PIT
PDF
[jLove 2020] Where is my cache architectural patterns for caching microservi...
PDF
Where is my cache architectural patterns for caching microservices by example
PDF
Architectural caching patterns for kubernetes
PDF
Build your operator with the right tool
PDF
5 levels of high availability from multi instance to hybrid cloud
PDF
Where is my cache? Architectural patterns for caching microservices by example
PDF
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
PDF
Where is my cache architectural patterns for caching microservices by example
PDF
Where is my cache architectural patterns for caching microservices by example
PDF
Where is my cache? Architectural patterns for caching microservices by example
PDF
[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...
PDF
Where is my cache? Architectural patterns for caching microservices by example
PDF
Mutation Testing - Voxxed Days Cluj-Napoca 2017
Build Your Kubernetes Operator with the Right Tool!
Mutation Testing with PIT
Distributed Locking in Kubernetes
Architectural patterns for high performance microservices in kubernetes
Architectural caching patterns for kubernetes
Architectural patterns for caching microservices
Mutation testing with PIT
[jLove 2020] Where is my cache architectural patterns for caching microservi...
Where is my cache architectural patterns for caching microservices by example
Architectural caching patterns for kubernetes
Build your operator with the right tool
5 levels of high availability from multi instance to hybrid cloud
Where is my cache? Architectural patterns for caching microservices by example
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
Where is my cache architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by example
[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...
Where is my cache? Architectural patterns for caching microservices by example
Mutation Testing - Voxxed Days Cluj-Napoca 2017

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing

Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019

  • 1. 1 Stream Processing in the Cloud Rafał Leszko (@RafalLeszko) Cloud Software Engineer at Hazelcast
  • 3. Hands Up Raise your hand if… ● ...you know what Stream Processing is?
  • 4. Hands Up Raise your hand if… ● ...you know what Stream Processing is? ● ...you have ever used Stream Processing?
  • 5. Hands Up Raise your hand if… ● ...you know what Stream Processing is? ● ...you have ever used Stream Processing? ● ...you have ever used Hazelcast Jet?
  • 6. Agenda ● Part 1: Stream Processing Basics ○ What is Stream Processing and Hazelcast Jet? ○ Example: Word Count ● Part 2: Jet Under the Hood ○ How does it work? ○ Infinite Streams ○ Example: Twitter Cryptocurrency Analysis ● Part 3: Jet in the Cloud ○ Cloud (Kubernetes) integration ○ Example: Stock Trade Aggregator ● Part 4: Jet Features & Use Cases ○ Why would I need it? ○ Example: Web Crawler
  • 7. Part 1: Stream Processing Basics
  • 12. What is Hazelcast Jet? DAG - Direct Acyclic Graph
  • 15. Example 1: Word Count Problem: Count the number of occurrences of each word in the given text. Sample Input: Lorem ipsum dolor, dolor. Sample Output: lorem=1 ipsum=1 dolor=2
  • 16. Example 1: Word Count Pure Java Pattern delimiter = Pattern.compile("W+"); return lines.entrySet().stream() .map(e -> e.getValue().toLowerCase()) .flatMap(t -> Arrays.stream(delimiter.split(t))) .filter(word -> !word.isEmpty()) .collect( groupingBy( identity(), counting()));
  • 22. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 23. Example 1: Word Count Pure Java Pattern delimiter = Pattern.compile("W+"); return lines.entrySet().stream() .map(e -> e.getValue().toLowerCase()) .flatMap(t -> Arrays.stream(delimiter.split(t))) .filter(word -> !word.isEmpty()) .collect( groupingBy( identity(), counting()));
  • 24. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 27. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 28. Example 1: Word Count Demo: https://github.com/hazelcast/hazelcast-jet-code-samples
  • 29. Part 2: Jet Under the Hood
  • 30. How does it work?
  • 31. How does it work?
  • 32. How does it work?
  • 33. How does it work? Under the Hood: ● Generate DAG representation from Pipeline ● Serialize DAG ● Send DAG to every Node ● Deserialize DAG ● Executes DAG on each Node
  • 36. Infinite Streams Examples: ● Currency Exchange Rates ● Tweets from Twitter ● Events in some Event-Based system ● ...
  • 38. Example 2: Twitter Cryptocurrency Analysis Problem: Present in real-time the sentiments about cryptocurrencies Input: Tweets are streamed from Twitter and categorized by coin type (BTC, ETC, XRP, etc) Output: Tweets sentiments (last 30 sec, last minute, last 5 minutes)
  • 39. Example 2: Twitter Cryptocurrency Analysis Demo: https://jet.hazelcast.org/demos/
  • 40. Part 3: Jet in the Cloud
  • 41. Jet in the Cloud: discovery plugins
  • 42. Jet in the Cloud: discovery plugins
  • 43. Jet in the Cloud: discovery plugins
  • 44. Jet in the Cloud: discovery plugins
  • 45. Jet in the Cloud: discovery plugins
  • 46. Jet in the Cloud: discovery plugins
  • 47. Jet in the Cloud: deploying on k8
  • 48. Jet in the Cloud: deploying on k8 $ helm install stable/hazelcast-jet
  • 49. Jet in the Cloud: deploying on k8 $ kubectl scale <name> --replicas=6
  • 50. Example 3: Stock Trade Aggregator Problem: Present in real-time the aggregated trade price of stocks Input: Stock trades with name and price Output: Sum of prices per stock name
  • 51. Example 3: Stock Trade Aggregator Demo: https://github.com/hazelcast/hazelcast-jet-code-samples/t ree/master/integration/kubernetes
  • 52. Part 4: Jet Features & Use Cases
  • 53. Jet Features Categories of Features ● Easy to Use ● Performance
  • 57. Why would I need it? ● Big Data Projects
  • 58. Why would I need it? ● Big Data Projects ● Speed up Everything
  • 59. Why would I need it? ● Big Data Projects ● Speed up Everything
  • 60. Example 4: Web Crawler Problem: Parse all blog posts from the webpage Input: URL of Blog Trips Output: All the content from the Blog
  • 61. Example 4: Web Crawler Demo: https://github.com/leszko/geodump