SlideShare a Scribd company logo
BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENEVA
HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
http://guidoschmutz.wordpress.com@gschmutz
Streaming Visualization
DOAG Konferenz 2019
Guido Schmutz
Agenda
1. Motivation / Introduction
2. Stream Data Integration & Stream Analytics Ecosystem
3. Three Blueprints for Streaming Visualization
End-to-End Demo available here:
https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF
HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH
Guido
Working at Trivadis for more than 22 years
Consultant, Trainer, Platform Architect for Java,
Oracle, SOA and Big Data / Fast Data
Oracle Groundbreaker Ambassador & Oracle ACE
Director
@gschmutz guidoschmutz.wordpress.com
175th
edition
Streaming Visualisation
Motivation / Introduction
Timely decisions require new data immediately
Keep the data in motion …
Data at Rest Data in Motion
Store
(Re)Act
Visualize/
Analyze
StoreAct
Analyze
11101
01010
10110
11101
01010
10110
vs.
Visualize
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Two Types of Stream Processing
(by Gartner)
Stream Data Integration
• focuses on the ingestion and processing of
data sources targeting real-time extract-
transform-load (ETL) and data integration
use cases
• filter and enrich the data
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
Stream Data Integration &
Stream Analytics Ecosystem
Stream Data Integration & Stream Analytics Ecosystem
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
Apache Kafka – A Streaming Platform
Kafka Cluster
Consumer 1 Consume 2r
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
ZK 1 ZK 2ZK 3
Schema
Registry
Service 1
Management
Control Center
Kafka Manager
KAdmin
Producer 1 Producer 2
kafkacat
Data Retention:
• Never
• Time (TTL) or Size-based
• Log-Compacted based
Producer3Producer3
ConsumerConsumer 3
Apache Kafka – A Streaming Platform
Source
Connector
Sink
Connector
trucking_
driver
KSQL Engine
Kafka Streams
Kafka Broker
Demo using Kafka Stack for Stream Data Integration
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data Flow
??
Filter: #doag2019,….
User: @gschmutz
Demo: Kafka Connect to retrieve Tweets
curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" 
-H "Content-Type: application/json" 
--data '{
"name": "twitter-source",
"config": {
"connector.class":
"com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
"twitter.oauth.consumerKey": "xxxxx",
"twitter.oauth.consumerSecret": "xxxxx",
"twitter.oauth.accessToken": "xxxx",
"twitter.oauth.accessTokenSecret": "xxxxx",
"process.deletes": "false",
"filter.keywords": "#doag2019",
"filter.userIds": "15148494",
"kafka.status.topic": "tweet-raw-v1",
"tasks.max": "1"
}
}'
Demo: KSQL for Streaming ETL
CREATE STREAM tweet_s
WITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS
SELECT id , createdAt , text , user->screenName
FROM tweet_raw_s;
CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1',
VALUE_FORMAT='AVRO');
SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word
FROM tweet_raw_s
WHERE lang = 'en' or lang = 'de';
SELECT id, LCASE(hashtagentities[0]->text)
FROM tweet_raw_s
WHERE hashtagentities[0] IS NOT NULL;
Demo using Kafka Stack for Stream Data Integration
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data Flow
??
Filter: #voxxeddaysbanff,#java,#kafka,….
User: @VoxxedDaysBanff, @gschmutz
Visualization: many many options!
But do they all support Streaming Data?
Three Blueprints for
Streaming Visualization
BP1: Fast datastore with regular polling from
consumer
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
BP1-1: Elasticsearch / Kibana
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
SOLR & Banana
BP1-2: InfluxDB / Grafana or Chronograf
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Prometheus & Grafana
Druid & Superset
BP1-3: NoSQL & Custom Web
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
BP-1: Demo Redis NoSQL & Custom Web
https://opensky-network.org/
BP1-4: Kafka Streams Interactive Query & Custom App
Storage
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
API
Data Store
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion Data at Rest
Data Flow
Alternatives:
Flink
…
BP2: Direct Streaming to the Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
BP2-1: Kafka Connect to Slack / WhatsApp
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Twitter
SMS
…
BP-2-1: Demo Kafka Connect to Slack
curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" 
-H "Content-Type: application/json" 
--data '{
"name": "slack-sink",
"config": {
"connector.class": "net..SlackSinkConnector",
"tasks.max": "1",
"topics":"slack-notify",
"slack.token":”XXXX",
"slack.channel":"general",
"message.template":
"tweet by ${USER_SCREENNAME} with ${TEXT}",
}
}'
BP2-2: Kafka to Tipboard (Dashboard Solution)
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Alternatives:
Dashing
Geckoboard
…
BP2-2: Demo Kafka to Tipboard (Dashboard Solution)
http://allegro.tech/tipboard/
BP2-2: Demo Kafka to Tipboard (Dashboard Solution)
c.subscribe(['DASH_TWEET_COUNT_BY_HOUR_T'])
while True:
msg = c.poll(1.0)
data = json.loads(msg.value().decode('utf-8'))
data_selected = data.get('NOF_TWEETS’)
data_prepared = prepare_for_just_value(data_selected)
data_jsoned = json.dumps(data_prepared)
data_to_push = { 'tile': TILE_NAME, 'key': TILE_KEY
, 'data': data_jsoned }
resp = requests.post(API_URL_PUSH, data=data_to_push)
def prepare_for_just_value(data):
# data={"title": "Number of Tweets:", "description": "(1 hour)", "just-value": "23"
data_prepared = data
data_prepared = {'title': '# Tweets:', 'description': 'per hour’,
'just-value': data_prepared}
return data_prepared
BP2-3: Web Sockets / SSE & Custom Modern Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics
Streaming
Visualization
Data Flow
ConsumerData
Sources
Data In Motion
Data Flow
Channel/
Protocol
API
Sever Sent Event (SSE)
BP3: Streaming SQL Result to Consumer
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
BP3-1: KSQL and Arcadia Data
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
BP3-1: Demo KSQL and Arcadia Data
https://www.arcadiadata.com/
BP3-2: KSQL with REST API to Custom Web App
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
BP3-2: Demo KSQL with REST API
curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’
-i http://analyticsplatform:8088/query --data '{
"ksql": "SELECT text FROM tweet_raw_s;",
"streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }
}'
{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to
@Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy nn#SaturdayThoughts
#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics
#…"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her
while you can! @ARUKscientist @S_Bauermeister #bigdata #ARUKConf
https://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["Blockchain Competitive Innovation
Advantage"]},"errorMessage":null,"finalMessage":null}
BP3-3: Spark Streaming & Oracle Stream Analytics
Stream
Analytics
Event
Hub
Stream Data Integration & Stream Analytics ConsumerData
Sources
Data In Motion
Data Flow
API Streaming
Visualization
BP3-3: Demo Spark Streaming & Oracle Stream
Analytics
https://www.oracle.com/middleware/technologies/complex-event-processing.html
Summary
BP1: Fast Store & Polling
• “classic” pattern
• Not end-to-end “data-in-
motion” -> “Data-at-rest”
before visualization
• Slight delay might not be
acceptable for monitoring
dashboard
• Can use full power of data
store(s) => NoSQL
• In-memory reduces overhead
BP2: Stream to Consumer
• minimal latency
• More difficult on “client side”
• good if stream holds directly
what should be displayed
• More difficult if data in
stream needs to be analyzed
before visualization
• No historical info available
BP3: Streaming SQL
• Minimal latency
• Power of SQL query engine
available for visualization
• possibility for “self-service”
style visualization
• Some analytics are more
difficult on streaming data
• No historical info available
Streaming Visualisation

More Related Content

PPTX
Introduction to Azure monitor
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
PDF
WSO2 API Platform: Vision and Roadmap
PDF
API Management - Why it matters!
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
PPTX
Introduction to Microservices
Introduction to Azure monitor
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
WSO2 API Platform: Vision and Roadmap
API Management - Why it matters!
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Introduction to Microservices

What's hot (20)

PDF
Apache Kafka® Use Cases for Financial Services
PDF
Benefits of Stream Processing and Apache Kafka Use Cases
PPTX
API Governance in the Enterprise
PDF
Presentación proyecto "Aplicación web de gestión de rutas turísticas mediante...
PDF
Observability
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka
PDF
Kafka 101 and Developer Best Practices
PPTX
API Monetization – It Does Not Mean What You Think It Means. It Is Far More
PPTX
API Management in Digital Transformation
PDF
API-first design - Basis for an consistent API-Management approach
ODP
Kong API Gateway
PDF
Definitive Guide to API Management
PDF
Transforming Financial Services with Event Streaming Data
PDF
Event Driven-Architecture from a Scalability perspective
PDF
How Secure Are Your APIs?
PPTX
Ansible Tutorial For Beginners | What Is Ansible And How It Works? | Ansible ...
PDF
Architecting an Enterprise API Management Strategy
PDF
API Management Solution Powerpoint Presentation Slides
PPTX
Deep-Dive: Secure API Management
Apache Kafka® Use Cases for Financial Services
Benefits of Stream Processing and Apache Kafka Use Cases
API Governance in the Enterprise
Presentación proyecto "Aplicación web de gestión de rutas turísticas mediante...
Observability
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kafka 101 and Developer Best Practices
API Monetization – It Does Not Mean What You Think It Means. It Is Far More
API Management in Digital Transformation
API-first design - Basis for an consistent API-Management approach
Kong API Gateway
Definitive Guide to API Management
Transforming Financial Services with Event Streaming Data
Event Driven-Architecture from a Scalability perspective
How Secure Are Your APIs?
Ansible Tutorial For Beginners | What Is Ansible And How It Works? | Ansible ...
Architecting an Enterprise API Management Strategy
API Management Solution Powerpoint Presentation Slides
Deep-Dive: Secure API Management
Ad

Similar to Streaming Visualisation (20)

PDF
Streaming Visualization
PDF
Streaming Visualization
PDF
Streaming Visualization
PDF
Streaming Visualization
PDF
Streaming Visualization
PDF
Building end to end streaming application on Spark
PDF
Introduction to Streaming Analytics
PDF
Data Ingestion in Big Data and IoT platforms
PDF
Streaming analytics state of the art
PDF
Data Streaming For Big Data
PDF
Introduction to Streaming Analytics
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
PDF
A primer on building real time data-driven products
PPTX
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
PPTX
Shikha fdp 62_14july2017
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
PDF
The State of Stream Processing
PDF
Down the event-driven road: Experiences of integrating streaming into analyti...
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Streaming Visualization
Streaming Visualization
Streaming Visualization
Streaming Visualization
Streaming Visualization
Building end to end streaming application on Spark
Introduction to Streaming Analytics
Data Ingestion in Big Data and IoT platforms
Streaming analytics state of the art
Data Streaming For Big Data
Introduction to Streaming Analytics
[WSO2Con EU 2018] The Rise of Streaming SQL
A primer on building real time data-driven products
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Shikha fdp 62_14july2017
Event Hub (i.e. Kafka) in Modern Data Architecture
The State of Stream Processing
Down the event-driven road: Experiences of integrating streaming into analyti...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Ad

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
PDF
Event Broker (Kafka) in a Modern Data Architecture
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
PDF
ksqlDB - Stream Processing simplified!
PDF
Kafka as your Data Lake - is it Feasible?
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Location Analytics Real-Time Geofencing using Kafka
PDF
Kafka as an event store - is it good enough?
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
PDF
Fundamentals Big Data and AI Architecture
PDF
Location Analytics - Real-Time Geofencing using Kafka
PDF
Location Analytics - Real Time Geofencing using Apache Kafka
PDF
Building Event-Driven (Micro) Services with Apache Kafka
PDF
Introduction to Stream Processing
PDF
Stream Processing – Concepts and Frameworks
30 Minutes to the Analytics Platform with Infrastructure as Code
Event Broker (Kafka) in a Modern Data Architecture
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
ksqlDB - Stream Processing simplified!
Kafka as your Data Lake - is it Feasible?
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Location Analytics - Real-Time Geofencing using Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
What is Apache Kafka? Why is it so popular? Should I use it?
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Location Analytics Real-Time Geofencing using Kafka
Kafka as an event store - is it good enough?
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Fundamentals Big Data and AI Architecture
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
Introduction to Stream Processing
Stream Processing – Concepts and Frameworks

Recently uploaded (20)

PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Computer network topology notes for revision
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to machine learning and Linear Models
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
A Quantitative-WPS Office.pptx research study
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Database Infoormation System (DBIS).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Knowledge Engineering Part 1
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Computer network topology notes for revision
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
Introduction to machine learning and Linear Models
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Streaming Visualisation

  • 1. BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH http://guidoschmutz.wordpress.com@gschmutz Streaming Visualization DOAG Konferenz 2019 Guido Schmutz
  • 2. Agenda 1. Motivation / Introduction 2. Stream Data Integration & Stream Analytics Ecosystem 3. Three Blueprints for Streaming Visualization End-to-End Demo available here: https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
  • 3. BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH Guido Working at Trivadis for more than 22 years Consultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast Data Oracle Groundbreaker Ambassador & Oracle ACE Director @gschmutz guidoschmutz.wordpress.com 175th edition
  • 6. Timely decisions require new data immediately
  • 7. Keep the data in motion … Data at Rest Data in Motion Store (Re)Act Visualize/ Analyze StoreAct Analyze 11101 01010 10110 11101 01010 10110 vs. Visualize
  • 8. Hadoop Clusterd Hadoop Cluster Big Data Reference Architecture for Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 9. Two Types of Stream Processing (by Gartner) Stream Data Integration • focuses on the ingestion and processing of data sources targeting real-time extract- transform-load (ETL) and data integration use cases • filter and enrich the data Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
  • 10. Stream Data Integration & Stream Analytics Ecosystem
  • 11. Stream Data Integration & Stream Analytics Ecosystem Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge
  • 12. Apache Kafka – A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK 1 ZK 2ZK 3 Schema Registry Service 1 Management Control Center Kafka Manager KAdmin Producer 1 Producer 2 kafkacat Data Retention: • Never • Time (TTL) or Size-based • Log-Compacted based Producer3Producer3 ConsumerConsumer 3
  • 13. Apache Kafka – A Streaming Platform Source Connector Sink Connector trucking_ driver KSQL Engine Kafka Streams Kafka Broker
  • 14. Demo using Kafka Stack for Stream Data Integration Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data Flow ?? Filter: #doag2019,…. User: @gschmutz
  • 15. Demo: Kafka Connect to retrieve Tweets curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" -H "Content-Type: application/json" --data '{ "name": "twitter-source", "config": { "connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector", "twitter.oauth.consumerKey": "xxxxx", "twitter.oauth.consumerSecret": "xxxxx", "twitter.oauth.accessToken": "xxxx", "twitter.oauth.accessTokenSecret": "xxxxx", "process.deletes": "false", "filter.keywords": "#doag2019", "filter.userIds": "15148494", "kafka.status.topic": "tweet-raw-v1", "tasks.max": "1" } }'
  • 16. Demo: KSQL for Streaming ETL CREATE STREAM tweet_s WITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS SELECT id , createdAt , text , user->screenName FROM tweet_raw_s; CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1', VALUE_FORMAT='AVRO'); SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word FROM tweet_raw_s WHERE lang = 'en' or lang = 'de'; SELECT id, LCASE(hashtagentities[0]->text) FROM tweet_raw_s WHERE hashtagentities[0] IS NOT NULL;
  • 17. Demo using Kafka Stack for Stream Data Integration Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data Flow ?? Filter: #voxxeddaysbanff,#java,#kafka,…. User: @VoxxedDaysBanff, @gschmutz
  • 18. Visualization: many many options! But do they all support Streaming Data?
  • 20. BP1: Fast datastore with regular polling from consumer Storage Stream Analytics Event Hub Stream Data Integration & Stream Analytics API Data Store Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data at Rest Data Flow
  • 21. BP1-1: Elasticsearch / Kibana Storage Stream Analytics Event Hub Stream Data Integration & Stream Analytics API Data Store Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data at Rest Data Flow Alternatives: SOLR & Banana
  • 22. BP1-2: InfluxDB / Grafana or Chronograf Storage Stream Analytics Event Hub Stream Data Integration & Stream Analytics API Data Store Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data at Rest Data Flow Alternatives: Prometheus & Grafana Druid & Superset
  • 23. BP1-3: NoSQL & Custom Web Storage Stream Analytics Event Hub Stream Data Integration & Stream Analytics API Data Store Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data at Rest Data Flow
  • 24. BP-1: Demo Redis NoSQL & Custom Web https://opensky-network.org/
  • 25. BP1-4: Kafka Streams Interactive Query & Custom App Storage Stream Analytics Event Hub Stream Data Integration & Stream Analytics API Data Store Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data at Rest Data Flow Alternatives: Flink …
  • 26. BP2: Direct Streaming to the Consumer Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data Flow Channel/ Protocol API
  • 27. BP2-1: Kafka Connect to Slack / WhatsApp Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data Flow Channel/ Protocol API Alternatives: Twitter SMS …
  • 28. BP-2-1: Demo Kafka Connect to Slack curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" -H "Content-Type: application/json" --data '{ "name": "slack-sink", "config": { "connector.class": "net..SlackSinkConnector", "tasks.max": "1", "topics":"slack-notify", "slack.token":”XXXX", "slack.channel":"general", "message.template": "tweet by ${USER_SCREENNAME} with ${TEXT}", } }'
  • 29. BP2-2: Kafka to Tipboard (Dashboard Solution) Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data Flow Channel/ Protocol API Alternatives: Dashing Geckoboard …
  • 30. BP2-2: Demo Kafka to Tipboard (Dashboard Solution) http://allegro.tech/tipboard/
  • 31. BP2-2: Demo Kafka to Tipboard (Dashboard Solution) c.subscribe(['DASH_TWEET_COUNT_BY_HOUR_T']) while True: msg = c.poll(1.0) data = json.loads(msg.value().decode('utf-8')) data_selected = data.get('NOF_TWEETS’) data_prepared = prepare_for_just_value(data_selected) data_jsoned = json.dumps(data_prepared) data_to_push = { 'tile': TILE_NAME, 'key': TILE_KEY , 'data': data_jsoned } resp = requests.post(API_URL_PUSH, data=data_to_push) def prepare_for_just_value(data): # data={"title": "Number of Tweets:", "description": "(1 hour)", "just-value": "23" data_prepared = data data_prepared = {'title': '# Tweets:', 'description': 'per hour’, 'just-value': data_prepared} return data_prepared
  • 32. BP2-3: Web Sockets / SSE & Custom Modern Web App Stream Analytics Event Hub Stream Data Integration & Stream Analytics Streaming Visualization Data Flow ConsumerData Sources Data In Motion Data Flow Channel/ Protocol API Sever Sent Event (SSE)
  • 33. BP3: Streaming SQL Result to Consumer Stream Analytics Event Hub Stream Data Integration & Stream Analytics ConsumerData Sources Data In Motion Data Flow API Streaming Visualization
  • 34. BP3-1: KSQL and Arcadia Data Stream Analytics Event Hub Stream Data Integration & Stream Analytics ConsumerData Sources Data In Motion Data Flow API Streaming Visualization
  • 35. BP3-1: Demo KSQL and Arcadia Data https://www.arcadiadata.com/
  • 36. BP3-2: KSQL with REST API to Custom Web App Stream Analytics Event Hub Stream Data Integration & Stream Analytics ConsumerData Sources Data In Motion Data Flow API Streaming Visualization
  • 37. BP3-2: Demo KSQL with REST API curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’ -i http://analyticsplatform:8088/query --data '{ "ksql": "SELECT text FROM tweet_raw_s;", "streamsProperties": { "ksql.streams.auto.offset.reset": "latest” } }' {"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to @Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null} {"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy nn#SaturdayThoughts #SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics #…"]},"errorMessage":null,"finalMessage":null} {"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her while you can! @ARUKscientist @S_Bauermeister #bigdata #ARUKConf https://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null} {"row":{"columns":["Blockchain Competitive Innovation Advantage"]},"errorMessage":null,"finalMessage":null}
  • 38. BP3-3: Spark Streaming & Oracle Stream Analytics Stream Analytics Event Hub Stream Data Integration & Stream Analytics ConsumerData Sources Data In Motion Data Flow API Streaming Visualization
  • 39. BP3-3: Demo Spark Streaming & Oracle Stream Analytics https://www.oracle.com/middleware/technologies/complex-event-processing.html
  • 40. Summary BP1: Fast Store & Polling • “classic” pattern • Not end-to-end “data-in- motion” -> “Data-at-rest” before visualization • Slight delay might not be acceptable for monitoring dashboard • Can use full power of data store(s) => NoSQL • In-memory reduces overhead BP2: Stream to Consumer • minimal latency • More difficult on “client side” • good if stream holds directly what should be displayed • More difficult if data in stream needs to be analyzed before visualization • No historical info available BP3: Streaming SQL • Minimal latency • Power of SQL query engine available for visualization • possibility for “self-service” style visualization • Some analytics are more difficult on streaming data • No historical info available