SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Streaming Visualization
Guido Schmutz
DOAG Big Data 2018 – 20.9.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Agenda
1. Visualization in Big Data Reference Architecture
2. How to implement „Data-in-Motion“?
3. Blueprints for Streaming Visualization
4. Blueprints for Stream Visualization – Implementation
Visualization in Big Data Reference
Architecture
Data Value Chain
Milliseconds
• Place Trace
• Serve ad
• Enrich Stream
• Approve Trans
Hundredths of Seconds
• Calculate Risk
• Leaderboard
• Aggregate
• Count
Second(s)
• Retrieve Click
Stream
• Show orders
Minutes
• Backtest algo
• BI
• Daily Reports
Hours
• Algo discovery
• Log analysis
• Fraud pattern match
Architekturen von Big Data Anwendungen
Traditional BI Infrastructures
Enterprise Data
Warehouse
ETL / Stored
Procedures
Bulk Source
DB
Extract
File
DB
Architekturen von Big Data Anwendungen
BI Tools
Search / Explore
Enterprise Apps
Logic
{ }
API
high latency
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Source
Location
Telemetry
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event Stream
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Stream
Event Source
Location
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event
Hub
Event
Hub
Event
Hub
Telemetry
"Data at Rest" vs. "Data in Motion"
Data at Rest Data in Motion
Store
Act
Analyze
StoreAct
Analyze
1110
1010
1010
110
1110
1010
1010
110
Introduction to Stream Processing
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Stream Processing Architecture solves Velocity
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Low(est) latency, no history
Telemetry
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Big Data for all historical data analysis
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Introduction to Stream Processing
Telemetry
Data Store
Integrate existing systems through CDC
Data
Event Hub
Integration
Consuming Systems
StateLogic
CDC
CDC Connector
Traditional Silo-based
System
LogicUser Interface
Capture changes directly on database
Change Data Capture (CDC) => think like
a global database trigger
Transform existing systems to event
producer
Event
Stream
Event
Stream
Introduction to Stream Processing
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Integrate existing systems with lower latency through CDC
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
File Import / SQL Import
Event
Stream
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Telemetry
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Two Types of Stream Processing
(from Gartner)
Introduction to Stream Processing
Stream Data Integration
• primarily focuses on the ingestion and
processing of data sources targeting real-
time extract-transform-load (ETL) and data
integration use cases
• filter and enrich the data
• optionally calculate time-windowed
aggregations before storing the results in a
database or file system
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business through real-time dashboards,
alerts or decision automation
How to implement „Data-in-
Motion“?
”Data-in-Motion” Ecosystem
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
Introduction to Stream Processing
Apache Kafka – A Streaming Platform
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Blueprints for Stream Visualization
1) Direct Streaming to the Consumer
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Data
Sources
2) Use a fast datastore and do regular polling from
consumer
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
3) Use stateful Stream Analytics and query directly the
store
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
Blueprints for Stream Visualization
- Impementation
Visualization: many many options! But do they support
Streaming Data?
Oracle Stream Analytics
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Data
Sources
Oracle Stream Analytics
• Stream Analytics and Visualization in
one
• offers real-time actionable business
insight on streaming data
• automates action to drive today’s agile
businesses (business user)
• Runs on top of Spark Streaming
• Cloud and on-premises
• Data Sources: Kafka, JMS, GoldenGate,
File
Web Sockets / SSE / Custom Java Script Application
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
Sever Sent Event (SSE)
Slack / WhatsApp / Twitter / …
”Data in Motion”
Stream
Analytics
Event Hub
Integration
Streaming
Visualization
Channel
Consumer
Data Flow
WebSockets vs. Server Sent Events (SSE)
WebSockets
• provide a richer protocol to perform bi-
directional, full-duplex communication
• require full-duplex connections and
new Web Socket servers to handle the
protocol
• Having a two-way channel is more
attractive for things like games,
messaging apps, and for cases where
you need near real-time updates in
both directions
SSE
• SSEs are sent over traditional HTTP
• do not require a special protocol or
server implementation to get working
• If only one direction is necessary,
• Server-Sent Events on the other hand,
have been designed from the ground
up to be efficient
KSQL / REST API / Custom App
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
KSQL & Arcadia Data
”Data in Motion”
Stream
Analytics
Event Hub
Integration
API Streaming
Visualization
ConsumerData
Sources
Arcadia Data
• Combines Batch and Streaming
Visualization in one
• Streaming Visualizations based on
Confluent KSQL (Kafka)
• Acadia Instant and Arcadia Enterprise
Druid & Superset / Imply
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
What is Druid?
• Open Source Time Series DB by
Metamarkets
• Apache Incubating
• Column-Oriented Storage
• Streaming and Batch Ingest
• Time optimized partitioning
• SQL Support
• Deep Storage can be HDFS / S3
Imply
• Commercial offering of Druid
• Built around Apache Druid
• Analytics, search and intelligence for
event-driven data
Superset
• Open source data visualization tool by
Airbnb
• Apache incubator
• Superset supports 30 types of
visualizations
• easy-to-use interface for exploring and
visualizing data
• Create and share dashboards
• Deep integration with Druid
• Integration with most SQL-speaking
RDBMS through SQLAlchemy
Elasticsearch / Kibana
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
Elasticsearch / Kibana
Elasticsearch
• NoSQL store
• a distributed, RESTful search and analytics
engine
• centrally stores your data so you can
discover the expected and uncover the
unexpected
• lets you perform and combine many types
of searches — structured, unstructured,
geo, metric
• aggregations let you zoom out to explore
trends and patterns in your data
Kibana
• Window into Elasticsearch
• Enables visual exploration and analysis of
data stored in Elasticsearch
InfluxDB / Grafana or Chronograf
”Data in Motion”
Stream
Analytics
Event Hub
Integration
APIData Store Streaming
Visualization
Data Flow
ConsumerData
Sources
InfluxDB
InfluxDB
• Popular Time Series Database
• Open source as well as Commercial offering
Chronograf
Grafana
Grafana allows to query, visualize, alert
and understand metrics independent of
their storage
Supports various datasources
• Elasticsearch
• InfluxDB
• Prometheus
• OpenTSDB
• MySQL
• …
Technology on its own won't help you.
You need to know how to use it properly.

More Related Content

PPTX
Introduction to RAG (Retrieval Augmented Generation) and its application
PPTX
Real-time Analytics with Trino and Apache Pinot
PPTX
EVCache at Netflix
PDF
Apache Flink internals
PDF
Pinot: Near Realtime Analytics @ Uber
PPTX
Kafka Retry and DLQ
PDF
Java Performance Analysis on Linux with Flame Graphs
PDF
Flink powered stream processing platform at Pinterest
Introduction to RAG (Retrieval Augmented Generation) and its application
Real-time Analytics with Trino and Apache Pinot
EVCache at Netflix
Apache Flink internals
Pinot: Near Realtime Analytics @ Uber
Kafka Retry and DLQ
Java Performance Analysis on Linux with Flame Graphs
Flink powered stream processing platform at Pinterest

What's hot (20)

PDF
Introducing Change Data Capture with Debezium
ODP
Stream processing using Kafka
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PDF
Rendering Techniques in Rise of the Tomb Raider
PDF
Apache kafka performance(latency)_benchmark_v0.3
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PDF
검색엔진에 적용된 ChatGPT
PDF
Kappa vs Lambda Architectures and Technology Comparison
PDF
Loki - like prometheus, but for logs
PPTX
Apache Spark Fundamentals
PPTX
Autoscaling Flink with Reactive Mode
PDF
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
PDF
Top 5 mistakes when writing Spark applications
PPTX
Spark architecture
PDF
Data Monitoring with whylogs
PPTX
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
PPTX
Introduction to Apache Spark
PDF
A Deep Dive into Query Execution Engine of Spark SQL
PDF
Apache Druid 101
Introducing Change Data Capture with Debezium
Stream processing using Kafka
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Rendering Techniques in Rise of the Tomb Raider
Apache kafka performance(latency)_benchmark_v0.3
Apache Spark on K8S Best Practice and Performance in the Cloud
검색엔진에 적용된 ChatGPT
Kappa vs Lambda Architectures and Technology Comparison
Loki - like prometheus, but for logs
Apache Spark Fundamentals
Autoscaling Flink with Reactive Mode
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Top 5 mistakes when writing Spark applications
Spark architecture
Data Monitoring with whylogs
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Introduction to Apache Spark
A Deep Dive into Query Execution Engine of Spark SQL
Apache Druid 101
Ad

Similar to Streaming Visualization (20)

PDF
Streaming Visualization
PDF
Streaming Visualization
PDF
Streaming Visualisation
PDF
Introduction to Stream Processing
PDF
Introduction to Stream Processing
PDF
Data Ingestion in Big Data and IoT platforms
PDF
Streaming Visualization
PDF
Introduction to Streaming Analytics
PDF
Introduction to Stream Processing
PDF
Introduction to Streaming Analytics
PPTX
Shikha fdp 62_14july2017
PDF
Introduction to Stream Processing
PPTX
WebAction-Sami Abkay
PDF
Big Data Architectures @ JAX / BigDataCon 2016
PPTX
Big data streaming with Apache Spark on Azure
PDF
BD_Architecture and Charateristics.pptx.pdf
PPTX
Building a Big Data Pipeline
PDF
Streaming Visualization
PDF
Stream Processing – Concepts and Frameworks
PPTX
Event Hub & Azure Stream Analytics
Streaming Visualization
Streaming Visualization
Streaming Visualisation
Introduction to Stream Processing
Introduction to Stream Processing
Data Ingestion in Big Data and IoT platforms
Streaming Visualization
Introduction to Streaming Analytics
Introduction to Stream Processing
Introduction to Streaming Analytics
Shikha fdp 62_14july2017
Introduction to Stream Processing
WebAction-Sami Abkay
Big Data Architectures @ JAX / BigDataCon 2016
Big data streaming with Apache Spark on Azure
BD_Architecture and Charateristics.pptx.pdf
Building a Big Data Pipeline
Streaming Visualization
Stream Processing – Concepts and Frameworks
Event Hub & Azure Stream Analytics
Ad

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
PDF
Event Broker (Kafka) in a Modern Data Architecture
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
PDF
ksqlDB - Stream Processing simplified!
PDF
Kafka as your Data Lake - is it Feasible?
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Location Analytics Real-Time Geofencing using Kafka
PDF
Kafka as an event store - is it good enough?
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
PDF
Fundamentals Big Data and AI Architecture
PDF
Location Analytics - Real-Time Geofencing using Kafka
PDF
Location Analytics - Real Time Geofencing using Apache Kafka
PDF
Building Event-Driven (Micro) Services with Apache Kafka
30 Minutes to the Analytics Platform with Infrastructure as Code
Event Broker (Kafka) in a Modern Data Architecture
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
ksqlDB - Stream Processing simplified!
Kafka as your Data Lake - is it Feasible?
Event Hub (i.e. Kafka) in Modern Data Architecture
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Building Event Driven (Micro)services with Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
What is Apache Kafka? Why is it so popular? Should I use it?
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Location Analytics Real-Time Geofencing using Kafka
Kafka as an event store - is it good enough?
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Fundamentals Big Data and AI Architecture
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka

Recently uploaded (20)

PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPT
JAVA ppt tutorial basics to learn java programming
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
medical staffing services at VALiNTRY
PPTX
L1 - Introduction to python Backend.pptx
PPTX
history of c programming in notes for students .pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Design an Analysis of Algorithms I-SECS-1021-03
Materi-Enum-and-Record-Data-Type (1).pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
2025 Textile ERP Trends: SAP, Odoo & Oracle
JAVA ppt tutorial basics to learn java programming
Wondershare Filmora 15 Crack With Activation Key [2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ManageIQ - Sprint 268 Review - Slide Deck
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PTS Company Brochure 2025 (1).pdf.......
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Design an Analysis of Algorithms II-SECS-1021-03
Online Work Permit System for Fast Permit Processing
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How to Migrate SBCGlobal Email to Yahoo Easily
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
medical staffing services at VALiNTRY
L1 - Introduction to python Backend.pptx
history of c programming in notes for students .pptx

Streaming Visualization

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Streaming Visualization Guido Schmutz DOAG Big Data 2018 – 20.9.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3. Agenda 1. Visualization in Big Data Reference Architecture 2. How to implement „Data-in-Motion“? 3. Blueprints for Streaming Visualization 4. Blueprints for Stream Visualization – Implementation
  • 4. Visualization in Big Data Reference Architecture
  • 5. Data Value Chain Milliseconds • Place Trace • Serve ad • Enrich Stream • Approve Trans Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count Second(s) • Retrieve Click Stream • Show orders Minutes • Backtest algo • BI • Daily Reports Hours • Algo discovery • Log analysis • Fraud pattern match Architekturen von Big Data Anwendungen
  • 6. Traditional BI Infrastructures Enterprise Data Warehouse ETL / Stored Procedures Bulk Source DB Extract File DB Architekturen von Big Data Anwendungen BI Tools Search / Explore Enterprise Apps Logic { } API high latency
  • 7. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing
  • 8. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Stream
  • 9. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Hub Event Hub Event Hub Telemetry
  • 10. "Data at Rest" vs. "Data in Motion" Data at Rest Data in Motion Store Act Analyze StoreAct Analyze 1110 1010 1010 110 1110 1010 1010 110 Introduction to Stream Processing
  • 11. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Stream Processing Architecture solves Velocity BI Tools Enterprise Data Warehouse Event Hub Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Low(est) latency, no history Telemetry
  • 12. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Big Data for all historical data analysis BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Introduction to Stream Processing Telemetry
  • 13. Data Store Integrate existing systems through CDC Data Event Hub Integration Consuming Systems StateLogic CDC CDC Connector Traditional Silo-based System LogicUser Interface Capture changes directly on database Change Data Capture (CDC) => think like a global database trigger Transform existing systems to event producer Event Stream Event Stream Introduction to Stream Processing
  • 14. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Integrate existing systems with lower latency through CDC BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Telemetry
  • 15. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API Event Stream Event Stream Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 16. Two Types of Stream Processing (from Gartner) Introduction to Stream Processing Stream Data Integration • primarily focuses on the ingestion and processing of data sources targeting real- time extract-transform-load (ETL) and data integration use cases • filter and enrich the data • optionally calculate time-windowed aggregations before storing the results in a database or file system Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business through real-time dashboards, alerts or decision automation
  • 17. How to implement „Data-in- Motion“?
  • 18. ”Data-in-Motion” Ecosystem Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge Introduction to Stream Processing
  • 19. Apache Kafka – A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 20. Blueprints for Stream Visualization
  • 21. 1) Direct Streaming to the Consumer ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  • 22. 2) Use a fast datastore and do regular polling from consumer ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 23. 3) Use stateful Stream Analytics and query directly the store ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 24. Blueprints for Stream Visualization - Impementation
  • 25. Visualization: many many options! But do they support Streaming Data?
  • 26. Oracle Stream Analytics ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  • 27. Oracle Stream Analytics • Stream Analytics and Visualization in one • offers real-time actionable business insight on streaming data • automates action to drive today’s agile businesses (business user) • Runs on top of Spark Streaming • Cloud and on-premises • Data Sources: Kafka, JMS, GoldenGate, File
  • 28. Web Sockets / SSE / Custom Java Script Application ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Sever Sent Event (SSE)
  • 29. Slack / WhatsApp / Twitter / … ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow
  • 30. WebSockets vs. Server Sent Events (SSE) WebSockets • provide a richer protocol to perform bi- directional, full-duplex communication • require full-duplex connections and new Web Socket servers to handle the protocol • Having a two-way channel is more attractive for things like games, messaging apps, and for cases where you need near real-time updates in both directions SSE • SSEs are sent over traditional HTTP • do not require a special protocol or server implementation to get working • If only one direction is necessary, • Server-Sent Events on the other hand, have been designed from the ground up to be efficient
  • 31. KSQL / REST API / Custom App ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 32. KSQL & Arcadia Data ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  • 33. Arcadia Data • Combines Batch and Streaming Visualization in one • Streaming Visualizations based on Confluent KSQL (Kafka) • Acadia Instant and Arcadia Enterprise
  • 34. Druid & Superset / Imply ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 35. What is Druid? • Open Source Time Series DB by Metamarkets • Apache Incubating • Column-Oriented Storage • Streaming and Batch Ingest • Time optimized partitioning • SQL Support • Deep Storage can be HDFS / S3
  • 36. Imply • Commercial offering of Druid • Built around Apache Druid • Analytics, search and intelligence for event-driven data
  • 37. Superset • Open source data visualization tool by Airbnb • Apache incubator • Superset supports 30 types of visualizations • easy-to-use interface for exploring and visualizing data • Create and share dashboards • Deep integration with Druid • Integration with most SQL-speaking RDBMS through SQLAlchemy
  • 38. Elasticsearch / Kibana ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 39. Elasticsearch / Kibana Elasticsearch • NoSQL store • a distributed, RESTful search and analytics engine • centrally stores your data so you can discover the expected and uncover the unexpected • lets you perform and combine many types of searches — structured, unstructured, geo, metric • aggregations let you zoom out to explore trends and patterns in your data Kibana • Window into Elasticsearch • Enables visual exploration and analysis of data stored in Elasticsearch
  • 40. InfluxDB / Grafana or Chronograf ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  • 41. InfluxDB InfluxDB • Popular Time Series Database • Open source as well as Commercial offering Chronograf
  • 42. Grafana Grafana allows to query, visualize, alert and understand metrics independent of their storage Supports various datasources • Elasticsearch • InfluxDB • Prometheus • OpenTSDB • MySQL • …
  • 43. Technology on its own won't help you. You need to know how to use it properly.