SlideShare a Scribd company logo
Time Series Analysis
Using an Event Streaming Platform
Virtual Meetup - September 8-th 2020
Dr. Mirko Kämpf - Solution Architect @ Confluent, Inc. - Team CEMEA
Abstract
Time Series Analysis using an Event Streaming Platform
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful
and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to
reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like
supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in
the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent
cloud.
This presentation is about linking:
- Time-Series-Analysis (TSA)
- Network- or Graph-Analysis Confluent Platform
- Complex Event Processing (CEP).
Research work ends often with nice charts, scientific papers, and conference talks.
But, many published results can’t be reproduced -
often because the setup it is simply too complicated ...
Question:
How can we integrate data streams,
experiments, and decision making better?
Why not using batch processing?
Study anatomy … ● Batch processing is fine:
○ as long as your data
doesn’t change.
○ in PoCs for method
development in a lab.
○ for research in a fixed scope.
Why using Kafka?
Study the anatomy … Study and influence
the living system ...
● Stream processing is better:
○ for real time business in
changing environments.
○ iterative (research)
projects.
○ repeatable experiments on
replayed data.
Content:
(1) Intro
Typical types of event
How to identify hidden events?
(2) The Challenge
(3) Approach
Time Series Analytics &
Network Analytics in Kafka
Create time series from events
Create graphs from time series pairs
(4) Demo
(5) Architecture & Building Blocks:
(6) Unified Domain Model
Events - 1
Business events
- transaction records
- discrete observation
A Sale
A Trade
An Invoice
A Customer
Experience
JUTS SIMPLE
OBSERVATION &
DATA CAPTURING
How to handle events?
Events
Events - 2
Well defined events
- in known context
Sometimes: SIMPLE
Sometimes: DATA ANALYSIS
How to identify events?
directly observed
“observed” during
analysis of an episode
Univariate TSA: single episodes are processed
- Extreme values
- Patterns
- Distribution of values
- Fluctuation properties
- Long-term correlations
(memory effects)
Events
Events - 3
Extreme Events
- “outliers” in
unknown context
ADVANCED
DATA ANALYSIS (& ML)
How to handle?
Multivariate TSA: pairs / tuples of episodes are processed
- Comparison
Similarity
measures
for link
creation
1212
Reality is Complex:
We should simplify a bit!
Simplification in our method
can lead to isolation:
- DATA SILOS
- OPERATIONAL SILOS
SOLUTION:
GRAPHS capture structure.
TIME SERIES capture
properties over time (history).
relations as graph
or matrix:
objects in groups:
WHAT?
WHY?
Z value is a
property of the
network’s topology
which evolves
over time.
Special procedures
are established.
The events which make
people & the market
happy :-)
Complex Event AnalysisEvent Driven Architecture
Recap:
IT Operations
- Server crash
- Cyber crime
Business Events
- Big deal won
- Technical issue solved
Transactions (in business)
- orders placed
- products shipped
- bills paid
Extreme Events:
- Service slow down due
to emerging bottlenecks
- Increased demand in a
resource
What events are
and how to process
event-data can be
misunderstood
or simply unclear.
It all depends on our view
and our goals!
The Challenge:
How can you combine unbound data assets and scientific methods?
● You bring data to a place where it can be processed easily,
for example to a cloud system or into special purpose systems, such a event
processing platform.
● You integrate important algorithms as early as possible in your processing
pipeline to get results fast, using streaming processing capabilities.
Things can become complicated:
Complex Event Analysis
Integration Across Domains
Extraction of Hidden Event
Complex Event Analysis
- time series analysis and ML reveal hidden events
- multi-stage processing is usually needed
METHODOLOGY
Integration Across Domains
- distributed event processing systems are used
- apps consume and produce events of different flavors
- event-types and data structures my change over time
ORGANIZATION & OPERATIONS
Problems on ORGANIZATION level:
Many legacy systems can’t be integrated without additional expensive servers.
Often, this data is unreachable for externals.
Business data is managed by different teams using different technologies.
Data scientists use data in the cloud, and they all do really L❤VE notebooks.
But often, they don’t use any automation.
Extraction of Hidden Events
- requires Applied Data Analysis & Data Science
- embedding of Complex Algorithms in IT landscape
- integration of GPU/HPC and streaming data pipelines
TECHNOLOGY & SCIENCE
Kafka and its Ecosystem …
- are considered to be middleware, managed by IT people:
- researchers do not plan nor build their experiments around
this technology yet.
- don’t offer ML / AI components:
- many people think, that a model has to be executed on an edge-device or in the cloud.
Yes, Apache Kafka can support
agile (data)experiments!
- Kafka APIs give access to data (flows) in real time.
- allows replay of experiments at a later point in time
- Kafka allows variation of analysis without redoing a simulation or ingestion
by simply reusing persisted event-streams again.
- Kafka Streams and KSQL allow data processing in place:
- this allows faster iterations, for example because plausibility checks can be done in place
- the streaming API gives freedom for extension of core logic
- DSL and KSQL will sav you a lot of implementation time
Why not building something great using the right tools ???
ADVANCED
TIME SERIES ANALYSIS &
NETWORK ANALYSIS
… how does it work?
Network Reconstruction & Topology Analysis:
The Approach
OpenTSx
Network Reconstruction & Topology Analysis:
A Standardized Event Processing Pipeline
ADVANCED
TIME SERIES ANALYSIS &
NETWORK ANALYSIS
… and how does this fit into Kafka?
28
The following slides will show
how TSA concepts
and Kafka concepts
fit together.
Table Stream Duality
Create Time Series from Event Streams:
By Aggregation, Grouping, and Sorting
Events /
Observations
event series
time series
From Table of Events to - Time Series
Table Stream Duality ⇒ Time Series and Graphs
A time series is a table of ordered observations
in a fixed context.
A graph can be seen as a table of node- and link-
properties - stored in two tables.
Create Networks from Event Streams:
By Aggregation, Grouping, and Sorting
Events /
Observations
node properties
link properties
Multi Layer Stream Processing:
TSA to Reveal Hidden System Properties
Events /
Observations
event
series
time
series
Node
properties
Link
properties
Node
properties
Link
properties
Static
aspects of
system
topology
Dynamic
aspects of
system
topology
Multivariate
TSA
Univariate
TSA
Dynamic & Static aspects of system topology
Complex Event Processing: For Complex Systems
System
Events &
Emerging
Patterns
Node
properties
Link
properties
Topology
Analysis
Use the Table-Network Analogy: Kafka Graphs
https://github.com/rayokota/kafka-graphs
large durable graphs:
Persisted in Kafka topic
Sliding Windows: Define the Temporal Graphs
In some use cases, we don’t want to keep the node and link data in topics:
- nodes aren’t always linked
- changes are very fast
- focus on activation patterns,
rather than on underlying structure
It is fine to calculate the correlation links
and the topology metrics on the fly,
just for a given time window.
t
1
t
2
Back to
Streams
39
Demo ...
The project will provide
reusable streaming apps
so that developers
implement just their
specific logic.
 Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform
Finally, we can compose a complex flow ...
 Time Series Analysis Using an Event Streaming Platform
Demo: OpenTSx
Generate some events and some episodes.
Apply some time series processing procedures on:
- a stream of episodes.
- a stream of events
Complex procedures are composed from a set of fundamental building blocks.
Visualize the flows and dependencies.
https://github.com/kamir/OpenTSx
47
Let’s Remember ...
Kafka Connect
UDFs:
Kafka Connect
Data Assets
We use 4 building blocks ...
Source Connectors
integrate sources …
Legacy and Future Systems
Sink Connectors
integrate external targets …
Special Purpose Systems
Domain specific logic is
implemented in small
reusable components:
Domain Driven Design
Data flows are no
longer transient.
The event log acts as
single source of truth.
Paradigm Shift in
Data Management
Confluent Platform
Kafka Consumer API
Kafka Connect
KSQL & Kafka Streams application
KSQL UDFs
Streaming
Applications
Primary Data
Kafka Producer API
Kafka Connect
Derived Data
… for our Time Series Processing Platform
Summary:
Because Kafka is a scalable & extensible platform it fits well for
complex event processing in any industry on premise and in the cloud.
Kafka ecosystem provides extension points for any kind of domain specific
or custom functionality - from advanced analytics to real time data enrichment.
Complex solutions are composed from a few fundamental building blocks:
What to do next?
(A) Identify relevant main flows and processing patterns in your project.
(B) Identify or implement source / sink connectors and establish 1st flow.
(C) Implement custom transformations as Kafka independent components.
(D) Integrate the processing topology as Kafka Streams application:
(a) Do you apply standard transformations and joins (for enrichment)?
(b) Is a special treatment required (advanced analysis)?
(c) Do you need special hardware / external services (AI/ML for classification)?
(E) Share your connectors and UDFs with the growing Kafka community.
(F) Iterate, add more flows and more topologies to your environment.
END
Thank you !
mirko@confluent.io
@semanpix

More Related Content

PDF
Enterprise Metadata Integration
PPTX
Apache Spark in Scientific Applciations
PDF
ASPgems - kappa architecture
PPTX
PCAP Graphs for Cybersecurity and System Tuning
PDF
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
PDF
The Event Mesh: real-time, event-driven, responsive APIs and beyond
PDF
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Enterprise Metadata Integration
Apache Spark in Scientific Applciations
ASPgems - kappa architecture
PCAP Graphs for Cybersecurity and System Tuning
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
The Event Mesh: real-time, event-driven, responsive APIs and beyond
Rediscovering the Value of Apache Kafka® in Modern Data Architecture

What's hot (20)

PPTX
From Events to Networks: Time Series Analysis on Scale
PDF
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
PDF
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
PDF
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
PDF
Fast data for fitness 10 nov 2020
PDF
Operational Analytics on Event Streams in Kafka
PDF
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
PDF
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
PDF
Fundamentals Big Data and AI Architecture
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
PDF
Events Everywhere: Enabling Digital Transformation in the Public Sector
PDF
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
PDF
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
PDF
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
PPTX
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
PDF
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
PDF
GCP for Apache Kafka® Users: Stream Ingestion and Processing
PDF
Power Your Delta Lake with Streaming Transactional Changes
From Events to Networks: Time Series Analysis on Scale
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Fast data for fitness 10 nov 2020
Operational Analytics on Event Streams in Kafka
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Fundamentals Big Data and AI Architecture
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Events Everywhere: Enabling Digital Transformation in the Public Sector
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
GCP for Apache Kafka® Users: Stream Ingestion and Processing
Power Your Delta Lake with Streaming Transactional Changes
Ad

Similar to Time Series Analysis Using an Event Streaming Platform (20)

PDF
Time series-analysis-using-an-event-streaming-platform -_v3_final
PDF
Shared time-series-analysis-using-an-event-streaming-platform -_v2
PPT
Moving Towards a Streaming Architecture
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
PDF
PPTX
Apache Beam: A unified model for batch and stream processing data
PDF
Keynote 1 the rise of stream processing for data management & micro serv...
PDF
04 open source_tools
PPTX
Microsoft SQL Server - StreamInsight Overview Presentation
PDF
High Performance Engineering - 01-intro.pdf
PPTX
Splunk App for Stream
PDF
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
PDF
Leveraging Mainframe Data for Modern Analytics
PDF
C19013010 the tutorial to build shared ai services session 2
PPT
CS8091_BDA_Unit_IV_Stream_Computing
PDF
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
PDF
Streaming analytics state of the art
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Time series-analysis-using-an-event-streaming-platform -_v3_final
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Moving Towards a Streaming Architecture
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Apache Beam: A unified model for batch and stream processing data
Keynote 1 the rise of stream processing for data management & micro serv...
04 open source_tools
Microsoft SQL Server - StreamInsight Overview Presentation
High Performance Engineering - 01-intro.pdf
Splunk App for Stream
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Leveraging Mainframe Data for Modern Analytics
C19013010 the tutorial to build shared ai services session 2
CS8091_BDA_Unit_IV_Stream_Computing
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Streaming analytics state of the art
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Ad

More from Dr. Mirko Kämpf (9)

PPTX
IoT meets AI in the Clouds
PPTX
Improving computer vision models at scale (Strata Data NYC)
PDF
Improving computer vision models at scale presentation
PPTX
Etosha - Data Asset Manager : Status and road map
PPTX
Apache Spark in Scientific Applications
PPT
DPG Berlin - SOE 18 - talk v1.2.4
PPT
Information Spread in the Context of Evacuation Optimization
PDF
Hadoop & Complex Systems Research
PDF
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
IoT meets AI in the Clouds
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale presentation
Etosha - Data Asset Manager : Status and road map
Apache Spark in Scientific Applications
DPG Berlin - SOE 18 - talk v1.2.4
Information Spread in the Context of Evacuation Optimization
Hadoop & Complex Systems Research
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPT
Introduction Database Management System for Course Database
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
System and Network Administraation Chapter 3
PDF
System and Network Administration Chapter 2
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Understanding Forklifts - TECH EHS Solution
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
ai tools demonstartion for schools and inter college
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Introduction Database Management System for Course Database
Design an Analysis of Algorithms II-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
How Creative Agencies Leverage Project Management Software.pdf
System and Network Administraation Chapter 3
System and Network Administration Chapter 2
Essential Infomation Tech presentation.pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
The Five Best AI Cover Tools in 2025.docx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Upgrade and Innovation Strategies for SAP ERP Customers
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
How to Migrate SBCGlobal Email to Yahoo Easily
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Understanding Forklifts - TECH EHS Solution

Time Series Analysis Using an Event Streaming Platform

  • 1. Time Series Analysis Using an Event Streaming Platform Virtual Meetup - September 8-th 2020 Dr. Mirko Kämpf - Solution Architect @ Confluent, Inc. - Team CEMEA
  • 2. Abstract Time Series Analysis using an Event Streaming Platform Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats. In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks. The first case is relevant for anomaly detection and to protect safety. Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research. With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
  • 3. This presentation is about linking: - Time-Series-Analysis (TSA) - Network- or Graph-Analysis Confluent Platform - Complex Event Processing (CEP). Research work ends often with nice charts, scientific papers, and conference talks. But, many published results can’t be reproduced - often because the setup it is simply too complicated ... Question: How can we integrate data streams, experiments, and decision making better?
  • 4. Why not using batch processing? Study anatomy … ● Batch processing is fine: ○ as long as your data doesn’t change. ○ in PoCs for method development in a lab. ○ for research in a fixed scope.
  • 5. Why using Kafka? Study the anatomy … Study and influence the living system ... ● Stream processing is better: ○ for real time business in changing environments. ○ iterative (research) projects. ○ repeatable experiments on replayed data.
  • 6. Content: (1) Intro Typical types of event How to identify hidden events? (2) The Challenge (3) Approach Time Series Analytics & Network Analytics in Kafka Create time series from events Create graphs from time series pairs (4) Demo (5) Architecture & Building Blocks: (6) Unified Domain Model
  • 7. Events - 1 Business events - transaction records - discrete observation A Sale A Trade An Invoice A Customer Experience JUTS SIMPLE OBSERVATION & DATA CAPTURING How to handle events?
  • 8. Events Events - 2 Well defined events - in known context Sometimes: SIMPLE Sometimes: DATA ANALYSIS How to identify events? directly observed “observed” during analysis of an episode
  • 9. Univariate TSA: single episodes are processed - Extreme values - Patterns - Distribution of values - Fluctuation properties - Long-term correlations (memory effects)
  • 10. Events Events - 3 Extreme Events - “outliers” in unknown context ADVANCED DATA ANALYSIS (& ML) How to handle?
  • 11. Multivariate TSA: pairs / tuples of episodes are processed - Comparison Similarity measures for link creation
  • 12. 1212 Reality is Complex: We should simplify a bit! Simplification in our method can lead to isolation: - DATA SILOS - OPERATIONAL SILOS SOLUTION: GRAPHS capture structure. TIME SERIES capture properties over time (history). relations as graph or matrix: objects in groups:
  • 13. WHAT? WHY? Z value is a property of the network’s topology which evolves over time.
  • 14. Special procedures are established. The events which make people & the market happy :-) Complex Event AnalysisEvent Driven Architecture Recap: IT Operations - Server crash - Cyber crime Business Events - Big deal won - Technical issue solved Transactions (in business) - orders placed - products shipped - bills paid Extreme Events: - Service slow down due to emerging bottlenecks - Increased demand in a resource What events are and how to process event-data can be misunderstood or simply unclear. It all depends on our view and our goals!
  • 15. The Challenge: How can you combine unbound data assets and scientific methods? ● You bring data to a place where it can be processed easily, for example to a cloud system or into special purpose systems, such a event processing platform. ● You integrate important algorithms as early as possible in your processing pipeline to get results fast, using streaming processing capabilities.
  • 16. Things can become complicated: Complex Event Analysis Integration Across Domains Extraction of Hidden Event
  • 17. Complex Event Analysis - time series analysis and ML reveal hidden events - multi-stage processing is usually needed METHODOLOGY
  • 18. Integration Across Domains - distributed event processing systems are used - apps consume and produce events of different flavors - event-types and data structures my change over time ORGANIZATION & OPERATIONS
  • 19. Problems on ORGANIZATION level: Many legacy systems can’t be integrated without additional expensive servers. Often, this data is unreachable for externals. Business data is managed by different teams using different technologies. Data scientists use data in the cloud, and they all do really L❤VE notebooks. But often, they don’t use any automation.
  • 20. Extraction of Hidden Events - requires Applied Data Analysis & Data Science - embedding of Complex Algorithms in IT landscape - integration of GPU/HPC and streaming data pipelines TECHNOLOGY & SCIENCE
  • 21. Kafka and its Ecosystem … - are considered to be middleware, managed by IT people: - researchers do not plan nor build their experiments around this technology yet. - don’t offer ML / AI components: - many people think, that a model has to be executed on an edge-device or in the cloud.
  • 22. Yes, Apache Kafka can support agile (data)experiments! - Kafka APIs give access to data (flows) in real time. - allows replay of experiments at a later point in time - Kafka allows variation of analysis without redoing a simulation or ingestion by simply reusing persisted event-streams again. - Kafka Streams and KSQL allow data processing in place: - this allows faster iterations, for example because plausibility checks can be done in place - the streaming API gives freedom for extension of core logic - DSL and KSQL will sav you a lot of implementation time
  • 23. Why not building something great using the right tools ???
  • 24. ADVANCED TIME SERIES ANALYSIS & NETWORK ANALYSIS … how does it work?
  • 25. Network Reconstruction & Topology Analysis: The Approach
  • 26. OpenTSx Network Reconstruction & Topology Analysis: A Standardized Event Processing Pipeline
  • 27. ADVANCED TIME SERIES ANALYSIS & NETWORK ANALYSIS … and how does this fit into Kafka?
  • 28. 28 The following slides will show how TSA concepts and Kafka concepts fit together.
  • 30. Create Time Series from Event Streams: By Aggregation, Grouping, and Sorting Events / Observations event series time series
  • 31. From Table of Events to - Time Series
  • 32. Table Stream Duality ⇒ Time Series and Graphs A time series is a table of ordered observations in a fixed context. A graph can be seen as a table of node- and link- properties - stored in two tables.
  • 33. Create Networks from Event Streams: By Aggregation, Grouping, and Sorting Events / Observations node properties link properties
  • 34. Multi Layer Stream Processing: TSA to Reveal Hidden System Properties Events / Observations event series time series Node properties Link properties Node properties Link properties Static aspects of system topology Dynamic aspects of system topology Multivariate TSA Univariate TSA
  • 35. Dynamic & Static aspects of system topology Complex Event Processing: For Complex Systems System Events & Emerging Patterns Node properties Link properties Topology Analysis
  • 36. Use the Table-Network Analogy: Kafka Graphs https://github.com/rayokota/kafka-graphs large durable graphs: Persisted in Kafka topic
  • 37. Sliding Windows: Define the Temporal Graphs In some use cases, we don’t want to keep the node and link data in topics: - nodes aren’t always linked - changes are very fast - focus on activation patterns, rather than on underlying structure It is fine to calculate the correlation links and the topology metrics on the fly, just for a given time window. t 1 t 2
  • 40. The project will provide reusable streaming apps so that developers implement just their specific logic.
  • 44. Finally, we can compose a complex flow ...
  • 46. Demo: OpenTSx Generate some events and some episodes. Apply some time series processing procedures on: - a stream of episodes. - a stream of events Complex procedures are composed from a set of fundamental building blocks. Visualize the flows and dependencies. https://github.com/kamir/OpenTSx
  • 48. Kafka Connect UDFs: Kafka Connect Data Assets We use 4 building blocks ... Source Connectors integrate sources … Legacy and Future Systems Sink Connectors integrate external targets … Special Purpose Systems Domain specific logic is implemented in small reusable components: Domain Driven Design Data flows are no longer transient. The event log acts as single source of truth. Paradigm Shift in Data Management
  • 49. Confluent Platform Kafka Consumer API Kafka Connect KSQL & Kafka Streams application KSQL UDFs Streaming Applications Primary Data Kafka Producer API Kafka Connect Derived Data … for our Time Series Processing Platform
  • 50. Summary: Because Kafka is a scalable & extensible platform it fits well for complex event processing in any industry on premise and in the cloud. Kafka ecosystem provides extension points for any kind of domain specific or custom functionality - from advanced analytics to real time data enrichment. Complex solutions are composed from a few fundamental building blocks:
  • 51. What to do next? (A) Identify relevant main flows and processing patterns in your project. (B) Identify or implement source / sink connectors and establish 1st flow. (C) Implement custom transformations as Kafka independent components. (D) Integrate the processing topology as Kafka Streams application: (a) Do you apply standard transformations and joins (for enrichment)? (b) Is a special treatment required (advanced analysis)? (c) Do you need special hardware / external services (AI/ML for classification)? (E) Share your connectors and UDFs with the growing Kafka community. (F) Iterate, add more flows and more topologies to your environment.