The Streaming
Transformation
Ben Stopford
@benstopford
Build
Features
Build for
the Future
Evolution!
KAFKA
Serving
Layer
(Cassandra etc)
Kafka Streams /
KSQL
Streaming Platforms
Data is embedded in
each engine
High Throughput
Messaging
Clustered
Java App
authorization_attempts possible_fraud
Streaming Example
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
Streaming == Manipulating Data in Flight
• Join
• Aggregate
• Map
• Reduce
• Peek
• Transform
• (any arbitrary code)
• Window
• Transactions
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
What is a Distributed Log?
Shard on the way in
Producing
Services
Kafka
Consuming
Services
Each shard is a queue
Producing
Services
Kafka
Consuming
Services
Share Load / Fault Tollerant
Producing
Services
Kafka
Consuming
Services
Retain datasets in the log. “Rewind & Replay”.
Rewind & Replay
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Kafka Connect
Kafka
Connect
Kafka
Connect
Kafka
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
KSQL is SQL over Kafka Streams
Kafka Streams is just an API
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
builder.stream(”caterpillars")
.map((k, v) -> coolTransformation(k, v))
.to(“butterflies”);
new KafkaStreams(builder.build(), props()).start();
}
24
KAFKA
Buffer 10 mins
Windows
Windows / Retention – Handle Late Events
KAFKA
Buffer 5 mins
Join by Key
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
orders.join(payments, KeyValue::new, JoinWindows.of(10 * MIN))
.peek((key, pair) -> emailer.sendMail(pair));
Lookup tables
KAFKA
A KTable is just a stream with infinite retention
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
KTable customers = builder.table(“Customers”);
orders.join(payments, EmailTuple::new, JoinWindows.of(10*MIN))
.join(customers, (tuple, cust) -> tuple.setCust(cust))
.peek((key, tuple) -> emailer.sendMail(tuple));
KAFKA
Join
Materialize a
table in two
lines of code!
Dataset Moves
to Client
Streaming is about
1. Processing data incrementally
2. Moving data to where it needs to be
processed (quickly and efficiently)
Kafka: a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Business
Applications
EcosystemsApp
Increasingly we build
ecosystems
SOA / Microservices / EDA
Customer
Service
Shipping
Service
The Problem is DATA
Most services share the same core facts.
Catalog
Most services
live in here
Buying an iPad with REST
Submit
Order
shipOrder() getCustomer()
Orders
Service
Shipping
Service
Customer
Service
Webserver
Buying an iPad with Events
Message Broker (Kafka)
Notification Data is
replicated
(incrementally)
Submit
Order
Order
Created
Customer
Updated
Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
Events for Notification Only
Message Broker (Kafka)
Submit
Order
Order
Created
getCustomer()
REST
Notification
Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
Events for Data Locality
Customer
Updated
Submit
Order
Order
Created
Data is
replicated
(incrementally)Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
Events have two hats
Notification Data
replication
Apply Stream Processing
Streaming is about manipulating data in flight
Kafka is a high throughput
Messaging & Storage System
Orders
Service
Shipping
Service
Customer
Service
KAFKA
Web App
Kafka Streams API (or KSQL)
An embedded API for data in flight
Orders
Service
Shipping
Service
Customer
Service
KAFKA
Web App
Streaming platforms
optimize for moving
data to code!?!
Add a more data
intensive use case
A Scrollable Grid
Orders
Service
Customer
Service
Web App
Scrollable Grid
Customer & Order in
each row
Many rows
Add caching -> problems of its own
Orders
Service
Customer
Service
Web App
Scrollable Grid
The Streaming Way
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
(with RocksDB)
KStreams API
Select * from
orders, customers
where…
Streams & Tables
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
Orders
provide
Notification
Customers
are replicated
Add Payments Service & Window
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
Orders
provide
Notification
Customers
are replicated
Payments
Service
Buffer /
Window
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
Payments
Service
Query Runs INSIDE the webserver
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
Payments
Service
Events are stored in Kafka (e.g. Customers)
Orders
Service
Customer
Service
KAFKA
Web App
Scrollable Grid
Payments
Service
Streaming is about Data Movement
Materialized View
POST
GET
Load
Balancer
ORDERSORDERS
OVTOPIC
Order
Validations
KAFKA
INVENTORY
Orders
Inventory
Fraud
Service
Order
Details
Service
Inventory
Service
(see previous figure)
Order
Created
Order
Validated
Orders
View
Q in CQRS
Orders
Service
C is CQRS
Services in the Micro: Orders Service
Find the
code online!
Orders Customers
Payments
Stock
Query Engine (Kstreams/KSQL)
Larger Ecosystems
HISTORICAL
EVENT STREAMS
Kafka
KAFKA
New York
Tokyo
London
Global / Disconnected Ecosystems
WIRED Principals
• Windowed: Use an API built for async events
• Immutable: Store events in an immutable log
• Repeatable: Compose from side-effect free functions
• Evolutionary: Be pluggable. Have data available in the log.
• Data-Enabled: Push data to services where necessary
Makes it easier to evolve!
The Streaming
Transformation
References
• Confluent Microservices Series:
https://www.confluent.io/blog/tag/microservices
• Code examples:
https://github.com/confluentinc/kafka-streams-
examples
Twitter:
@benstopford

More Related Content

PDF
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
PDF
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PDF
ksqlDB - Stream Processing simplified!
PDF
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
PDF
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
ksqlDB - Stream Processing simplified!
Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...

What's hot (20)

PDF
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
PDF
Building a Streaming Platform with Kafka
PDF
Introduction to the Processor API
PDF
KSQL: Open Source Streaming for Apache Kafka
PDF
The State of Stream Processing
PDF
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
PDF
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
PPTX
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
PDF
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
PDF
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
PPTX
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
PDF
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
PDF
Introduction to Stream Processing
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
PDF
Location Analytics - Real-Time Geofencing using Kafka
PDF
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Building a Streaming Platform with Kafka
Introduction to the Processor API
KSQL: Open Source Streaming for Apache Kafka
The State of Stream Processing
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Event Hub (i.e. Kafka) in Modern Data Architecture
Introduction to Stream Processing
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Location Analytics - Real-Time Geofencing using Kafka
Kafka Summit NYC 2017 - Singe Message Transforms are not the Transformations ...
Ad

Similar to Big Data LDN 2017: The Streaming Transformation (20)

PDF
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
PDF
10 essentials steps for kafka streaming services
PDF
First Steps with Apache Kafka on Google Cloud Platform
PDF
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
PDF
Streaming ETL with Apache Kafka and KSQL
PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
How to Build Streaming Apps with Confluent II
PDF
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
PDF
Confluent kafka meetupseattle jan2017
PDF
Concepts and Patterns for Streaming Services with Kafka
PDF
JAX London Slides
PDF
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
PPTX
Beyond Microservices: Streams, State and Scalability
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
ODP
Stream processing using Kafka
PDF
Streaming Visualisation
PDF
ksqlDB Workshop
PPTX
Apache kafka
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
10 essentials steps for kafka streaming services
First Steps with Apache Kafka on Google Cloud Platform
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Streaming ETL with Apache Kafka and KSQL
Streaming Data and Stream Processing with Apache Kafka
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
How to Build Streaming Apps with Confluent II
Akka Streams And Kafka Streams: Where Microservices Meet Fast Data
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
Confluent kafka meetupseattle jan2017
Concepts and Patterns for Streaming Services with Kafka
JAX London Slides
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Beyond Microservices: Streams, State and Scalability
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Stream processing using Kafka
Streaming Visualisation
ksqlDB Workshop
Apache kafka
Ad

More from Matt Stubbs (20)

PDF
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
PDF
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
PDF
Blueprint Series: Expedia Partner Solutions, Data Platform
PDF
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
PDF
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
PDF
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
PDF
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
PDF
Big Data LDN 2018: AI VS. GDPR
PDF
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
PDF
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
PDF
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
PDF
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
PDF
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
PDF
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
PDF
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPT
Predictive modeling basics in data cleaning process
PPTX
Business_Capability_Map_Collection__pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
modul_python (1).pptx for professional and student
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PDF
Microsoft Core Cloud Services powerpoint
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Microsoft 365 products and services descrption
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPT
Image processing and pattern recognition 2.ppt
STERILIZATION AND DISINFECTION-1.ppthhhbx
CYBER SECURITY the Next Warefare Tactics
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
New ISO 27001_2022 standard and the changes
Predictive modeling basics in data cleaning process
Business_Capability_Map_Collection__pptx
[EN] Industrial Machine Downtime Prediction
modul_python (1).pptx for professional and student
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
A Complete Guide to Streamlining Business Processes
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
Microsoft Core Cloud Services powerpoint
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
SAP 2 completion done . PRESENTATION.pptx
Global Data and Analytics Market Outlook Report
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Microsoft 365 products and services descrption
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Image processing and pattern recognition 2.ppt

Big Data LDN 2017: The Streaming Transformation