SlideShare a Scribd company logo
A Global Source of Truth for the
Microservices Generation
Ben Stopford
Office of the CTO
Confluent
@benstopford
Where does the data live?
In the Events
Trade
Surveillance
Project
• 9 months sourcing 16 data sets
• Different formats (including for historical extracts)
• Batch based approach
Event Streams
Orders
Payments
Customers
Distinct Visits
Destination
Elasticsearch
Postgres
AWS Lambda
Other Kafka
Select Organizational Events
Stream Processing
SELECT *
FROM ORDERS O, CUSTOMERS C
WHERE O.REGION = ‘EU’
AND C.TYPE = ‘Platinum’
Msgs/Day
Customers
Stream Processing
Elastic
Lambda
Orders
History
1w
All
Event-driven designs are (mostly) location independent
Apps Apps Apps
Apps
Search Monitoring
Apps Apps
Apps Apps Apps
Apps
Search Monitoring
Apps Apps
Apps
Search
NoSQL
Apps
Apps
DWH
Hado
STREAM
ING
PLATFORM
Apps
Search
NoSQL
Apps
DWH
STREAMING
PLATFORM
PRODUCERCONSUMER
Streaming Platform
Event Storage
Kafka stores
petabytes of data
Stream Processing
Real-time processing
over streams and tables
Scalability
Clusters of hundreds
of machines. Global.
+ + +
Messaging + …
Stream Processing
Formulae 1 – Race Telemetry
• 400 Sensors on car
• 70,000 derivative
measures
• Events streamed back to
base
• Analyzed in real time
• Tire modelling
• Racing line
• Aerodynamics
• Machine Learning and
Physics Models.
• Replayed later for post
race analysis.
Race Track HQ
e.g. Tire modelling:
- Temp
- Pressure
- Suspension compression
Stream Processing
Post race analysis
Analytics
SourceofTruth
This is a form of
Event Sourcing
We can apply this idea to any application
What is event sourcing?
In Event Sourcing events are
immutable, stateless and
truthful.
A Shopping Cart as Events
Shopping Cart Events
2 Trousers added
1 Jumper added
1 Trousers removed
1 Hat added
Checkout
Shopping Cart
Traditional Event Sourcing
(Store raw events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Journal of every state change
Save State Changes as Events
Apps
Search Monitoring
Apps Apps
Traditional Event Sourcing
(Derive current state from truthful events)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Save State Changes as Events
Apps
Search Monitoring
Apps Apps
Apply Projection
Query by
customer Id
- Projection applied on read
- Constantly rederived from
truthful events
- No schema migration
Using Kafka: A Distributed Log
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
All events, stored indefinitely
Using Kafka: Log, but no query
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Can’t query by
CustomerId
CustomerId
CustomerId
CustomerId
CustomerId
CQRS with Kafka
Using events to build a view (DB, Cache, Stream Processor)
Apps
Search Mo
Apps Apps
S T R E A M I N G P L AT F O R M
Projection
(Stream Processor)
Query by customer Id
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Events
Events accumulate in the log
- Event stream is source of truth
- View can be a DB, Cache or
Stateful Stream Processor
- View can be re-derived from
the event stream
http://bit.ly/kafka-microservice-examples
Does anyone actually do
this?
New York Times
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”
What do I do if I already
have a database?
Alternate Approach: “Write Through”
(Event model in DB, CDC Connector)
Apps
Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
Write
Query
Every write becomes an event
Note:
- Database is now the source
of truth.
- Events are a “cache”
available to others.
- Users can read their writes
immediately (not true of
CQRS)
COMMON
IN PRACTICE
What about
Microservices?
We can repurpose the event stream
Apps
Search
NoSQL
M
S
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Shipping Service
Source of Truth
Full-text Search
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
View
Join datasets from many different sources in real-time
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API
Create Aggregate Streams
(easier to consume, keep apps stateless)
Orders
Service
Payment
Service
Customer
Service
Aggregate Events
Apps
Search
Apps
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L AT F O
NoSQL
Order
Payment
Customer
- Historical and real-time data are
both self service (pluggable)
- Source systems don’t need to
republish
- Views are use case specific /
decoupled / autonomous
- Encourages Event-driven design
Billin
g
Shipping
Fraud Fraud
Fulfilment
The Source of Truth
Services flex around a central source of truth
Many views
derived from
the log
Apps
M
onitorin
Apps
Apps
Hadoop
R
E
A
M
IN
G
P
L
A
T
F
O
R
M
a.k.a. Forward Deployed Event Cache, The Database Inside Out
Event driven
services
Apps
Search
NoSQL
Apps
Apps
DW
H
S
T
R
EA
M
IN
G
P
LA
T
FO
R
M
All patterns involve trade offs
Do I need to store
events in a messaging
system?
Global
Deployment
Multi-Team
Cluster
Automated data
provisioning.
Cached
Datasets &
Streaming Apps
5
4
3
2
Investment & time
Single Team,
Microservices /
Streaming Analytics
1
Value
It’s a pattern, adopt it when you’re ready
Stateful Stream Processing requires storage
Transaction
Payments
KStreams
Customers Table
(Read Only)
Intermediary State
(Read/Write)
Orders
Event Storage
Start with Dimensions
Facts
(Streams)
Dimensions
(Tables)
Orders
Visits
Payments
CustomersAccountsProducts
Large, High
Velocity.
Small, Low
Velocity.
Dimensions typically
only useful as a whole
dataset
Stateful Stream
Processing is Stateful.
Aren’t stateful
applications bad?
Separate stateful and stateless operations
(Just like you do with a database)
KSQL
Stateful
Data Layer
Stateless
Application layer
Business logic
goes here
Source of Truth
For the hip and trendy, use FaaS
KSQL
Stateless
FaaS
FaaS
FaaS
FaaS
Autoscale
Stateful
Data Layer
Won’t reloading events
and applying projections
be slow?
Writes are typically the limiting factor
Kafka Streams:
• RocksDB: capable of ~10M x 500 KB objects per minute on top end
hardware (roughly GbE speed)
Regular database:
• Postgres will bulk-load ~1M rows per minute.
(Kafka delivers data at ~network speed)
Lean Data – take only the data you need
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
Apps
Apps
Search
NoSQL
Apps A
DWH
Hadoop
S T R E A M I N G P L AT F O R M
If messaging remembers, databases don’t have to.
SELECT O.OrderId, C.Email
FROM ORDERS O, CUSTOMERS C
WHERE O.REGION = ‘EU’
AND C.TYPE = ‘Platinum’
Is Kafka built for long
term storage?
It’s ok to Store Data in Kafka
• Largely built by a guy who built databases (DB2…)
• Log files are immutable once they roll
• (unless it’s a compacted topic)
• Log is O(1) read, O(1) write
• But care required: Writes can block behind historical
scans
• Some users run dedicated clusters for reading old data
• ZFS has several page cache optimizations
• Tiered storage would help
What about GDPR?
Anonymize with a Stream Processor
Anonymized events
Anonymization metadata
Delete messages by key with a compacted topic
https://www.confluent.io/blog/handling-gdpr-log-forget/
Evolving with events
Events are immutable, stateless
and truthful.
Events as a Global Source of Truth
In summary
• Broadcast events.
• Cache shared datasets in the log and make them discoverable.
• Let users manipulate event streams directly.
• Drive simple microservices, or prepare use case specific views in
a DB of your choice.
Self-service data, wherever you are,
in whatever form you need, at whatever scale.
Thank you
@benstopford
Microservices blog with associated code
http://bit.ly/kafka-microservice-examples
Book:
https://www.confluent.io/designing-event-driven-systems

More Related Content

PDF
The Future of Streaming: Global Apps, Event Stores and Serverless
PPTX
10 Principals for Effective Event Driven Microservices
PDF
Amsterdam meetup at ING June 18, 2019
PDF
Real time data processing and model inferncing platform with Kafka streams (N...
PPTX
10 Principals for Effective Event-Driven Microservices with Apache Kafka
PDF
Top use cases for 2022 with Data in Motion and Apache Kafka
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
PDF
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
The Future of Streaming: Global Apps, Event Stores and Serverless
10 Principals for Effective Event Driven Microservices
Amsterdam meetup at ING June 18, 2019
Real time data processing and model inferncing platform with Kafka streams (N...
10 Principals for Effective Event-Driven Microservices with Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

What's hot (20)

PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
PDF
Kafka Streams State Stores Being Persistent
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
PDF
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
PPTX
JUG Tirana - Introduction to data streaming
PDF
Understanding the TCO and ROI of Apache Kafka & Confluent
PPTX
Stream me to the Cloud (and back) with Confluent & MongoDB
PDF
Why Build an Apache Kafka® Connector
PDF
Building event-driven Microservices with Kafka Ecosystem
PDF
Apache Kafka® and Analytics in a Connected IoT World
PDF
How to Quantify the Value of Kafka in Your Organization
PPTX
Bridge Your Kafka Streams to Azure Webinar
PDF
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
PDF
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
PDF
Events Everywhere: Enabling Digital Transformation in the Public Sector
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
PPTX
A guide through the Azure Messaging services - Update Conference
PDF
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
PDF
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Kafka Streams State Stores Being Persistent
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
JUG Tirana - Introduction to data streaming
Understanding the TCO and ROI of Apache Kafka & Confluent
Stream me to the Cloud (and back) with Confluent & MongoDB
Why Build an Apache Kafka® Connector
Building event-driven Microservices with Kafka Ecosystem
Apache Kafka® and Analytics in a Connected IoT World
How to Quantify the Value of Kafka in Your Organization
Bridge Your Kafka Streams to Azure Webinar
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Building a Secure, Tamper-Proof & Scalable Blockchain on Top of Apache Kafka ...
Events Everywhere: Enabling Digital Transformation in the Public Sector
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
A guide through the Azure Messaging services - Update Conference
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...
Ad

Similar to A Global Source of Truth for the Microservices Generation (20)

PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
PDF
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
PDF
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
PPTX
Data Streaming with Apache Kafka & MongoDB
PDF
xGem Data Stream Processing
PPTX
Data Streaming with Apache Kafka & MongoDB - EMEA
PPTX
Webinar: Data Streaming with Apache Kafka & MongoDB
PPTX
Snowplow Analytics: from NoSQL to SQL and back again
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
PPTX
Big Data Beers - Introducing Snowplow
PDF
Infochimps: Cloud for Big Data
PDF
Mobile Analytics mit Elasticsearch und Kibana
PDF
Streaming Visualization
PDF
Getting Started with Splunk Enterprise
PDF
Time's Up! Getting Value from Big Data Now
PDF
Confluent kafka meetupseattle jan2017
PDF
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
PDF
20141021 AWS Cloud Taekwon - Big Data on AWS
PDF
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Apache Kafka as Event Streaming Platform for Microservice Architectures
Data Streaming with Apache Kafka & MongoDB
xGem Data Stream Processing
Data Streaming with Apache Kafka & MongoDB - EMEA
Webinar: Data Streaming with Apache Kafka & MongoDB
Snowplow Analytics: from NoSQL to SQL and back again
[WSO2Con EU 2018] The Rise of Streaming SQL
Big Data Beers - Introducing Snowplow
Infochimps: Cloud for Big Data
Mobile Analytics mit Elasticsearch und Kibana
Streaming Visualization
Getting Started with Splunk Enterprise
Time's Up! Getting Value from Big Data Now
Confluent kafka meetupseattle jan2017
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
20141021 AWS Cloud Taekwon - Big Data on AWS
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Ad

More from Ben Stopford (20)

PDF
Building Event Driven Services with Kafka Streams
PDF
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
PDF
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
PDF
Building Event Driven Services with Stateful Streams
PDF
Devoxx London 2017 - Rethinking Services With Stateful Streams
PDF
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
PDF
Event Driven Services Part 1: The Data Dichotomy
PDF
Event Driven Services Part 3: Putting the Micro into Microservices with State...
PDF
Strata Software Architecture NY: The Data Dichotomy
PDF
The Power of the Log
PDF
Streaming, Database & Distributed Systems Bridging the Divide
PDF
Data Pipelines with Apache Kafka
PDF
JAX London Slides
PDF
Microservices for a Streaming World
PDF
A little bit of clojure
PPTX
Big iron 2 (published)
PDF
The return of big iron?
PDF
Big Data & the Enterprise
PDF
Where Does Big Data Meet Big Database - QCon 2012
PPTX
Advanced databases ben stopford
Building Event Driven Services with Kafka Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Strata Software Architecture NY: The Data Dichotomy
The Power of the Log
Streaming, Database & Distributed Systems Bridging the Divide
Data Pipelines with Apache Kafka
JAX London Slides
Microservices for a Streaming World
A little bit of clojure
Big iron 2 (published)
The return of big iron?
Big Data & the Enterprise
Where Does Big Data Meet Big Database - QCon 2012
Advanced databases ben stopford

Recently uploaded (20)

PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Transform Your Business with a Software ERP System
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
ai tools demonstartion for schools and inter college
PDF
System and Network Administration Chapter 2
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Digital Strategies for Manufacturing Companies
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
history of c programming in notes for students .pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
ManageIQ - Sprint 268 Review - Slide Deck
Transform Your Business with a Software ERP System
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Understanding Forklifts - TECH EHS Solution
ai tools demonstartion for schools and inter college
System and Network Administration Chapter 2
Navsoft: AI-Powered Business Solutions & Custom Software Development
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Digital Strategies for Manufacturing Companies
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
2025 Textile ERP Trends: SAP, Odoo & Oracle
PTS Company Brochure 2025 (1).pdf.......
Online Work Permit System for Fast Permit Processing
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Design an Analysis of Algorithms I-SECS-1021-03
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
history of c programming in notes for students .pptx

A Global Source of Truth for the Microservices Generation

  • 1. A Global Source of Truth for the Microservices Generation Ben Stopford Office of the CTO Confluent @benstopford
  • 2. Where does the data live? In the Events
  • 3. Trade Surveillance Project • 9 months sourcing 16 data sets • Different formats (including for historical extracts) • Batch based approach
  • 4. Event Streams Orders Payments Customers Distinct Visits Destination Elasticsearch Postgres AWS Lambda Other Kafka Select Organizational Events Stream Processing SELECT * FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’ Msgs/Day Customers Stream Processing Elastic Lambda Orders History 1w All
  • 5. Event-driven designs are (mostly) location independent
  • 6. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hado STREAM ING PLATFORM Apps Search NoSQL Apps DWH STREAMING PLATFORM PRODUCERCONSUMER Streaming Platform
  • 7. Event Storage Kafka stores petabytes of data Stream Processing Real-time processing over streams and tables Scalability Clusters of hundreds of machines. Global. + + + Messaging + …
  • 9. Formulae 1 – Race Telemetry • 400 Sensors on car • 70,000 derivative measures • Events streamed back to base • Analyzed in real time • Tire modelling • Racing line • Aerodynamics • Machine Learning and Physics Models. • Replayed later for post race analysis. Race Track HQ e.g. Tire modelling: - Temp - Pressure - Suspension compression Stream Processing
  • 12. This is a form of Event Sourcing We can apply this idea to any application
  • 13. What is event sourcing?
  • 14. In Event Sourcing events are immutable, stateless and truthful.
  • 15. A Shopping Cart as Events Shopping Cart Events 2 Trousers added 1 Jumper added 1 Trousers removed 1 Hat added Checkout Shopping Cart
  • 16. Traditional Event Sourcing (Store raw events in a database in time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Journal of every state change Save State Changes as Events Apps Search Monitoring Apps Apps
  • 17. Traditional Event Sourcing (Derive current state from truthful events) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Save State Changes as Events Apps Search Monitoring Apps Apps Apply Projection Query by customer Id - Projection applied on read - Constantly rederived from truthful events - No schema migration
  • 18. Using Kafka: A Distributed Log Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M All events, stored indefinitely
  • 19. Using Kafka: Log, but no query Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Can’t query by CustomerId CustomerId CustomerId CustomerId CustomerId
  • 20. CQRS with Kafka Using events to build a view (DB, Cache, Stream Processor) Apps Search Mo Apps Apps S T R E A M I N G P L AT F O R M Projection (Stream Processor) Query by customer Id Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events Events accumulate in the log - Event stream is source of truth - View can be a DB, Cache or Stateful Stream Processor - View can be re-derived from the event stream http://bit.ly/kafka-microservice-examples
  • 22. New York Times Source of Truth Every article since 1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”
  • 23. What do I do if I already have a database?
  • 24. Alternate Approach: “Write Through” (Event model in DB, CDC Connector) Apps Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Write Query Every write becomes an event Note: - Database is now the source of truth. - Events are a “cache” available to others. - Users can read their writes immediately (not true of CQRS) COMMON IN PRACTICE
  • 26. We can repurpose the event stream Apps Search NoSQL M S Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Shipping Service Source of Truth Full-text Search Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M View
  • 27. Join datasets from many different sources in real-time Fraud Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API
  • 28. Create Aggregate Streams (easier to consume, keep apps stateless) Orders Service Payment Service Customer Service Aggregate Events Apps Search Apps Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O NoSQL Order Payment Customer
  • 29. - Historical and real-time data are both self service (pluggable) - Source systems don’t need to republish - Views are use case specific / decoupled / autonomous - Encourages Event-driven design Billin g Shipping Fraud Fraud Fulfilment The Source of Truth Services flex around a central source of truth Many views derived from the log Apps M onitorin Apps Apps Hadoop R E A M IN G P L A T F O R M a.k.a. Forward Deployed Event Cache, The Database Inside Out Event driven services Apps Search NoSQL Apps Apps DW H S T R EA M IN G P LA T FO R M
  • 30. All patterns involve trade offs
  • 31. Do I need to store events in a messaging system?
  • 32. Global Deployment Multi-Team Cluster Automated data provisioning. Cached Datasets & Streaming Apps 5 4 3 2 Investment & time Single Team, Microservices / Streaming Analytics 1 Value It’s a pattern, adopt it when you’re ready
  • 33. Stateful Stream Processing requires storage Transaction Payments KStreams Customers Table (Read Only) Intermediary State (Read/Write) Orders Event Storage
  • 34. Start with Dimensions Facts (Streams) Dimensions (Tables) Orders Visits Payments CustomersAccountsProducts Large, High Velocity. Small, Low Velocity. Dimensions typically only useful as a whole dataset
  • 35. Stateful Stream Processing is Stateful. Aren’t stateful applications bad?
  • 36. Separate stateful and stateless operations (Just like you do with a database) KSQL Stateful Data Layer Stateless Application layer Business logic goes here Source of Truth
  • 37. For the hip and trendy, use FaaS KSQL Stateless FaaS FaaS FaaS FaaS Autoscale Stateful Data Layer
  • 38. Won’t reloading events and applying projections be slow?
  • 39. Writes are typically the limiting factor Kafka Streams: • RocksDB: capable of ~10M x 500 KB objects per minute on top end hardware (roughly GbE speed) Regular database: • Postgres will bulk-load ~1M rows per minute. (Kafka delivers data at ~network speed)
  • 40. Lean Data – take only the data you need Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search NoSQL Apps A DWH Hadoop S T R E A M I N G P L AT F O R M If messaging remembers, databases don’t have to. SELECT O.OrderId, C.Email FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’
  • 41. Is Kafka built for long term storage?
  • 42. It’s ok to Store Data in Kafka • Largely built by a guy who built databases (DB2…) • Log files are immutable once they roll • (unless it’s a compacted topic) • Log is O(1) read, O(1) write • But care required: Writes can block behind historical scans • Some users run dedicated clusters for reading old data • ZFS has several page cache optimizations • Tiered storage would help
  • 44. Anonymize with a Stream Processor Anonymized events Anonymization metadata
  • 45. Delete messages by key with a compacted topic https://www.confluent.io/blog/handling-gdpr-log-forget/
  • 47. Events are immutable, stateless and truthful.
  • 48. Events as a Global Source of Truth
  • 49. In summary • Broadcast events. • Cache shared datasets in the log and make them discoverable. • Let users manipulate event streams directly. • Drive simple microservices, or prepare use case specific views in a DB of your choice.
  • 50. Self-service data, wherever you are, in whatever form you need, at whatever scale.
  • 51. Thank you @benstopford Microservices blog with associated code http://bit.ly/kafka-microservice-examples Book: https://www.confluent.io/designing-event-driven-systems