SlideShare a Scribd company logo
KAFKA Summit EMEA 2022
Andrea Gioia
CTO at Quantyca
Co-Founder at Blindata
Matteo Cimini
Data Architect at Quantyca
Handling Eventual Consistency in a Transactional World
Who we are?
Not an easy question to answer but keeping it simple...
Andrea Gioia
CTO & Partner
andrea.gioia@quantyca.it
Matteo Cimini
Data Architect
Quantyca is a privately owned technological
consulting firm specialized in data and metadata
management based in Italy
quantyca.it
matteo.cimini@quantyca.it
Digital Integration Hub
Where we left off
System of engagement System Of Insight
System of Records
Legacy
Systems
Application
Layer
Digital
Integration
Hub
API Gateway
Event-Based Integration Layer
High-Performance Data Store
Microservices
Metadata Management
Data offloaded from legacy systems are aggregated in read
models stored into dedicated low-latency, high performance
datastores accessible via APIs, events or batch.
The data store synchronizes with the back ends via
event-driven integration patterns.
Benefits
○ Responsive user experience
○ Offload legacy systems from expensive workloads
generated by front-end services
○ Support legacy refactoring
○ Align services to business domain
○ Enable real time analytics
○ Foster a data centric approach to integration
PROS
+ Can handle very high
throughput
CONS
- Not a good fit for complex
events processing
- TCO may not be optimal for
huge data volumes
PROS
+ Low Latency
+ Can handle very high
throughput
+ Simplified schema
management
CONS
- Not a good fit for complex
stateful transformations
- Can have some performance
issues at very high throughput
PROS
+ Largely used by service
developers, probably already
present in the architecture
+ Simplified schema
management
CONS
- Not a good fit for complex
events processing
- Can have some performance
issues at very high throughput
PROS
+ SQL Compliant
+ Transactional (ACID)
+ TCO can be optimized selecting
the right storage strategy
between RAM and disk
CONS
- Can have some performance
issues at very high throughput
- Low latency is not guaranteed
Event Store
The kafka option
RDBMS NoSQL DB
Streaming
Platform
Distributed Cache
Event store on Kafka
Main consistency challenges
TRANSACTIONAL CONSISTENCY: The read model must be
consistent from a transactional point of view with the
upstream source aggregate offloaded form source
system. Kafka is not a transactional system
ORDERING CONSISTENCY: The read model must be
updated only in forward way. Older states cannot replace
newer ones. For complex source aggregates is very easy
to have events delivered out of order.
HISTORICAL CONSISTENCY: The read model must be
easily created at anytime from scratch without
information loss. Infinite retention in kafka is possible but
it is not always a faisable option.
Event Store
Target
Read
Model
Source
Aggregate
Event store on Kafka
Source System Event Store
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
High
Performance
Data Store
Business
Events
(Ease of
consumption)
Commands Micro/Mini
Services
READ
WRITE
Handling consistency challenges in a DIH
Aggregate
Read
Model
Handling consistency challenges at the source
Outbox pattern
Source System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
DESCRIPTION
The source system is modified in order to inserts
messages/events into an outbox table as part of the local
transaction.
The modification can be performed at code or database
level (es. triggers or materialized views).
The connector that offloads data to the streaming platform
is triggered by the outbox table.
PROS
+ It has no overhead in terms of latency and throughput
+ It does not generate extra workloads at the source
CONS
- It’s not always possible to modify the source to
implement this pattern
OUTBOX Table
COMMIT TRX
INSERT
UPDATE
DELETE
INSERT
Handling consistency challenges at the source
Callback Pattern
Source System Streaming Platform
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
DESCRIPTION
All changes to tables that are part of the same aggregate
are mapped to the same topic as technical event that can
contain only the aggregate id and transaction id as payload.
For every transaction id a stream processor query the
legacy database extracting the modified aggregate, filtering
by id, and publishing it as payload of a new domain event
To reduce the workload on legacy the stream processor can
query a read replica
PROS
+ Do not require any modification at the source
CONS
- Even if the footprint at the source is less then a
standard pooling solutions it could be anyway not trivial
especially for high throughput transactional sources
Handling consistency challenges in kafka
In-flight handling pattern
Domain
Events
(Trusted
Views)
Kafka Streams
Ecosystem
Buffering
Events
Closing
Transactions
Ordering
Transactions
Transactions
Metadata
DESCRIPTION
All events that are part of the same aggregate are buffered to the
same changelog topic until the corresponding END transaction
event has been captured.
Closed transactions are then ordered by a Punctuation function
and mapped to the corresponding domain event.
PROS
+ It has minimum overhead in terms of latency and throughput
+ It does not generate extra workloads at the source
+ Do not require any modification at the source
CONS
- Need to write stateful applications
- Need to use Processor API capabilities (low level statestore,
punctuators, etc etc)
- Transaction Metadata must contain
END transaction informations
Streaming Platform
Technical
Events
(Speed &
Fidelity)
Transactions
Metadata
Buffering Layer
Handling consistency challenges in the fast storage
Cross-docking pattern
Technical
Events
(Speed &
Fidelity)
Domain
Events
(Trusted
Views)
Fast Storage
Closing & Ordering
Transactions
Data Model
Business Events
(Ease of consumption)
Streaming Platform
DESCRIPTION
Events that are part of the same aggregate are
buffered in the same buffering table until the
corresponding END transaction event has been
captured.
Closed transactions are then ordered by a
micro-batch processor and then mapped to the
corresponding domain event.
PROS
+ It has medium overhead in terms of latency
and throughput
+ Stateful solution on a stateful storage
(typically strong consistent)
+ Fast Storage is SQL compliant in most cases
+ Business Events can be enriched with
external informations
CONS
- Need to add a component (Fast Storage) to
the overall architecture (DIH)
Takeaways
In a complex and heterogeneous architecture not all consumers can handle eventual consistency
There are different solutions to enforce consistency in an event driven architecture like the digital integration hub
There are no free lunches anyway. Every solution comes with pros and cons. It is important to evaluate them in the context.
The general rule of thumbs is to
○ use the outbox pattern based solution for every newly implemented custom source (and for all legacy sources you are
allowed to modify)
○ decide between in-flight handling and cross docking pattern based solution for existing solution considering latency
trade off and the skills set of your engineering team
What’s next…
We are working to define solution templates at the platform level in order to provide to data product teams consistency
preserving services in a self service way through a declarative interface. More on this next year ;-)
Questions?
Feel free to ask
matteo.cimini@quantyca.it
andrea.gioia@quantyca.it
Corso Milano, 45 / 20900 Monza (MB)
T. +39 039 9000 210 / F. +39 039 9000 211 / @ info@quantyca.it
www.quantyca.it

More Related Content

PDF
KAFKA Summit 2021: From legacy systems to microservices and back.pdf
PDF
Digital integration hub: Why, what and how?
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
PDF
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
PPTX
Software architecture for data applications
PPTX
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
PDF
Confluent Partner Tech Talk with BearingPoint
KAFKA Summit 2021: From legacy systems to microservices and back.pdf
Digital integration hub: Why, what and how?
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Software architecture for data applications
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
Confluent Partner Tech Talk with BearingPoint

Similar to Kafka Summit 2022: Handling Eventual Consistency in a Transactional World.pdf (20)

PDF
Event Driven Architecture: Mistakes, I've made a few...
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Designing data intensive applications - Oleg Mürk
PDF
Designing Data-Intensive Applications
PDF
fundamentalsofeventdrivenmicroservices11728489736099.pdf
PDF
EDA Meets Data Engineering – What's the Big Deal?
PPTX
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
PDF
Event Driven Architecture - Mistakes, I've made a few
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Cassandra as an event sourced journal for big data analytics Cassandra Summit...
PDF
Cassandra as event sourced journal for big data analytics
PDF
Cake Solutions: Cassandra as event sourced journal for big data analytics
PPTX
Event Driven Architectures - Phoenix Java Users Group 2013
PDF
20220311-EB-Designing_Event_Driven_Systems.pdf
PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
PDF
Apache kafka event_streaming___kai_waehner
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PDF
Towards Data Operations
PPTX
Events & Microservices
PDF
The art of the event streaming application: streams, stream processors and sc...
Event Driven Architecture: Mistakes, I've made a few...
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Designing data intensive applications - Oleg Mürk
Designing Data-Intensive Applications
fundamentalsofeventdrivenmicroservices11728489736099.pdf
EDA Meets Data Engineering – What's the Big Deal?
Confluent-Ably-AWS-ID-2023 - GSlide.pptx
Event Driven Architecture - Mistakes, I've made a few
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Cassandra as an event sourced journal for big data analytics Cassandra Summit...
Cassandra as event sourced journal for big data analytics
Cake Solutions: Cassandra as event sourced journal for big data analytics
Event Driven Architectures - Phoenix Java Users Group 2013
20220311-EB-Designing_Event_Driven_Systems.pdf
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Apache kafka event_streaming___kai_waehner
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Towards Data Operations
Events & Microservices
The art of the event streaming application: streams, stream processors and sc...
Ad

More from Andrea Gioia (10)

PDF
DATA & DRINKS: Data Management Trends.pdf
PDF
Ripartire dai dati ponendo le integrazioni al centro della propria strategia.pdf
PDF
I Software passano, i Dati restano.pdf
PDF
THE DATA JOURNEY: FROM CHAOS TO STRATEGY .pdf
PPTX
How to integrate legacy systems within a modern polyglot and event driven arc...
PPTX
IT matters once again
PPTX
Framework tecnologici per i Big Data: Data Lake & Data River
PPTX
Fast data platforms - Hadoop User Group (Italy)
PPT
Open Source Location Intelligence with SpagoBI
PPT
Corso sistemi aperti - Laboratorio - Case Study (SpagoBI)
DATA & DRINKS: Data Management Trends.pdf
Ripartire dai dati ponendo le integrazioni al centro della propria strategia.pdf
I Software passano, i Dati restano.pdf
THE DATA JOURNEY: FROM CHAOS TO STRATEGY .pdf
How to integrate legacy systems within a modern polyglot and event driven arc...
IT matters once again
Framework tecnologici per i Big Data: Data Lake & Data River
Fast data platforms - Hadoop User Group (Italy)
Open Source Location Intelligence with SpagoBI
Corso sistemi aperti - Laboratorio - Case Study (SpagoBI)
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?

Kafka Summit 2022: Handling Eventual Consistency in a Transactional World.pdf

  • 1. KAFKA Summit EMEA 2022 Andrea Gioia CTO at Quantyca Co-Founder at Blindata Matteo Cimini Data Architect at Quantyca Handling Eventual Consistency in a Transactional World
  • 2. Who we are? Not an easy question to answer but keeping it simple... Andrea Gioia CTO & Partner andrea.gioia@quantyca.it Matteo Cimini Data Architect Quantyca is a privately owned technological consulting firm specialized in data and metadata management based in Italy quantyca.it matteo.cimini@quantyca.it
  • 3. Digital Integration Hub Where we left off System of engagement System Of Insight System of Records Legacy Systems Application Layer Digital Integration Hub API Gateway Event-Based Integration Layer High-Performance Data Store Microservices Metadata Management Data offloaded from legacy systems are aggregated in read models stored into dedicated low-latency, high performance datastores accessible via APIs, events or batch. The data store synchronizes with the back ends via event-driven integration patterns. Benefits ○ Responsive user experience ○ Offload legacy systems from expensive workloads generated by front-end services ○ Support legacy refactoring ○ Align services to business domain ○ Enable real time analytics ○ Foster a data centric approach to integration
  • 4. PROS + Can handle very high throughput CONS - Not a good fit for complex events processing - TCO may not be optimal for huge data volumes PROS + Low Latency + Can handle very high throughput + Simplified schema management CONS - Not a good fit for complex stateful transformations - Can have some performance issues at very high throughput PROS + Largely used by service developers, probably already present in the architecture + Simplified schema management CONS - Not a good fit for complex events processing - Can have some performance issues at very high throughput PROS + SQL Compliant + Transactional (ACID) + TCO can be optimized selecting the right storage strategy between RAM and disk CONS - Can have some performance issues at very high throughput - Low latency is not guaranteed Event Store The kafka option RDBMS NoSQL DB Streaming Platform Distributed Cache
  • 5. Event store on Kafka Main consistency challenges TRANSACTIONAL CONSISTENCY: The read model must be consistent from a transactional point of view with the upstream source aggregate offloaded form source system. Kafka is not a transactional system ORDERING CONSISTENCY: The read model must be updated only in forward way. Older states cannot replace newer ones. For complex source aggregates is very easy to have events delivered out of order. HISTORICAL CONSISTENCY: The read model must be easily created at anytime from scratch without information loss. Infinite retention in kafka is possible but it is not always a faisable option. Event Store Target Read Model Source Aggregate
  • 6. Event store on Kafka Source System Event Store Technical Events (Speed & Fidelity) Domain Events (Trusted Views) High Performance Data Store Business Events (Ease of consumption) Commands Micro/Mini Services READ WRITE Handling consistency challenges in a DIH Aggregate Read Model
  • 7. Handling consistency challenges at the source Outbox pattern Source System Streaming Platform Technical Events (Speed & Fidelity) Domain Events (Trusted Views) DESCRIPTION The source system is modified in order to inserts messages/events into an outbox table as part of the local transaction. The modification can be performed at code or database level (es. triggers or materialized views). The connector that offloads data to the streaming platform is triggered by the outbox table. PROS + It has no overhead in terms of latency and throughput + It does not generate extra workloads at the source CONS - It’s not always possible to modify the source to implement this pattern OUTBOX Table COMMIT TRX INSERT UPDATE DELETE INSERT
  • 8. Handling consistency challenges at the source Callback Pattern Source System Streaming Platform Technical Events (Speed & Fidelity) Domain Events (Trusted Views) DESCRIPTION All changes to tables that are part of the same aggregate are mapped to the same topic as technical event that can contain only the aggregate id and transaction id as payload. For every transaction id a stream processor query the legacy database extracting the modified aggregate, filtering by id, and publishing it as payload of a new domain event To reduce the workload on legacy the stream processor can query a read replica PROS + Do not require any modification at the source CONS - Even if the footprint at the source is less then a standard pooling solutions it could be anyway not trivial especially for high throughput transactional sources
  • 9. Handling consistency challenges in kafka In-flight handling pattern Domain Events (Trusted Views) Kafka Streams Ecosystem Buffering Events Closing Transactions Ordering Transactions Transactions Metadata DESCRIPTION All events that are part of the same aggregate are buffered to the same changelog topic until the corresponding END transaction event has been captured. Closed transactions are then ordered by a Punctuation function and mapped to the corresponding domain event. PROS + It has minimum overhead in terms of latency and throughput + It does not generate extra workloads at the source + Do not require any modification at the source CONS - Need to write stateful applications - Need to use Processor API capabilities (low level statestore, punctuators, etc etc) - Transaction Metadata must contain END transaction informations Streaming Platform Technical Events (Speed & Fidelity)
  • 10. Transactions Metadata Buffering Layer Handling consistency challenges in the fast storage Cross-docking pattern Technical Events (Speed & Fidelity) Domain Events (Trusted Views) Fast Storage Closing & Ordering Transactions Data Model Business Events (Ease of consumption) Streaming Platform DESCRIPTION Events that are part of the same aggregate are buffered in the same buffering table until the corresponding END transaction event has been captured. Closed transactions are then ordered by a micro-batch processor and then mapped to the corresponding domain event. PROS + It has medium overhead in terms of latency and throughput + Stateful solution on a stateful storage (typically strong consistent) + Fast Storage is SQL compliant in most cases + Business Events can be enriched with external informations CONS - Need to add a component (Fast Storage) to the overall architecture (DIH)
  • 11. Takeaways In a complex and heterogeneous architecture not all consumers can handle eventual consistency There are different solutions to enforce consistency in an event driven architecture like the digital integration hub There are no free lunches anyway. Every solution comes with pros and cons. It is important to evaluate them in the context. The general rule of thumbs is to ○ use the outbox pattern based solution for every newly implemented custom source (and for all legacy sources you are allowed to modify) ○ decide between in-flight handling and cross docking pattern based solution for existing solution considering latency trade off and the skills set of your engineering team What’s next… We are working to define solution templates at the platform level in order to provide to data product teams consistency preserving services in a self service way through a declarative interface. More on this next year ;-)
  • 12. Questions? Feel free to ask matteo.cimini@quantyca.it andrea.gioia@quantyca.it
  • 13. Corso Milano, 45 / 20900 Monza (MB) T. +39 039 9000 210 / F. +39 039 9000 211 / @ info@quantyca.it www.quantyca.it