SlideShare a Scribd company logo
Scaling a core banking engine
using Apache Kafka
Peter Dudbridge (aka. Dudders)
Engineering Director
Thought Machine
What will we cover?
The relationship between correctness and scale, through the lens of a core banking system
● To guide us through this subject we will tell a story starting with the monolith and ending with a
scalable distributed system
● We will encounter some patterns and anti patterns on the way
● We will see some tricky tradeoffs, but also show that with the right design, the need not hold us
back!
● Loosely based on real life experiences!
Note: I assume you know what kafka and microservices are (if not you may be lost)
What is core banking and why do we care?
The most important system that you’ve never heard of (unless something goes
wrong!)
● We need both correctness and scale!
● You want 24/7 access to your money
● You don’t want to lose all your money!
● You also don’t want to pay for your current account…
Core banking, at its heart, is:
● Product engine
● Ledger (accounts + balances)
The monolith
Works great!
● Everything is one machine
● No network calls! Yay!
● Total ordering, single clock
● Need more resources?
> Mainframe: we’ve got you covered!
Waaait what year is it again?
● Glass ceiling
● Very expensive
● Core banking traffic is very spiky
● HA / DR
You know the answer to the problem: microservices!
● We need an architecture that can scale horizontally, on demand: that
means microservices
● Has anyone tried carving up a monolith?
● We quickly realize the things that worked well in the monolith fall flat on
their face when distributed
● We also quickly realize that we’re now playing a game of trade offs
○ Latency vs throughput
○ Consistency vs availability
○ Correctness vs everything!
● Usually the first thing we realize is we need to embrace asynchronousity
Towards microservices
Why Kafka
Why am I speaking at this conference... we just need an async queue type thing right?
why use Kafka?
● Kafka streams can be partitioned, which allows it to scale
● Kafka can be configured to be durable (good for banking!)
● Kafka is a commit log!
○ This has changed the game with regards to stream processing
○ Towards a general theory of table and stream relativity:
The aggregation of a stream of updates over time yields a table
The observation of changes to a table over time yields a stream
- Streaming Systems
Some prerequisites
Here are some prerequisites for correctness that aren’t the focus of today’s talk
● We choose consistency over availability (when we are forced to make a choice)
● Idempotency and retryability is baked in
● This gives us effective once delivery
● Holistic consistency
● Dual writes - pick your favourite pattern (we use transactional outbox)
Let’s solutionize!
So let’s start with the general problem we’re trying to solve (then immediately start solutionizing)
● We’re processing a payment. In core banking lingo this is a posting
● In Thought Machine we have a super flexible product engine that is driven by an abstraction we
refer to as Smart Contracts (not dogecoin, sorry)
● The trade off with having super flexibility, is we have to do super loads of work on the hot path
Correctness - our first battle
Since streams are data in motion, tables are data at rest, we decide to store our ledger / balances and
accounts in a database
We build our prototype but quickly realize a problem
● When two users concurrently access one account, we see a race condition
● Fortunately there’s a common pattern for solving this!
Scale - our first battle
So now our system is ‘correct’, however this has come at a price, let’s open up our observability dashboard
to see if we can’t offset this cost!
● We’re spending all of our time on the network
● We’re having to scale out our application to handle more requests, but our database is running out
of connections!
● We’re burning cash and we seem to be hitting the exact problem we moved away from the
monolith to solve
● Batching is the answer. Don’t worry - it’s a subset of streaming
Batching strategies
The big questions is - what should the batch size be?
● No ‘one size fits all’ answer
● Consider what work we are batching:
○ Kafka consumer batches
○ Calls our consumer might be making to another microservice
○ Database queries / commands
○ Kafka producing batches
○ Kafka committing offsets
● Rule of thumb - try a few strategies out!
Batching strategies
#strategy 1 - one batch to rule them all
● Maintain whatever batch size we pull from Kafka in our consumers
● We can tweak our batch size via the Kafka consumer settings:
fetch.max.wait.ms | fetch.min.bytes | fetch.max.bytes
● Works great if our batch is homogeneous
● Easy to reason about architecture, single place to configure out batch sizes
● Assumes the batch size the works for kafka also works for the DB, calls to other microservices, etc.
Batching strategies
#strategy 2 - fan out
● If our batch is heterogeneous we might see that some elements of the batch process quicker than
others, in #strategy 1 we always pay the price of the slowest element
● If our batch contains different types of work (e.g. different messages need routing to different
places) fan out might be our only option - if this is the case first consider separate topics
● If we fan out we might get benefits with parallelizing work (always test this!)
● The long tail problem isn’t trivial to solve
● Consume a batch of 3 from Kafka
● Fan out the request
● Decide how long to wait, save the long tail to local storage
● Commit all offsets and pull next batch
Batching strategies
#strategy 3 - dynamic batch sizing
If our batches are super heterogeneous / unpredictable a batch size that works well today might not work
so well tomorrow (singles day anyone?)
● Monitor our batch response time / error rate over time
● If the response time gets quicker increase the batch size
● If the response time gets slower and we observe timeouts, decrease the batch size
● Use sensible bounds! Lots of things can affect response times so is easy to scale on a false positive!
Correctness: we meet again...
We play around with batch sizes / strategies, and we find that sweet spot that gives us the best
throughput / latency profile, however we spot something odd when perusing the observability
dashboards:
Woah - so we introduced batching to mitigate the cost of correctness, but it looks like we hit a point
where it stops to help, and even makes it worse!
● The bigger the batch, the more likely we are to have a lock conflict with a batch being processed by
another consumer
Pro tip - know your bottlenecks!
Correctness: we meet again...
If only there was a way we could reduce the number of lock
conflicts then maybe we could further increase our batch size and
get more perf gainz?
… what if there was a way we could make sure that messages that might conflict always go to the same
consumer? That way we could resolve the conflict in the application, or evict it to the next batch?
Ordering
Ordering!
● A key feature of Kafka is the consistent hashing of user defined partition keys, guaranteeing that
any message with a given partition keys’ value, will always go to the same consumer (until a
rebalance happens)
● If we partition on Account ID, we can capitalize on the affinity this gives us
Ordering
Warning! Ordering is a double edged sword
You may recall that we had a total ordering in our monolith, which made things nice
● The only way to have a total ordering with Kafka is to limit our topic to 1 partition (and as a result a
single consumer) - very bad for scale!
Do we need a total ordering? Maybe not
1. Identify the causal relationships in your data model
2. Hopefully everything isn’t causally related. Banking is great as generally only stuff that happens
within an account is causally related. My banking activity doesn’t (usually) affect to yours. If this is
the case, partition on that
3. $$$ profit from a partial ordering $$$
Warning! Be cautious about having a hard dependency on ordering. Whereas Kafka guarantees an
ordering per partition, it can be difficult / restrictive to maintain that ordering when processing, e.g.:
○ Fan out
○ Rebalancing
○ Stream joins
Scale: turning the tide of the war
By capitalizing on bank account affinity within our consumers we mitigated lock contention
● We can use larger batch sizes
● We now hit expected issues, such as max message size errors, and poor latency profiles (we have to
wait for batches to fill up!). We choose a batch size that sits comfortably below the point we hit our
known bottlenecks
Does this mean we can get rid of our lock? That way our perf would be even better?
● Not wise, we still want our safety net (see warnings from last slide!)
● But maybe there’s a better way to do it…
Effective concurrency control
Sometimes the best lock, is no lock
● Our locking mechanism is quite primitive, and crucially, pessimistic
● Pessimistic locks work best when we expect collisions
● Optimistic ‘locking’ doesn’t use a lock so is cheaper in the happy case, but more expensive when we
have a collision
Effective concurrency control
Optimistic concurrency control is sometimes implemented using real timestamp. In a distributed system
it’s often better to use logical timestamps
● Let’s imagine we want to support rounding up to the nearest dollar, here’s our requirements
○ Rounding MUST NOT happen on the hot path
○ Rounding MUST happen strictly after the original posting (i.e. no intermediate postings are allowed)
Scale - bringing it home
Our system is beginning to flow. We’ve got some smart batching
strategies and effective use of partitioning helps reduce transaction
contention in our database
We can now handle hundreds of concurrent users, however we hit a
wall as we try to scale out more: our application is stateless so can
scale horizontally, our database is a single point of contention and can
only scale vertically
● Read replicas - synchronous replicas can serve consistent reads
keeping contention off the master
● NewSQL - emerging databases such as Spanner and
CockroachDB might help (not a silver bullet!)
● Shared nothing architecture
● AZ affinity (but don’t rely on it!)
I made it go 10x
faster without
sacrificing
correctness!
Client: it
needs to go
100x faster
Data locality
Whereas these strategies help, they’re only really buying us time, and certainly not getting us the stonks
we need
We take a good hard look our system, we still seem to be spending most of our time on the network going
to the database. What’s more - looking at these interactions, and there seems to be a lot of commonality,
we keep asking the same questions and getting the same answers
If only there was a way we could avoid these redundant interactions??
Data locality!
● Bringing the data closer to the process. Or the other way around, but stored procs aren’t cool
anymore :(
● This can mean geographically closer, but can also mean computationally closer, e.g. caching the
result of a costly computation
Data locality
Scenario: our contract execution requires the current month’s postings, to check for reward eligibility.
Current state: every time we process a posting we’re fetching the same months worth of data (+1) from the
database
#strategy 1 - read through cache
Data locality
#strategy 2 - in memory
The intermediate cache works nicely, however we’re still having to transfer a lot of data over the wire,
and whereas the cache is more performant, it’s still a single point of contention
If only there was a nice way to partition the cached data - wait a minute…
Disclaimer: bootstrapping can be a difficult problem to solve!
Data locality
#strategy 3 - stateful services 😲
If we’re already storing our postings in memory, and we’re only ever fetching the most recent posting
from the DB… but isn’t our processor the thing that processed that most recent posting?
Data locality
… but this is blasphemy! Our services should be stateLESS ?!
For a long time now stateless services have been the royal road to scalability. Nearly every treatise on scalability declares
statelessness as the best practices approved method for building scalable systems. A stateless architecture is easy to scale
horizontally and only requires simple round-robin load balancing.
What’s not to love? Perhaps the increased latency from the roundtrips to the database. Or maybe the complexity of the caching
layer required to hide database latency problems. Or even the troublesome consistency issues.
- Todd hoff
But what about the bootstrapping problem? …. Hello Kafka! Remember: The aggregation of a stream of
updates over time yields a table
Core bankings lust for correctness rears its head
● We’ve got by so far with a partial (per account) ordering
● We now realize that we have to obey the laws of accounting
○ We need to construct a trial balance at the end of the business day that will feed into the daily balance sheet
● This means we need to calculate a balance for every account - even internal accounts (double
entry)
● Do we need a total ordering after all?
Pro tip: don’t push your consistency
problems onto your clients (we want
holistic consistency)
Correctness: boss fight
Watermarks
Scenario: End of business day reconciliation
We need a way to calculate a balance for every account in the bank, including internal accounts
● We can’t calculate balances for internal accounts on the hot path as we’d get massive lock
contention
● If our cut off is 12pm, how do we know when we’ve received all postings before this time?
Watermarks!
A watermark is a monotonically increasing timestamp of the oldest work not yet completed
This means that if a watermark advances past some time T (e.g. 12pm) we are guaranteed that no
more processing will happen for events at or previous to T
● Balances are calculated from the timestamp set by the ledger aka. processing time (not
event time)
● Because our cut off is a fixed point, we’re dealing with fixed windows
Watermarks
#strategy 1 - heuristic watermarks
Perhaps the easiest approach would be just to wait a bit of time after the cut off
● How long do we wait? The client needs this ASAP!
● How do we tell the difference between idleness and a network partition?
● We don’t want to smash the ledger to get the result - let’s use Kafka :)
[if idle] system clock gives us an indication
when the fixed window is closing
[if active] watch out for if we receive
postings with timestamp after T
Health check kafka + processors
Publish to watermark stream (out of band)
for a downstream balance processor to
read
Watermarks
#strategy 2 - perfect watermarks
Strategy works great although given it’s a heuristic approach we inherently will always have the
possibility of late data - what could go wrong!
● Since we are using processing timestamps, i.e. we have ingress timestamping, it is possible to have a
perfect watermark approach
[if active] detect if we write a posting passed the close
timestamp of a window
[if idle] system clock indicates a window is closing, check for
an open DB transactions
Publish watermark in-band, explicitly to every partition
(shocking, i know). Downstream balances waits for a
watermark from all partitions
Correctness - a final word
A quick note on high availability and disaster recovery
● Since we’ve optimized for batching, and we only hit the DB when we need to, we have
inadvertently optimized our system not to depend on tight DB commit latencies
● For banks, we need zero data loss when it comes to DR. We can achieve this with sync replication!
● Can Kafka do multi region replication like the database can?
○ Probably not synchronously! Tables are streams at rest! We can hydrate a new Kafka cluster from our journal
● Can we have multi region active-active?
Scale - a final word
To achieve massive throughput in your microservices architecture you need
to let your data flow
● Commit your offsets fast
○ Avoid service fan out (i.e. to different services)
○ Avoid chaining service calls
○ Avoid sync service calls for writes
○ Don’t await on async request / response
● Consider choreography over orchestration
○ Aim to avoid any sort of shared contention
○ Think carefully about saga state. Consider the Routing Slip pattern
● Don’t try and solve distributed 2PC
● … but beware the saga rollback
● Observability!
○ Tracing can tell you a lot
○ CPU over time can tell you if your services aren’t being utilized (e.g. your
data isn’t flowing)
Summary
● Building a system that is both correct, and moves at scale, is hard - but certainly not impossible!
● We can only get some of the way with patterns / anti-patterns
● The rest requires some creativity - architecture is the sum of its parts
○ Synergies between ordering and concurrency control
○ Choose a batching strategy that works best for the whole system
○ Your caching strategy should complement your system, not paper over the cracks
○ The less moving parts the better - building something simple is hard!
○ A bad design compounds problems - make a hard choice or pay a high cloud bill
○ If the end result is something the client can’t use or doesn’t want - we have failed
● Building on this, we can see trade offs as relationships. We don’t have to choose one or the other
○ Correctness AND scale
○ High throughput AND low latency
○ Consistency AND availability
Thanks for listening!
Feedback appreciated:
peter.dudbridge@gmail.com
linkedin.com/in/peterdudbridge

More Related Content

PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
PDF
Apache Kafka in Financial Services - Use Cases and Architectures
PDF
Apache Kafka® Use Cases for Financial Services
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
PDF
When NOT to use Apache Kafka?
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
PPTX
Introduction to Kafka Cruise Control
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka® Use Cases for Financial Services
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
When NOT to use Apache Kafka?
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Introduction to Kafka Cruise Control

What's hot (20)

KEY
Event Driven Architecture
PPTX
Introduction to Apache Kafka
PDF
Kafka Streams: What it is, and how to use it?
PPSX
Event Sourcing & CQRS, Kafka, Rabbit MQ
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
APIs in a Microservice Architecture
PPTX
Blockchain Introduction Presentation
PDF
Apache Kafka Architecture & Fundamentals Explained
PDF
The Dual write problem
PDF
Common issues with Apache Kafka® Producer
PDF
Transforming Financial Services with Event Streaming Data
PDF
Introduction to WebSockets Presentation
PPTX
Hyperledger Fabric
PDF
Blockchain in Banking, Business and Beyond
PDF
Implementing Domain Events with Kafka
PDF
Cloud Migration Strategy and Best Practices
PDF
Apache Kafka Introduction
PDF
Global Payment System- Reference Architecture
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
PPSX
Blockchain HyperLedger Fabric Internals - Clavent
Event Driven Architecture
Introduction to Apache Kafka
Kafka Streams: What it is, and how to use it?
Event Sourcing & CQRS, Kafka, Rabbit MQ
Building Event Driven (Micro)services with Apache Kafka
APIs in a Microservice Architecture
Blockchain Introduction Presentation
Apache Kafka Architecture & Fundamentals Explained
The Dual write problem
Common issues with Apache Kafka® Producer
Transforming Financial Services with Event Streaming Data
Introduction to WebSockets Presentation
Hyperledger Fabric
Blockchain in Banking, Business and Beyond
Implementing Domain Events with Kafka
Cloud Migration Strategy and Best Practices
Apache Kafka Introduction
Global Payment System- Reference Architecture
Apache Kafka as Event Streaming Platform for Microservice Architectures
Blockchain HyperLedger Fabric Internals - Clavent
Ad

Similar to Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought Machine (20)

PDF
Eventos y Microservicios - Santander TechTalk
PPTX
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
PDF
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
PDF
Kafka used at scale to deliver real-time notifications
PDF
It's Time To Stop Using Lambda Architecture
PPTX
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
PDF
Streaming vs batching (conundrum ai internal meetup)
PDF
Building Big Data Streaming Architectures
PDF
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
PDF
Reactor, Reactive streams and MicroServices
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
PPT
The economies of scaling software - Abdel Remani
PDF
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
PDF
Towards Data Operations
PPT
The Economies of Scaling Software
PDF
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PDF
Lessons Learned: Using Spark and Microservices
PPTX
kafka simplicity and complexity
PDF
Brian Oliver Pimp My Data Grid
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
Eventos y Microservicios - Santander TechTalk
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
Kafka used at scale to deliver real-time notifications
It's Time To Stop Using Lambda Architecture
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
Streaming vs batching (conundrum ai internal meetup)
Building Big Data Streaming Architectures
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Reactor, Reactive streams and MicroServices
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
The economies of scaling software - Abdel Remani
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
Towards Data Operations
The Economies of Scaling Software
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
Lessons Learned: Using Spark and Microservices
kafka simplicity and complexity
Brian Oliver Pimp My Data Grid
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars

Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought Machine

  • 1. Scaling a core banking engine using Apache Kafka Peter Dudbridge (aka. Dudders) Engineering Director Thought Machine
  • 2. What will we cover? The relationship between correctness and scale, through the lens of a core banking system ● To guide us through this subject we will tell a story starting with the monolith and ending with a scalable distributed system ● We will encounter some patterns and anti patterns on the way ● We will see some tricky tradeoffs, but also show that with the right design, the need not hold us back! ● Loosely based on real life experiences! Note: I assume you know what kafka and microservices are (if not you may be lost)
  • 3. What is core banking and why do we care? The most important system that you’ve never heard of (unless something goes wrong!) ● We need both correctness and scale! ● You want 24/7 access to your money ● You don’t want to lose all your money! ● You also don’t want to pay for your current account… Core banking, at its heart, is: ● Product engine ● Ledger (accounts + balances)
  • 4. The monolith Works great! ● Everything is one machine ● No network calls! Yay! ● Total ordering, single clock ● Need more resources? > Mainframe: we’ve got you covered! Waaait what year is it again? ● Glass ceiling ● Very expensive ● Core banking traffic is very spiky ● HA / DR
  • 5. You know the answer to the problem: microservices! ● We need an architecture that can scale horizontally, on demand: that means microservices ● Has anyone tried carving up a monolith? ● We quickly realize the things that worked well in the monolith fall flat on their face when distributed ● We also quickly realize that we’re now playing a game of trade offs ○ Latency vs throughput ○ Consistency vs availability ○ Correctness vs everything! ● Usually the first thing we realize is we need to embrace asynchronousity Towards microservices
  • 6. Why Kafka Why am I speaking at this conference... we just need an async queue type thing right? why use Kafka? ● Kafka streams can be partitioned, which allows it to scale ● Kafka can be configured to be durable (good for banking!) ● Kafka is a commit log! ○ This has changed the game with regards to stream processing ○ Towards a general theory of table and stream relativity: The aggregation of a stream of updates over time yields a table The observation of changes to a table over time yields a stream - Streaming Systems
  • 7. Some prerequisites Here are some prerequisites for correctness that aren’t the focus of today’s talk ● We choose consistency over availability (when we are forced to make a choice) ● Idempotency and retryability is baked in ● This gives us effective once delivery ● Holistic consistency ● Dual writes - pick your favourite pattern (we use transactional outbox)
  • 8. Let’s solutionize! So let’s start with the general problem we’re trying to solve (then immediately start solutionizing) ● We’re processing a payment. In core banking lingo this is a posting ● In Thought Machine we have a super flexible product engine that is driven by an abstraction we refer to as Smart Contracts (not dogecoin, sorry) ● The trade off with having super flexibility, is we have to do super loads of work on the hot path
  • 9. Correctness - our first battle Since streams are data in motion, tables are data at rest, we decide to store our ledger / balances and accounts in a database We build our prototype but quickly realize a problem ● When two users concurrently access one account, we see a race condition ● Fortunately there’s a common pattern for solving this!
  • 10. Scale - our first battle So now our system is ‘correct’, however this has come at a price, let’s open up our observability dashboard to see if we can’t offset this cost! ● We’re spending all of our time on the network ● We’re having to scale out our application to handle more requests, but our database is running out of connections! ● We’re burning cash and we seem to be hitting the exact problem we moved away from the monolith to solve ● Batching is the answer. Don’t worry - it’s a subset of streaming
  • 11. Batching strategies The big questions is - what should the batch size be? ● No ‘one size fits all’ answer ● Consider what work we are batching: ○ Kafka consumer batches ○ Calls our consumer might be making to another microservice ○ Database queries / commands ○ Kafka producing batches ○ Kafka committing offsets ● Rule of thumb - try a few strategies out!
  • 12. Batching strategies #strategy 1 - one batch to rule them all ● Maintain whatever batch size we pull from Kafka in our consumers ● We can tweak our batch size via the Kafka consumer settings: fetch.max.wait.ms | fetch.min.bytes | fetch.max.bytes ● Works great if our batch is homogeneous ● Easy to reason about architecture, single place to configure out batch sizes ● Assumes the batch size the works for kafka also works for the DB, calls to other microservices, etc.
  • 13. Batching strategies #strategy 2 - fan out ● If our batch is heterogeneous we might see that some elements of the batch process quicker than others, in #strategy 1 we always pay the price of the slowest element ● If our batch contains different types of work (e.g. different messages need routing to different places) fan out might be our only option - if this is the case first consider separate topics ● If we fan out we might get benefits with parallelizing work (always test this!) ● The long tail problem isn’t trivial to solve ● Consume a batch of 3 from Kafka ● Fan out the request ● Decide how long to wait, save the long tail to local storage ● Commit all offsets and pull next batch
  • 14. Batching strategies #strategy 3 - dynamic batch sizing If our batches are super heterogeneous / unpredictable a batch size that works well today might not work so well tomorrow (singles day anyone?) ● Monitor our batch response time / error rate over time ● If the response time gets quicker increase the batch size ● If the response time gets slower and we observe timeouts, decrease the batch size ● Use sensible bounds! Lots of things can affect response times so is easy to scale on a false positive!
  • 15. Correctness: we meet again... We play around with batch sizes / strategies, and we find that sweet spot that gives us the best throughput / latency profile, however we spot something odd when perusing the observability dashboards: Woah - so we introduced batching to mitigate the cost of correctness, but it looks like we hit a point where it stops to help, and even makes it worse! ● The bigger the batch, the more likely we are to have a lock conflict with a batch being processed by another consumer Pro tip - know your bottlenecks!
  • 16. Correctness: we meet again... If only there was a way we could reduce the number of lock conflicts then maybe we could further increase our batch size and get more perf gainz? … what if there was a way we could make sure that messages that might conflict always go to the same consumer? That way we could resolve the conflict in the application, or evict it to the next batch?
  • 17. Ordering Ordering! ● A key feature of Kafka is the consistent hashing of user defined partition keys, guaranteeing that any message with a given partition keys’ value, will always go to the same consumer (until a rebalance happens) ● If we partition on Account ID, we can capitalize on the affinity this gives us
  • 18. Ordering Warning! Ordering is a double edged sword You may recall that we had a total ordering in our monolith, which made things nice ● The only way to have a total ordering with Kafka is to limit our topic to 1 partition (and as a result a single consumer) - very bad for scale! Do we need a total ordering? Maybe not 1. Identify the causal relationships in your data model 2. Hopefully everything isn’t causally related. Banking is great as generally only stuff that happens within an account is causally related. My banking activity doesn’t (usually) affect to yours. If this is the case, partition on that 3. $$$ profit from a partial ordering $$$ Warning! Be cautious about having a hard dependency on ordering. Whereas Kafka guarantees an ordering per partition, it can be difficult / restrictive to maintain that ordering when processing, e.g.: ○ Fan out ○ Rebalancing ○ Stream joins
  • 19. Scale: turning the tide of the war By capitalizing on bank account affinity within our consumers we mitigated lock contention ● We can use larger batch sizes ● We now hit expected issues, such as max message size errors, and poor latency profiles (we have to wait for batches to fill up!). We choose a batch size that sits comfortably below the point we hit our known bottlenecks Does this mean we can get rid of our lock? That way our perf would be even better? ● Not wise, we still want our safety net (see warnings from last slide!) ● But maybe there’s a better way to do it…
  • 20. Effective concurrency control Sometimes the best lock, is no lock ● Our locking mechanism is quite primitive, and crucially, pessimistic ● Pessimistic locks work best when we expect collisions ● Optimistic ‘locking’ doesn’t use a lock so is cheaper in the happy case, but more expensive when we have a collision
  • 21. Effective concurrency control Optimistic concurrency control is sometimes implemented using real timestamp. In a distributed system it’s often better to use logical timestamps ● Let’s imagine we want to support rounding up to the nearest dollar, here’s our requirements ○ Rounding MUST NOT happen on the hot path ○ Rounding MUST happen strictly after the original posting (i.e. no intermediate postings are allowed)
  • 22. Scale - bringing it home Our system is beginning to flow. We’ve got some smart batching strategies and effective use of partitioning helps reduce transaction contention in our database We can now handle hundreds of concurrent users, however we hit a wall as we try to scale out more: our application is stateless so can scale horizontally, our database is a single point of contention and can only scale vertically ● Read replicas - synchronous replicas can serve consistent reads keeping contention off the master ● NewSQL - emerging databases such as Spanner and CockroachDB might help (not a silver bullet!) ● Shared nothing architecture ● AZ affinity (but don’t rely on it!) I made it go 10x faster without sacrificing correctness! Client: it needs to go 100x faster
  • 23. Data locality Whereas these strategies help, they’re only really buying us time, and certainly not getting us the stonks we need We take a good hard look our system, we still seem to be spending most of our time on the network going to the database. What’s more - looking at these interactions, and there seems to be a lot of commonality, we keep asking the same questions and getting the same answers If only there was a way we could avoid these redundant interactions?? Data locality! ● Bringing the data closer to the process. Or the other way around, but stored procs aren’t cool anymore :( ● This can mean geographically closer, but can also mean computationally closer, e.g. caching the result of a costly computation
  • 24. Data locality Scenario: our contract execution requires the current month’s postings, to check for reward eligibility. Current state: every time we process a posting we’re fetching the same months worth of data (+1) from the database #strategy 1 - read through cache
  • 25. Data locality #strategy 2 - in memory The intermediate cache works nicely, however we’re still having to transfer a lot of data over the wire, and whereas the cache is more performant, it’s still a single point of contention If only there was a nice way to partition the cached data - wait a minute… Disclaimer: bootstrapping can be a difficult problem to solve!
  • 26. Data locality #strategy 3 - stateful services 😲 If we’re already storing our postings in memory, and we’re only ever fetching the most recent posting from the DB… but isn’t our processor the thing that processed that most recent posting?
  • 27. Data locality … but this is blasphemy! Our services should be stateLESS ?! For a long time now stateless services have been the royal road to scalability. Nearly every treatise on scalability declares statelessness as the best practices approved method for building scalable systems. A stateless architecture is easy to scale horizontally and only requires simple round-robin load balancing. What’s not to love? Perhaps the increased latency from the roundtrips to the database. Or maybe the complexity of the caching layer required to hide database latency problems. Or even the troublesome consistency issues. - Todd hoff But what about the bootstrapping problem? …. Hello Kafka! Remember: The aggregation of a stream of updates over time yields a table
  • 28. Core bankings lust for correctness rears its head ● We’ve got by so far with a partial (per account) ordering ● We now realize that we have to obey the laws of accounting ○ We need to construct a trial balance at the end of the business day that will feed into the daily balance sheet ● This means we need to calculate a balance for every account - even internal accounts (double entry) ● Do we need a total ordering after all? Pro tip: don’t push your consistency problems onto your clients (we want holistic consistency) Correctness: boss fight
  • 29. Watermarks Scenario: End of business day reconciliation We need a way to calculate a balance for every account in the bank, including internal accounts ● We can’t calculate balances for internal accounts on the hot path as we’d get massive lock contention ● If our cut off is 12pm, how do we know when we’ve received all postings before this time? Watermarks! A watermark is a monotonically increasing timestamp of the oldest work not yet completed This means that if a watermark advances past some time T (e.g. 12pm) we are guaranteed that no more processing will happen for events at or previous to T ● Balances are calculated from the timestamp set by the ledger aka. processing time (not event time) ● Because our cut off is a fixed point, we’re dealing with fixed windows
  • 30. Watermarks #strategy 1 - heuristic watermarks Perhaps the easiest approach would be just to wait a bit of time after the cut off ● How long do we wait? The client needs this ASAP! ● How do we tell the difference between idleness and a network partition? ● We don’t want to smash the ledger to get the result - let’s use Kafka :) [if idle] system clock gives us an indication when the fixed window is closing [if active] watch out for if we receive postings with timestamp after T Health check kafka + processors Publish to watermark stream (out of band) for a downstream balance processor to read
  • 31. Watermarks #strategy 2 - perfect watermarks Strategy works great although given it’s a heuristic approach we inherently will always have the possibility of late data - what could go wrong! ● Since we are using processing timestamps, i.e. we have ingress timestamping, it is possible to have a perfect watermark approach [if active] detect if we write a posting passed the close timestamp of a window [if idle] system clock indicates a window is closing, check for an open DB transactions Publish watermark in-band, explicitly to every partition (shocking, i know). Downstream balances waits for a watermark from all partitions
  • 32. Correctness - a final word A quick note on high availability and disaster recovery ● Since we’ve optimized for batching, and we only hit the DB when we need to, we have inadvertently optimized our system not to depend on tight DB commit latencies ● For banks, we need zero data loss when it comes to DR. We can achieve this with sync replication! ● Can Kafka do multi region replication like the database can? ○ Probably not synchronously! Tables are streams at rest! We can hydrate a new Kafka cluster from our journal ● Can we have multi region active-active?
  • 33. Scale - a final word To achieve massive throughput in your microservices architecture you need to let your data flow ● Commit your offsets fast ○ Avoid service fan out (i.e. to different services) ○ Avoid chaining service calls ○ Avoid sync service calls for writes ○ Don’t await on async request / response ● Consider choreography over orchestration ○ Aim to avoid any sort of shared contention ○ Think carefully about saga state. Consider the Routing Slip pattern ● Don’t try and solve distributed 2PC ● … but beware the saga rollback ● Observability! ○ Tracing can tell you a lot ○ CPU over time can tell you if your services aren’t being utilized (e.g. your data isn’t flowing)
  • 34. Summary ● Building a system that is both correct, and moves at scale, is hard - but certainly not impossible! ● We can only get some of the way with patterns / anti-patterns ● The rest requires some creativity - architecture is the sum of its parts ○ Synergies between ordering and concurrency control ○ Choose a batching strategy that works best for the whole system ○ Your caching strategy should complement your system, not paper over the cracks ○ The less moving parts the better - building something simple is hard! ○ A bad design compounds problems - make a hard choice or pay a high cloud bill ○ If the end result is something the client can’t use or doesn’t want - we have failed ● Building on this, we can see trade offs as relationships. We don’t have to choose one or the other ○ Correctness AND scale ○ High throughput AND low latency ○ Consistency AND availability
  • 35. Thanks for listening! Feedback appreciated: peter.dudbridge@gmail.com linkedin.com/in/peterdudbridge