SlideShare a Scribd company logo
Data Patterns in
Microservice Applications
Ryan Knight - CEO / CTO Grand Cloud
@knight_cloud
Ryan Knight
● CEO / CTO of Grand Cloud - Boutique consulting company working
at the intersection of Distributed Systems and Data Engineering
● Experience ranges across traditional software development and
architecture to sales engineering, consulting, solution architecture
and developer advocacy.
● Worked across wide range of companies from small startups such as
Lightbend and DataStax to Large Corporations such as Starbucks
and Capital One.
● Consulting Experience spans over 50 companies and 10 Countries
● Currently Consulting at Brighthouse Financial
Distributed System Design
Heart of distributed system design is a requirement
for a consistent, performant, and reliable way of
managing data - Jonas Bonér
Cloud Native -> New Requirements
Users: 1 million+ Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request rate: Millions
Access: Web, mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: No assembly required
Challenges with Data
Consistency
● RDBMS
● CAP Theorem
● Trade-off between
Consistency and Scale
● Rise of Eventual Consistency
● NoSQL Databases
EASY
COMPLEX
ACID TXN / Strong
Consistency
Eventual Consistency
(D)evolution of Consistency
Challenges
with Eventual
Consistency
Credit to this tweet
CAP Theorem
Challenges with Application Tier Consistency
● Consistency problems are far harder to solve in the application tier
● Increased Corner Case Bugs
○ Consistency is really hard to get right in the Application Tier!
○ Consistency is really hard to test and verify
● Increased Complexity
Business Impact of Consistency
● Travel Booking of Flight, Hotel, etc. - Inconsistencies could either
lead to double bookings or lost bookings.
● Rewards Program - Very difficult to prevent fraudulent redemptions.
Potential for monetary loss.
● Physical Allocation of Resources vs. Digital Realm
● Inventory / Limited Sales
Direct Business Value of “Strong Consistency”
● Increases accuracy of sales and reduces lost business revenue
● Cost Savings with reduced operational complexity and increased
visibility into business operations.
● Weak Consistency is a Security Concern - Possible financial loss
from inconsistent views of data.
● ACIDRain Attack - Todd Warszawski, Peter Bailis
○ 22 critical ACIDRain attacks that allow attackers to corrupt store
inventory,over-spend gift cards, and steal inventory.
○ Bankrupt popular Bitcoin exchange
Eventual Consistency
● Internet of Things
● Media
● Retail
● Real-time Analytics
● Time-Series
● Monitoring
● Customer 360
Strong Consistency
● Financial Transactions
● Rewards Programs
● Inventory Management
● Global Meta-Data
● Travel Reservations
● Gaming
● Billing / Payments
● Ad Tech
What is Data Consistency?
Challenges with Understanding Consistency
● Lots of Definitions of Consistency
● Consistency in ACID is about enforcing invariants
○ Data must be valid according to all defined rules
○ Not the consistency we are looking for
● "Strong consistency" - term used to differentiate full
consistency from weaker levels of consistency such as
casual or session consistency.
Consistency Challenges
Dirty Reads - Read Uncommitted Write
Read Skew / Non-Repeatable Reads
Read your own Writes
Lost Updates
Write Skew
Write Skew
Two concurrent
transactions each
determine what they are
writing based on reading
a data set which overlaps
what the other is writing
begriffs.com
Consistency Models
Credit to Peter Bailis and Aphyr at jepsen.io
http://www.bailis.org/blog/linearizability-versus-serializability/
Linearizability
● Guarantees that the order of reads and writes to a single
register or row will always appear the same on all nodes.
● Appearance that there is only one copy of the data.
● It doesn’t group operations into transactions.
● Guarantees read-your-write behavior.
Linearizable Consistency in CAP
● CAP Theorem is about “atomic consistency”
● Atomic consistency refers only to a property of a single
request/response operation sequence.
● Strong Consistency in CAP is Linearizability
Serializable Consistency
● Transaction Isolation
● Database guarantees that two transactions have the same
effect as if they where run serially.
● multi-operation, multi-object, arbitrary total order
Strict Serializability
● Linearizability plus Serializability provides Strict
Serializability
● Highest level of Consistency
● Guarantee ordering and transaction isolation
Linearizable vs. Serializable Consistency
● Serializability - multi-operation, multi-object, arbitrary total order
● Linearizability - single-operation, single-object, real-time order
● Strict Serializability - Linearizability plus Serializability provides Strict
Serializability
Peter Bailis - Linearizability versus Serializability
No One Solution to Consistency
● Do you want your data right or right now? - Pat Helland
● PACELC Theorem -> More than CAP
○ In the absence of network partitions the trade-off is
between latency and consistency - Daniel Abadi
● Evaluate trade-offs in the differing approaches
Data Consistency in
Microservices and
Serverless
From Monolith to Microservices to Serverless
● Data Consistency was easy in a monolith application -
single source of truth w/ ACID transactions
● Move to microservices each service became a bounded
context that owns and manages its data.
● Data Consistency became very difficult w/ microservices
● Serverless increases the complexity even more
Consistency Challenges with Data in Microservices
● Traditional ACID transactions did not scale
● Data orchestration between multiple services - Number of
Microservices Increases Number of Interactions
● Stateful or Stateless
● Data rehydration for things like service failures and rolling
updates.
Popularity of Eventual Consistency
CAP Theorem
• Force choice between Global Scale or Strong Consistency
Eventual Consistency
• Sacrificed consistency for availability and partition tolerance.
• Really a Necessary Evil
• Write now and figure it out later
Pushed complexity of managing consistency to application tier
Rise of Managing
Consistency
in the Database
Value of Consistency in the Database
● Decrease Application Tier Complexity
● Reduce Cognitive Overhead
● Increased Developer Productivity
● Increased Focus on Business Value
● Most implementations also provide strong atomicity and isolation
● Push complexity of consistency back to the database
● Not a panacea for all data consistency challenges
Case Study - AdStage
● Recently migrated from Cassandra to Postgres
● Leverage Postgres DB Transactions
● Found Postgres to be extremely capable with advance
data model and query capabilities
● Significant decrease in application and operational
complexity
● Significantly reduced operational costs
Leveraging DB Consistency
● Ledger Pattern with Compare and Swap Like Operation
● Application reads latest ledger id from DB
● Application makes an update with what it thinks is the latest
ledger id plus one
● DB transaction / stored procedure to read the last ledger id and
make the update if the ledger id is greater than the last entry
● If update fails DB returns correct Ledger ID
Traditional / Hybrid NoSQL DB’s
● Cloud Operated Relation DB’s are a re-emerging trend.
● Cloud SQL w/ Postgres or MySQL
● AWS Aurora - Amazon re-designed MySQL as a
cloud-native relational database
● AWS Dynamo w/ Transactions - Multiple Object with limits
to single region
Next Generation Databases
● Google Spanner - Horizontally scalable, globally consistent, relational database
service. Relies on on Proprietary Atomic Clocks and Low Latency Network.
● Coackroach & YugaByte - Open Source version of Spanner with 2 Phase
Commits and Hybrid-Logical Clocks
● Fauna - Single Phase Commit with no hard dependency on clocks
● FoundationDB - Serializable Optimistic MVCC concurrency. Loosely based on
Google Percolator
● TiDB - Hybrid Transactional and Analytical Processing (HTAP) workloads.
Features “horizontal scalability, strong consistency, and high availability.”
● Microsoft Azure Cosmos DB - Configurable consistency guarantees
Transactions are hard. Distributed transactions are
harder. Distributed transactions over the WAN are
final boss hardness. I'm all for new DBMSs but
people should tread carefully. - Andy Pavlo
New Generation / Global Transactional Databases
Not All Global Databases are the Same
● Differences in Transaction Protocol
● Global Ordering Done in a Single Phase vs. Multi-Phase
● Pre or Post Commit Transaction Resolution
● Different levels of consistency
● Maximum scope of a transaction - Single Record vs. Multiple
Records
● Geographic limits of transactions - Single Region vs. Global?
● Storage Layer is an entirely other discussion beyond the
transaction protocol. Large impact on performance and stability!
Week Isolation Level
Scope of Transaction -
Single Row
Eventually Consistent
Strongest Isolation Level
Scope of Transaction -
Distributed Across
Partitions
Serializable Consistency
Consistency and the ACID Spectrum
Consistency Levels in Next Gen Databases - 1/2
● Google Spanner - External strong consistency across rows, regions, and
continents.
● Yugabyte - snapshot isolation, not serializability yet, writes must go to
partition leaders. Reliance on hybrid clocks makes it difficult to run in
virtualized environments.
● Cockroach - serializability but not strict serializability, reads and writes must
go to partition leaders, no replica reads allowed
Consistency Levels in Next Gen Databases - 2/2
● TiDB - read-committed within a datacenter, no serializability, timestamp oracle
must issue leases for all write transactions, replica reads unclear
● FoundationDb: Serializable Snapshot Isolation and strictly serializable within a
datacenter, timestamp oracle must issue leases for all serializable reads and
all writes, snapshot reads possible
● FaunaDB - Global pre-ordering of transactions provides strict serializable consistency
● Azure Cosmos DB - Five consistency models allow developer to choose between
latency and consistency. Highest Level of consistency is strong consistency with
linearizability guarantees. Doesn’t seem to be strict serializable?
Adventures in Application
Tier Consistency
Application Tier Consistency
Write now and figure it out later
Advantages of Application Tier Consistency
● Low Read / Write Latency
● High-Throughput
● Read your Writes - Same session only
● Requires application to enforce session stickiness
Disadvantages of Application Tier Consistency
● Consistency problems are far harder to solve in the
application tier
● Increased Complexity
● No Isolation and limited atomicity
● Corner Case Bugs - Consistency is really hard to test and
verify
● No magic pattern or technology that you can sprinkle on
data to make it consistent.
Options for Application Tier Consistency
● Serialization Points - i.e. Kafka Consumers pinned to session id’s.
● Akka Clustering - Stateful Services pinned to a client id.
● CRDT - Conflict Free Replicated Data Types, i.e. Associative
Counters. Data must be of a certain shape to work.
● Event Sourcing / Append Only Logging with Aggregates for running
totals. Hard to provide consistency guarantees across aggregates.
● Saga Pattern - Builds on Event Sourcing and uses a Central
Coordinator to manages complex transaction logic. Relies heavily on
idempotent services that can roll back transactions in the face of
failures.
Patterns for Application Tier Consistency
● Kafka Consumer Serialization Points
● Akka Clustering w/ Cluster Singletons
● CRDT - Conflict Free Replicated Data Types
● Event Sourcing / Append Only Logging
with Aggregates
● CQRS
● Saga Pattern
● Custom Distributed Transactions
WIRED
TIRED
CRDT’s
● CRDT - Conflict Free Replicated Data Types
● Data types that guarantee convergence to the same value without any
synchronization mechanism
● Consistency without Consensus
● Avoid distributed locks, two-phase commit, etc. Data Structure that
tells how to build the value
● Sacrifice linearizability (guaranteed ordering ) while remaining correct
Overview of Saga Pattern
● Central Coordinator
● Manages Complex Transaction Logic
● State managed in an distributed log
● Split work into idempotent executors / requests
● Requires compensating transactions for dealing with failures /
aborting transaction
● Effectively Once instead of Exactly Once
The Challenges with the Saga Pattern
● Consistency is reliant on the consistency of the distributed log
● Limited Consistency
● Weak Isolation
● No Guaranteed Atomicity - Unsafe partially committed states
● Complexity with versioning of Saga Logic
● Increased application complexity
● Rollback and recovery logic required in application tier
● Idempotency impossible for some services
● Effectively Once instead of Exactly Once
Data Patterns in
Microservice Applications
Ryan Knight - CEO / CTO Grand Cloud
@knight_cloud
Addendum
Global Scale Next Gen
Databases
Spanner
● External consistency, an isolation level even stricter than strict serializability
● Relation Integrity Constraints
● 99.999% availability SLA
● Uses a global commit timestamps to guarantee ordering of transactions via the
TrueTime API.
● Multiple Shards with 2PC
● Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC
● No Downtime upgrades - Maintenance done by moving data between nodes
● Downside is cost and some limitations to the SQL model and schema design
CoackroachDB
● Open source Database Inspired by Spanner
● Hybrid Logical Clock similar to a vector clock for ordering of transactions
● Challenges with clock skew - waits up to 250 MS on reads
● Provides linearizability on single key and overlapping keys
● Transactions that span disjoint set of key it only provides serializability and not
linearizability
● Some edge cases cause anomalies called “casual reverse” - Jepsen
● “Enterprise-only” features like row-level replication zones
● Supports migrating by supporting PostgreSQL syntax and drivers, however it does
not offer exact compatibility.
YugaByte
● Another Database Inspired by Spanner that relies on Hybrid Logical Clocks
● Currently only supports snapshot isolation
● Serializable isolation level work in progress
● Distributed Transactions to multiple partitions require a provisional record or
temporary table
FaunaDB - Consistency without Clocks
● Transaction resolution based on the Calvin protocol - pre-ordering of transactions
before commit
● Global transaction ordering provides serializable consistency
● Transactions can include multiple rows - not restricted to data in a single row or
shard
● Distributed log based algorithm scales throughput with cluster size by partitioning
the log
● Low Latency Snapshot Reads
● Proprietary Query Language with a high learning curve
● Optimistic concurrency model can causes high number of failures with highly
contentious workloads
References
● Bla-bla-microservices-bla-bla http://jonasboner.com/bla-bla-microservices-bla-bla/
● Aphyr Strong consistency models -
https://aphyr.com/posts/313-strong-consistency-models
● Achieving ACID Transactions in a Globally Distributed Database from FaunaDB
● Peter Bailis - Linearizability versus Serializability
● Calvin: fast distributed transactions for partitioned database systems

More Related Content

PDF
Data Consitency Patterns in Cloud Native Applications
PDF
Return of the transaction king
PDF
The Reactive Principles: Design Principles For Cloud Native Applications
PDF
Five Trends in Real Time Applications
PPTX
Microservices Architecture
PPTX
Modeling microservices using DDD
PPTX
Three perspective on migrating to Cloud
PPTX
Webinar: Eventual Consistency != Hopeful Consistency
Data Consitency Patterns in Cloud Native Applications
Return of the transaction king
The Reactive Principles: Design Principles For Cloud Native Applications
Five Trends in Real Time Applications
Microservices Architecture
Modeling microservices using DDD
Three perspective on migrating to Cloud
Webinar: Eventual Consistency != Hopeful Consistency

What's hot (20)

PDF
Microservices for a Streaming World
PDF
Event Driven-Architecture from a Scalability perspective
PPT
The Architect's Two Hats
PPTX
Accelerate DevOps/Microservices and Kubernetes
PDF
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
PPTX
Declare Victory with Big Data
PPTX
Unlocking the Power of Salesforce Integrations with Confluent
PDF
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
PPTX
Designing microservices part2
PPTX
Events & Microservices
PDF
Data Insight Action
PPT
Azure Cloud Patterns
PPTX
DevOpsDays SLC - Getting Along With Your DBOps Team
ODP
Microservices Patterns and Anti-Patterns
PDF
C*ollege Credit: Is My App a Good Fit for Cassandra?
PDF
Nats meetup sf 20150826
PDF
API Days Singapore
PPT
Turning client-side-to-server-side-ruxcon-2011-laurent
PDF
Event-Driven Architecture (EDA)
PPTX
Azure Application Architecture Guide
Microservices for a Streaming World
Event Driven-Architecture from a Scalability perspective
The Architect's Two Hats
Accelerate DevOps/Microservices and Kubernetes
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
Declare Victory with Big Data
Unlocking the Power of Salesforce Integrations with Confluent
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Designing microservices part2
Events & Microservices
Data Insight Action
Azure Cloud Patterns
DevOpsDays SLC - Getting Along With Your DBOps Team
Microservices Patterns and Anti-Patterns
C*ollege Credit: Is My App a Good Fit for Cassandra?
Nats meetup sf 20150826
API Days Singapore
Turning client-side-to-server-side-ruxcon-2011-laurent
Event-Driven Architecture (EDA)
Azure Application Architecture Guide
Ad

Similar to Data Patterns (20)

PPTX
Lost with data consistency
PPTX
SQL and NoSQL in SQL Server
PPTX
PPTX
Hbase hivepig
PPTX
Data Engineering for Data Scientists
PPTX
slides.07.pptx
PPTX
HbaseHivePigbyRohitDubey
PPTX
Databases through out and beyond Big Data hype
KEY
Dropping ACID - Building Scalable Systems That Work
PDF
t2_4-architecting-data-for-integration-and-longevity
PDF
OOP 2021 - Eventual Consistency - Du musst keine Angst haben
PPTX
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
PPTX
Hbase hive pig
PDF
Eventual Consistency - Du musst keine Angst haben
PDF
w-jax 2022: Eventual-Consistency-Du-musst-keine-Angst-haben-Final.pdf
PDF
Where to put_my_data
PDF
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
PDF
Eventual Consistency – Du musst keine Angst haben
PPTX
The Rise of NoSQL and Polyglot Persistence
PDF
BASE: An Acid Alternative
Lost with data consistency
SQL and NoSQL in SQL Server
Hbase hivepig
Data Engineering for Data Scientists
slides.07.pptx
HbaseHivePigbyRohitDubey
Databases through out and beyond Big Data hype
Dropping ACID - Building Scalable Systems That Work
t2_4-architecting-data-for-integration-and-longevity
OOP 2021 - Eventual Consistency - Du musst keine Angst haben
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Hbase hive pig
Eventual Consistency - Du musst keine Angst haben
w-jax 2022: Eventual-Consistency-Du-musst-keine-Angst-haben-Final.pdf
Where to put_my_data
Designing Data-Intensive Applications_ The Big Ideas Behind Reliable, Scalabl...
Eventual Consistency – Du musst keine Angst haben
The Rise of NoSQL and Polyglot Persistence
BASE: An Acid Alternative
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation

Data Patterns

  • 1. Data Patterns in Microservice Applications Ryan Knight - CEO / CTO Grand Cloud @knight_cloud
  • 2. Ryan Knight ● CEO / CTO of Grand Cloud - Boutique consulting company working at the intersection of Distributed Systems and Data Engineering ● Experience ranges across traditional software development and architecture to sales engineering, consulting, solution architecture and developer advocacy. ● Worked across wide range of companies from small startups such as Lightbend and DataStax to Large Corporations such as Starbucks and Capital One. ● Consulting Experience spans over 50 companies and 10 Countries ● Currently Consulting at Brighthouse Financial
  • 3. Distributed System Design Heart of distributed system design is a requirement for a consistent, performant, and reliable way of managing data - Jonas Bonér
  • 4. Cloud Native -> New Requirements Users: 1 million+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions Access: Web, mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: No assembly required
  • 6. ● RDBMS ● CAP Theorem ● Trade-off between Consistency and Scale ● Rise of Eventual Consistency ● NoSQL Databases EASY COMPLEX ACID TXN / Strong Consistency Eventual Consistency (D)evolution of Consistency
  • 9. Challenges with Application Tier Consistency ● Consistency problems are far harder to solve in the application tier ● Increased Corner Case Bugs ○ Consistency is really hard to get right in the Application Tier! ○ Consistency is really hard to test and verify ● Increased Complexity
  • 10. Business Impact of Consistency ● Travel Booking of Flight, Hotel, etc. - Inconsistencies could either lead to double bookings or lost bookings. ● Rewards Program - Very difficult to prevent fraudulent redemptions. Potential for monetary loss. ● Physical Allocation of Resources vs. Digital Realm ● Inventory / Limited Sales
  • 11. Direct Business Value of “Strong Consistency” ● Increases accuracy of sales and reduces lost business revenue ● Cost Savings with reduced operational complexity and increased visibility into business operations. ● Weak Consistency is a Security Concern - Possible financial loss from inconsistent views of data. ● ACIDRain Attack - Todd Warszawski, Peter Bailis ○ 22 critical ACIDRain attacks that allow attackers to corrupt store inventory,over-spend gift cards, and steal inventory. ○ Bankrupt popular Bitcoin exchange
  • 12. Eventual Consistency ● Internet of Things ● Media ● Retail ● Real-time Analytics ● Time-Series ● Monitoring ● Customer 360 Strong Consistency ● Financial Transactions ● Rewards Programs ● Inventory Management ● Global Meta-Data ● Travel Reservations ● Gaming ● Billing / Payments ● Ad Tech
  • 13. What is Data Consistency?
  • 14. Challenges with Understanding Consistency ● Lots of Definitions of Consistency ● Consistency in ACID is about enforcing invariants ○ Data must be valid according to all defined rules ○ Not the consistency we are looking for ● "Strong consistency" - term used to differentiate full consistency from weaker levels of consistency such as casual or session consistency.
  • 15. Consistency Challenges Dirty Reads - Read Uncommitted Write Read Skew / Non-Repeatable Reads Read your own Writes Lost Updates Write Skew
  • 16. Write Skew Two concurrent transactions each determine what they are writing based on reading a data set which overlaps what the other is writing begriffs.com
  • 17. Consistency Models Credit to Peter Bailis and Aphyr at jepsen.io http://www.bailis.org/blog/linearizability-versus-serializability/
  • 18. Linearizability ● Guarantees that the order of reads and writes to a single register or row will always appear the same on all nodes. ● Appearance that there is only one copy of the data. ● It doesn’t group operations into transactions. ● Guarantees read-your-write behavior.
  • 19. Linearizable Consistency in CAP ● CAP Theorem is about “atomic consistency” ● Atomic consistency refers only to a property of a single request/response operation sequence. ● Strong Consistency in CAP is Linearizability
  • 20. Serializable Consistency ● Transaction Isolation ● Database guarantees that two transactions have the same effect as if they where run serially. ● multi-operation, multi-object, arbitrary total order
  • 21. Strict Serializability ● Linearizability plus Serializability provides Strict Serializability ● Highest level of Consistency ● Guarantee ordering and transaction isolation
  • 22. Linearizable vs. Serializable Consistency ● Serializability - multi-operation, multi-object, arbitrary total order ● Linearizability - single-operation, single-object, real-time order ● Strict Serializability - Linearizability plus Serializability provides Strict Serializability Peter Bailis - Linearizability versus Serializability
  • 23. No One Solution to Consistency ● Do you want your data right or right now? - Pat Helland ● PACELC Theorem -> More than CAP ○ In the absence of network partitions the trade-off is between latency and consistency - Daniel Abadi ● Evaluate trade-offs in the differing approaches
  • 25. From Monolith to Microservices to Serverless ● Data Consistency was easy in a monolith application - single source of truth w/ ACID transactions ● Move to microservices each service became a bounded context that owns and manages its data. ● Data Consistency became very difficult w/ microservices ● Serverless increases the complexity even more
  • 26. Consistency Challenges with Data in Microservices ● Traditional ACID transactions did not scale ● Data orchestration between multiple services - Number of Microservices Increases Number of Interactions ● Stateful or Stateless ● Data rehydration for things like service failures and rolling updates.
  • 27. Popularity of Eventual Consistency CAP Theorem • Force choice between Global Scale or Strong Consistency Eventual Consistency • Sacrificed consistency for availability and partition tolerance. • Really a Necessary Evil • Write now and figure it out later Pushed complexity of managing consistency to application tier
  • 29. Value of Consistency in the Database ● Decrease Application Tier Complexity ● Reduce Cognitive Overhead ● Increased Developer Productivity ● Increased Focus on Business Value ● Most implementations also provide strong atomicity and isolation ● Push complexity of consistency back to the database ● Not a panacea for all data consistency challenges
  • 30. Case Study - AdStage ● Recently migrated from Cassandra to Postgres ● Leverage Postgres DB Transactions ● Found Postgres to be extremely capable with advance data model and query capabilities ● Significant decrease in application and operational complexity ● Significantly reduced operational costs
  • 31. Leveraging DB Consistency ● Ledger Pattern with Compare and Swap Like Operation ● Application reads latest ledger id from DB ● Application makes an update with what it thinks is the latest ledger id plus one ● DB transaction / stored procedure to read the last ledger id and make the update if the ledger id is greater than the last entry ● If update fails DB returns correct Ledger ID
  • 32. Traditional / Hybrid NoSQL DB’s ● Cloud Operated Relation DB’s are a re-emerging trend. ● Cloud SQL w/ Postgres or MySQL ● AWS Aurora - Amazon re-designed MySQL as a cloud-native relational database ● AWS Dynamo w/ Transactions - Multiple Object with limits to single region
  • 33. Next Generation Databases ● Google Spanner - Horizontally scalable, globally consistent, relational database service. Relies on on Proprietary Atomic Clocks and Low Latency Network. ● Coackroach & YugaByte - Open Source version of Spanner with 2 Phase Commits and Hybrid-Logical Clocks ● Fauna - Single Phase Commit with no hard dependency on clocks ● FoundationDB - Serializable Optimistic MVCC concurrency. Loosely based on Google Percolator ● TiDB - Hybrid Transactional and Analytical Processing (HTAP) workloads. Features “horizontal scalability, strong consistency, and high availability.” ● Microsoft Azure Cosmos DB - Configurable consistency guarantees
  • 34. Transactions are hard. Distributed transactions are harder. Distributed transactions over the WAN are final boss hardness. I'm all for new DBMSs but people should tread carefully. - Andy Pavlo New Generation / Global Transactional Databases
  • 35. Not All Global Databases are the Same ● Differences in Transaction Protocol ● Global Ordering Done in a Single Phase vs. Multi-Phase ● Pre or Post Commit Transaction Resolution ● Different levels of consistency ● Maximum scope of a transaction - Single Record vs. Multiple Records ● Geographic limits of transactions - Single Region vs. Global? ● Storage Layer is an entirely other discussion beyond the transaction protocol. Large impact on performance and stability!
  • 36. Week Isolation Level Scope of Transaction - Single Row Eventually Consistent Strongest Isolation Level Scope of Transaction - Distributed Across Partitions Serializable Consistency Consistency and the ACID Spectrum
  • 37. Consistency Levels in Next Gen Databases - 1/2 ● Google Spanner - External strong consistency across rows, regions, and continents. ● Yugabyte - snapshot isolation, not serializability yet, writes must go to partition leaders. Reliance on hybrid clocks makes it difficult to run in virtualized environments. ● Cockroach - serializability but not strict serializability, reads and writes must go to partition leaders, no replica reads allowed
  • 38. Consistency Levels in Next Gen Databases - 2/2 ● TiDB - read-committed within a datacenter, no serializability, timestamp oracle must issue leases for all write transactions, replica reads unclear ● FoundationDb: Serializable Snapshot Isolation and strictly serializable within a datacenter, timestamp oracle must issue leases for all serializable reads and all writes, snapshot reads possible ● FaunaDB - Global pre-ordering of transactions provides strict serializable consistency ● Azure Cosmos DB - Five consistency models allow developer to choose between latency and consistency. Highest Level of consistency is strong consistency with linearizability guarantees. Doesn’t seem to be strict serializable?
  • 40. Application Tier Consistency Write now and figure it out later
  • 41. Advantages of Application Tier Consistency ● Low Read / Write Latency ● High-Throughput ● Read your Writes - Same session only ● Requires application to enforce session stickiness
  • 42. Disadvantages of Application Tier Consistency ● Consistency problems are far harder to solve in the application tier ● Increased Complexity ● No Isolation and limited atomicity ● Corner Case Bugs - Consistency is really hard to test and verify ● No magic pattern or technology that you can sprinkle on data to make it consistent.
  • 43. Options for Application Tier Consistency ● Serialization Points - i.e. Kafka Consumers pinned to session id’s. ● Akka Clustering - Stateful Services pinned to a client id. ● CRDT - Conflict Free Replicated Data Types, i.e. Associative Counters. Data must be of a certain shape to work. ● Event Sourcing / Append Only Logging with Aggregates for running totals. Hard to provide consistency guarantees across aggregates. ● Saga Pattern - Builds on Event Sourcing and uses a Central Coordinator to manages complex transaction logic. Relies heavily on idempotent services that can roll back transactions in the face of failures.
  • 44. Patterns for Application Tier Consistency ● Kafka Consumer Serialization Points ● Akka Clustering w/ Cluster Singletons ● CRDT - Conflict Free Replicated Data Types ● Event Sourcing / Append Only Logging with Aggregates ● CQRS ● Saga Pattern ● Custom Distributed Transactions WIRED TIRED
  • 45. CRDT’s ● CRDT - Conflict Free Replicated Data Types ● Data types that guarantee convergence to the same value without any synchronization mechanism ● Consistency without Consensus ● Avoid distributed locks, two-phase commit, etc. Data Structure that tells how to build the value ● Sacrifice linearizability (guaranteed ordering ) while remaining correct
  • 46. Overview of Saga Pattern ● Central Coordinator ● Manages Complex Transaction Logic ● State managed in an distributed log ● Split work into idempotent executors / requests ● Requires compensating transactions for dealing with failures / aborting transaction ● Effectively Once instead of Exactly Once
  • 47. The Challenges with the Saga Pattern ● Consistency is reliant on the consistency of the distributed log ● Limited Consistency ● Weak Isolation ● No Guaranteed Atomicity - Unsafe partially committed states ● Complexity with versioning of Saga Logic ● Increased application complexity ● Rollback and recovery logic required in application tier ● Idempotency impossible for some services ● Effectively Once instead of Exactly Once
  • 48. Data Patterns in Microservice Applications Ryan Knight - CEO / CTO Grand Cloud @knight_cloud
  • 50. Global Scale Next Gen Databases
  • 51. Spanner ● External consistency, an isolation level even stricter than strict serializability ● Relation Integrity Constraints ● 99.999% availability SLA ● Uses a global commit timestamps to guarantee ordering of transactions via the TrueTime API. ● Multiple Shards with 2PC ● Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC ● No Downtime upgrades - Maintenance done by moving data between nodes ● Downside is cost and some limitations to the SQL model and schema design
  • 52. CoackroachDB ● Open source Database Inspired by Spanner ● Hybrid Logical Clock similar to a vector clock for ordering of transactions ● Challenges with clock skew - waits up to 250 MS on reads ● Provides linearizability on single key and overlapping keys ● Transactions that span disjoint set of key it only provides serializability and not linearizability ● Some edge cases cause anomalies called “casual reverse” - Jepsen ● “Enterprise-only” features like row-level replication zones ● Supports migrating by supporting PostgreSQL syntax and drivers, however it does not offer exact compatibility.
  • 53. YugaByte ● Another Database Inspired by Spanner that relies on Hybrid Logical Clocks ● Currently only supports snapshot isolation ● Serializable isolation level work in progress ● Distributed Transactions to multiple partitions require a provisional record or temporary table
  • 54. FaunaDB - Consistency without Clocks ● Transaction resolution based on the Calvin protocol - pre-ordering of transactions before commit ● Global transaction ordering provides serializable consistency ● Transactions can include multiple rows - not restricted to data in a single row or shard ● Distributed log based algorithm scales throughput with cluster size by partitioning the log ● Low Latency Snapshot Reads ● Proprietary Query Language with a high learning curve ● Optimistic concurrency model can causes high number of failures with highly contentious workloads
  • 55. References ● Bla-bla-microservices-bla-bla http://jonasboner.com/bla-bla-microservices-bla-bla/ ● Aphyr Strong consistency models - https://aphyr.com/posts/313-strong-consistency-models ● Achieving ACID Transactions in a Globally Distributed Database from FaunaDB ● Peter Bailis - Linearizability versus Serializability ● Calvin: fast distributed transactions for partitioned database systems