SlideShare a Scribd company logo
Michał Gryglicki
gryglicki.michal<at>gmail.com
LOST WITH DATA CONSISTENCY
WHAT ARE WE GOING TO TALK ABOUT?
fairy tales
&
„lies for children”
&
others
WHAT KIND OF CONSISTENCIES ARE WE
TALKING ABOUT?
• Consisteny of Backup = Point in time consistency
WHAT KIND OF CONSISTENCIES ARE WE
TALKING ABOUT?
• Consisteny of Backup = Point in time consistency
• Consistency in the sense of ACID properties of database transactions
WHAT KIND OF CONSISTENCIES ARE WE
TALKING ABOUT?
• Consisteny of Backup = Point in time consistency
• Consistency in the sense of ACID properties of database transactions
• Consistency in the sense of concurrent programming
WHAT KIND OF CONSISTENCIES ARE WE
TALKING ABOUT?
• Consisteny of Backup = Point in time consistency
• Consistency in the sense of ACID properties of database transactions
• Consistency in the sense of concurrent programming
• Consistency between multiple replicas of the same data in the distributed system
• because it’s something you just can’t neglect
• because software fail, hardware fail, network partitions happen
• in an uncertain world, we want our software to maintain some sort of „correctness”
COMMON APPROACH TO CONSISTENCY
• Use Defaults: eg. „READ COMMITED” in relational databases – common practice
COMMON APPROACH TO CONSISTENCY
• Use Defaults: eg. „READ COMMITED” in relational databases – common practice
• Change configuration (consistency model) based as a solution to an issue we try to fix
(usually found on stack overflow)
COMMON APPROACH TO CONSISTENCY
• Use Defaults: eg. „READ COMMITED” in relational databases – common practice
• Change configuration (consistency model) based as a solution to an issue we try to fix
(usually found on stack overflow)
• Belief that your system/database provide much stronger consisteny model than it really
does
• especially in RDBMS world
COMMON APPROACH TO CONSISTENCY
• Use Defaults: eg. „READ COMMITED” in relational databases – common practice
• Change configuration (consistency model) based as a solution to an issue we try to fix
(usually found on stack overflow)
• Belief that your system/database provide much stronger consisteny model than it really
does
• especially in RDBMS world
• Fear of any form of inconsistencies in the system, despite most businesses in real life
deals with inconsistencies for ages
DEFINITION OF CONSISTENCY MODEL
• A set of rules (for order and visibility of reads and updates) that system that wants to
provide this consistency model must obey at each state in it's history
• A consistency model is the set of all allowed histories of operations
• Deals with interleavings of operations
• We need to know the consistency model of our system in order to write predictible
programs
DEFINITION OF CONSISTENCY MODEL
• A set of rules (for order and visibility of reads and updates) that system that wants to
provide this consistency model must obey at each state in it's history
• A consistency model is the set of all allowed histories of operations
• Deals with interleavings of operations
• We need to know the consistency model of our system in order to write predictible
programs
• Extreme examples:
• Every read always returns zero
• There are no rules at all – easy model satisfied trivially by every system
TRANSACTIONS
• To provide reliable units of work that allow correct recovery from failures and keep a
database consistent even in cases of system failure
• To provide isolation between programs accessing a database concurrently
TRANSACTIONS
• To provide reliable units of work that allow correct recovery from failures and keep a
database consistent even in cases of system failure
• To provide isolation between programs accessing a database concurrently
• ACID = Atomicity, Consistency, Isolation, Durability
• a set of properties of database transactions intended to guarantee validity even in
the event of errors
• Consistency – ensures that any transaction will bring the database from one valid
state to another (constraints, foreign keys, triggers) – Integrity?
• Isolation – ensures that the concurrent execution of transactions results in a system
state that would be obtained if transactions were executed sequentially, i.e., one
after the other – Serializable isolation level
TOO MUCH CONSISTENCY MODELS
• A lot of theoretical consistency models:
• depending on the problem areas
• processors architecture
• concurrent programming
• distributed systems
TOO MUCH CONSISTENCY MODELS
• A lot of theoretical consistency models:
• depending on the problem areas
• processors architecture
• concurrent programming
• distributed systems
• using diferent elementary definitions
• possible ordering of elements
• relation between reads and writes
• time constraints
TOO MUCH CONSISTENCY MODELS
• A lot of theoretical consistency models:
• depending on the problem areas
• processors architecture
• concurrent programming
• distributed systems
• using diferent elementary definitions
• possible ordering of elements
• relation between reads and writes
• time constraints
• Each database implementation defines it’s private consistency model usually defined by
its configuration options
CONSISTENCY MODELS (ONLY A FEW)
https://aphyr.com/posts/313-strong-consistency-models
LINEARIZABILITY (ATOMIC CONSISTENCY)
• single-operation, single-object, real-time order
• It provides a real-time (wall-clock) guarantee on the behavior of a set of single operations
(reads and writes) on a single object (distributed register)
• Operations appear to be instantaneous (atomic)
• Once an operation is complete, everyone must see it, or some later state (prohibits stale
read)
• Gold standard in distributed systems
• „C” in the CAP Theorem
SERIALIZABILITY
• multi-operation, multi-object, arbitrary total order
• It guarantees that the execution of a set of transactions (usually containing read and write
operations) over multiple items is equivalent to some serial execution (total ordering) of
the transactions.
• Gold standard of Transactions – „I” in the definition of ACID
• A mechanism for guaranteeing database correctness
• If users’ transactions each preserve application correctness (“C,” or consistency, in
ACID), a serializable execution also preserves correctness
SERIALIZABILITY
• multi-operation, multi-object, arbitrary total order
• It guarantees that the execution of a set of transactions (usually containing read and write
operations) over multiple items is equivalent to some serial execution (total ordering) of
the transactions.
• Gold standard of Transactions – „I” in the definition of ACID
• A mechanism for guaranteeing database correctness
• If users’ transactions each preserve application correctness (“C,” or consistency, in
ACID), a serializable execution also preserves correctness
• Does not impose any real-time constraints on the ordering of transactions
• (no deterministic order)
• Only require that some equivalent serial execution exist.
SERIALIZABILITY
• Example:
• Transaction 1:
SELECT SUM(value) FROM tab WHERE class = 1
INSERT INTO tab VALUES (2, 30)
COMMIT
• Transaction 2:
SELECT SUM(value) FROM tab WHERE class = 2
INSERT INTO tab VALUES (1, 300)
COMMIT
• There’s no serial order of executions consistent with the result = Exception
• if A had executed before B, B would have computed the sum 330, not 300
• similarly the other order
class value
1 10
1 20
2 100
2 200
Table: tab
STRICT (STRONG) SERIALIZABILITY
• combines Serializability and Linearizability
• Transaction behavior is equivalent to some serial execution, and the serial order
corresponds to real time.
• Implicitly assumes the presence of a global clock
STRICT (STRONG) SERIALIZABILITY
• combines Serializability and Linearizability
• Transaction behavior is equivalent to some serial execution, and the serial order
corresponds to real time.
• Implicitly assumes the presence of a global clock
• Example:
• User1: begin and commit Transaction1, which writes to item X
• later
• User2: begin and commit Transaction2, which reads from X
• Strict Serializability – places T1 before T2 in the serial ordering, and T2 reads T1’s write
• Serializability – could place T2 before T1
SOME EASY EXAMPLES ?
• Linearizability (Atomic consistency)
• multi-core Processors don’t provide Linearizability by default – you have to use
special atomic operations are used: memory barriers, compareAndSwap operations
SOME EASY EXAMPLES ?
• Linearizability (Atomic consistency)
• multi-core Processors don’t provide Linearizability by default – you have to use
special atomic operations are used: memory barriers, compareAndSwap operations
• programming languages don’t provide Linearizability by default – eg. in Java you
have to use special constructs: volatile, synchronized, AtomicReference, some non-
blocking data structures (blocking is not required to achieve linearizability)
SOME EASY EXAMPLES ?
• Linearizability (Atomic consistency)
• multi-core Processors don’t provide Linearizability by default – you have to use
special atomic operations are used: memory barriers, compareAndSwap operations
• programming languages don’t provide Linearizability by default – eg. in Java you
have to use special constructs: volatile, synchronized, AtomicReference, some non-
blocking data structures (blocking is not required to achieve linearizability)
• distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it
SOME EASY EXAMPLES ?
• Linearizability (Atomic consistency)
• multi-core Processors don’t provide Linearizability by default – you have to use
special atomic operations are used: memory barriers, compareAndSwap operations
• programming languages don’t provide Linearizability by default – eg. in Java you
have to use special constructs: volatile, synchronized, AtomicReference, some non-
blocking data structures (blocking is not required to achieve linearizability)
• distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it
• Serializability
• databases implemented using two-phase locking (long read and long write locks) in
the highest isolation level provide Strict Serializability
SOME EASY EXAMPLES ?
• Linearizability (Atomic consistency)
• multi-core Processors don’t provide Linearizability by default – you have to use
special atomic operations are used: memory barriers, compareAndSwap operations
• programming languages don’t provide Linearizability by default – eg. in Java you
have to use special constructs: volatile, synchronized, AtomicReference, some non-
blocking data structures (blocking is not required to achieve linearizability)
• distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it
• Serializability
• databases implemented using two-phase locking (long read and long write locks) in
the highest isolation level provide Strict Serializability
• most MVCC databases don’t provide Serializability at all eg. Oracle
• except PostgrsSQL and some less common
OVERLOADED TERMINOLOGY
• Linearizability comes from distributed systems and concurrency programming community
OVERLOADED TERMINOLOGY
• Linearizability comes from distributed systems and concurrency programming community
• Serializability comes from database community
OVERLOADED TERMINOLOGY
• Linearizability comes from distributed systems and concurrency programming community
• Serializability comes from database community
• Today, we’re mostly interested in distributed databases – which often leads to overloaded
terminology
RELEASING CONSTRAINTS
• Most real real systems provide cheeper to implement and harder to understand models,
but usually those model provide higher Availability and require less overhead
(coordination)
EVENTUAL CONSISTENCY
• if no new updates are made to a particular piece of data, eventually all reads to that item
will return the last updated value.
EVENTUAL CONSISTENCY
• if no new updates are made to a particular piece of data, eventually all reads to that item
will return the last updated value.
• eventual consistency is purely a liveness guarantee
• does not make safety guarantees: an eventually consistent system can return any value
before it converges
EVENTUAL CONSISTENCY
• if no new updates are made to a particular piece of data, eventually all reads to that item
will return the last updated value.
• eventual consistency is purely a liveness guarantee
• does not make safety guarantees: an eventually consistent system can return any value
before it converges
• We’re get used to statements:
1 + 1 = 2 (not eventually 2)
EVENTUAL != HOPEFUL CONSISTENCY
• Eventual consistent systems are usually highly reliable, just don’t give you any
guarantees
• Eventual usually don’t mean minutes or seconds, but milliseconds
EVENTUAL != HOPEFUL CONSISTENCY
• Eventual consistent systems are usually highly reliable, just don’t give you any
guarantees
• Eventual usually don’t mean minutes or seconds, but milliseconds
• Always ask yourself, is consistency really that important?
EVENTUAL != HOPEFUL CONSISTENCY
• Eventual consistent systems are usually highly reliable, just don’t give you any
guarantees
• Eventual usually don’t mean minutes or seconds, but milliseconds
• Always ask yourself, is consistency really that important?
• Reasoning for Eventual consistency:
• Pessimistic design for high consistency = punish your users 99,9% of the time
EVENTUAL != HOPEFUL CONSISTENCY
• Eventual consistent systems are usually highly reliable, just don’t give you any
guarantees
• Eventual usually don’t mean minutes or seconds, but milliseconds
• Always ask yourself, is consistency really that important?
• Reasoning for Eventual consistency:
• Pessimistic design for high consistency = punish your users 99,9% of the time
• Optimistic design:
• know your business
• have some consistency plan = business Compensating transaction if things go
wrong (eg. discount, special offer)
SEQUENTIAL CONSISTENCY
• Assumes all operations are executed in some sequential order and each process issues
operations in program order
• each process preserves its program order
• any valid interleaving is allowed
• but all processes agree on the same interleaving
• A write to a variable does not have to be seen instanteneously
SEQUENTIAL CONSISTENCY
• Assumes all operations are executed in some sequential order and each process issues
operations in program order
• each process preserves its program order
• any valid interleaving is allowed
• but all processes agree on the same interleaving
• A write to a variable does not have to be seen instanteneously
A sequentially consistent data store A data store that is not sequentially consistent
http://csis.pace.edu/~marchese/CS865/Lectures/Chap7/Chapter7fin.htm
SEQUENTIAL CONSISTENCY
• Assumes all operations are executed in some sequential order and each process issues
operations in program order
• each process preserves its program order
• any valid interleaving is allowed
• but all processes agree on the same interleaving
• A write to a variable does not have to be seen instanteneously
• Examples:
• Multi-core processors memory models are usually weaker by default
• Zookeeper (showed as clear-cut case of choosing consistency over availability) by
default provides Sequential consistency model for reads.
SEQUENTIAL CONSISTENCY IN
SYSTEMS INTEGRATION
• You want want to synchronize multiple copies / derivatives of your data in multiple
systems in sync
https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
SEQUENTIAL CONSISTENCY IN
SYSTEMS INTEGRATION
• and eventual consistency is not enough, because you may end up in perpetual
inconsistency
https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
SEQUENTIAL CONSISTENCY IN
SYSTEMS INTEGRATION
• but Sequential Consistency is right for you
• because what you want is an totally ordered log of events that all systems need to follow
if they want to be in sync
• eg. Kafka is an append only replication log with total order (within a single partition)
https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
TRANSACTION ISOLATION LEVELS
• Serializability is available on a single node, yet
• Isolation levels in transactional databases are there to weaken the consistency
guarantees and increase availability, even on a single database node
TRANSACTION ISOLATION LEVELS
• Serializability is available on a single node, yet
• Isolation levels in transactional databases are there to weaken the consistency
guarantees and increase availability, even on a single database node
• Database transaction isolation level don’t provide all you need to know to understand the
consistency model it provides.
• You also have to consider other (sometimes) configurable properties:
• MVCC (non-blocking readers) vs locks (blocking readers)
• replication strategy (if used): synchronous vs asynchronous
REPLICATED DATA CONSISTENCY MODELS
• Defined not by the interleaving of operations, but by the visibility guarantees on each
replica of the data
• More Client-Centric consistency models = focus on how data is seen by each client
https://image.slidesharecdn.com/searchingcassandrakn-150805184146-lva1-app6891/95/solr-cassandra-searching-cassandra-with-datastax-enterprise-12-638.jpg?cb=1438800199
„REPLICATED DATA CONSISTENCY EXPLAINED
THROUGH BASEBALL”
• Different participants can use different consistency guarantees
http://cdn.walkthrough.vooxe.com/media/picture/3b712de48137572f3849aabd5666a4e3_640_435.jpg
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
Eventual consistency • See subset (any) of previous writes.
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
Eventual consistency • See subset (any) of previous writes.
Consistent prefix • See initial sequence of writes.
• Reader sees a version of the data store that existed at the
master at some time in the past.
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
Eventual consistency • See subset (any) of previous writes.
Consistent prefix • See initial sequence of writes.
• Reader sees a version of the data store that existed at the
master at some time in the past.
Bounded staleness • See all „old enough” writes
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
Eventual consistency • See subset (any) of previous writes.
Consistent prefix • See initial sequence of writes.
• Reader sees a version of the data store that existed at the
master at some time in the past.
Bounded staleness • See all „old enough” writes
Monotonic read • See increasing subset of writes.
• Client can read stale data, but is guaranteed to see data
store that is increasingly up-to-date over time.
CONSISTENCY GUARANTEES
Read operation
Consistency guarantee
Set of previous writes whose results are visible to a read
operation
Strong consistency • See all previous writes. Linearizability.
Eventual consistency • See subset (any) of previous writes.
Consistent prefix • See initial sequence of writes.
• Reader sees a version of the data store that existed at the
master at some time in the past.
Bounded staleness • See all „old enough” writes
Monotonic read • See increasing subset of writes.
• Client can read stale data, but is guaranteed to see data
store that is increasingly up-to-date over time.
Read My Writes • See all writes performed by this reader
• or some more recent value written by different client.
EXAMPLE GAME
• Let’s assume that the baseball game score is kept in a Key-Value store in two objects. A
score for „home” team and a score for „visitors” team.
• Datastore is replicated among a number of servers
EXAMPLE GAME
• Let’s assume that the baseball game score is kept in a Key-Value store in two objects. A
score for „home” team and a score for „visitors” team.
• Datastore is replicated among a number of servers
• Example game – sequence of writes:
Write operation Score („visitors” – „home”)
0 – 0
(„home”, 1) 0 – 1
(„visitors”, 1) 1 – 1
(„home”, 2) 1 – 2
(„home”, 3) 1 – 3
(„visitors”, 2) 2 – 3
(„home”, 4) 2 – 4
(„home”, 5) 2 – 5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1,
1-2, 1-3, 1-4, 1-5, 2-0, 2-1,
2-2, 2-3, 2-4, 2-5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1,
1-2, 1-3, 1-4, 1-5, 2-0, 2-1,
2-2, 2-3, 2-4, 2-5
Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1,
1-2, 1-3, 1-4, 1-5, 2-0, 2-1,
2-2, 2-3, 2-4, 2-5
Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5
Bounded staleness some score in the staleness window 2-4, 2-5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1,
1-2, 1-3, 1-4, 1-5, 2-0, 2-1,
2-2, 2-3, 2-4, 2-5
Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5
Bounded staleness some score in the staleness window 2-4, 2-5
Monotonic read after reading 1-3: 1-3, 1-4, 1-5, 2-3, 2-4, 2-5
POSSIBLE READ RESULTS FOR EACH
CONSISTENCY GUARANTEE
Consistency model Possible Read results
Strong consistency 2-5
Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1,
1-2, 1-3, 1-4, 1-5, 2-0, 2-1,
2-2, 2-3, 2-4, 2-5
Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5
Bounded staleness some score in the staleness window 2-4, 2-5
Monotonic read after reading 1-3: 1-3, 1-4, 1-5, 2-3, 2-4, 2-5
Read my writes for the writer: 2-5
for anyone other: same as in eventual cons.
Participant Required consistency guarantee
Official scorekeeper • Read My Writes
• require Strong Consistency, but it’s the only writer (application
specific knowledge)
Umpire • Strong Consistency
• most of the time doesn’t care about the score. Only close to the
end can end the game earlier if the game is already won
Radio reporter • Consistent Prefix & Monotonic Read
• don’t have to be completely up-to-date
Sportswriter • Bounded Staleness
• writes article after some time. Eventual Consistency will likely
return the correct score, but this gives us 100% certainty
Statistician • Strong Consistency for game score
• Read My Writes for staticics, because he writes those
Stat watcher • Eventual Consistency
• checks statistics once a day
EACH BASEBALL PARTICIPANT REQUIRE
DIFFERENT CONSISTENCY GUARANTEE
„REPLICATED DATA CONSISTENCY EXPLAINED
THROUGH BASEBALL”
• Lessons learned:
• Read Your Writes – is usually enough in most cases
• Eventual Consistency – probably most systems provide more than that
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Linearizability is correctly provided by a system implementing Consensus problem
• the problem of getting a set of nodes in a distributed system to agree on something
• Achieving consensus allows a distributed system to act as a single entity, with every
individual node aware of and in agreement with the actions of the whole of the
network
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Linearizability is correctly provided by a system implementing Consensus problem
• the problem of getting a set of nodes in a distributed system to agree on something
• Achieving consensus allows a distributed system to act as a single entity, with every
individual node aware of and in agreement with the actions of the whole of the
network
• it’s proven that in a fully asynchronous message-passing distributed system in which one
process may have a halting failure, consensus in impossible (but in a very unlikely edge
case scenarios)
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• 2-Phase-Commit
• simplest, most often used consensus algorithm
• quite efficient compared
• can blocks on Coordinator failure in some phasesto other (N nodes exchange 3*N
messages)
• it is very hard (in some cases impossible) to recover transaction state
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• 2-Phase-Commit
• simplest, most often used consensus algorithm
• quite efficient compared
• can blocks on Coordinator failure in some phasesto other (N nodes exchange 3*N
messages)
• it is very hard (in some cases impossible) to recover transaction state
• 3-Phase-Commit
• can fail in the network partition (split-brain) scenario – then in some cases one part
can commit and another can abort
• satisfies liveness properties - will make progress in failure cases.
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Paxos
• Provably correct in asynchronous networks that eventually become synchronous.
• Does not block if a majority of participants are available (so withstands n/2 faults)
• Sacrifices liveness for correctness – guaranteed termination, when the network is
behaving asynchronously and terminates only when synchronicity returns.
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Paxos
• Provably correct in asynchronous networks that eventually become synchronous.
• Does not block if a majority of participants are available (so withstands n/2 faults)
• Sacrifices liveness for correctness – guaranteed termination, when the network is
behaving asynchronously and terminates only when synchronicity returns.
• Raft
• Correctness and availability of the system remains guaranteed as long as a majority
of the servers remain up
• Easier to implement – they say…
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Paxos
• Provably correct in asynchronous networks that eventually become synchronous.
• Does not block if a majority of participants are available (so withstands n/2 faults)
• Sacrifices liveness for correctness – guaranteed termination, when the network is
behaving asynchronously and terminates only when synchronicity returns.
• Raft
• Correctness and availability of the system remains guaranteed as long as a majority
of the servers remain up
• Easier to implement – they say…
• Zab (Zookeeper)
• Coordinated actions are much (10+ times) slower than not coordinated
LINEARIZABILITY IN DISTRIBUTED SYSTEMS
• Quorum Read + Quorum Write = Linearizability ?
• „You sometimes see people people claiming that quorum reads and writes guarantee
linearizability, but I think it would be unwise to rely on it – subtle combinations of features
such as sloppy quorums and read repair can lead to tricky edge cases in which deleted
data is resurrected, or the number of replicas of a value falls below the original W
(violating the quorum condition), or the number of replica nodes increases above the
original N (again violating the quorum condition). All of these lead to non-linearizable
outcomes.” (Martin Kleppmann)
EXAMPLES OF REAL SYSTEMS
https://media.licdn.com/mpr/mpr/shrinknp_800_800/AAEAAQAAAAAAAAhUAAAAJDMyNGE1YTA5LTA3MDgtNDY3MS04ODBlLWM3Yjg3MWFmZWM0MA.jpg
AZURE COSMOS DB
• Higher-level, more application-level consistency models
Consistency Description
Strong • Linearizability
• Guarantees that a write is only visible after it is committed durably by the
majority quorum of replicas
• A read is always acknowledged by the majority read quorum
Bounded
Staleness
• Consistent Prefix.
• the reads may lag behind writes by at most k versions or t time-interval.
• 20%
Session
Consistent
• Scoped to client session
• Prefix. Monotonic reads, monotonic writes, read-your-writes, write-
follows-reads
• 73%
Consistent
Prefix
• Updates returned are some prefix of all the updates, with no gaps.
Guarantees that reads never see out of order writes.
Eventual • Out of order reads
They use linearizability checker, which continuously operates over our service telemetry and openly reports any
consistency violations to you.
CASSANDRA
• Configurable consistency level defined by the number of replicas you require ACK from
before returning to the client
Cassandra consistency level Description
ALL Highest consistency
QUORUM • A write must be written to the commit log and memtable
on a quorum of replica nodes.
• Provides strong consistency if you can tolerate some
level of failure.
ONE • One replica
THREE • Three replicas
QUORUM_LOCAL • Datacenter aware versions
CASSANDRA
• To provide strong consistency:
• R + W > N
• N – replication factor, R – read consistency level, W – write consitency level
• Conflicts resolution: Last Write Wins (based on operation timestamp) – require proper
relative time synchronization
• Lightweight transaction (compare and set transactions)
• INSERT INTO ... VALUES ... IF …
• Such queries require a read and a write and they also need to reach consensus among
all the replicas – uses Paxos.
• Read in those transactions uses SERIAL consistency level (similar to QUORUM)
• Write uses tunable consistency level
POSTGRESQL AND OTHER REALTIONAL DBS
• MVCC (not configurable in PostgreSQL, but eg. in Ms SQL Server you can turn it on/off)
• configurable Transaction isolation levels
• special operations like SELECT FOR UPDATE
• Replication: synchronous or asynchronous, but even in synchronous there is a time gap
between master and replica
POSTGRESQL AND OTHER REALTIONAL DBS
• MVCC (not configurable in PostgreSQL, but eg. in Ms SQL Server you can turn it on/off)
• configurable Transaction isolation levels
• special operations like SELECT FOR UPDATE
• Replication: synchronous or asynchronous, but even in synchronous there is a time gap
between master and replica
• PostreSQL Implementations of multi-master with asynchronous replication and various
ways of resolving conflicts, but with weak global consistency model
• Galeria Cluster (extended MySQL) provides multi-master based on synchronous
replication, and they say it
• can support transaction isolation levels up to REPEATABLE READ if proper locking
techniques (SELECT FOR UPDATE) are used
COUCHBASE
• Queries – Eventually consistent
• can enforce Read Your Writes with client help: you provide mutation id that you get
after write, and you can provide it to read operation to force query node to get at
least up-to-date with this mutatil id
• Document access – Strongly consistent
• Each key (hash) is assigned to a single primary node that handles all operations
• After node failure, rebalancing happen, but don’t know about consistency guarantee
in some edge cases
• with XDCR you only get Eventual consistency.
ZOOKEEPER
• Showed as clear-cut case of choosing consistency over availability
• Writes = Linearizable
• Requires a majority quorum in order to reach consensus using custom Zab protocol
• Reads = provide more than Sequential consistency
• Each client is connected to one of the server nodes, and when you make a read, you
see only the data on that node, even if there are more up-to-date writes on another
node
• It is possible to make linearizable reads in ZooKeeper by preceding a read with a
sync command
ETCD / CONSUL
• Write = Linearizable
• go through the Raft consensus process
• Read = Sequential consistency
• can configure Linearizable consisntecy that also go through Raft
• Raft allows in some cases for a leader to believe that he's still a leader (assumes stable
time window) - he can't process writes by itself but can process reads
TIME DEPENDENCY
• Linearizability – depends on global time
• Multiple databases implemented conflict resolution based on time
• But time synchronization is a real problem
• and NTP don’t solve those problems
CLOUD SPANNER = NEW TIME API
TT.now() => [earliest, latest]
Time = interval with bounded time uncertainty
CLOUD SPANNER = NEW TIME API
• multi-version, globally-distributed and synchronously-replicated database, with support for
consistent distributed transactions
• External consistency – stronger than Serializability – transactions commit in an order that
is reflected in their commit timestamps, and these commit timestamps are "real time" so
you can compare them to your watch
• True Time = provide the exact global time with a high degree of accuracy
• atomic clock and GPS antenas in datacenters for time sources
• new API design and protocol different to NTP
• TT.now() => [earliest, latest] interval with bounded time uncertainty
• transaction timestamps allow system to order them often without any coordination
• Read-Only transactions – strong consistency (return latest copy of data) without locking
• Read-Write transactions – locking, orchestrated by Paxos leader
CLOUD SPANNER
• In particular, Spanner assigns a timestamp to all reads and writes. A transaction at
timestamp T1 is guaranteed to reflect the results of all writes that happened before T1
• If a machine wants to satisfy a read at T2, it must ensure that its view of the data is up-
to-date through at least T2.
• Transaction gets assigned timestamp any time between all locks are acquired and any
lock is released.
• Spanner assigns it the timestamp that Paxos assigns to the Paxos write that represents
the transaction commit.
• Spanner depends on the following monotonicity invariant: within each Paxos group,
Spanner assigns timestamps to Paxos writes in monotonically increasing order, even
across leaders.
• A single leader replica can trivially assign timestamps in monotonically increasing
order.
• This invariant is enforced across leaders by making use of the disjointness invariant:
a leader must only assign timestamps within the interval of its leader lease.
CLOUD SPANNER
• lock-free read-only transactions
• locks at row-and-column level
• Blind Writes (writing data without previously reading it in the same transaction)
• shared write locks
• conflicts resolution based on timestamps
• Reads in a transaction see everything that has been committed before the transaction
commits, and writes are seen by everything that starts after the transaction is committed.
• Possibly retries are needed - client libraries handle this mostly
CONSISTENCY OF THE WHOLE
• You could probably think about the consistency guaratnees of your database…
• But have you ever considered the consistency model provided on different layers or by
your system as a whole?
https://www.confluent.io/wp-content/uploads/2016/09/Event-sourced-based-architecture.jpeg
CONSISTENCY OF THE WHOLE
• Database don’t have to provide consistency guarantees by itself
• Client libraries (provided by database vendors) can be very smart
• Consistency can be provided by both database and proper behavior of the clients
• Clients can be topology aware, and eg. by default try to read from the first (consistent)
replica that gets updates the fastest
• Many scenarios are possible
https://www.confluent.io/wp-content/uploads/2016/09/Event-sourced-based-architecture.jpeg
CONSISTENCY OF THE WHOLE - CQRS EXAMPLE
• Does it fail the most obvious test: Reading what You just Wrote?
https://lostechies.com/jimmybogard/files/2012/08/image4.png
CONFUSION WITH ENTANGLEMENT
https://earth-chronicles.com/wp-content/uploads/2017/09/entanglement-650x459.jpg http://scienews.com/images/2017/09/ad8145a6780431ea3986b46f9e5cf79e.jpg http://wavewatching.net/wp-content/uploads/2015/05/Paris_Tuileries_Facepalm_statue.jpg
SUMMARY
• Understand the components of your system and their specific consistency guarantees
SUMMARY
• Understand the components of your system and their specific consistency guarantees
• Think about the consistency your system exposes to End Users as the whole
SUMMARY
• Understand the components of your system and their specific consistency guarantees
• Think about the consistency your system exposes to end users as the whole
• Consistency guarantee depends on the Use-Case, not the Data you access
SUMMARY
• Understand the components of your system and their specific consistency guarantees
• Think about the consistency your system exposes to end users as the whole
• Consistency guarantee depends on the use-case, not the data you access
• The stronger the consistency, the more likely you have to be Ready for Retires
• like Optimistic locking optimize for the best case scenario
SUMMARY
• Use business knowledge to reason about consistency guarantees of the system:
• maybe it is possible to shard data into groups that need stronger consistency
guaratnees within, than between each other
• Some data are written only by a single client? Or a group of clients that need
coordination – not all the clients
SUMMARY
• Use business knowledge to reason about consistency guarantees of the system:
• maybe it is possible to shard data into groups that need stronger consistency
guaratnees within, than between each other
• Some data are written only by a single client? Or a group of clients that need
coordination – not all the clients
• Consistency guarantee provided by the system is shared decision of technical and
business people
• many big companies created weakly consistent databases because their business
valued availability higher than consistency
HOW TO READ SOME DBS DOCUMENTATIONS?
„Usually (not always),
with proper configuration,
for some operations,
we provide
(as long as we read the same definition as you reader)
some consistency model”
WHOULD YOU LIKE TO KNOW MORE?
https://earth-chronicles.com/wp-content/uploads/2017/09/entanglement-650x459.jpg http://scienews.com/images/2017/09/ad8145a6780431ea3986b46f9e5cf79e.jpg http://wavewatching.net/wp-content/uploads/2015/05/Paris_Tuileries_Facepalm_statue.jpg
https://qph.ec.quoracdn.net/main-qimg-ed23d9866da0882d2b994338a31dc8fa
SOURCES
• https://www.microsoft.com/en-us/research/wp-content/uploads/2011/10/ConsistencyAndBaseballReport.pdf
• http://the-paper-trail.org/blog/
• https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
• http://www.bailis.org/blog/
• https://aphyr.com/posts/313-strong-consistency-models
• https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
• https://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-hopeful-consistency-by-
christos-kalantzis
• https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
• http://galeracluster.com/
• https://www.postgresql.org/docs/current/static/transaction-iso.html
• https://developer.couchbase.com/documentation/server/4.5/developer-guide/query-consistency.html
• https://zookeeper.apache.org/doc/r3.4.10/zookeeperInternals.html
• https://coreos.com/etcd/docs/latest/learning/api_guarantees.html
• https://www.consul.io/docs/internals/consensus.html
• https://cloud.google.com/spanner/docs/
• https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
CAP THEOREM
• CAP Theorem dictates that it is impossible to achieve “consistency” while remaining available in
the presence of network and system partitions.
• CAP uses very narrow definitions:
• Consistency = linearizability
• Availability
• every request received by a non-failing [database] node in the system must result in
a [non-error] response.
• It’s not sufficient for some node to be able to handle the request – any non-failing
node needs to be able to handle it
• Many so-called “highly available” (i.e. low downtime) systems actually do not meet
this definition of availability.
• Partition Tolerance
• means that you’re communicating over an asynchronous network that may delay or
drop messages
• you don’t really have any choice
CAP THEOREM
https://2.bp.blogspot.com/-MQsRTq0PLfk/V1bdpYE_lRI/AAAAAAAAAEM/jzxob8LRMfUm13PNCfV1lcc6sHgK-_eIgCLcB/s1600/trainagle.jpeg
CONSISTENCY VS AVAILABILITY
Can’t be fully Available
Sticky Availability
Achievable if we relax our notion of availability –
client nodes must always talk to the same server
Total Availability

More Related Content

PDF
Understanding Data Consistency in Apache Cassandra
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
PPTX
Flink Streaming
PDF
Introduction to Azure Resource Manager
PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
PDF
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Understanding Data Consistency in Apache Cassandra
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Understanding Data Partitioning and Replication in Apache Cassandra
Flink Streaming
Introduction to Azure Resource Manager
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...

What's hot (20)

PDF
CockroachDB: Architecture of a Geo-Distributed SQL Database
PDF
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
PDF
Introduction to Spark Streaming
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
Microservices in the Apache Kafka Ecosystem
PDF
Apache Kafka
PPTX
A Deep Dive Into Understanding Apache Cassandra
PDF
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
PPTX
Technical overview of Azure Cosmos DB
PDF
Using Sphinx for Search in PHP
PPTX
PDF
The basics of fluentd
PPTX
Introduction to Apache Kafka
PPTX
Data Pipelines with Kafka Connect
PDF
Fundamentals of Apache Kafka
PDF
AWSからのメール送信
PPTX
Developing Scylla Applications: Practical Tips
PPTX
Kafka and Avro with Confluent Schema Registry
PDF
DATADOG TIPS #1
PDF
RDBNoSQLの基礎と組み合わせDB構成をちょっとよくする話
CockroachDB: Architecture of a Geo-Distributed SQL Database
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Introduction to Spark Streaming
Apache Kafka Fundamentals for Architects, Admins and Developers
Microservices in the Apache Kafka Ecosystem
Apache Kafka
A Deep Dive Into Understanding Apache Cassandra
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
Technical overview of Azure Cosmos DB
Using Sphinx for Search in PHP
The basics of fluentd
Introduction to Apache Kafka
Data Pipelines with Kafka Connect
Fundamentals of Apache Kafka
AWSからのメール送信
Developing Scylla Applications: Practical Tips
Kafka and Avro with Confluent Schema Registry
DATADOG TIPS #1
RDBNoSQLの基礎と組み合わせDB構成をちょっとよくする話
Ad

Similar to Lost with data consistency (20)

PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
PPT
The Economies of Scaling Software
PPT
The economies of scaling software - Abdel Remani
PPTX
Revision
PDF
Scalability, Availability & Stability Patterns
PDF
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
PPTX
Hbase hive pig
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
UNIT I Introduction to NoSQL.pptx
PDF
Basic Introduction to Crate @ ViennaDB Meetup
PPTX
DevOpsDays SLC - Getting Along With Your DBOps Team
PPTX
Master.pptx
PPTX
Sql vs NoSQL
PPTX
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
PPTX
Hbase hivepig
PPT
ds7_con.ppt
PPTX
PPTX
Allyourbase
PDF
A Closer Look at Apache Kudu
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
The Economies of Scaling Software
The economies of scaling software - Abdel Remani
Revision
Scalability, Availability & Stability Patterns
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Hbase hive pig
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Basic Introduction to Crate @ ViennaDB Meetup
DevOpsDays SLC - Getting Along With Your DBOps Team
Master.pptx
Sql vs NoSQL
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Hbase hivepig
ds7_con.ppt
Allyourbase
A Closer Look at Apache Kudu
Ad

Recently uploaded (20)

PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Online Work Permit System for Fast Permit Processing
PDF
medical staffing services at VALiNTRY
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Introduction to Artificial Intelligence
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Digital Strategies for Manufacturing Companies
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Nekopoi APK 2025 free lastest update
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
System and Network Administration Chapter 2
PDF
top salesforce developer skills in 2025.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Online Work Permit System for Fast Permit Processing
medical staffing services at VALiNTRY
ISO 45001 Occupational Health and Safety Management System
Introduction to Artificial Intelligence
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Digital Strategies for Manufacturing Companies
Internet Downloader Manager (IDM) Crack 6.42 Build 41
ManageIQ - Sprint 268 Review - Slide Deck
Nekopoi APK 2025 free lastest update
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia
PTS Company Brochure 2025 (1).pdf.......
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Illustrator 28.6 Crack My Vision of Vector Design
System and Network Administration Chapter 2
top salesforce developer skills in 2025.pdf

Lost with data consistency

  • 2. WHAT ARE WE GOING TO TALK ABOUT? fairy tales & „lies for children” & others
  • 3. WHAT KIND OF CONSISTENCIES ARE WE TALKING ABOUT? • Consisteny of Backup = Point in time consistency
  • 4. WHAT KIND OF CONSISTENCIES ARE WE TALKING ABOUT? • Consisteny of Backup = Point in time consistency • Consistency in the sense of ACID properties of database transactions
  • 5. WHAT KIND OF CONSISTENCIES ARE WE TALKING ABOUT? • Consisteny of Backup = Point in time consistency • Consistency in the sense of ACID properties of database transactions • Consistency in the sense of concurrent programming
  • 6. WHAT KIND OF CONSISTENCIES ARE WE TALKING ABOUT? • Consisteny of Backup = Point in time consistency • Consistency in the sense of ACID properties of database transactions • Consistency in the sense of concurrent programming • Consistency between multiple replicas of the same data in the distributed system • because it’s something you just can’t neglect • because software fail, hardware fail, network partitions happen • in an uncertain world, we want our software to maintain some sort of „correctness”
  • 7. COMMON APPROACH TO CONSISTENCY • Use Defaults: eg. „READ COMMITED” in relational databases – common practice
  • 8. COMMON APPROACH TO CONSISTENCY • Use Defaults: eg. „READ COMMITED” in relational databases – common practice • Change configuration (consistency model) based as a solution to an issue we try to fix (usually found on stack overflow)
  • 9. COMMON APPROACH TO CONSISTENCY • Use Defaults: eg. „READ COMMITED” in relational databases – common practice • Change configuration (consistency model) based as a solution to an issue we try to fix (usually found on stack overflow) • Belief that your system/database provide much stronger consisteny model than it really does • especially in RDBMS world
  • 10. COMMON APPROACH TO CONSISTENCY • Use Defaults: eg. „READ COMMITED” in relational databases – common practice • Change configuration (consistency model) based as a solution to an issue we try to fix (usually found on stack overflow) • Belief that your system/database provide much stronger consisteny model than it really does • especially in RDBMS world • Fear of any form of inconsistencies in the system, despite most businesses in real life deals with inconsistencies for ages
  • 11. DEFINITION OF CONSISTENCY MODEL • A set of rules (for order and visibility of reads and updates) that system that wants to provide this consistency model must obey at each state in it's history • A consistency model is the set of all allowed histories of operations • Deals with interleavings of operations • We need to know the consistency model of our system in order to write predictible programs
  • 12. DEFINITION OF CONSISTENCY MODEL • A set of rules (for order and visibility of reads and updates) that system that wants to provide this consistency model must obey at each state in it's history • A consistency model is the set of all allowed histories of operations • Deals with interleavings of operations • We need to know the consistency model of our system in order to write predictible programs • Extreme examples: • Every read always returns zero • There are no rules at all – easy model satisfied trivially by every system
  • 13. TRANSACTIONS • To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure • To provide isolation between programs accessing a database concurrently
  • 14. TRANSACTIONS • To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure • To provide isolation between programs accessing a database concurrently • ACID = Atomicity, Consistency, Isolation, Durability • a set of properties of database transactions intended to guarantee validity even in the event of errors • Consistency – ensures that any transaction will bring the database from one valid state to another (constraints, foreign keys, triggers) – Integrity? • Isolation – ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially, i.e., one after the other – Serializable isolation level
  • 15. TOO MUCH CONSISTENCY MODELS • A lot of theoretical consistency models: • depending on the problem areas • processors architecture • concurrent programming • distributed systems
  • 16. TOO MUCH CONSISTENCY MODELS • A lot of theoretical consistency models: • depending on the problem areas • processors architecture • concurrent programming • distributed systems • using diferent elementary definitions • possible ordering of elements • relation between reads and writes • time constraints
  • 17. TOO MUCH CONSISTENCY MODELS • A lot of theoretical consistency models: • depending on the problem areas • processors architecture • concurrent programming • distributed systems • using diferent elementary definitions • possible ordering of elements • relation between reads and writes • time constraints • Each database implementation defines it’s private consistency model usually defined by its configuration options
  • 18. CONSISTENCY MODELS (ONLY A FEW) https://aphyr.com/posts/313-strong-consistency-models
  • 19. LINEARIZABILITY (ATOMIC CONSISTENCY) • single-operation, single-object, real-time order • It provides a real-time (wall-clock) guarantee on the behavior of a set of single operations (reads and writes) on a single object (distributed register) • Operations appear to be instantaneous (atomic) • Once an operation is complete, everyone must see it, or some later state (prohibits stale read) • Gold standard in distributed systems • „C” in the CAP Theorem
  • 20. SERIALIZABILITY • multi-operation, multi-object, arbitrary total order • It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions. • Gold standard of Transactions – „I” in the definition of ACID • A mechanism for guaranteeing database correctness • If users’ transactions each preserve application correctness (“C,” or consistency, in ACID), a serializable execution also preserves correctness
  • 21. SERIALIZABILITY • multi-operation, multi-object, arbitrary total order • It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions. • Gold standard of Transactions – „I” in the definition of ACID • A mechanism for guaranteeing database correctness • If users’ transactions each preserve application correctness (“C,” or consistency, in ACID), a serializable execution also preserves correctness • Does not impose any real-time constraints on the ordering of transactions • (no deterministic order) • Only require that some equivalent serial execution exist.
  • 22. SERIALIZABILITY • Example: • Transaction 1: SELECT SUM(value) FROM tab WHERE class = 1 INSERT INTO tab VALUES (2, 30) COMMIT • Transaction 2: SELECT SUM(value) FROM tab WHERE class = 2 INSERT INTO tab VALUES (1, 300) COMMIT • There’s no serial order of executions consistent with the result = Exception • if A had executed before B, B would have computed the sum 330, not 300 • similarly the other order class value 1 10 1 20 2 100 2 200 Table: tab
  • 23. STRICT (STRONG) SERIALIZABILITY • combines Serializability and Linearizability • Transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. • Implicitly assumes the presence of a global clock
  • 24. STRICT (STRONG) SERIALIZABILITY • combines Serializability and Linearizability • Transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. • Implicitly assumes the presence of a global clock • Example: • User1: begin and commit Transaction1, which writes to item X • later • User2: begin and commit Transaction2, which reads from X • Strict Serializability – places T1 before T2 in the serial ordering, and T2 reads T1’s write • Serializability – could place T2 before T1
  • 25. SOME EASY EXAMPLES ? • Linearizability (Atomic consistency) • multi-core Processors don’t provide Linearizability by default – you have to use special atomic operations are used: memory barriers, compareAndSwap operations
  • 26. SOME EASY EXAMPLES ? • Linearizability (Atomic consistency) • multi-core Processors don’t provide Linearizability by default – you have to use special atomic operations are used: memory barriers, compareAndSwap operations • programming languages don’t provide Linearizability by default – eg. in Java you have to use special constructs: volatile, synchronized, AtomicReference, some non- blocking data structures (blocking is not required to achieve linearizability)
  • 27. SOME EASY EXAMPLES ? • Linearizability (Atomic consistency) • multi-core Processors don’t provide Linearizability by default – you have to use special atomic operations are used: memory barriers, compareAndSwap operations • programming languages don’t provide Linearizability by default – eg. in Java you have to use special constructs: volatile, synchronized, AtomicReference, some non- blocking data structures (blocking is not required to achieve linearizability) • distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it
  • 28. SOME EASY EXAMPLES ? • Linearizability (Atomic consistency) • multi-core Processors don’t provide Linearizability by default – you have to use special atomic operations are used: memory barriers, compareAndSwap operations • programming languages don’t provide Linearizability by default – eg. in Java you have to use special constructs: volatile, synchronized, AtomicReference, some non- blocking data structures (blocking is not required to achieve linearizability) • distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it • Serializability • databases implemented using two-phase locking (long read and long write locks) in the highest isolation level provide Strict Serializability
  • 29. SOME EASY EXAMPLES ? • Linearizability (Atomic consistency) • multi-core Processors don’t provide Linearizability by default – you have to use special atomic operations are used: memory barriers, compareAndSwap operations • programming languages don’t provide Linearizability by default – eg. in Java you have to use special constructs: volatile, synchronized, AtomicReference, some non- blocking data structures (blocking is not required to achieve linearizability) • distribured systems using strong consensus algorithms eg. Paxos, Raft can provide it • Serializability • databases implemented using two-phase locking (long read and long write locks) in the highest isolation level provide Strict Serializability • most MVCC databases don’t provide Serializability at all eg. Oracle • except PostgrsSQL and some less common
  • 30. OVERLOADED TERMINOLOGY • Linearizability comes from distributed systems and concurrency programming community
  • 31. OVERLOADED TERMINOLOGY • Linearizability comes from distributed systems and concurrency programming community • Serializability comes from database community
  • 32. OVERLOADED TERMINOLOGY • Linearizability comes from distributed systems and concurrency programming community • Serializability comes from database community • Today, we’re mostly interested in distributed databases – which often leads to overloaded terminology
  • 33. RELEASING CONSTRAINTS • Most real real systems provide cheeper to implement and harder to understand models, but usually those model provide higher Availability and require less overhead (coordination)
  • 34. EVENTUAL CONSISTENCY • if no new updates are made to a particular piece of data, eventually all reads to that item will return the last updated value.
  • 35. EVENTUAL CONSISTENCY • if no new updates are made to a particular piece of data, eventually all reads to that item will return the last updated value. • eventual consistency is purely a liveness guarantee • does not make safety guarantees: an eventually consistent system can return any value before it converges
  • 36. EVENTUAL CONSISTENCY • if no new updates are made to a particular piece of data, eventually all reads to that item will return the last updated value. • eventual consistency is purely a liveness guarantee • does not make safety guarantees: an eventually consistent system can return any value before it converges • We’re get used to statements: 1 + 1 = 2 (not eventually 2)
  • 37. EVENTUAL != HOPEFUL CONSISTENCY • Eventual consistent systems are usually highly reliable, just don’t give you any guarantees • Eventual usually don’t mean minutes or seconds, but milliseconds
  • 38. EVENTUAL != HOPEFUL CONSISTENCY • Eventual consistent systems are usually highly reliable, just don’t give you any guarantees • Eventual usually don’t mean minutes or seconds, but milliseconds • Always ask yourself, is consistency really that important?
  • 39. EVENTUAL != HOPEFUL CONSISTENCY • Eventual consistent systems are usually highly reliable, just don’t give you any guarantees • Eventual usually don’t mean minutes or seconds, but milliseconds • Always ask yourself, is consistency really that important? • Reasoning for Eventual consistency: • Pessimistic design for high consistency = punish your users 99,9% of the time
  • 40. EVENTUAL != HOPEFUL CONSISTENCY • Eventual consistent systems are usually highly reliable, just don’t give you any guarantees • Eventual usually don’t mean minutes or seconds, but milliseconds • Always ask yourself, is consistency really that important? • Reasoning for Eventual consistency: • Pessimistic design for high consistency = punish your users 99,9% of the time • Optimistic design: • know your business • have some consistency plan = business Compensating transaction if things go wrong (eg. discount, special offer)
  • 41. SEQUENTIAL CONSISTENCY • Assumes all operations are executed in some sequential order and each process issues operations in program order • each process preserves its program order • any valid interleaving is allowed • but all processes agree on the same interleaving • A write to a variable does not have to be seen instanteneously
  • 42. SEQUENTIAL CONSISTENCY • Assumes all operations are executed in some sequential order and each process issues operations in program order • each process preserves its program order • any valid interleaving is allowed • but all processes agree on the same interleaving • A write to a variable does not have to be seen instanteneously A sequentially consistent data store A data store that is not sequentially consistent http://csis.pace.edu/~marchese/CS865/Lectures/Chap7/Chapter7fin.htm
  • 43. SEQUENTIAL CONSISTENCY • Assumes all operations are executed in some sequential order and each process issues operations in program order • each process preserves its program order • any valid interleaving is allowed • but all processes agree on the same interleaving • A write to a variable does not have to be seen instanteneously • Examples: • Multi-core processors memory models are usually weaker by default • Zookeeper (showed as clear-cut case of choosing consistency over availability) by default provides Sequential consistency model for reads.
  • 44. SEQUENTIAL CONSISTENCY IN SYSTEMS INTEGRATION • You want want to synchronize multiple copies / derivatives of your data in multiple systems in sync https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
  • 45. SEQUENTIAL CONSISTENCY IN SYSTEMS INTEGRATION • and eventual consistency is not enough, because you may end up in perpetual inconsistency https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
  • 46. SEQUENTIAL CONSISTENCY IN SYSTEMS INTEGRATION • but Sequential Consistency is right for you • because what you want is an totally ordered log of events that all systems need to follow if they want to be in sync • eg. Kafka is an append only replication log with total order (within a single partition) https://cdn.infoq.com/statics_s1_20171017-0336/resource/presentations/event-streams-kafka/en/slides (by Martin Kleppmann)
  • 47. TRANSACTION ISOLATION LEVELS • Serializability is available on a single node, yet • Isolation levels in transactional databases are there to weaken the consistency guarantees and increase availability, even on a single database node
  • 48. TRANSACTION ISOLATION LEVELS • Serializability is available on a single node, yet • Isolation levels in transactional databases are there to weaken the consistency guarantees and increase availability, even on a single database node • Database transaction isolation level don’t provide all you need to know to understand the consistency model it provides. • You also have to consider other (sometimes) configurable properties: • MVCC (non-blocking readers) vs locks (blocking readers) • replication strategy (if used): synchronous vs asynchronous
  • 49. REPLICATED DATA CONSISTENCY MODELS • Defined not by the interleaving of operations, but by the visibility guarantees on each replica of the data • More Client-Centric consistency models = focus on how data is seen by each client https://image.slidesharecdn.com/searchingcassandrakn-150805184146-lva1-app6891/95/solr-cassandra-searching-cassandra-with-datastax-enterprise-12-638.jpg?cb=1438800199
  • 50. „REPLICATED DATA CONSISTENCY EXPLAINED THROUGH BASEBALL” • Different participants can use different consistency guarantees http://cdn.walkthrough.vooxe.com/media/picture/3b712de48137572f3849aabd5666a4e3_640_435.jpg
  • 51. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability.
  • 52. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability. Eventual consistency • See subset (any) of previous writes.
  • 53. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability. Eventual consistency • See subset (any) of previous writes. Consistent prefix • See initial sequence of writes. • Reader sees a version of the data store that existed at the master at some time in the past.
  • 54. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability. Eventual consistency • See subset (any) of previous writes. Consistent prefix • See initial sequence of writes. • Reader sees a version of the data store that existed at the master at some time in the past. Bounded staleness • See all „old enough” writes
  • 55. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability. Eventual consistency • See subset (any) of previous writes. Consistent prefix • See initial sequence of writes. • Reader sees a version of the data store that existed at the master at some time in the past. Bounded staleness • See all „old enough” writes Monotonic read • See increasing subset of writes. • Client can read stale data, but is guaranteed to see data store that is increasingly up-to-date over time.
  • 56. CONSISTENCY GUARANTEES Read operation Consistency guarantee Set of previous writes whose results are visible to a read operation Strong consistency • See all previous writes. Linearizability. Eventual consistency • See subset (any) of previous writes. Consistent prefix • See initial sequence of writes. • Reader sees a version of the data store that existed at the master at some time in the past. Bounded staleness • See all „old enough” writes Monotonic read • See increasing subset of writes. • Client can read stale data, but is guaranteed to see data store that is increasingly up-to-date over time. Read My Writes • See all writes performed by this reader • or some more recent value written by different client.
  • 57. EXAMPLE GAME • Let’s assume that the baseball game score is kept in a Key-Value store in two objects. A score for „home” team and a score for „visitors” team. • Datastore is replicated among a number of servers
  • 58. EXAMPLE GAME • Let’s assume that the baseball game score is kept in a Key-Value store in two objects. A score for „home” team and a score for „visitors” team. • Datastore is replicated among a number of servers • Example game – sequence of writes: Write operation Score („visitors” – „home”) 0 – 0 („home”, 1) 0 – 1 („visitors”, 1) 1 – 1 („home”, 2) 1 – 2 („home”, 3) 1 – 3 („visitors”, 2) 2 – 3 („home”, 4) 2 – 4 („home”, 5) 2 – 5
  • 59. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5
  • 60. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5 Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1, 1-2, 1-3, 1-4, 1-5, 2-0, 2-1, 2-2, 2-3, 2-4, 2-5
  • 61. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5 Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1, 1-2, 1-3, 1-4, 1-5, 2-0, 2-1, 2-2, 2-3, 2-4, 2-5 Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5
  • 62. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5 Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1, 1-2, 1-3, 1-4, 1-5, 2-0, 2-1, 2-2, 2-3, 2-4, 2-5 Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5 Bounded staleness some score in the staleness window 2-4, 2-5
  • 63. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5 Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1, 1-2, 1-3, 1-4, 1-5, 2-0, 2-1, 2-2, 2-3, 2-4, 2-5 Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5 Bounded staleness some score in the staleness window 2-4, 2-5 Monotonic read after reading 1-3: 1-3, 1-4, 1-5, 2-3, 2-4, 2-5
  • 64. POSSIBLE READ RESULTS FOR EACH CONSISTENCY GUARANTEE Consistency model Possible Read results Strong consistency 2-5 Eventual consistency 0-0, 0-1, 0-2, 0-3, 0-4, 0-5, 1-0, 1-1, 1-2, 1-3, 1-4, 1-5, 2-0, 2-1, 2-2, 2-3, 2-4, 2-5 Consistent prefix 0-0, 0-1, 1-1, 1-2, 1-3, 2-3, 2-4, 2-5 Bounded staleness some score in the staleness window 2-4, 2-5 Monotonic read after reading 1-3: 1-3, 1-4, 1-5, 2-3, 2-4, 2-5 Read my writes for the writer: 2-5 for anyone other: same as in eventual cons.
  • 65. Participant Required consistency guarantee Official scorekeeper • Read My Writes • require Strong Consistency, but it’s the only writer (application specific knowledge) Umpire • Strong Consistency • most of the time doesn’t care about the score. Only close to the end can end the game earlier if the game is already won Radio reporter • Consistent Prefix & Monotonic Read • don’t have to be completely up-to-date Sportswriter • Bounded Staleness • writes article after some time. Eventual Consistency will likely return the correct score, but this gives us 100% certainty Statistician • Strong Consistency for game score • Read My Writes for staticics, because he writes those Stat watcher • Eventual Consistency • checks statistics once a day EACH BASEBALL PARTICIPANT REQUIRE DIFFERENT CONSISTENCY GUARANTEE
  • 66. „REPLICATED DATA CONSISTENCY EXPLAINED THROUGH BASEBALL” • Lessons learned: • Read Your Writes – is usually enough in most cases • Eventual Consistency – probably most systems provide more than that
  • 67. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Linearizability is correctly provided by a system implementing Consensus problem • the problem of getting a set of nodes in a distributed system to agree on something • Achieving consensus allows a distributed system to act as a single entity, with every individual node aware of and in agreement with the actions of the whole of the network
  • 68. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Linearizability is correctly provided by a system implementing Consensus problem • the problem of getting a set of nodes in a distributed system to agree on something • Achieving consensus allows a distributed system to act as a single entity, with every individual node aware of and in agreement with the actions of the whole of the network • it’s proven that in a fully asynchronous message-passing distributed system in which one process may have a halting failure, consensus in impossible (but in a very unlikely edge case scenarios)
  • 69. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • 2-Phase-Commit • simplest, most often used consensus algorithm • quite efficient compared • can blocks on Coordinator failure in some phasesto other (N nodes exchange 3*N messages) • it is very hard (in some cases impossible) to recover transaction state
  • 70. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • 2-Phase-Commit • simplest, most often used consensus algorithm • quite efficient compared • can blocks on Coordinator failure in some phasesto other (N nodes exchange 3*N messages) • it is very hard (in some cases impossible) to recover transaction state • 3-Phase-Commit • can fail in the network partition (split-brain) scenario – then in some cases one part can commit and another can abort • satisfies liveness properties - will make progress in failure cases.
  • 71. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Paxos • Provably correct in asynchronous networks that eventually become synchronous. • Does not block if a majority of participants are available (so withstands n/2 faults) • Sacrifices liveness for correctness – guaranteed termination, when the network is behaving asynchronously and terminates only when synchronicity returns.
  • 72. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Paxos • Provably correct in asynchronous networks that eventually become synchronous. • Does not block if a majority of participants are available (so withstands n/2 faults) • Sacrifices liveness for correctness – guaranteed termination, when the network is behaving asynchronously and terminates only when synchronicity returns. • Raft • Correctness and availability of the system remains guaranteed as long as a majority of the servers remain up • Easier to implement – they say…
  • 73. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Paxos • Provably correct in asynchronous networks that eventually become synchronous. • Does not block if a majority of participants are available (so withstands n/2 faults) • Sacrifices liveness for correctness – guaranteed termination, when the network is behaving asynchronously and terminates only when synchronicity returns. • Raft • Correctness and availability of the system remains guaranteed as long as a majority of the servers remain up • Easier to implement – they say… • Zab (Zookeeper) • Coordinated actions are much (10+ times) slower than not coordinated
  • 74. LINEARIZABILITY IN DISTRIBUTED SYSTEMS • Quorum Read + Quorum Write = Linearizability ? • „You sometimes see people people claiming that quorum reads and writes guarantee linearizability, but I think it would be unwise to rely on it – subtle combinations of features such as sloppy quorums and read repair can lead to tricky edge cases in which deleted data is resurrected, or the number of replicas of a value falls below the original W (violating the quorum condition), or the number of replica nodes increases above the original N (again violating the quorum condition). All of these lead to non-linearizable outcomes.” (Martin Kleppmann)
  • 75. EXAMPLES OF REAL SYSTEMS https://media.licdn.com/mpr/mpr/shrinknp_800_800/AAEAAQAAAAAAAAhUAAAAJDMyNGE1YTA5LTA3MDgtNDY3MS04ODBlLWM3Yjg3MWFmZWM0MA.jpg
  • 76. AZURE COSMOS DB • Higher-level, more application-level consistency models Consistency Description Strong • Linearizability • Guarantees that a write is only visible after it is committed durably by the majority quorum of replicas • A read is always acknowledged by the majority read quorum Bounded Staleness • Consistent Prefix. • the reads may lag behind writes by at most k versions or t time-interval. • 20% Session Consistent • Scoped to client session • Prefix. Monotonic reads, monotonic writes, read-your-writes, write- follows-reads • 73% Consistent Prefix • Updates returned are some prefix of all the updates, with no gaps. Guarantees that reads never see out of order writes. Eventual • Out of order reads They use linearizability checker, which continuously operates over our service telemetry and openly reports any consistency violations to you.
  • 77. CASSANDRA • Configurable consistency level defined by the number of replicas you require ACK from before returning to the client Cassandra consistency level Description ALL Highest consistency QUORUM • A write must be written to the commit log and memtable on a quorum of replica nodes. • Provides strong consistency if you can tolerate some level of failure. ONE • One replica THREE • Three replicas QUORUM_LOCAL • Datacenter aware versions
  • 78. CASSANDRA • To provide strong consistency: • R + W > N • N – replication factor, R – read consistency level, W – write consitency level • Conflicts resolution: Last Write Wins (based on operation timestamp) – require proper relative time synchronization • Lightweight transaction (compare and set transactions) • INSERT INTO ... VALUES ... IF … • Such queries require a read and a write and they also need to reach consensus among all the replicas – uses Paxos. • Read in those transactions uses SERIAL consistency level (similar to QUORUM) • Write uses tunable consistency level
  • 79. POSTGRESQL AND OTHER REALTIONAL DBS • MVCC (not configurable in PostgreSQL, but eg. in Ms SQL Server you can turn it on/off) • configurable Transaction isolation levels • special operations like SELECT FOR UPDATE • Replication: synchronous or asynchronous, but even in synchronous there is a time gap between master and replica
  • 80. POSTGRESQL AND OTHER REALTIONAL DBS • MVCC (not configurable in PostgreSQL, but eg. in Ms SQL Server you can turn it on/off) • configurable Transaction isolation levels • special operations like SELECT FOR UPDATE • Replication: synchronous or asynchronous, but even in synchronous there is a time gap between master and replica • PostreSQL Implementations of multi-master with asynchronous replication and various ways of resolving conflicts, but with weak global consistency model • Galeria Cluster (extended MySQL) provides multi-master based on synchronous replication, and they say it • can support transaction isolation levels up to REPEATABLE READ if proper locking techniques (SELECT FOR UPDATE) are used
  • 81. COUCHBASE • Queries – Eventually consistent • can enforce Read Your Writes with client help: you provide mutation id that you get after write, and you can provide it to read operation to force query node to get at least up-to-date with this mutatil id • Document access – Strongly consistent • Each key (hash) is assigned to a single primary node that handles all operations • After node failure, rebalancing happen, but don’t know about consistency guarantee in some edge cases • with XDCR you only get Eventual consistency.
  • 82. ZOOKEEPER • Showed as clear-cut case of choosing consistency over availability • Writes = Linearizable • Requires a majority quorum in order to reach consensus using custom Zab protocol • Reads = provide more than Sequential consistency • Each client is connected to one of the server nodes, and when you make a read, you see only the data on that node, even if there are more up-to-date writes on another node • It is possible to make linearizable reads in ZooKeeper by preceding a read with a sync command
  • 83. ETCD / CONSUL • Write = Linearizable • go through the Raft consensus process • Read = Sequential consistency • can configure Linearizable consisntecy that also go through Raft • Raft allows in some cases for a leader to believe that he's still a leader (assumes stable time window) - he can't process writes by itself but can process reads
  • 84. TIME DEPENDENCY • Linearizability – depends on global time • Multiple databases implemented conflict resolution based on time • But time synchronization is a real problem • and NTP don’t solve those problems
  • 85. CLOUD SPANNER = NEW TIME API TT.now() => [earliest, latest] Time = interval with bounded time uncertainty
  • 86. CLOUD SPANNER = NEW TIME API • multi-version, globally-distributed and synchronously-replicated database, with support for consistent distributed transactions • External consistency – stronger than Serializability – transactions commit in an order that is reflected in their commit timestamps, and these commit timestamps are "real time" so you can compare them to your watch • True Time = provide the exact global time with a high degree of accuracy • atomic clock and GPS antenas in datacenters for time sources • new API design and protocol different to NTP • TT.now() => [earliest, latest] interval with bounded time uncertainty • transaction timestamps allow system to order them often without any coordination • Read-Only transactions – strong consistency (return latest copy of data) without locking • Read-Write transactions – locking, orchestrated by Paxos leader
  • 87. CLOUD SPANNER • In particular, Spanner assigns a timestamp to all reads and writes. A transaction at timestamp T1 is guaranteed to reflect the results of all writes that happened before T1 • If a machine wants to satisfy a read at T2, it must ensure that its view of the data is up- to-date through at least T2. • Transaction gets assigned timestamp any time between all locks are acquired and any lock is released. • Spanner assigns it the timestamp that Paxos assigns to the Paxos write that represents the transaction commit. • Spanner depends on the following monotonicity invariant: within each Paxos group, Spanner assigns timestamps to Paxos writes in monotonically increasing order, even across leaders. • A single leader replica can trivially assign timestamps in monotonically increasing order. • This invariant is enforced across leaders by making use of the disjointness invariant: a leader must only assign timestamps within the interval of its leader lease.
  • 88. CLOUD SPANNER • lock-free read-only transactions • locks at row-and-column level • Blind Writes (writing data without previously reading it in the same transaction) • shared write locks • conflicts resolution based on timestamps • Reads in a transaction see everything that has been committed before the transaction commits, and writes are seen by everything that starts after the transaction is committed. • Possibly retries are needed - client libraries handle this mostly
  • 89. CONSISTENCY OF THE WHOLE • You could probably think about the consistency guaratnees of your database… • But have you ever considered the consistency model provided on different layers or by your system as a whole? https://www.confluent.io/wp-content/uploads/2016/09/Event-sourced-based-architecture.jpeg
  • 90. CONSISTENCY OF THE WHOLE • Database don’t have to provide consistency guarantees by itself • Client libraries (provided by database vendors) can be very smart • Consistency can be provided by both database and proper behavior of the clients • Clients can be topology aware, and eg. by default try to read from the first (consistent) replica that gets updates the fastest • Many scenarios are possible https://www.confluent.io/wp-content/uploads/2016/09/Event-sourced-based-architecture.jpeg
  • 91. CONSISTENCY OF THE WHOLE - CQRS EXAMPLE • Does it fail the most obvious test: Reading what You just Wrote? https://lostechies.com/jimmybogard/files/2012/08/image4.png
  • 92. CONFUSION WITH ENTANGLEMENT https://earth-chronicles.com/wp-content/uploads/2017/09/entanglement-650x459.jpg http://scienews.com/images/2017/09/ad8145a6780431ea3986b46f9e5cf79e.jpg http://wavewatching.net/wp-content/uploads/2015/05/Paris_Tuileries_Facepalm_statue.jpg
  • 93. SUMMARY • Understand the components of your system and their specific consistency guarantees
  • 94. SUMMARY • Understand the components of your system and their specific consistency guarantees • Think about the consistency your system exposes to End Users as the whole
  • 95. SUMMARY • Understand the components of your system and their specific consistency guarantees • Think about the consistency your system exposes to end users as the whole • Consistency guarantee depends on the Use-Case, not the Data you access
  • 96. SUMMARY • Understand the components of your system and their specific consistency guarantees • Think about the consistency your system exposes to end users as the whole • Consistency guarantee depends on the use-case, not the data you access • The stronger the consistency, the more likely you have to be Ready for Retires • like Optimistic locking optimize for the best case scenario
  • 97. SUMMARY • Use business knowledge to reason about consistency guarantees of the system: • maybe it is possible to shard data into groups that need stronger consistency guaratnees within, than between each other • Some data are written only by a single client? Or a group of clients that need coordination – not all the clients
  • 98. SUMMARY • Use business knowledge to reason about consistency guarantees of the system: • maybe it is possible to shard data into groups that need stronger consistency guaratnees within, than between each other • Some data are written only by a single client? Or a group of clients that need coordination – not all the clients • Consistency guarantee provided by the system is shared decision of technical and business people • many big companies created weakly consistent databases because their business valued availability higher than consistency
  • 99. HOW TO READ SOME DBS DOCUMENTATIONS? „Usually (not always), with proper configuration, for some operations, we provide (as long as we read the same definition as you reader) some consistency model”
  • 100. WHOULD YOU LIKE TO KNOW MORE? https://earth-chronicles.com/wp-content/uploads/2017/09/entanglement-650x459.jpg http://scienews.com/images/2017/09/ad8145a6780431ea3986b46f9e5cf79e.jpg http://wavewatching.net/wp-content/uploads/2015/05/Paris_Tuileries_Facepalm_statue.jpg https://qph.ec.quoracdn.net/main-qimg-ed23d9866da0882d2b994338a31dc8fa
  • 101. SOURCES • https://www.microsoft.com/en-us/research/wp-content/uploads/2011/10/ConsistencyAndBaseballReport.pdf • http://the-paper-trail.org/blog/ • https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html • http://www.bailis.org/blog/ • https://aphyr.com/posts/313-strong-consistency-models • https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf • https://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-hopeful-consistency-by- christos-kalantzis • https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html • http://galeracluster.com/ • https://www.postgresql.org/docs/current/static/transaction-iso.html • https://developer.couchbase.com/documentation/server/4.5/developer-guide/query-consistency.html • https://zookeeper.apache.org/doc/r3.4.10/zookeeperInternals.html • https://coreos.com/etcd/docs/latest/learning/api_guarantees.html • https://www.consul.io/docs/internals/consensus.html • https://cloud.google.com/spanner/docs/ • https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
  • 102. CAP THEOREM • CAP Theorem dictates that it is impossible to achieve “consistency” while remaining available in the presence of network and system partitions. • CAP uses very narrow definitions: • Consistency = linearizability • Availability • every request received by a non-failing [database] node in the system must result in a [non-error] response. • It’s not sufficient for some node to be able to handle the request – any non-failing node needs to be able to handle it • Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability. • Partition Tolerance • means that you’re communicating over an asynchronous network that may delay or drop messages • you don’t really have any choice
  • 104. CONSISTENCY VS AVAILABILITY Can’t be fully Available Sticky Availability Achievable if we relax our notion of availability – client nodes must always talk to the same server Total Availability

Editor's Notes

  • #3: Będziemy mówić o wielu rzeczach, bo to się wszystko wiąże. Replikacja danych. CAP Theorem, Availability, ... Także wytrwajcie... Myślę że temat jest za długi, a ja za słaby że przedstawić wszystko jasno w 30min, ale mam nadzieję że rozbudzę w was ciekawość żeby się samemu zainteresować. Nie ma szans żeby wszystko omówić i wyjaśnić w tak krótkim czasie, więc mam zamiar tylko pokazać zajawkę i mieć nadzieję że się zainteresujecie. Nie mam zamieru wszystkiego wyjaśnić - dodatkowo jestem za cienki, żeby pewne rzeczy wyjaśnić dość prosto, żeby wam je przekazać. Będę używał pewnych pojęć których tutaj nie tłumaczę zakłądając że wiecie o co chodzi – wystarczy intuicja. W razie czego na koniec spróbuję wytłumaczyć – ponieważ robię to pierwszy raz i nie wiem ile mi zejdzie czasu… Pomidory rozdają moi pomocnicy... Temat trochę teoretyczny, ale niektórze rzeczy zainspirowałoy mnie prywatnie do praktycznych przemyśleń, więc warto trochę rzucić okiem
  • #14: To co zapewnia nam zmianę możliwych historii (przeplotów operacji) to poziom izolacji. Tym sterujemy. Widzicie niefortunność i wieloznaczność nazw…
  • #15: To co zapewnia nam zmianę możliwych historii (przeplotów operacji) to poziom izolacji. Tym sterujemy. Widzicie niefortunność i wieloznaczność nazw…
  • #16: O wielu nawet nie wspominam.
  • #17: O wielu nawet nie wspominam.
  • #18: O wielu nawet nie wspominam.
  • #19: Przykład hierarchii która jak się przyjrzymy zawiera jakby nie związane modele. Przejrzymy kilka ze szczególnym uwzględnieniem tych najbardziej praktycznych. RR = Repeatable Read SI = Snaphot Isolation CS = Curson stability RC = Read Commited WFR = Writes Following Reads MR = Monotonic Read MW = Monotonic Write PRAM (FIFO Consistency) RYW = Read Your Writes FIFO consistency: writes from a process are seen by others in the same order. Writes from different processes may be seen in different order (even if causally related) Relaxes causal consistency Simple implementation: tag each write by (Proc ID, seq #)
  • #21: Arbitraly - bezwzględny
  • #22: Arbitraly - bezwzględny
  • #23: Arbitraly - bezwzględny
  • #35: Arbitraly - bezwzględny
  • #36: Arbitraly - bezwzględny
  • #37: Arbitraly - bezwzględny
  • #38: Arbitraly - bezwzględny
  • #39: Arbitraly - bezwzględny
  • #40: Arbitraly - bezwzględny
  • #41: Arbitraly - bezwzględny
  • #42: Arbitraly - bezwzględny
  • #43: Arbitraly - bezwzględny
  • #44: Arbitraly - bezwzględny
  • #45: Arbitraly - bezwzględny
  • #46: Arbitraly - bezwzględny
  • #47: Arbitraly - bezwzględny
  • #50: Arbitraly - bezwzględny
  • #51: Arbitraly - bezwzględny
  • #52: Last 4 operation – none is stronger than the other
  • #53: Last 4 operation – none is stronger than the other
  • #54: Last 4 operation – none is stronger than the other
  • #55: Last 4 operation – none is stronger than the other
  • #56: Last 4 operation – none is stronger than the other
  • #57: Last 4 operation – none is stronger than the other
  • #58: Arbitraly - bezwzględny
  • #59: Arbitraly - bezwzględny
  • #60: Arbitraly - bezwzględny
  • #61: Arbitraly - bezwzględny
  • #62: Arbitraly - bezwzględny
  • #63: Arbitraly - bezwzględny
  • #64: Arbitraly - bezwzględny
  • #65: Arbitraly - bezwzględny
  • #66: Arbitraly - bezwzględny
  • #67: Arbitraly - bezwzględny
  • #68: Arbitraly - bezwzględny
  • #69: Arbitraly - bezwzględny
  • #70: Arbitraly - bezwzględny
  • #71: Arbitraly - bezwzględny
  • #72: Arbitraly - bezwzględny
  • #73: Arbitraly - bezwzględny
  • #74: Arbitraly - bezwzględny
  • #75: Arbitraly - bezwzględny
  • #76: Arbitraly - bezwzględny
  • #77: Arbitraly - bezwzględny
  • #78: Arbitraly - bezwzględny
  • #79: Arbitraly - bezwzględny
  • #80: Arbitraly - bezwzględny
  • #81: Arbitraly - bezwzględny
  • #82: Arbitraly - bezwzględny
  • #83: Arbitraly - bezwzględny
  • #84: Arbitraly - bezwzględny
  • #85: Arbitraly - bezwzględny
  • #86: Arbitraly - bezwzględny
  • #87: Arbitraly - bezwzględny
  • #88: Arbitraly - bezwzględny
  • #89: Arbitraly - bezwzględny
  • #90: Arbitraly - bezwzględny
  • #91: Arbitraly - bezwzględny
  • #92: Arbitraly - bezwzględny
  • #93: Pomieszanie z poplątaniem
  • #94: Przykład z Timelinem
  • #95: Przykład z Timelinem
  • #96: Przykład z Timelinem
  • #97: Przykład z Timelinem
  • #98: Przykład z Timelinem
  • #99: Przykład z Timelinem
  • #100: Przykład z Timelinem
  • #101: Pomieszanie z poplątaniem
  • #102: Przykład z Timelinem
  • #103: Przykład z Timelinem
  • #104: Przykład z Timelinem