SlideShare a Scribd company logo
REPLICATION
IN THE WILD
Ensar Basri Kahveci
Hello! Ensar Basri Kahveci
Distributed Systems Engineer @ Hazelcast
Ph.D. Candidate @ Bilkent CS
twitter: metanet | github: metanet | linkedin: basrikahveci
- IMDG: storage + computation + messaging
- Open source, distributed, highly available, elastic,
scalable
- Distributed Java collections, JCache, HD store
- Embedded or client - server deployment
- Clients: Java, Scala, C++, C#, Python, Node.js, ...
- Integration modules
- https://blog.hazelcast.com/announcing-hazel
cast-imdg-3-8
- Rolling upgrades
- User code deployment
- Hot restart improvements
- WAN replication improvements
REPLICATION
- Putting a data set into
multiple nodes.
- Each replica has a full copy.
- A few reasons for replication:
- Performance
- Availability and fault tolerance
REPLICATION + PARTITIONING
- Mostly used with
partitioning.
- Two partitions: P1, P2
- Two replicas for each
partition.
NOTHING FOR FREE!
- Very easy to do when the data is immutable.
- Problems start when we have multiple copies
of the data and we want to update them.
- Two main difficulties:
- Handling updates,
- Handling failures.
CAP PrIncIple
- Proposed by Eric Brewer in 2000 [13],
- Proved by Gilbert and Lynch in 2002 [14].
- A shared-data system cannot achieve perfect
consistency and availability in the presence of
partitions CP vs AP.
- Widespread acceptance, and yet a lot of criticism
[15 - 21].
consIstency AND avaIlabIlIty
- Degrees of consistency:
- Data centric, client centric
- Degrees of availability:
- High availability, sticky
availability, non-availability
- Replication is directly
related to C and A. [25]
The dangers of replIcatIon and a solutIon
- Gray et al. [1] classify replication models by 2
parameters:
- Where to make updates: primary copy or update
anywhere
- When to make updates: eagerly or lazily
WHERE: PRIMARY COPY
- There is a single replica
managing the updates.
- Concurrency control is easy.
- No conflicts and no conflict-handling logic.
- Updates are executed on the primary and
secondaries with the same order.
- When primary fails, a new primary is elected.
- Ensuring a single and good primary is hard.
WHERE: UPDATE ANYWHERE
- Each replica can initiate a
transaction to make an update.
- Complex concurrency control.
- Deadlocks or conflicts are
possible.
- In practice, there is also
multi-leader.
WHEN: EAGER REPLICATION
- Synchronously updates all
replicas as part of one atomic
transaction.
- Provides strong consistency.
- Degree of availability can
degrade on node failures.
- Consensus algorithms.
WHEN: LAZY REPLICATION
- Updates each replica with a
separate transaction.
- Updates can execute quite fast.
- Degree of availability is high.
- Eventual consistency.
- Data copies can diverge.
- Data loss or conflicts can occur.
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
strong consistency
simple concurrency
slow
inflexible
strong consistency
complex concurrency
slow
expensive
deadlocks
LAZY
fast
sticky availability
eventual consistency
simple concurrency
inconsistency
fast
flexible
high availability
eventual consistency
inconsistency
conflicts
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
Multi Paxos [5]
etcd and Consul (RAFT) [6]
Zookeeper (Zab) [7]
Kafka
VoltDB [24]
Paxos [5]
Hazelcast Cluster State Change [12]
MySQL 5.7 Group Replication [23]
LAZY
Hazelcast
MongoDB
ElasticSearch
Redis
Dynamo [4]
Cassandra
Riak
Hazelcast Active-Active WAN
Replication [22]
PRIMARY COPY + EAGER REPLICATION
- When the primary fails, secondaries are
guaranteed to be up to date.
- Raft, Kafka etc.
- Majority approach can be used.
- In Kafka, in-sync-replica set [11] is maintained.
- Secondaries can be used for reads.
UPDATE ANYWHERE + EAGER REPLICATION
- Each replica can initiate a new transaction.
- Concurrent transactions can compete with
each other.
- Possibility of deadlocks.
- In the basic Paxos algorithm, there is no
designated leader.
PRIMARy COPY + LAZY REPLICATION
- The primary copy can execute updates fast.
- Secondaries can fall behind the primary. It is
called replication lag.
- It can lead to data loss during leader failover, but
still no conflicts.
- Implies sticky availability.
- Secondaries can be used for reads.
UPDATE ANYWHERE + LAZY REPLICATION
- Dynamo-style [4] highly available databases.
- Quorums.
- Concurrent updates are first-class citizens.
- Possibility of conflicts
- Avoiding, discarding, detecting & resolving conflicts
- Eventual convergence
- Write repair, read repair and anti-entropy
QUORUMS
- W + R > N
- W = 3, R = 1, N = 3
- W = 2, R = 2, N = 3
- If W or R is not met
- Sloppy quorums and
hinted handoff
ConflIct-free replIcated data types (CRDTS)
- Special data types that achieve strong
eventual consistency and monotonicity [2]
- No conflicts
- Merge function has 3 properties:
- Commutative: A+B=B+A
- Associative: A+(B+C)=(A+B)+C
- Idempotent: f(f(x))=f(x)
- Riak Data Types [3]
DISCARDING CONFLICTS: LAST WRITE WINS
- When 2 updates are concurrent, define an
arbitrary order among them.
- i.e., pretend that one of them is more recent.
- Attach a timestamp to each write.
- Cassandra uses physical timestamps [8], [9].
DETECTING CONFLICTS: VECTOR CLOCKS
- In Dynamo paper [4], each update is done
against a particular version of a data entry.
- Multiple versions of a data entry can exist together.
- Vector clocks [10] are used to track causality.
- The system can determine the authoritative version:
syntactic reconciliation
- The system cannot reconcile multiple versions:
semantic reconciliation
VECTOR CLOCKS
ResolvIng conflIcts and EVENTUAL CONVERGENCE
- Write repair
- Read repair
- Anti-entropy
- Merkle trees
Recap
- We apply replication to make distributed
systems performant, available and fault
tolerant.
- It suffers from core problems of distributed systems.
- Various replication protocols are built based
on when and where to make updates.
- No silver bullet. It is a world of trade-offs.
- We are hiring!
- Senior Java Developer
http://stackoverflow.com/jobs/129435/senior-java-developer-hazelcast
- Software Quality and Performance Wiz
http://stackoverflow.com/jobs/126077/software-quality-and-performance-wiz
ard-hazelcast
- Solution Architech
http://stackoverflow.com/jobs/131938/solutions-architect-hazelcast
REFerences
[1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182.
[2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011.
[3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/
[4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220.
[5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.
[6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014.
[7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010.
[8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks
[9] https://aphyr.com/posts/299-the-trouble-with-timestamps
[10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56.
[11] http://kafka.apache.org/documentation.html#replication
[12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states
[13] E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp. Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10
[14] https://codahale.com/you-cant-sacrifice-partition-tolerance/
[15] http://blog.nahurst.com/visual-guide-to-nosql-systems
[16] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
[17] https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
[18] https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
[19] Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." Acm Sigact News 33.2 (2002): 51-59.
[20] https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
[21] https://henryr.github.io/cap-faq/
[22] http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#wan-replication
[23] https://dev.mysql.com/doc/refman/5.7/en/group-replication.html
[24] https://www.voltdb.com/architecture
[25] Bailis, Peter, et al. "Highly available transactions: Virtues and limitations." Proceedings of the VLDB Endowment 7.3 (2013): 181-192.
THANKS!Any questions?

More Related Content

ODP
MySQL Group Replication
PDF
DIY: A distributed database cluster, or: MySQL Cluster
ODP
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
ODP
Vote NO for MySQL
ODP
MySQL 5.7 clustering: The developer perspective
ODP
MySQL 5.6 Global Transaction Identifier - Use case: Failover
ODP
The mysqlnd replication and load balancing plugin
PDF
HTTP Plugin for MySQL!
MySQL Group Replication
DIY: A distributed database cluster, or: MySQL Cluster
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
Vote NO for MySQL
MySQL 5.7 clustering: The developer perspective
MySQL 5.6 Global Transaction Identifier - Use case: Failover
The mysqlnd replication and load balancing plugin
HTTP Plugin for MySQL!

What's hot (20)

ODP
PoC: Using a Group Communication System to improve MySQL Replication HA
ODP
Built-in query caching for all PHP MySQL extensions/APIs
ODP
NoSQL in MySQL
ODP
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
ODP
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
KEY
Intro to PECL/mysqlnd_ms (4/7/2011)
PDF
Webinar slides: ClusterControl 1.4: The MySQL Replication & MongoDB Edition -...
PDF
Webinar slides: Managing MySQL Replication for High Availability
PDF
Become a MySQL DBA - Webinars - Schema Changes for MySQL Replication & Galera...
ODP
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
PPTX
MySQL Multi Master Replication
ODP
PHP mysqlnd connection multiplexing plugin
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
PDF
Client Drivers and Cassandra, the Right Way
PPT
Mysql high availability and scalability
PDF
Introduction to failover clustering with sql server
PDF
Reducing Risk When Upgrading MySQL
PDF
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
PDF
MySQL Database Architectures - 2020-10
PDF
Online MySQL Backups with Percona XtraBackup
PoC: Using a Group Communication System to improve MySQL Replication HA
Built-in query caching for all PHP MySQL extensions/APIs
NoSQL in MySQL
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
Intro to PECL/mysqlnd_ms (4/7/2011)
Webinar slides: ClusterControl 1.4: The MySQL Replication & MongoDB Edition -...
Webinar slides: Managing MySQL Replication for High Availability
Become a MySQL DBA - Webinars - Schema Changes for MySQL Replication & Galera...
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL Multi Master Replication
PHP mysqlnd connection multiplexing plugin
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Client Drivers and Cassandra, the Right Way
Mysql high availability and scalability
Introduction to failover clustering with sql server
Reducing Risk When Upgrading MySQL
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL Database Architectures - 2020-10
Online MySQL Backups with Percona XtraBackup
Ad

Viewers also liked (20)

PPTX
Fikir küpü 2014 4.çeyrek bülteni
PDF
2. Etap İhale İlanı
DOCX
Teknopark İstanbul ofi̇s, i̇ş yeri̇, dekorasyon ve tadi̇lat kılavuzu
PDF
Teknopark Istanbul Güncel Servis Güzergahları
PDF
Sucool Tİ Start Up Marketing Roadmap
PDF
Teknopark İstanbul Güncel Yenek Menüsü - Mayıs 2016
DOCX
Teknopark İstanbul Otopark Prosedürü (İdare Binası ve Kuluçka Merkezi)
PPTX
Fikir küpü 2015 1. çeyrek bülteni
PPTX
Fikir Küpü Mezuniyet Töreni Sunumu
PDF
Visual Design with Data
PDF
3 Things Every Sales Team Needs to Be Thinking About in 2017
PDF
Client-centric Consistency Models
PDF
How to Become a Thought Leader in Your Niche
PDF
MySQL 5.7 Replication News
PDF
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
PPTX
Discovering MySQL 5.7 @ InstantPost
PDF
New awesome features in MySQL 5.7
PDF
International Journal of Optical Sciences (Vol 2 Issue 2)
PDF
Código de Classificacao de Documentos - Correios
PPTX
Reglamento interno santiago mariño
Fikir küpü 2014 4.çeyrek bülteni
2. Etap İhale İlanı
Teknopark İstanbul ofi̇s, i̇ş yeri̇, dekorasyon ve tadi̇lat kılavuzu
Teknopark Istanbul Güncel Servis Güzergahları
Sucool Tİ Start Up Marketing Roadmap
Teknopark İstanbul Güncel Yenek Menüsü - Mayıs 2016
Teknopark İstanbul Otopark Prosedürü (İdare Binası ve Kuluçka Merkezi)
Fikir küpü 2015 1. çeyrek bülteni
Fikir Küpü Mezuniyet Töreni Sunumu
Visual Design with Data
3 Things Every Sales Team Needs to Be Thinking About in 2017
Client-centric Consistency Models
How to Become a Thought Leader in Your Niche
MySQL 5.7 Replication News
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Discovering MySQL 5.7 @ InstantPost
New awesome features in MySQL 5.7
International Journal of Optical Sciences (Vol 2 Issue 2)
Código de Classificacao de Documentos - Correios
Reglamento interno santiago mariño
Ad

Similar to Replication in the wild ankara cloud meetup - feb 2017 (20)

PDF
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
PDF
Replication in the Wild
PPT
Dynamo.ppt
PPT
Dynamo.ppt
PDF
Consistency Models in New Generation Databases
PDF
Consistency-New-Generation-Databases
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PDF
Thoughts on Transaction and Consistency Models
PDF
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
PDF
Designing large scale distributed systems
PDF
From Mainframe to Microservice: An Introduction to Distributed Systems
PDF
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
ODP
Distributed Systems
PPT
Handling Data in Mega Scale Web Systems
PPTX
Chapter Introductionn to distributed system .pptx
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
PDF
Distributed Algorithms
PDF
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
PDF
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
PPTX
Storing the real world data
Replication in the Wild - Warsaw Cloud Native Meetup - May 2017
Replication in the Wild
Dynamo.ppt
Dynamo.ppt
Consistency Models in New Generation Databases
Consistency-New-Generation-Databases
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Thoughts on Transaction and Consistency Models
Distributed System explained (with NodeJS) - Bruno Bossola - Codemotion Milan...
Designing large scale distributed systems
From Mainframe to Microservice: An Introduction to Distributed Systems
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems
Handling Data in Mega Scale Web Systems
Chapter Introductionn to distributed system .pptx
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Distributed Algorithms
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Storing the real world data

More from AnkaraCloud (6)

PDF
Kubernetes Nedir?
PDF
Apache Kafka Nedir?
PPTX
Nokta techpresentation
PPTX
Designing a Reliable Software Factory for the Cloud
PPTX
Dev ops culture and practices
PPTX
Introduction to Amazon Web Services
Kubernetes Nedir?
Apache Kafka Nedir?
Nokta techpresentation
Designing a Reliable Software Factory for the Cloud
Dev ops culture and practices
Introduction to Amazon Web Services

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Replication in the wild ankara cloud meetup - feb 2017

  • 2. Hello! Ensar Basri Kahveci Distributed Systems Engineer @ Hazelcast Ph.D. Candidate @ Bilkent CS twitter: metanet | github: metanet | linkedin: basrikahveci
  • 3. - IMDG: storage + computation + messaging - Open source, distributed, highly available, elastic, scalable - Distributed Java collections, JCache, HD store - Embedded or client - server deployment - Clients: Java, Scala, C++, C#, Python, Node.js, ... - Integration modules
  • 4. - https://blog.hazelcast.com/announcing-hazel cast-imdg-3-8 - Rolling upgrades - User code deployment - Hot restart improvements - WAN replication improvements
  • 5. REPLICATION - Putting a data set into multiple nodes. - Each replica has a full copy. - A few reasons for replication: - Performance - Availability and fault tolerance
  • 6. REPLICATION + PARTITIONING - Mostly used with partitioning. - Two partitions: P1, P2 - Two replicas for each partition.
  • 7. NOTHING FOR FREE! - Very easy to do when the data is immutable. - Problems start when we have multiple copies of the data and we want to update them. - Two main difficulties: - Handling updates, - Handling failures.
  • 8. CAP PrIncIple - Proposed by Eric Brewer in 2000 [13], - Proved by Gilbert and Lynch in 2002 [14]. - A shared-data system cannot achieve perfect consistency and availability in the presence of partitions CP vs AP. - Widespread acceptance, and yet a lot of criticism [15 - 21].
  • 9. consIstency AND avaIlabIlIty - Degrees of consistency: - Data centric, client centric - Degrees of availability: - High availability, sticky availability, non-availability - Replication is directly related to C and A. [25]
  • 10. The dangers of replIcatIon and a solutIon - Gray et al. [1] classify replication models by 2 parameters: - Where to make updates: primary copy or update anywhere - When to make updates: eagerly or lazily
  • 11. WHERE: PRIMARY COPY - There is a single replica managing the updates. - Concurrency control is easy. - No conflicts and no conflict-handling logic. - Updates are executed on the primary and secondaries with the same order. - When primary fails, a new primary is elected. - Ensuring a single and good primary is hard.
  • 12. WHERE: UPDATE ANYWHERE - Each replica can initiate a transaction to make an update. - Complex concurrency control. - Deadlocks or conflicts are possible. - In practice, there is also multi-leader.
  • 13. WHEN: EAGER REPLICATION - Synchronously updates all replicas as part of one atomic transaction. - Provides strong consistency. - Degree of availability can degrade on node failures. - Consensus algorithms.
  • 14. WHEN: LAZY REPLICATION - Updates each replica with a separate transaction. - Updates can execute quite fast. - Degree of availability is high. - Eventual consistency. - Data copies can diverge. - Data loss or conflicts can occur.
  • 15. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER strong consistency simple concurrency slow inflexible strong consistency complex concurrency slow expensive deadlocks LAZY fast sticky availability eventual consistency simple concurrency inconsistency fast flexible high availability eventual consistency inconsistency conflicts
  • 16. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER Multi Paxos [5] etcd and Consul (RAFT) [6] Zookeeper (Zab) [7] Kafka VoltDB [24] Paxos [5] Hazelcast Cluster State Change [12] MySQL 5.7 Group Replication [23] LAZY Hazelcast MongoDB ElasticSearch Redis Dynamo [4] Cassandra Riak Hazelcast Active-Active WAN Replication [22]
  • 17. PRIMARY COPY + EAGER REPLICATION - When the primary fails, secondaries are guaranteed to be up to date. - Raft, Kafka etc. - Majority approach can be used. - In Kafka, in-sync-replica set [11] is maintained. - Secondaries can be used for reads.
  • 18. UPDATE ANYWHERE + EAGER REPLICATION - Each replica can initiate a new transaction. - Concurrent transactions can compete with each other. - Possibility of deadlocks. - In the basic Paxos algorithm, there is no designated leader.
  • 19. PRIMARy COPY + LAZY REPLICATION - The primary copy can execute updates fast. - Secondaries can fall behind the primary. It is called replication lag. - It can lead to data loss during leader failover, but still no conflicts. - Implies sticky availability. - Secondaries can be used for reads.
  • 20. UPDATE ANYWHERE + LAZY REPLICATION - Dynamo-style [4] highly available databases. - Quorums. - Concurrent updates are first-class citizens. - Possibility of conflicts - Avoiding, discarding, detecting & resolving conflicts - Eventual convergence - Write repair, read repair and anti-entropy
  • 21. QUORUMS - W + R > N - W = 3, R = 1, N = 3 - W = 2, R = 2, N = 3 - If W or R is not met - Sloppy quorums and hinted handoff
  • 22. ConflIct-free replIcated data types (CRDTS) - Special data types that achieve strong eventual consistency and monotonicity [2] - No conflicts - Merge function has 3 properties: - Commutative: A+B=B+A - Associative: A+(B+C)=(A+B)+C - Idempotent: f(f(x))=f(x) - Riak Data Types [3]
  • 23. DISCARDING CONFLICTS: LAST WRITE WINS - When 2 updates are concurrent, define an arbitrary order among them. - i.e., pretend that one of them is more recent. - Attach a timestamp to each write. - Cassandra uses physical timestamps [8], [9].
  • 24. DETECTING CONFLICTS: VECTOR CLOCKS - In Dynamo paper [4], each update is done against a particular version of a data entry. - Multiple versions of a data entry can exist together. - Vector clocks [10] are used to track causality. - The system can determine the authoritative version: syntactic reconciliation - The system cannot reconcile multiple versions: semantic reconciliation
  • 26. ResolvIng conflIcts and EVENTUAL CONVERGENCE - Write repair - Read repair - Anti-entropy - Merkle trees
  • 27. Recap - We apply replication to make distributed systems performant, available and fault tolerant. - It suffers from core problems of distributed systems. - Various replication protocols are built based on when and where to make updates. - No silver bullet. It is a world of trade-offs.
  • 28. - We are hiring! - Senior Java Developer http://stackoverflow.com/jobs/129435/senior-java-developer-hazelcast - Software Quality and Performance Wiz http://stackoverflow.com/jobs/126077/software-quality-and-performance-wiz ard-hazelcast - Solution Architech http://stackoverflow.com/jobs/131938/solutions-architect-hazelcast
  • 29. REFerences [1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182. [2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011. [3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/ [4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220. [5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25. [6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014. [7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010. [8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks [9] https://aphyr.com/posts/299-the-trouble-with-timestamps [10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56. [11] http://kafka.apache.org/documentation.html#replication [12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states [13] E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp. Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10 [14] https://codahale.com/you-cant-sacrifice-partition-tolerance/ [15] http://blog.nahurst.com/visual-guide-to-nosql-systems [16] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html [17] https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/ [18] https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed [19] Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services." Acm Sigact News 33.2 (2002): 51-59. [20] https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html [21] https://henryr.github.io/cap-faq/ [22] http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#wan-replication [23] https://dev.mysql.com/doc/refman/5.7/en/group-replication.html [24] https://www.voltdb.com/architecture [25] Bailis, Peter, et al. "Highly available transactions: Virtues and limitations." Proceedings of the VLDB Endowment 7.3 (2013): 181-192.