SlideShare a Scribd company logo
Double Sync Replication
——Enhancing Data Durability
Lixun Peng @ Alibaba Cloud Compute
About me
• Name: Lixun Peng
• Location: Hangzhou, China
• Occupation: Staff Database Kernel Engineer @ Alibaba Cloud
• Interests: MySQL Replication & InnoDB
• Experience:
In the first, I worked as a DBA
Then, I began to modify code, in order to better use
Gradually I became a MySQL Kernel Engineer
Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• How to use Double-Sync
• Several cases
Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
Problem of Async Replication
• Master doesn’t have to wait ACK from Slave.
• Slave doesn’t know if it dumps the latest binary logs.
• When Master crashes, slave can’t tell if it catches up Master.
• The major problem is slave doesn’t know master’s status.
Semi-Sync Replication
Semi-Sync will wait for
the ACK from Slave
Problem of SemiSync
• Master has to wait ACK from slave.
• Slave will downgrade to async when timeout happens.
• If timeout setting is too small, timeout happens too often.
• If timeout setting is too big, master blocks a lot.
• Slave dump binary logs generated during timeout
asynchronously, after it recover from network failure.
• If Master crashes, slave doesn’t know how replication works
(Async or SemiSync).
• In this case, slave still doesn’t know if it dumps the latest
binary logs.
• Conclusion is SemiSync doesn’t solve the major problem .
Problem of Async/SemiSync
Flow Chart (Async/Semi-Sync)
Background & Target
• Background
• SA team guarantee the server availability: 99.999%
• Net Ops team guarantee the network availability: 99.999%
• Assuming master and network doesn’t fail at the same time.
• Target
• Slave knows if it catch up master.
• Slave knows how data in master side it doesn’t have.
• Key Point: Clarify Slave's status!
Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
Solve the weak point of SemiSync
• Even network recover after failure, slave still has to dump the
binary logs generated during timeout asynchronously.
• If timeout happens and slave gives up the binary logs generated
during timeout, what will happen afterwards if master only send the
latest position & logs?
• When network is down, slave always knows the latest position.
• Slave can know if its data is the same with Master or not.
• How to catch up data modification when network is down?
• Async replication can still dump binary logs
• So we can use Async replication to do a full log apply.
Combine the Async and SemiSync
• Async Replication (Async Channel)
• Dumping continuous binary logs from master.
• Applying logs immediately after slave receives them.
• SemiSync Replication(Sync Channel)
• Dumping the latest binary logs and position.
• Not applying logs immediately. Expired logs are being purged
automatically.
• Analyzing Consistency
• Comparing logs and position from two channels.
Combine the Async and SemiSync
Flow Chart (Double Sync)
Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
How to create two channels(1)
• Multi-Source replication enables N channels in one slave.
• Problem: when master received two dump requests from the
same server-id servers, it disconnects the previous one.
• Solution: set up special Server-ID (0xFFFFFF) for Sync Channel.
How to create two channels (2)
• Problem: there are a SemiSync and a non-SemiSync Channel
in one slave, but the SemiSync settings are global.
• Solution: move SemiSyncSlave class to Master_info.
Analyzing consistency
• Using the GTID
• Using the Log_file_name and Log_file_pos
• Learn the process by checking the following pictures J
Analyzing consistency
ß Needn’t Repair, Just use it!
ß Can’t Repair, Will lose something
ß Can Repair, Use it after repair
Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
CASE 1: Needn’t Fix
• The GTID between Sync and Async Channel are the same.
CASE 2: Can’t Fix
• Exists broken gap between Sync and Async Channel.
CASE 3: Can Repair
• Combine two channel’s logs to make logs continuous.
How to Repair
• Slave waits for the Async Channel to apply all the logs it
receives, then start the SQL THREAD of Sync Channel.
• GTID filters the events which have been applied by Async
Channel.
• A REPAIR SLAVE command is provided to do things
automatically.
FAQs (1)
• Q1: Will Alibaba release this feature?
• A1: Of course! Alibaba will release all the patches.
• Q2: When Alibaba release the source codes?
• A2: Check AliSQL’s roadmap.
• Q3: How can I access AliSQL’s source codes?
• A3: https://github.com/alibaba/AliSQL Currently the project is
private. If you want to access it, please email me to provide
your GitHub account.
FAQs (2)
• Q4: What’s the difference between 2 Semi-Sync Slaves and
double sync replication?
• A4: In fact they do the same job. Performance is pretty much
the same too. But double sync replication saves one more
slave than 2 Semi-Sync Slaves architecture. When the number
of MySQL servers grows, it will save lots of money.
Any other Questions?
penglixun@gmail.com

More Related Content

PDF
Time Machine
PPTX
MySqL Failover by Weatherly Cloud Computing USA
PPTX
Creating SaltStack State data with Pyobjects
PPTX
Building flexible ETL pipelines with Apache Camel on Quarkus
PPTX
Building big data pipelines with Kafka and Kubernetes
PDF
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
PPTX
Integrating microservices with apache camel on kubernetes
PPTX
SaltConf2015: SaltStack at Scale Automating Your Automation
Time Machine
MySqL Failover by Weatherly Cloud Computing USA
Creating SaltStack State data with Pyobjects
Building flexible ETL pipelines with Apache Camel on Quarkus
Building big data pipelines with Kafka and Kubernetes
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Integrating microservices with apache camel on kubernetes
SaltConf2015: SaltStack at Scale Automating Your Automation

What's hot (20)

PDF
What's new with Apache Camel 3? | DevNation Tech Talk
PDF
Multi-master, multi-region MySQL deployment in Amazon AWS
PPTX
Building a derived data store using Kafka
PPTX
Apache Curator: Past, Present and Future
PPTX
Using Apache Camel as AKKA
ODP
Developing Microservices with Apache Camel
PPTX
Real time dashboards with Kafka and Druid
PDF
High Availability in GCE
PDF
AWS multi-region DB design and deployment
PPTX
Apache development with GitHub and Travis CI
PDF
What's new in MySQL 5.5? FOSDEM 2011
PDF
MySQL 5.5 Replication Enhancements – An Overview (FOSDEM 2011)
PPTX
Zero Downtime with OSGi - Chicago Coder Conference 05-15-2015
PDF
Apache Camel v3, Camel K and Camel Quarkus
PDF
Akka and AngularJS – Reactive Applications in Practice
PPTX
Serverless integration with Knative and Apache Camel on Kubernetes
PDF
Percona Live 2014 - Scaling MySQL in AWS
PDF
Empowering developers to deploy their own data stores
PDF
#sitNL presentation sap teched berlin v3
PPTX
MySQL Multi-Master Replication
What's new with Apache Camel 3? | DevNation Tech Talk
Multi-master, multi-region MySQL deployment in Amazon AWS
Building a derived data store using Kafka
Apache Curator: Past, Present and Future
Using Apache Camel as AKKA
Developing Microservices with Apache Camel
Real time dashboards with Kafka and Druid
High Availability in GCE
AWS multi-region DB design and deployment
Apache development with GitHub and Travis CI
What's new in MySQL 5.5? FOSDEM 2011
MySQL 5.5 Replication Enhancements – An Overview (FOSDEM 2011)
Zero Downtime with OSGi - Chicago Coder Conference 05-15-2015
Apache Camel v3, Camel K and Camel Quarkus
Akka and AngularJS – Reactive Applications in Practice
Serverless integration with Knative and Apache Camel on Kubernetes
Percona Live 2014 - Scaling MySQL in AWS
Empowering developers to deploy their own data stores
#sitNL presentation sap teched berlin v3
MySQL Multi-Master Replication
Ad

Similar to Double Sync Replication (20)

PPTX
Alibaba patches in MariaDB
PPTX
MariaDB High Availability
PDF
Choosing the right high availability strategy
PDF
Choosing the right high availability strategy
PDF
Best Practice for Achieving High Availability in MariaDB
PPTX
module-3-chapter-3-replication-san1.pptx
PDF
Distribute Storage System May-2014
PDF
Buytaert kris my_sql-pacemaker
PPTX
Sql 2012 always on
PDF
Mysql 5.5 and 5.6 replication
PDF
From frustration to fascination: dissecting Replication
PPTX
MySQL - Scale Out @ CloudParty 2013 Milano Talent Garden
PPT
Fundamentals Of Transaction Systems - Part 3: Relativity shatters the Classic...
PPT
Replication.ppt
PPT
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
PDF
Webinar: From Frustration to Fascination: Dissecting Replication
PDF
M|18 Choosing the Right High Availability Strategy for You
ODP
MySQL HA with PaceMaker
PDF
MariaDB High Availability Webinar
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Alibaba patches in MariaDB
MariaDB High Availability
Choosing the right high availability strategy
Choosing the right high availability strategy
Best Practice for Achieving High Availability in MariaDB
module-3-chapter-3-replication-san1.pptx
Distribute Storage System May-2014
Buytaert kris my_sql-pacemaker
Sql 2012 always on
Mysql 5.5 and 5.6 replication
From frustration to fascination: dissecting Replication
MySQL - Scale Out @ CloudParty 2013 Milano Talent Garden
Fundamentals Of Transaction Systems - Part 3: Relativity shatters the Classic...
Replication.ppt
Magento Imagine 2015 - Aspirin For Your MySQL Headaches
Webinar: From Frustration to Fascination: Dissecting Replication
M|18 Choosing the Right High Availability Strategy for You
MySQL HA with PaceMaker
MariaDB High Availability Webinar
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ad

More from Lixun Peng (20)

PPTX
MySQL新技术探索与实践
PDF
阿里云RDS for MySQL的若干优化
PDF
DoubleBinlog方案
PDF
MySQL优化、新特性和新架构 彭立勋
PDF
对MySQL应用的一些总结
PPTX
对MySQL的一些改进想法和实现
PDF
MySQL多机房容灾设计(with Multi-Master)
PDF
Performance of fractal tree databases
PPT
MySQL新技术探索与实践
PPT
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
PPT
MySQL源码分析.02.Handler API
PPT
MySQL源码分析.01.代码结构与基本流程
PPT
内部MySQL培训.3.基本原理
PPT
内部MySQL培训.2.高级应用
PPT
内部MySQL培训.1.基础技能
PDF
对简易几何机械化证明的进一步研究
PDF
A binary graphics recognition algorithm based on fitting function
DOC
一种基于拟合函数的图形识别算法
PDF
中文分词算法设计
PPT
Database.Cache&Buffer&Lock
MySQL新技术探索与实践
阿里云RDS for MySQL的若干优化
DoubleBinlog方案
MySQL优化、新特性和新架构 彭立勋
对MySQL应用的一些总结
对MySQL的一些改进想法和实现
MySQL多机房容灾设计(with Multi-Master)
Performance of fractal tree databases
MySQL新技术探索与实践
MySQL源码分析.03.InnoDB 物理文件格式与数据恢复
MySQL源码分析.02.Handler API
MySQL源码分析.01.代码结构与基本流程
内部MySQL培训.3.基本原理
内部MySQL培训.2.高级应用
内部MySQL培训.1.基础技能
对简易几何机械化证明的进一步研究
A binary graphics recognition algorithm based on fitting function
一种基于拟合函数的图形识别算法
中文分词算法设计
Database.Cache&Buffer&Lock

Recently uploaded (20)

PPTX
Internet of Things (IOT) - A guide to understanding
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Current and future trends in Computer Vision.pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPT
Project quality management in manufacturing
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Construction Project Organization Group 2.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Internet of Things (IOT) - A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Current and future trends in Computer Vision.pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Project quality management in manufacturing
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Sustainable Sites - Green Building Construction
CH1 Production IntroductoryConcepts.pptx
OOP with Java - Java Introduction (Basics)
Construction Project Organization Group 2.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Fundamentals of safety and accident prevention -final (1).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Foundation to blockchain - A guide to Blockchain Tech

Double Sync Replication

  • 1. Double Sync Replication ——Enhancing Data Durability Lixun Peng @ Alibaba Cloud Compute
  • 2. About me • Name: Lixun Peng • Location: Hangzhou, China • Occupation: Staff Database Kernel Engineer @ Alibaba Cloud • Interests: MySQL Replication & InnoDB • Experience: In the first, I worked as a DBA Then, I began to modify code, in order to better use Gradually I became a MySQL Kernel Engineer
  • 3. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • How to use Double-Sync • Several cases
  • 4. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  • 5. Problem of Async Replication • Master doesn’t have to wait ACK from Slave. • Slave doesn’t know if it dumps the latest binary logs. • When Master crashes, slave can’t tell if it catches up Master. • The major problem is slave doesn’t know master’s status.
  • 6. Semi-Sync Replication Semi-Sync will wait for the ACK from Slave
  • 7. Problem of SemiSync • Master has to wait ACK from slave. • Slave will downgrade to async when timeout happens. • If timeout setting is too small, timeout happens too often. • If timeout setting is too big, master blocks a lot. • Slave dump binary logs generated during timeout asynchronously, after it recover from network failure. • If Master crashes, slave doesn’t know how replication works (Async or SemiSync). • In this case, slave still doesn’t know if it dumps the latest binary logs. • Conclusion is SemiSync doesn’t solve the major problem .
  • 10. Background & Target • Background • SA team guarantee the server availability: 99.999% • Net Ops team guarantee the network availability: 99.999% • Assuming master and network doesn’t fail at the same time. • Target • Slave knows if it catch up master. • Slave knows how data in master side it doesn’t have. • Key Point: Clarify Slave's status!
  • 11. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  • 12. Solve the weak point of SemiSync • Even network recover after failure, slave still has to dump the binary logs generated during timeout asynchronously. • If timeout happens and slave gives up the binary logs generated during timeout, what will happen afterwards if master only send the latest position & logs? • When network is down, slave always knows the latest position. • Slave can know if its data is the same with Master or not. • How to catch up data modification when network is down? • Async replication can still dump binary logs • So we can use Async replication to do a full log apply.
  • 13. Combine the Async and SemiSync • Async Replication (Async Channel) • Dumping continuous binary logs from master. • Applying logs immediately after slave receives them. • SemiSync Replication(Sync Channel) • Dumping the latest binary logs and position. • Not applying logs immediately. Expired logs are being purged automatically. • Analyzing Consistency • Comparing logs and position from two channels.
  • 14. Combine the Async and SemiSync
  • 16. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  • 17. How to create two channels(1) • Multi-Source replication enables N channels in one slave. • Problem: when master received two dump requests from the same server-id servers, it disconnects the previous one. • Solution: set up special Server-ID (0xFFFFFF) for Sync Channel.
  • 18. How to create two channels (2) • Problem: there are a SemiSync and a non-SemiSync Channel in one slave, but the SemiSync settings are global. • Solution: move SemiSyncSlave class to Master_info.
  • 19. Analyzing consistency • Using the GTID • Using the Log_file_name and Log_file_pos • Learn the process by checking the following pictures J
  • 20. Analyzing consistency ß Needn’t Repair, Just use it! ß Can’t Repair, Will lose something ß Can Repair, Use it after repair
  • 21. Agenda • The problem about Async/Semi-Sync • How to solve these problems • How to implement Double-Sync • Several cases
  • 22. CASE 1: Needn’t Fix • The GTID between Sync and Async Channel are the same.
  • 23. CASE 2: Can’t Fix • Exists broken gap between Sync and Async Channel.
  • 24. CASE 3: Can Repair • Combine two channel’s logs to make logs continuous.
  • 25. How to Repair • Slave waits for the Async Channel to apply all the logs it receives, then start the SQL THREAD of Sync Channel. • GTID filters the events which have been applied by Async Channel. • A REPAIR SLAVE command is provided to do things automatically.
  • 26. FAQs (1) • Q1: Will Alibaba release this feature? • A1: Of course! Alibaba will release all the patches. • Q2: When Alibaba release the source codes? • A2: Check AliSQL’s roadmap. • Q3: How can I access AliSQL’s source codes? • A3: https://github.com/alibaba/AliSQL Currently the project is private. If you want to access it, please email me to provide your GitHub account.
  • 27. FAQs (2) • Q4: What’s the difference between 2 Semi-Sync Slaves and double sync replication? • A4: In fact they do the same job. Performance is pretty much the same too. But double sync replication saves one more slave than 2 Semi-Sync Slaves architecture. When the number of MySQL servers grows, it will save lots of money.