SlideShare a Scribd company logo
An issue of all slaves stop replication
Kentoku SHIBA
- All slaves stop replication with “Got fatal error 1236 from
master when reading data from binary log: 'unknown error
reading log event on the master; the first event
'binlog.004215' at 8610479, the last event read from
'./binlog.004226' at 154, the last byte read from
'./binlog.004226' at 154.‘”. We can see this error by “show
slave status”.
- When you execute “START SLAVE”, position slightly
advances and replication stops with the same error again.
- Recovery by executing “START SLAVE” on the slaves after
binary log rotation on the master.
Abstract of the issue
Don’t panic.
Let’s explain about detail.
Abstract of the issue
When a slave requested the latest binary log
Master Slave
Request
binary log
The master referenced MYSQL_BIN_LOG::binlog_end_pos for
checking sendable position of the latest binary log to the slave.
Send
binary log event
MYSQL_BIN_LOG
::binlog_end_pos
A behavior of the COMMIT
Master
FLUSH stage
A case of SYNC_BINLOG=1, end position of the binlog is held by
THD::m_trans_end_pos on FLUSH stage, then it’s copied to
MYSQL_BIN_LOG::binlog_end_pos on SYNC stage.
COMMIT stage
THD::
m_trans_end_pos
SYNC stage
MYSQL_BIN_LOG
::binlog_end_pos
- FLUSH stage
Write events from transactions to the binlog.
(Physical writes are not guaranteed on this stage)
- SYNC stage
Write the binlog events physically.
- COMMIT stage
Finalize of the COMMIT on each storage engine.
(Transactions have PREPARE status before this stage)
Each stage can work independently like a transaction is
on COMMIT stage, next transaction is on SYNC stage,
another transaction is on FLUSH stage.
The abstract of stages at COMMIT
The condition of causing this issue
Master
FLUSH stage
When a binlog rotation is occurred before updating
MYSQL_BIN_LOG::binlog_end_pos on SYNC stage, this issue occurs.
COMMIT stage
THD::
m_trans_end_pos
SYNC stage
MYSQL_BIN_LOG
::binlog_end_pos
Rotate a binlog
- The binlog is not broken, the readable position is wrong. So,
replication can restart by command, read and send binarylog
event if it is written after previous error position, then stop by
causing same error.
- When the binlog is rotated again,
MYSQL_BIN_LOG::binlog_end_pos is resetted for new
binlog file. If this time no transaction is on SYNC stage, the
problem is gone.
- If transaction has update tables that supports XA transaction,
MYSQL_BIN_LOG::m_prep_xids is incremented on FLUSH
stage. It makes binlog rotation waiting when it is decremented
on COMMIT stage. So this case does not cause this issue.
Behaviors after causing the issue
Base conditions
- MySQL 5.7 or 8.0 (Include Percona Server etc. Amazon
RDS? Aurora? No source code available. MariaDB doesn’t
have this issue)
- SYNC_BINLOG=1
- Outputting binary logs
The binlog is rotated at the same time of the following actions.
(It includes rotation by FLUSH command)
- Updating to tables that does not support XA transaction like
MyISAM and MEMORY.
- Executing DDL. (Except atomic DDL at MySQL 8.0)
(“CREATE TABLE … SELECT …” does not include atomic DDL)
- Executing commands like ANALYZE TABLE.
The detail condition of causing the issue
MySQL 5.7 & MySQL 8.0
ANALYZE TABLE, REPAIR TABLE, OPTIMIZE TABLE, FLUSH TABLES,
FLUSH PRIVILEGES, FLUSH ENGINE LOGS, FLUSH ERROR LOGS,
FLUSH GENERAL LOGS, FLUSH HOSTS, FLUSH OPTIMIZER_COSTS,
FLUSH RELAY LOGS, FLUSH SLOW LOGS, FLUSH SLOW LOGS,
FLUSH STATUS, FLUSH USER_RESOURCES,
ALTER INSTANCE,
CREATE TABLE ... SELECT ...
MySQL 5.7 & MySQL 8.0 (Tables that does not support atomic DDL)
CREATE TABLE, DROP TABLE, ALTER TABLE, RENAME TABLE,
TRUNCATE, CREATE INDEX, DROP INDEX
MySQL 5.7 & MySQL 8.0 (Tables that does not support XA transaction)
INSERT, UPDATE, DELETE, REPLACE, LOAD DATA
The statements of causing the issue (No.1)
MySQL 5.7
CREATE USER, DROP USER, RENAME USER, ALTER USER,
SET PASSWORD, GRANT, REVOKE (ALL),
CREATE DATABASE, DROP DATABASE, ALTER DATABASE,
CREATE VIEW, DROP VIEW,
CREATE TABLESPACE, DROP TABLESPACE, ALTER TABLESPACE,
CREATE FUNCTION, DROP FUNCTION, ALTER FUNCTION,
CREATE PROCEDURE, DROP PROCEDURE, ALTER PROCEDURE,
CREATE TRIGGER, DROP TRIGGER,
CREATE EVENT, DROP EVENT, ALTER EVENT,
FLUSH DES_KEY_FILE, FLUSH QUERY CACHE
The statements of causing the issue (No.2)
It is possible to do like the following approaches.
(Sometimes, it is possible to do other approaches)
1. Execute “FLUSH BINARY LOGS” command on the master for
rotating the binlog, then execute “START SLAVE
(IO_THREAD)” on the slaves.
If you get same error on the slave, re-execute “FLUSH
BINARY LOGS” and “START SLAVE”.
2. Keep executing “START SLAVE (IO_THREAD)” until there
are no more errors.
Approach of recovering or avoiding from the issue
It is possible to do like the following approaches.
(Sometimes, it is possible to do other approaches)
3. Change “SYNC_BINLOG” from 1. Requires to take care of
data inconsistency in case of master failure.
4. If you know in advance that problems do not occur during
normal operation, execute “FLUSH BINARY LOGS” before
executing DDL etc in maintenance for reducing the possibility
of binary log rotation during DDL execution.
Approach of recovering or avoiding from the issue

More Related Content

PDF
How to migrate_to_sharding_with_spider
PDF
Spider HA 20100922(DTT#7)
PPTX
A Deep Dive Into Understanding Apache Cassandra
PDF
Extending Apache Spark – Beyond Spark Session Extensions
PDF
Deep Dive into Cassandra
PDF
Advanced Sharding Techniques with Spider (MUC2010)
PDF
Advanced MySQL Query Optimizations
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
How to migrate_to_sharding_with_spider
Spider HA 20100922(DTT#7)
A Deep Dive Into Understanding Apache Cassandra
Extending Apache Spark – Beyond Spark Session Extensions
Deep Dive into Cassandra
Advanced Sharding Techniques with Spider (MUC2010)
Advanced MySQL Query Optimizations
Introduction to MySQL Query Tuning for Dev[Op]s

What's hot (18)

PPTX
ConFoo MySQL Replication Evolution : From Simple to Group Replication
PDF
Replicating in Real-time from MySQL to Amazon Redshift
ODP
Introduction to apache_cassandra_for_developers-lhg
PDF
Developers’ mDay 2021: Bogdan Kecman, Oracle – MySQL nekad i sad
PDF
The Automation Factory
PDF
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
PDF
New index features in MySQL 8
PDF
Introduction into MySQL Query Tuning
PPTX
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
PDF
Cassandra Materialized Views
PDF
Extra performance out of thin air
PDF
MySQL Performance Schema in Action
PPTX
Emr zeppelin & Livy demystified
PDF
Why your Spark job is failing
PDF
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
PPTX
mesos-devoxx14
PPTX
Spark 1.6 vs Spark 2.0
PDF
How to migrate from MySQL to MariaDB without tears
ConFoo MySQL Replication Evolution : From Simple to Group Replication
Replicating in Real-time from MySQL to Amazon Redshift
Introduction to apache_cassandra_for_developers-lhg
Developers’ mDay 2021: Bogdan Kecman, Oracle – MySQL nekad i sad
The Automation Factory
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
New index features in MySQL 8
Introduction into MySQL Query Tuning
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Cassandra Materialized Views
Extra performance out of thin air
MySQL Performance Schema in Action
Emr zeppelin & Livy demystified
Why your Spark job is failing
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
mesos-devoxx14
Spark 1.6 vs Spark 2.0
How to migrate from MySQL to MariaDB without tears
Ad

Similar to An issue of all slaves stop replication (20)

PDF
Replication Troubleshooting in Classic VS GTID
PDF
MySQL Replication Troubleshooting for Oracle DBAs
PDF
MySQL Parallel Replication: inventory, use-cases and limitations
PDF
MySQL replication best practices 105-232-931
PDF
The consequences of sync_binlog != 1
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PPTX
Consistency between Engine and Binlog under Reduced Durability
PDF
Lessons Learned: Troubleshooting Replication
ODP
Plmce2k15 15 tips galera cluster
PDF
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
PDF
Riding the Binlog: an in Deep Dissection of the Replication Stream
PDF
Galera Cluster DDL and Schema Upgrades 220217
PDF
Fosdem 2014 - MySQL & Friends Devroom: 15 tips galera cluster
PDF
From crash to testcase
PDF
Best practices for MySQL High Availability
PDF
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
Why MySQL Replication Fails, and How to Get it Back
PDF
Percona Live 2012PPT: introduction-to-mysql-replication
Replication Troubleshooting in Classic VS GTID
MySQL Replication Troubleshooting for Oracle DBAs
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL replication best practices 105-232-931
The consequences of sync_binlog != 1
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
Consistency between Engine and Binlog under Reduced Durability
Lessons Learned: Troubleshooting Replication
Plmce2k15 15 tips galera cluster
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
Riding the Binlog: an in Deep Dissection of the Replication Stream
Galera Cluster DDL and Schema Upgrades 220217
Fosdem 2014 - MySQL & Friends Devroom: 15 tips galera cluster
From crash to testcase
Best practices for MySQL High Availability
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
The Full MySQL and MariaDB Parallel Replication Tutorial
Why MySQL Replication Fails, and How to Get it Back
Percona Live 2012PPT: introduction-to-mysql-replication
Ad

More from Kentoku (20)

PDF
MariaDB 10.3から利用できるSpider関連の性能向上機能・便利機能ほか
PDF
Spiderストレージエンジンの使い方と利用事例 他ストレージエンジンの紹介
PDF
Spider storage engine (dec212016)
PDF
Spiderストレージエンジンのご紹介
PDF
Using spider for sharding in production
PDF
MariaDB ColumnStore 20160721
PDF
Sharding with spider solutions 20160721
PDF
Mroonga 20141129
PDF
MariaDB Spider Mroonga 20140218
PDF
Mroonga 20131129
PDF
Newest topic of spider 20131016 in Buenos Aires Argentina
PDF
Spiderの最新動向 20131009
PDF
Spiderの最新動向 20130419
PDF
Mroonga 20121129
PDF
Mroonga unsupported feature_20111129
PDF
Introducing mroonga 20111129
PDF
hs_spider_hs_something_20110906
PDF
Charms of MySQL 20101206(DTT#7)
PDF
Introducing Spider 20101206(DTT#7)
PDF
Spider DeNA Technology Seminar #2
MariaDB 10.3から利用できるSpider関連の性能向上機能・便利機能ほか
Spiderストレージエンジンの使い方と利用事例 他ストレージエンジンの紹介
Spider storage engine (dec212016)
Spiderストレージエンジンのご紹介
Using spider for sharding in production
MariaDB ColumnStore 20160721
Sharding with spider solutions 20160721
Mroonga 20141129
MariaDB Spider Mroonga 20140218
Mroonga 20131129
Newest topic of spider 20131016 in Buenos Aires Argentina
Spiderの最新動向 20131009
Spiderの最新動向 20130419
Mroonga 20121129
Mroonga unsupported feature_20111129
Introducing mroonga 20111129
hs_spider_hs_something_20110906
Charms of MySQL 20101206(DTT#7)
Introducing Spider 20101206(DTT#7)
Spider DeNA Technology Seminar #2

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
August Patch Tuesday
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
1. Introduction to Computer Programming.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
STKI Israel Market Study 2025 version august
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Programs and apps: productivity, graphics, security and other tools
August Patch Tuesday
Zenith AI: Advanced Artificial Intelligence
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
1. Introduction to Computer Programming.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Developing a website for English-speaking practice to English as a foreign la...
observCloud-Native Containerability and monitoring.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hindi spoken digit analysis for native and non-native speakers
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
STKI Israel Market Study 2025 version august
NewMind AI Weekly Chronicles - August'25-Week II
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
cloud_computing_Infrastucture_as_cloud_p
DP Operators-handbook-extract for the Mautical Institute
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf

An issue of all slaves stop replication

  • 1. An issue of all slaves stop replication Kentoku SHIBA
  • 2. - All slaves stop replication with “Got fatal error 1236 from master when reading data from binary log: 'unknown error reading log event on the master; the first event 'binlog.004215' at 8610479, the last event read from './binlog.004226' at 154, the last byte read from './binlog.004226' at 154.‘”. We can see this error by “show slave status”. - When you execute “START SLAVE”, position slightly advances and replication stops with the same error again. - Recovery by executing “START SLAVE” on the slaves after binary log rotation on the master. Abstract of the issue
  • 3. Don’t panic. Let’s explain about detail. Abstract of the issue
  • 4. When a slave requested the latest binary log Master Slave Request binary log The master referenced MYSQL_BIN_LOG::binlog_end_pos for checking sendable position of the latest binary log to the slave. Send binary log event MYSQL_BIN_LOG ::binlog_end_pos
  • 5. A behavior of the COMMIT Master FLUSH stage A case of SYNC_BINLOG=1, end position of the binlog is held by THD::m_trans_end_pos on FLUSH stage, then it’s copied to MYSQL_BIN_LOG::binlog_end_pos on SYNC stage. COMMIT stage THD:: m_trans_end_pos SYNC stage MYSQL_BIN_LOG ::binlog_end_pos
  • 6. - FLUSH stage Write events from transactions to the binlog. (Physical writes are not guaranteed on this stage) - SYNC stage Write the binlog events physically. - COMMIT stage Finalize of the COMMIT on each storage engine. (Transactions have PREPARE status before this stage) Each stage can work independently like a transaction is on COMMIT stage, next transaction is on SYNC stage, another transaction is on FLUSH stage. The abstract of stages at COMMIT
  • 7. The condition of causing this issue Master FLUSH stage When a binlog rotation is occurred before updating MYSQL_BIN_LOG::binlog_end_pos on SYNC stage, this issue occurs. COMMIT stage THD:: m_trans_end_pos SYNC stage MYSQL_BIN_LOG ::binlog_end_pos Rotate a binlog
  • 8. - The binlog is not broken, the readable position is wrong. So, replication can restart by command, read and send binarylog event if it is written after previous error position, then stop by causing same error. - When the binlog is rotated again, MYSQL_BIN_LOG::binlog_end_pos is resetted for new binlog file. If this time no transaction is on SYNC stage, the problem is gone. - If transaction has update tables that supports XA transaction, MYSQL_BIN_LOG::m_prep_xids is incremented on FLUSH stage. It makes binlog rotation waiting when it is decremented on COMMIT stage. So this case does not cause this issue. Behaviors after causing the issue
  • 9. Base conditions - MySQL 5.7 or 8.0 (Include Percona Server etc. Amazon RDS? Aurora? No source code available. MariaDB doesn’t have this issue) - SYNC_BINLOG=1 - Outputting binary logs The binlog is rotated at the same time of the following actions. (It includes rotation by FLUSH command) - Updating to tables that does not support XA transaction like MyISAM and MEMORY. - Executing DDL. (Except atomic DDL at MySQL 8.0) (“CREATE TABLE … SELECT …” does not include atomic DDL) - Executing commands like ANALYZE TABLE. The detail condition of causing the issue
  • 10. MySQL 5.7 & MySQL 8.0 ANALYZE TABLE, REPAIR TABLE, OPTIMIZE TABLE, FLUSH TABLES, FLUSH PRIVILEGES, FLUSH ENGINE LOGS, FLUSH ERROR LOGS, FLUSH GENERAL LOGS, FLUSH HOSTS, FLUSH OPTIMIZER_COSTS, FLUSH RELAY LOGS, FLUSH SLOW LOGS, FLUSH SLOW LOGS, FLUSH STATUS, FLUSH USER_RESOURCES, ALTER INSTANCE, CREATE TABLE ... SELECT ... MySQL 5.7 & MySQL 8.0 (Tables that does not support atomic DDL) CREATE TABLE, DROP TABLE, ALTER TABLE, RENAME TABLE, TRUNCATE, CREATE INDEX, DROP INDEX MySQL 5.7 & MySQL 8.0 (Tables that does not support XA transaction) INSERT, UPDATE, DELETE, REPLACE, LOAD DATA The statements of causing the issue (No.1)
  • 11. MySQL 5.7 CREATE USER, DROP USER, RENAME USER, ALTER USER, SET PASSWORD, GRANT, REVOKE (ALL), CREATE DATABASE, DROP DATABASE, ALTER DATABASE, CREATE VIEW, DROP VIEW, CREATE TABLESPACE, DROP TABLESPACE, ALTER TABLESPACE, CREATE FUNCTION, DROP FUNCTION, ALTER FUNCTION, CREATE PROCEDURE, DROP PROCEDURE, ALTER PROCEDURE, CREATE TRIGGER, DROP TRIGGER, CREATE EVENT, DROP EVENT, ALTER EVENT, FLUSH DES_KEY_FILE, FLUSH QUERY CACHE The statements of causing the issue (No.2)
  • 12. It is possible to do like the following approaches. (Sometimes, it is possible to do other approaches) 1. Execute “FLUSH BINARY LOGS” command on the master for rotating the binlog, then execute “START SLAVE (IO_THREAD)” on the slaves. If you get same error on the slave, re-execute “FLUSH BINARY LOGS” and “START SLAVE”. 2. Keep executing “START SLAVE (IO_THREAD)” until there are no more errors. Approach of recovering or avoiding from the issue
  • 13. It is possible to do like the following approaches. (Sometimes, it is possible to do other approaches) 3. Change “SYNC_BINLOG” from 1. Requires to take care of data inconsistency in case of master failure. 4. If you know in advance that problems do not occur during normal operation, execute “FLUSH BINARY LOGS” before executing DDL etc in maintenance for reducing the possibility of binary log rotation during DDL execution. Approach of recovering or avoiding from the issue