SlideShare a Scribd company logo
How to Repair MySQL Replication
About Me
Mughees Ahmed
4 years of experience as an Oracle and MySQL DBA
Currently working with Etisalcom Bahrain
Certified Oracle and MySQL Professional
Created Course on The Ultimate MySQL Replication Crash Course from Zero to Hero
https://mughees.gumroad.com/l/yGqVw
Course is coming on Udemy Soon.
My YouTube Channel https://www.youtube.com/user/mughees52
Blogs https://ittutorial.org/category/mysql/
Linkedin, Twitter @mughees52
Agenda
 Types of problem you can face.
 SQL_SLAVE_SKIP_COUNTER
 How it works in ACID compliance Table (Innodb)
 How it works in NON-ACID compliance table (MyISAM)
 pt-slave-restart
 TROUBLESHOOTING GTID
 Solving duplicate key error etc.
 Errant GTID
 Confirm if there is any Errant GTID
 Finding the exact Errant GTID
 Solving the Errant GTID
 Insert empty transactions
 Remove from Binlog
If You want to Learn more, You can Follow me on
https://app.gumroad.com/signup?referrer=mughees
Data Drift
 A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
 A statement was executed directly on the replica
 Can happen if the replica was not in super_read_only and a Super user executed
 Can happen if the replica was not in read_only
 A statement was executed on a replica and the replica was later promoted to a
primary without GTID in place
 A primary server is not configured for full ACID compliance and it crashed
 At some point, the primary was not configured for row-based replication (even
briefly)
 More exotic cases can involve bugs, engine differences, version differences
 few things to prevent and fix this:
• All replicas run in super_read_only mode
• Verify ACID compliance and if not using it, checksum after any crash or failover event
• Checksum regularly
How to Repair MySQL Replication
 If you have set up MySQL replication, you probably know this problem: sometimes
there are invalid MySQL queries which cause the replication to not work anymore.
 Identifying the Problem
 MySQL Error Log
 Show slave status;
Last_Errno: 1146
Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760'
doesn't exist' on query. Default database: 'mydb’.
Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS
aggregate
SET thread.views = thread.views + aggregate.views
WHERE thread.threadid = aggregate.threadid’
 Repair the MySQL Replication
SAMPLE ERROR MESSAGES (FROM SHOW SLAVE
STATUS OUTPUT):
 Last_SQL_Error: Could not execute Write_rows event on table
test.t1; Duplicate entry '4' for key 'PRIMARY', Error_code:
1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master
log mysql-bin.000304, end_log_pos 285
 Last_SQL_Error: Could not execute Update_rows event on table
test.t1; Can't find record in 't1', Error_code: 1032; handler
error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-
bin.000304, end_log_pos 492
 Last_SQL_Error: Could not execute Delete_rows event on table
test.t1; Can't find record in 't1', Error_code: 1032; handler
error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-
bin.000304, end_log_pos 688
First Option
 SQL_SLAVE_SKIP_COUNTER
 SET GLOBAL sql_slave_skip_counter = N
 This statement skips the next N events from the master. This is useful for recovering
from replication stops caused by a statement.
 If you look closely at the document, the last paragraph of the document also
says:
 When you use SET GLOBAL sql_slave_skip_counter to skip events and the results
in the middle of a group, the slave continues to skip events until it reaches the end of
the group. Execution then starts with the next event group.
Master Slave
Table (Z) Table (Z)
+----+
| a |
+----+
| 1 |
| 2 |
| 3 |
+----+
+----+
| a |
+----+
| 1 |
| 3 |
+----+
BEGIN;
INSERT INTO z SELECT 4;
DELETE FROM z WHERE a = 2;
INSERT INTO z SELECT 5;
COMMIT;
Obviously, the slave will report an error, prompting an error of
1032, because the record 2 is not found. At this time, many
DBAs will choose to execute SET GLOBAL
sql_slave_skip_counter=1.
However, such processing will cause the INSERT 5 record to
not be executed. Because after skipping the DELETE 2
operation, the transaction is not over, and the next event will
continue to be skipped.
This is what the document says: the slave continues to skip
events until it reaches the end of the group . Interested
students can test by themselves to see the final result.
What should I do if I just want to skip an EVENT? Should we,
just set the parameter slave_exec_mode to IDEMPOTENT ?
Test Data
On Master
 create tablerepl_innodb(id intprimary key,name1 char( 10),name2
char( 10)) engine= innodb;
 create tablerepl_myisam(id intprimary key,name1 char( 10),name2
char( 10)) engine= myisam;
On Slave:
 # Add data from the SLAVE to the test table, not recorded in binlog.
setsql_log_bin = 0;
insert intorepl_innodb(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 ');
insert intorepl_myisam(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 ');
setsql_log_bin = 1;
Current Data
Replica
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
mysql> select * from repl_myisam;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
MASTER
mysql> select * from repl_innodb;
Empty set (0.00 sec)
mysql> select * from repl_myisam;
Empty set (0.00 sec)
Transactional tables
 On master :
begin ;
insert into repl_innodb(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 '
);
insert into repl_innodb(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 '
);
commit ;
mysql> select * from repl_innodb;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | m1062-1 | m1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
Transactional tables
 On Replica
select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
Master_Host: 192.168.70.10
Master_Log_File: binlog.000014
Read_Master_Log_Pos: 593
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 156
Last_Errno: 1062
Last_Error: Could not execute Write_rows event
on table test.repl_innodb; Duplicate entry '1' for key
'repl_innodb.PRIMARY', Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000014,
end_log_pos 436
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
Transactional tables
mysql> show binary logs;
+---------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000013 | 156 | No |
| binlog.000014 | 179 | No |
| binlog.000015 | 1861 | No |
+---------------+-----------+-----------+
mysqlbinlog –v --base64-output=decode-rows
/var/lib/mysql/binlog.000015
And by looking into the slave bin log file you will not find any entry in slave binlog
Transactional tables
 set global sql_slave_skip_counter = 1 ;
 start slave sql_thread;
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
Non-transactional tables
 The Master adds data to non-transactional tables
begin ;
insert into repl_myisam(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 '
);
insert into repl_myisam(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 '
);
commit ;
mysql> select * from repl_myisam;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | m1062-1 | m1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
Non-transactional tables
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Errno: 1062
Last_SQL_Error: Could
not execute Write_rows event on table
test.repl_myisam; Duplicate entry '1'
for key 'repl_myisam.PRIMARY',
Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's
master log binlog.000014, end_log_pos
2113
 ON SALVE:
mysql> select * from
test.repl_myisam;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
Non-transactional tables
 Let Solve the error by skipping the event
set global sql_slave_skip_counter = 1 ;
start slave sql_thread;
select * from repl_myisam;
mysql> select * from repl_myisam;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | s1062-1 | s1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
And if you see here where have second record here Why??
Non-transactional tables (Master binlog)
BEGIN
/*!*/;
# at 1987
#210706 17:44:15 server id 1 end_log_pos 2055 CRC32 0xd5896c4c Table_map:
`test`.`repl_myisam` mapped to number 111
# at 2055
#210706 17:44:15 server id 1 end_log_pos 2113 CRC32 0xadd8d9bd
Write_rows: table id 111 flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=1
### @2=' m1062-1'
### @3=' m1062-1'
# at 2113
#210706 17:44:15 server id 1 end_log_pos 2189 CRC32 0x31c1deaa Query
thread_id=47 exec_time=0 error_code=0
SET TIMESTAMP=1625593455/*!*/;
COMMIT
Non-transactional tables (Master binlog)
SET TIMESTAMP=1625593455/*!*/;
BEGIN
/*!*/;
# at 2343
#210706 17:44:15 server id 1 end_log_pos 2411 CRC32 0x54db30b4 Table_map:
`test`.`repl_myisam` mapped to number 111
# at 2411
#210706 17:44:15 server id 1 end_log_pos 2469 CRC32 0xe02b1e17 Write_rows: table id 111
flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=2
### @2=' m1062-2'
### @3=' m1062-2'
# at 2469
#210706 17:44:15 server id 1 end_log_pos 2545 CRC32 0x168f221d Query thread_id=47
exec_time=0 error_code=0
SET TIMESTAMP=1625593455/*!*/;
COMMIT
Non-transactional tables (Slave binlog)
# at 1490
#210706 17:44:15 server id 1 end_log_pos 1560 CRC32 0xb789547b Query thread_id=47 exec_time=308
error_code=0
SET TIMESTAMP=1625593455/*!*/;
BEGIN
/*!*/;
# at 1560
#210706 17:44:15 server id 1 end_log_pos 1628 CRC32 0x3d6a2f01 Table_map: `test`.`repl_myisam` mapped to
number 109
# at 1628
#210706 17:44:15 server id 1 end_log_pos 1686 CRC32 0x2872bae4 Write_rows: table id 109 flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=2
### @2=' m1062-2'
### @3=' m1062-2'
# at 1686
#210706 17:44:15 server id 1 end_log_pos 1753 CRC32 0x83d5d026 Query thread_id=47 exec_time=308
error_code=0
SET TIMESTAMP=1625593455/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
COMMIT
Option 2 pt-slave-restart
 Last_SQL_Errno: 1062
 Last_SQL_Error: Could not execute Write_rows event on table
test1.repl_innodb; Duplicate entry '1' for key 'repl_innodb.PRIMARY', Error_code:
1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log
binlog.000007, end_log_pos 1007
[root@mysql-gtid2 ~]# pt-slave-restart
2021-07-11T19:09:16 mysql-gtid2-relay-bin.000004 934
1062
 pt-slave-restart watches one or more MySQL replication slaves and tries to skip
statements that cause errors. It polls slaves intelligently with an exponentially
varying sleep time.
 When using GTID, an empty transaction should be created in order to skip it. If
writes are coming from different nodes in the replication tree above, it is not
possible to know which event from which UUID to skip.
 master1 -> slave1 -> slave2
 pt-slave-restart --master-uuid
Option 3
 I am not going to show this one as this will take long time.
 But this is the last option where you have to restore/reseed the replica from
the master backup.
 Almost just the same, only need to reset the salve.
 Incase of the GTID you need to reset the master as well to clear the value of
GTID_EXCUTED.
TROUBLESHOOTING GTID
show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on
table test1.repl_innodb; Duplicate entry '1' for key
'repl_innodb.PRIMARY', Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000007, end_log_pos
1624
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-187
mysql> show master status;
+---------------+----------+--------------------------------------------+
| File | Position | Executed_Gtid_Set |
+---------------+----------+--------------------------------------------+
| binlog.000007 | 1782 | 02992584-de8e-11eb-98ad-080027b81a94:1-188 |
+---------------+----------+--------------------------------------------+
TROUBLESHOOTING GTID
mysql> SET gtid_next='02992584-de8e-11eb-98ad-080027b81a94:188';
mysql> BEGIN;
mysql> COMMIT;
mysql> SET GTID_NEXT="AUTOMATIC";
mysql> start slave;
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Slave_SQL_Running_State: Slave has read all relay log; waiting for more
updates
Last_SQL_Errno: 0
Last_SQL_Error:
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188
How to Detect and solve Errant transection:
 Un-replicated transaction existing only on a replica
 Data is not the same on all nodes
 Cluster is no longer in a consistent stat
 Errant GTID detection:
 Compare executed GTID sets between primary node and replica nodes
 Replica has more GTIDs than primary => errant GTID
Let's Find Errant GTID
 We Need to user two function:
 GTID_SUBSET:
 Used to find if Replica is a subset of Master or not?
 SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>');
 GTID_SUBTRACT:
 Used to find the exact Errant GTID.
 SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>’);
Current situation
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94
Slave_SQL_Running_State: Slave has read all relay log; waiting for more
updates
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
GTID subset
mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1-
170','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS is_subset;
+-----------+
| is_subset |
+-----------+
| 1 |
+-----------+
Replica GTID set is a subset of primary GTID set : OK (It was just to show you guys in our case it’s not)
Check if Replica GTID set is the subset of Primary GTID Set:
mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-
e1b1-11eb-99b5-080027b81a94:1','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS
is_subset;
+-----------+
| is_subset |
+-----------+
| 0 |
+-----------+
Replica GTID set is NOT a subset of primary GTID set => Errant GTID on replica
Find Errant GTID
SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>’);
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
mysql> SELECT GTID_SUBTRACT ('02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-e1b1-
11eb-99b5-080027b81a94:1','02992584-de8e-11eb-98ad-080027b81a94:1-188') AS errant_gtid;
+----------------------------------------+
| errant_gtid |
+----------------------------------------+
| 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 |
+----------------------------------------+
 Result is errant GTID
[root@mysql-gtid2 ~]# mysqlbinlog --base64-output=DECODE-ROWS --verbose /var/lib/mysql/binlog.000001
| grep 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 -A100
SET @@SESSION.GTID_NEXT= '9ad0db84-e1b1-11eb-99b5-080027b81a94:1'/*!*/;
# at 235
#210711 20:06:33 server id 2 end_log_pos 311 CRC32 0x45140e33 Query thread_id=14 exec_time=0
error_code=0
SET TIMESTAMP=1626033993/*!*/;
.
.
BEGIN
/*!*/;
# at 311
#210711 20:06:33 server id 2 end_log_pos 380 CRC32 0xb7b533ca Table_map: `test1`.`repl_innodb`
mapped to number 97
# at 380
#210711 20:06:33 server id 2 end_log_pos 438 CRC32 0x5ef8bb59 Write_rows: table id 97 flags:
STMT_END_F
### INSERT INTO `test1`.`repl_innodb`
### SET
### @1=2
### @2='m1062-2'
### @3='Ernt-tran'
# at 438
#210711 20:06:33 server id 2 end_log_pos 469 CRC32 0x696f6881 Xid = 135
COMMIT/*!*/;
Fix errant GTIDs
 Possible fixes:
• Insert empty transactions on other nodes (including primary)
• Remove GTIDs from replica bin-log
• Restore data from primary/backup
Insert empty transactions
 On all nodes (or only on the primary of replication still works):
 In our case the replication is working fine so we will insert empty transactions on
Master
mysql> SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1';
mysql> BEGIN;
mysql> COMMIT;
mysql> SET gtid_next=automatic;
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
If you don’t set gtid_next:
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must
explicitly set it to a different value after a COMMIT or ROLLBACK.
Insert empty transactions
 What if Master is down, then we will insert on all slave and then promote on
of the most up to date slave to master and point the rest of the slave to new
master.
STOP SLAVE;
SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1';
BEGIN;
COMMIT;
SET gtid_next=automatic;
START SLAVE;
Errant GTID : Remove from binlog:
 Inser new query on slave to make it inconsistant and create and errant GTID
insert into repl_innodb(id,name1,name2) values ( 6 , ' m1062-2 ' ,
'ReGTIDbin' );
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-
080027b81a94:182-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-
080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1-2
Errant GTID : Remove from binlog:
 On the primary:
mysql> SELECT @@GLOBAL.gtid_executed;
+------------------------------------------------------------------ -+
| @@GLOBAL.gtid_executed |
+--------------------------------------------------------------------+
| 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1 |
+--------------------------------------------------------------------+
Errant GTID : Remove from binlog:
 On Replica:
mysql> STOP SLAVE;
mysql> RESET MASTER;
# With RESET MASTER : Binlogs are purged on the replica and reset the gtid_executed to ''
mysql> SELECT @@GLOBAL.gtid_executed;
+------------------------+
| @@GLOBAL.gtid_executed |
+------------------------+
| |
+------------------------+
mysql> SET GLOBAL GTID_PURGED="02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1";
mysql> START SLAVE;
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Thank
you
QNA

More Related Content

PDF
MySQL GTID Concepts, Implementation and troubleshooting
PPTX
ProxySQL for MySQL
PDF
MySQL InnoDB Cluster - Group Replication
PDF
The InnoDB Storage Engine for MySQL
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
PDF
MySQL Data Encryption at Rest
PDF
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
PDF
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL GTID Concepts, Implementation and troubleshooting
ProxySQL for MySQL
MySQL InnoDB Cluster - Group Replication
The InnoDB Storage Engine for MySQL
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Data Encryption at Rest
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)

What's hot (20)

PDF
MariaDB MaxScale
PDF
Automated master failover
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
PDF
Maxscale_메뉴얼
PDF
MySQL Performance Schema in Action
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
PPTX
My sql failover test using orchestrator
PDF
MariaDB 10.5 binary install (바이너리 설치)
PPTX
Maxscale 소개 1.1.1
PDF
MySQL Administrator 2021 - 네오클로바
PDF
What is new in PostgreSQL 14?
PPTX
MySQL_MariaDB-성능개선-202201.pptx
PDF
ProxySQL High Availability (Clustering)
PDF
Maxscale switchover, failover, and auto rejoin
PPTX
Maria db 이중화구성_고민하기
PDF
[2019] 200만 동접 게임을 위한 MySQL 샤딩
PDF
ProxySQL High Avalability and Configuration Management Overview
PDF
MariaDB MaxScale monitor 매뉴얼
PPTX
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
MariaDB MaxScale
Automated master failover
How to Manage Scale-Out Environments with MariaDB MaxScale
Maxscale_메뉴얼
MySQL Performance Schema in Action
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
My sql failover test using orchestrator
MariaDB 10.5 binary install (바이너리 설치)
Maxscale 소개 1.1.1
MySQL Administrator 2021 - 네오클로바
What is new in PostgreSQL 14?
MySQL_MariaDB-성능개선-202201.pptx
ProxySQL High Availability (Clustering)
Maxscale switchover, failover, and auto rejoin
Maria db 이중화구성_고민하기
[2019] 200만 동접 게임을 위한 MySQL 샤딩
ProxySQL High Avalability and Configuration Management Overview
MariaDB MaxScale monitor 매뉴얼
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
Ad

Similar to Replication Troubleshooting in Classic VS GTID (20)

PDF
An issue of all slaves stop replication
PDF
MySQL Replication Troubleshooting for Oracle DBAs
PDF
MySQL replication best practices 105-232-931
PDF
Why MySQL Replication Fails, and How to Get it Back
PDF
MySQL 8.0: Secure your replication deployment
PPTX
Consistency between Engine and Binlog under Reduced Durability
PDF
MySQL Best Practices - OTN
PDF
Basic MySQL Troubleshooting for Oracle DBAs
PDF
MySQL 5.6 Replication Webinar
PDF
Basic MySQL Troubleshooting for Oracle DBAs
PDF
Mha procedure
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PDF
MySQL highav Availability
PPT
Oreilly Webcast Jan 09, 2009
PDF
Riding the Binlog: an in Deep Dissection of the Replication Stream
ODP
MySQL 101 PHPTek 2017
PDF
My sql 5.7-upcoming-changes-v2
PDF
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
PDF
Percona Live 2012PPT: introduction-to-mysql-replication
PPTX
MySQL Replication Overview -- PHPTek 2016
An issue of all slaves stop replication
MySQL Replication Troubleshooting for Oracle DBAs
MySQL replication best practices 105-232-931
Why MySQL Replication Fails, and How to Get it Back
MySQL 8.0: Secure your replication deployment
Consistency between Engine and Binlog under Reduced Durability
MySQL Best Practices - OTN
Basic MySQL Troubleshooting for Oracle DBAs
MySQL 5.6 Replication Webinar
Basic MySQL Troubleshooting for Oracle DBAs
Mha procedure
MySQL Parallel Replication: inventory, use-case and limitations
MySQL highav Availability
Oreilly Webcast Jan 09, 2009
Riding the Binlog: an in Deep Dissection of the Replication Stream
MySQL 101 PHPTek 2017
My sql 5.7-upcoming-changes-v2
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
Percona Live 2012PPT: introduction-to-mysql-replication
MySQL Replication Overview -- PHPTek 2016
Ad

More from Mydbops (20)

PDF
Scaling TiDB for Large-Scale Application
PDF
AWS MySQL Showdown - RDS vs RDS Multi AZ vs Aurora vs Serverless - Mydbops...
PDF
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
PDF
Migration Journey To TiDB - Kabilesh PR - Mydbops MyWebinar 38
PDF
AWS Blue Green Deployment for Databases - Mydbops
PDF
What's New In MySQL 8.4 LTS Mydbops MyWebinar Edition 36
PDF
What's New in PostgreSQL 17? - Mydbops MyWebinar Edition 35
PDF
What's New in MongoDB 8.0 - Mydbops MyWebinar Edition 34
PDF
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
PDF
Read/Write Splitting using MySQL Router - Mydbops Meetup16
PDF
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
PDF
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
PDF
Demystifying Real time Analytics with TiDB
PDF
Must Know Postgres Extension for DBA and Developer during Migration
PDF
Efficient MySQL Indexing and what's new in MySQL Explain
PDF
Scale your database traffic with Read & Write split using MySQL Router
PDF
PostgreSQL Schema Changes with pg-osc - Mydbops @ PGConf India 2024
PDF
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
PDF
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
PDF
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Scaling TiDB for Large-Scale Application
AWS MySQL Showdown - RDS vs RDS Multi AZ vs Aurora vs Serverless - Mydbops...
Mastering Vector Search with MongoDB Atlas - Manosh Malai - Mydbops MyWebinar 39
Migration Journey To TiDB - Kabilesh PR - Mydbops MyWebinar 38
AWS Blue Green Deployment for Databases - Mydbops
What's New In MySQL 8.4 LTS Mydbops MyWebinar Edition 36
What's New in PostgreSQL 17? - Mydbops MyWebinar Edition 35
What's New in MongoDB 8.0 - Mydbops MyWebinar Edition 34
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Read/Write Splitting using MySQL Router - Mydbops Meetup16
TiDB - From Data to Discovery: Exploring the Intersection of Distributed Dat...
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Demystifying Real time Analytics with TiDB
Must Know Postgres Extension for DBA and Developer during Migration
Efficient MySQL Indexing and what's new in MySQL Explain
Scale your database traffic with Read & Write split using MySQL Router
PostgreSQL Schema Changes with pg-osc - Mydbops @ PGConf India 2024
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Mastering Aurora PostgreSQL Clusters for Disaster Recovery
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing

Replication Troubleshooting in Classic VS GTID

  • 1. How to Repair MySQL Replication
  • 2. About Me Mughees Ahmed 4 years of experience as an Oracle and MySQL DBA Currently working with Etisalcom Bahrain Certified Oracle and MySQL Professional Created Course on The Ultimate MySQL Replication Crash Course from Zero to Hero https://mughees.gumroad.com/l/yGqVw Course is coming on Udemy Soon. My YouTube Channel https://www.youtube.com/user/mughees52 Blogs https://ittutorial.org/category/mysql/ Linkedin, Twitter @mughees52
  • 3. Agenda  Types of problem you can face.  SQL_SLAVE_SKIP_COUNTER  How it works in ACID compliance Table (Innodb)  How it works in NON-ACID compliance table (MyISAM)  pt-slave-restart  TROUBLESHOOTING GTID  Solving duplicate key error etc.  Errant GTID  Confirm if there is any Errant GTID  Finding the exact Errant GTID  Solving the Errant GTID  Insert empty transactions  Remove from Binlog If You want to Learn more, You can Follow me on https://app.gumroad.com/signup?referrer=mughees
  • 4. Data Drift  A statement is executed on a primary with: SET SESSION sql_log_bin = OFF  A statement was executed directly on the replica  Can happen if the replica was not in super_read_only and a Super user executed  Can happen if the replica was not in read_only  A statement was executed on a replica and the replica was later promoted to a primary without GTID in place  A primary server is not configured for full ACID compliance and it crashed  At some point, the primary was not configured for row-based replication (even briefly)  More exotic cases can involve bugs, engine differences, version differences  few things to prevent and fix this: • All replicas run in super_read_only mode • Verify ACID compliance and if not using it, checksum after any crash or failover event • Checksum regularly
  • 5. How to Repair MySQL Replication  If you have set up MySQL replication, you probably know this problem: sometimes there are invalid MySQL queries which cause the replication to not work anymore.  Identifying the Problem  MySQL Error Log  Show slave status; Last_Errno: 1146 Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb’. Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate SET thread.views = thread.views + aggregate.views WHERE thread.threadid = aggregate.threadid’  Repair the MySQL Replication
  • 6. SAMPLE ERROR MESSAGES (FROM SHOW SLAVE STATUS OUTPUT):  Last_SQL_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '4' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000304, end_log_pos 285  Last_SQL_Error: Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql- bin.000304, end_log_pos 492  Last_SQL_Error: Could not execute Delete_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql- bin.000304, end_log_pos 688
  • 7. First Option  SQL_SLAVE_SKIP_COUNTER  SET GLOBAL sql_slave_skip_counter = N  This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.  If you look closely at the document, the last paragraph of the document also says:  When you use SET GLOBAL sql_slave_skip_counter to skip events and the results in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.
  • 8. Master Slave Table (Z) Table (Z) +----+ | a | +----+ | 1 | | 2 | | 3 | +----+ +----+ | a | +----+ | 1 | | 3 | +----+ BEGIN; INSERT INTO z SELECT 4; DELETE FROM z WHERE a = 2; INSERT INTO z SELECT 5; COMMIT; Obviously, the slave will report an error, prompting an error of 1032, because the record 2 is not found. At this time, many DBAs will choose to execute SET GLOBAL sql_slave_skip_counter=1. However, such processing will cause the INSERT 5 record to not be executed. Because after skipping the DELETE 2 operation, the transaction is not over, and the next event will continue to be skipped. This is what the document says: the slave continues to skip events until it reaches the end of the group . Interested students can test by themselves to see the final result. What should I do if I just want to skip an EVENT? Should we, just set the parameter slave_exec_mode to IDEMPOTENT ?
  • 9. Test Data On Master  create tablerepl_innodb(id intprimary key,name1 char( 10),name2 char( 10)) engine= innodb;  create tablerepl_myisam(id intprimary key,name1 char( 10),name2 char( 10)) engine= myisam; On Slave:  # Add data from the SLAVE to the test table, not recorded in binlog. setsql_log_bin = 0; insert intorepl_innodb(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 '); insert intorepl_myisam(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 '); setsql_log_bin = 1;
  • 10. Current Data Replica mysql> select * from repl_innodb; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec) mysql> select * from repl_myisam; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec) MASTER mysql> select * from repl_innodb; Empty set (0.00 sec) mysql> select * from repl_myisam; Empty set (0.00 sec)
  • 11. Transactional tables  On master : begin ; insert into repl_innodb(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 ' ); insert into repl_innodb(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 ' ); commit ; mysql> select * from repl_innodb; +----+----------+----------+ | id | name1 | name2 | +----+----------+----------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+----------+----------+ 2 rows in set (0.00 sec)
  • 12. Transactional tables  On Replica select * from repl_innodb; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec) Master_Host: 192.168.70.10 Master_Log_File: binlog.000014 Read_Master_Log_Pos: 593 Slave_IO_Running: Yes Slave_SQL_Running: No Exec_Master_Log_Pos: 156 Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table test.repl_innodb; Duplicate entry '1' for key 'repl_innodb.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000014, end_log_pos 436 mysql> select * from repl_innodb; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec)
  • 13. Transactional tables mysql> show binary logs; +---------------+-----------+-----------+ | Log_name | File_size | Encrypted | +---------------+-----------+-----------+ | binlog.000013 | 156 | No | | binlog.000014 | 179 | No | | binlog.000015 | 1861 | No | +---------------+-----------+-----------+ mysqlbinlog –v --base64-output=decode-rows /var/lib/mysql/binlog.000015 And by looking into the slave bin log file you will not find any entry in slave binlog
  • 14. Transactional tables  set global sql_slave_skip_counter = 1 ;  start slave sql_thread; mysql> select * from repl_innodb; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec)
  • 15. Non-transactional tables  The Master adds data to non-transactional tables begin ; insert into repl_myisam(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 ' ); insert into repl_myisam(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 ' ); commit ; mysql> select * from repl_myisam; +----+----------+----------+ | id | name1 | name2 | +----+----------+----------+ | 1 | m1062-1 | m1062-1 | | 2 | m1062-2 | m1062-2 | +----+----------+----------+ 2 rows in set (0.00 sec)
  • 16. Non-transactional tables mysql> show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: No Last_SQL_Errno: 1062 Last_SQL_Error: Could not execute Write_rows event on table test.repl_myisam; Duplicate entry '1' for key 'repl_myisam.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000014, end_log_pos 2113  ON SALVE: mysql> select * from test.repl_myisam; +----+----------+---------+ | id | name1 | name2 | +----+----------+---------+ | 1 | s1062-1 | s1062-1 | +----+----------+---------+ 1 row in set (0.00 sec)
  • 17. Non-transactional tables  Let Solve the error by skipping the event set global sql_slave_skip_counter = 1 ; start slave sql_thread; select * from repl_myisam; mysql> select * from repl_myisam; +----+----------+----------+ | id | name1 | name2 | +----+----------+----------+ | 1 | s1062-1 | s1062-1 | | 2 | m1062-2 | m1062-2 | +----+----------+----------+ 2 rows in set (0.00 sec) And if you see here where have second record here Why??
  • 18. Non-transactional tables (Master binlog) BEGIN /*!*/; # at 1987 #210706 17:44:15 server id 1 end_log_pos 2055 CRC32 0xd5896c4c Table_map: `test`.`repl_myisam` mapped to number 111 # at 2055 #210706 17:44:15 server id 1 end_log_pos 2113 CRC32 0xadd8d9bd Write_rows: table id 111 flags: STMT_END_F ### INSERT INTO `test`.`repl_myisam` ### SET ### @1=1 ### @2=' m1062-1' ### @3=' m1062-1' # at 2113 #210706 17:44:15 server id 1 end_log_pos 2189 CRC32 0x31c1deaa Query thread_id=47 exec_time=0 error_code=0 SET TIMESTAMP=1625593455/*!*/; COMMIT
  • 19. Non-transactional tables (Master binlog) SET TIMESTAMP=1625593455/*!*/; BEGIN /*!*/; # at 2343 #210706 17:44:15 server id 1 end_log_pos 2411 CRC32 0x54db30b4 Table_map: `test`.`repl_myisam` mapped to number 111 # at 2411 #210706 17:44:15 server id 1 end_log_pos 2469 CRC32 0xe02b1e17 Write_rows: table id 111 flags: STMT_END_F ### INSERT INTO `test`.`repl_myisam` ### SET ### @1=2 ### @2=' m1062-2' ### @3=' m1062-2' # at 2469 #210706 17:44:15 server id 1 end_log_pos 2545 CRC32 0x168f221d Query thread_id=47 exec_time=0 error_code=0 SET TIMESTAMP=1625593455/*!*/; COMMIT
  • 20. Non-transactional tables (Slave binlog) # at 1490 #210706 17:44:15 server id 1 end_log_pos 1560 CRC32 0xb789547b Query thread_id=47 exec_time=308 error_code=0 SET TIMESTAMP=1625593455/*!*/; BEGIN /*!*/; # at 1560 #210706 17:44:15 server id 1 end_log_pos 1628 CRC32 0x3d6a2f01 Table_map: `test`.`repl_myisam` mapped to number 109 # at 1628 #210706 17:44:15 server id 1 end_log_pos 1686 CRC32 0x2872bae4 Write_rows: table id 109 flags: STMT_END_F ### INSERT INTO `test`.`repl_myisam` ### SET ### @1=2 ### @2=' m1062-2' ### @3=' m1062-2' # at 1686 #210706 17:44:15 server id 1 end_log_pos 1753 CRC32 0x83d5d026 Query thread_id=47 exec_time=308 error_code=0 SET TIMESTAMP=1625593455/*!*/; SET @@session.sql_mode=1168113696/*!*/; COMMIT
  • 21. Option 2 pt-slave-restart  Last_SQL_Errno: 1062  Last_SQL_Error: Could not execute Write_rows event on table test1.repl_innodb; Duplicate entry '1' for key 'repl_innodb.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000007, end_log_pos 1007 [root@mysql-gtid2 ~]# pt-slave-restart 2021-07-11T19:09:16 mysql-gtid2-relay-bin.000004 934 1062  pt-slave-restart watches one or more MySQL replication slaves and tries to skip statements that cause errors. It polls slaves intelligently with an exponentially varying sleep time.  When using GTID, an empty transaction should be created in order to skip it. If writes are coming from different nodes in the replication tree above, it is not possible to know which event from which UUID to skip.  master1 -> slave1 -> slave2  pt-slave-restart --master-uuid
  • 22. Option 3  I am not going to show this one as this will take long time.  But this is the last option where you have to restore/reseed the replica from the master backup.  Almost just the same, only need to reset the salve.  Incase of the GTID you need to reset the master as well to clear the value of GTID_EXCUTED.
  • 23. TROUBLESHOOTING GTID show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table test1.repl_innodb; Duplicate entry '1' for key 'repl_innodb.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000007, end_log_pos 1624 Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-187 mysql> show master status; +---------------+----------+--------------------------------------------+ | File | Position | Executed_Gtid_Set | +---------------+----------+--------------------------------------------+ | binlog.000007 | 1782 | 02992584-de8e-11eb-98ad-080027b81a94:1-188 | +---------------+----------+--------------------------------------------+
  • 24. TROUBLESHOOTING GTID mysql> SET gtid_next='02992584-de8e-11eb-98ad-080027b81a94:188'; mysql> BEGIN; mysql> COMMIT; mysql> SET GTID_NEXT="AUTOMATIC"; mysql> start slave; mysql> show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Last_SQL_Errno: 0 Last_SQL_Error: Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188
  • 25. How to Detect and solve Errant transection:  Un-replicated transaction existing only on a replica  Data is not the same on all nodes  Cluster is no longer in a consistent stat  Errant GTID detection:  Compare executed GTID sets between primary node and replica nodes  Replica has more GTIDs than primary => errant GTID
  • 26. Let's Find Errant GTID  We Need to user two function:  GTID_SUBSET:  Used to find if Replica is a subset of Master or not?  SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>');  GTID_SUBTRACT:  Used to find the exact Errant GTID.  SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>’);
  • 27. Current situation mysql> show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: Yes Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1
  • 28. GTID subset mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1- 170','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS is_subset; +-----------+ | is_subset | +-----------+ | 1 | +-----------+ Replica GTID set is a subset of primary GTID set : OK (It was just to show you guys in our case it’s not) Check if Replica GTID set is the subset of Primary GTID Set: mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84- e1b1-11eb-99b5-080027b81a94:1','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS is_subset; +-----------+ | is_subset | +-----------+ | 0 | +-----------+ Replica GTID set is NOT a subset of primary GTID set => Errant GTID on replica
  • 29. Find Errant GTID SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>’); Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 mysql> SELECT GTID_SUBTRACT ('02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-e1b1- 11eb-99b5-080027b81a94:1','02992584-de8e-11eb-98ad-080027b81a94:1-188') AS errant_gtid; +----------------------------------------+ | errant_gtid | +----------------------------------------+ | 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 | +----------------------------------------+  Result is errant GTID
  • 30. [root@mysql-gtid2 ~]# mysqlbinlog --base64-output=DECODE-ROWS --verbose /var/lib/mysql/binlog.000001 | grep 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 -A100 SET @@SESSION.GTID_NEXT= '9ad0db84-e1b1-11eb-99b5-080027b81a94:1'/*!*/; # at 235 #210711 20:06:33 server id 2 end_log_pos 311 CRC32 0x45140e33 Query thread_id=14 exec_time=0 error_code=0 SET TIMESTAMP=1626033993/*!*/; . . BEGIN /*!*/; # at 311 #210711 20:06:33 server id 2 end_log_pos 380 CRC32 0xb7b533ca Table_map: `test1`.`repl_innodb` mapped to number 97 # at 380 #210711 20:06:33 server id 2 end_log_pos 438 CRC32 0x5ef8bb59 Write_rows: table id 97 flags: STMT_END_F ### INSERT INTO `test1`.`repl_innodb` ### SET ### @1=2 ### @2='m1062-2' ### @3='Ernt-tran' # at 438 #210711 20:06:33 server id 2 end_log_pos 469 CRC32 0x696f6881 Xid = 135 COMMIT/*!*/;
  • 31. Fix errant GTIDs  Possible fixes: • Insert empty transactions on other nodes (including primary) • Remove GTIDs from replica bin-log • Restore data from primary/backup
  • 32. Insert empty transactions  On all nodes (or only on the primary of replication still works):  In our case the replication is working fine so we will insert empty transactions on Master mysql> SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1'; mysql> BEGIN; mysql> COMMIT; mysql> SET gtid_next=automatic; Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 If you don’t set gtid_next: ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK.
  • 33. Insert empty transactions  What if Master is down, then we will insert on all slave and then promote on of the most up to date slave to master and point the rest of the slave to new master. STOP SLAVE; SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1'; BEGIN; COMMIT; SET gtid_next=automatic; START SLAVE;
  • 34. Errant GTID : Remove from binlog:  Inser new query on slave to make it inconsistant and create and errant GTID insert into repl_innodb(id,name1,name2) values ( 6 , ' m1062-2 ' , 'ReGTIDbin' ); mysql> show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: Yes Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94 Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad- 080027b81a94:182-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1 Executed_Gtid_Set: 02992584-de8e-11eb-98ad- 080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1-2
  • 35. Errant GTID : Remove from binlog:  On the primary: mysql> SELECT @@GLOBAL.gtid_executed; +------------------------------------------------------------------ -+ | @@GLOBAL.gtid_executed | +--------------------------------------------------------------------+ | 02992584-de8e-11eb-98ad-080027b81a94:1-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 | +--------------------------------------------------------------------+
  • 36. Errant GTID : Remove from binlog:  On Replica: mysql> STOP SLAVE; mysql> RESET MASTER; # With RESET MASTER : Binlogs are purged on the replica and reset the gtid_executed to '' mysql> SELECT @@GLOBAL.gtid_executed; +------------------------+ | @@GLOBAL.gtid_executed | +------------------------+ | | +------------------------+ mysql> SET GLOBAL GTID_PURGED="02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1"; mysql> START SLAVE; mysql> show slave statusG; Slave_IO_Running: Yes Slave_SQL_Running: Yes Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188, 9ad0db84-e1b1-11eb-99b5-080027b81a94:1