MySQL Group Replication Got fatal error 1236

2017年03月13日 14:30:03 ZzzCrazyPig 阅读数：2227

问题产生的场景

原先配置了MGR集群，从节点退出集群(stop group_replication)，隔了很长一段时间想要加进来，这段时间内，主节点进行很多操作，并且之前的binlog被purged掉

问题产生原因分析

MGR主节点的binary log被删除掉(purge或者定期删除掉，人为地在硬盘上执行rm删除不在此考虑范围), MGR从节点无法读取到这些binary log, 尝试多次recovery(找donor进行recovery)后失败，从replication group中退出，从节点的报错信息大致如下所示:

2017-03-13T10:00:26.358681+08:00 355 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2017-03-13T10:00:26.358720+08:00 355 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
...

 
         x 
        
2017-03-13T10:00:26.358681+08:00 355 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2017-03-13T10:00:26.358720+08:00 355 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
...

解决思路

从MGR主节点利用mysqldump dump出数据，包括gtid_purged，在从节点上应用

解决步骤

mysqldump --all-databases --set-gtid-purged=ON --single-transaction -uroot -S"/mysql_data/data/mgrtest1/mysql_mgrtest1.sock" -P24801 -p > /mysql_data/data/mgrtest1_alldb.sql

 
mysqldump --all-databases --set-gtid-purged=ON --single-transaction -uroot -S"/mysql_data/data/mgrtest1/mysql_mgrtest1.sock" -P24801 -p > /mysql_data/data/mgrtest1_alldb.sql

输入密码回车后，提示:

Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events. 
mysqldump: Couldn't execute 'SAVEPOINT sp': The MySQL server is running with the --transaction-write-set-extraction!=OFF option so it cannot execute this statement (1290)

 
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events. 
mysqldump: Couldn't execute 'SAVEPOINT sp': The MySQL server is running with the --transaction-write-set-extraction!=OFF option so it cannot execute this statement (1290)

关联到mysql bug: https://bugs.mysql.com/bug.php?id=81494

根据提示信息，当节点运行在group replication模式下，不支持savepoint，而mysqldump中需要dump出这条语句，然后dump出set @@gitd_purged=XXX 这个语句

根据mysql bug链接提供的解决步骤，在主节点上执行:

mysql> set global transaction_write_set_extraction=OFF;
ERROR 3093 (HY000): The write set algorithm cannot be changed when Group replication is running.

 
mysql> set global transaction_write_set_extraction=OFF;
ERROR 3093 (HY000): The write set algorithm cannot be changed when Group replication is running.

主节点需要关闭group_replication(此时如果有其他ONLINE从节点，选择其他ONLINE节点进行mysqldump应该是更好的选择)

若stop的是主节点，请确保其他ONLINE从节点是否需要提前从集群中解除，如果需要，先stop所有从节点，再stop主节点

mysql> stop group_replication;
Query OK, 0 rows affected (8.74 sec)

mysql> set global transaction_write_set_extraction=OFF;
Query OK, 0 rows affected (0.00 sec)

 
mysql> stop group_replication;
Query OK, 0 rows affected (8.74 sec)
mysql> set global transaction_write_set_extraction=OFF;
Query OK, 0 rows affected (0.00 sec)

再次执行mysqldump命令即可成功dump出数据

将dump出来的文件拷贝到从节点所在机器，登陆从节点，执行:

reset master;
source ${your_sql_file}

 
reset master;
source ${your_sql_file}

恢复主节点:

mysql> set global transaction_write_set_extraction=XXHASH64;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.01 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.06 sec)

mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

 
mysql> set global transaction_write_set_extraction=XXHASH64;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.01 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.06 sec)
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

最后，在从节点上执行:

start group_replication;

 
start group_replication;

到这里从节点应该能够正常加入了

参考

引用 http://lefred.be/content/mysql-group-replication-limitations-savepoints/，评论部分如下描述：

Savepoints are also used when executing mysqldump with –single-transaction option.
http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_single-transaction

That means if you want to do a consistent non blocking mysqldump on a node of the group you need to put the node out of the cluster first.

e.g.
mysql> — Put the node out of the Group Replication cluster
mysql> STOP group_replication;
mysql> SET GLOBAL transaction_write_set_extraction=OFF;

$ # Dump the entire node (instance)
$ mysqldump –all-databases –triggers –routines –events –single-transaction > /mysqldump/dump_YYYYMMDD.sql

mysql> — Bring back the node into the cluster
mysql> SET GLOBAL transaction_write_set_extraction=XXHASH64;
mysql> START group_replication;

 
     
   
 
         x 
        

             
         
Savepoints are also used when executing mysqldump with –single-transaction option.
http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_single-transaction
That means if you want to do a consistent non blocking mysqldump on a node of the group you need to put the node out of the cluster first.
e.g.
mysql> — Put the node out of the Group Replication cluster
mysql> STOP group_replication;
mysql> SET GLOBAL transaction_write_set_extraction=OFF;
$ # Dump the entire node (instance)
$ mysqldump –all-databases –triggers –routines –events –single-transaction > /mysqldump/dump_YYYYMMDD.sql
mysql> — Bring back the node into the cluster
mysql> SET GLOBAL transaction_write_set_extraction=XXHASH64;
mysql> START group_replication;

来自为知笔记(Wiz)

027_【重要#集群恢复步骤】MySQL Group Replication Got fatal error 1236 - CrazyPig的技术博客 - CSDN博客