只读实例(slave主从)延迟排查


实为吾之愚见,望诸君酌之!闻过则喜,与君共勉 


以下方法适用于实时查看只读延迟(主从延迟),即需要在延迟发生的时候查看才能确认问题,历史延迟不适用,以下环境已经开启并行复制。

第一章:如何判断无延迟

第一步:主实例信息获取

主实例(主节点)高权限账户执行:

1,show processlist

2451b6268c7a72110d1461870ca8be4ec8ecb3bc
2,show master status\G

7ede228c8ad41b85eb1d3d02291756d7c7264c0f

3,show variables like ‘%uuid%’;

4302cee140dacf133cd2c4b7c993c9ab8f37e3b8

扫描二维码关注公众号,回复: 5395568 查看本文章

第二步:只读实例信息获取

只读实例(slave从库)高权限账户执行:

1, show processlist

aafb4079462fe04a04c17a92c960db89aed7c20f

2, show slave status\G

mysql>show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.151.83.26

                  Master_User: replicator

                  Master_Port: 3106

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000379

          Read_Master_Log_Pos: 53064

               Relay_Log_File: slave-relay.000375

                Relay_Log_Pos: 53237

        Relay_Master_Log_File: mysql-bin.000379

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 53064

              Relay_Log_Space: 53527

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 999098802

                  Master_UUID: a23bf85f-f20b-11e8-8268-7cd30abda240

             Master_Info_File: mysql.slave_master_info

                    SQL_Delay: 0

          SQL_Remaining_Delay:

      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

           Master_Retry_Count: 86400

                  Master_Bind:

      Last_IO_Error_Timestamp:

     Last_SQL_Error_Timestamp:

          Last_SQL_Error_Gtid:

               Master_SSL_Crl:

           Master_SSL_Crlpath:

           Retrieved_Gtid_Set: a23bf85f-f20b-11e8-8268-7cd30abda240:391216-838127

            Executed_Gtid_Set: 937ca3af-f20b-11e8-9447-7cd30ab8a5d8:1-178841,

a23bf85f-f20b-11e8-8268-7cd30abda240:1-838127

                Auto_Position: 1

         Replicate_Rewrite_DB:

                 Channel_Name:

           Master_TLS_Version:

3,show variables like ‘%uuid%’;

e1992786156498a06ab6a65c9e56b639bcff3dcd

第三步:对比确认

1, 确认主实例的进程状态是否为Master has sent all binlog to slave; waiting for more updates(图片红框标识处),如果是该状态,说明master已经把所有的binlog发送给了slave,主实例正常

2, 确认只读实例(slave从库)的进程状态是否为:Waiting for master to send event以及

Slave has read all relay log; waiting for more updates,如果是改状态,说明slave以及应用了所有master发送的binlog,正在等待master继续发送

3, 对比只读实例执行show slave status\G得到的Executed_Gtid_Set与主实例执行show master status\G得到的Executed_Gtid_Set是否一致或相近,如果一致,则可以说明主实例执行的事务,只读实例也已经执行。

通过如上3步的对比,可以确认主实例与只读实例(slave从库)无延迟

第二章 延迟原因以及解决

通过第一章的结论,可以得出只读实例无延迟,当出现只读延迟的时候,应该如何确认原因以及解决,请参考下面的具体例子:

2.1:大批量更新

2.1.1复现方法:

       主实例执行大批量写入等更新操作: insert into MOCK_DATA3 select * from aaaaa.MOCK_DATA,其中MOCK_DATA表数量1000w行,导致延迟出现

2.1.2延迟出现:

show processlist结果:

21     system user                 Connect  41786     Waiting for master to send event      

22     system user                 Connect         Waiting for Slave Workers to free pending events  

23     system user                 Connect  236  Executing event   

24     system user                 Connect  571  Waiting for an event from Coordinator    

25     system user                 Connect  41786     Waiting for an event from Coordinator    

26     system user                 Connect  41786     Waiting for an event from Coordinator    

27     system user                 Connect  41786     Waiting for an event from Coordinator    

28     system user                 Connect  41786     Waiting for an event from Coordinator    

29     system user                 Connect  41786     Waiting for an event from Coordinator    

30     system user                 Connect  41786     Waiting for an event from Coordinator

 

show slave status\G结果:

              Master_SSL_Key :

              Seconds_Behind_Master : 0

              Master_SSL_Verify_Server_Cert : No

              Last_IO_Errno : 0

              Last_IO_Error :

              Last_SQL_Errno : 0

              Last_SQL_Error :

              Replicate_Ignore_Server_Ids :

              Master_Server_Id : 999098802

              Master_UUID : a23bf85f-f20b-11e8-8268-7cd30abda240

              Master_Info_File : mysql.slave_master_info

              SQL_Delay : 0

              SQL_Remaining_Delay : undefined

              Slave_SQL_Running_State : Waiting for Slave Workers to free pending events

              Master_Retry_Count : 86400

              Master_Bind :

              Last_IO_Error_Timestamp :

              Last_SQL_Error_Timestamp :

              Last_SQL_Error_Gtid :

              Master_SSL_Crl :

              Master_SSL_Crlpath :

              Retrieved_Gtid_Set : a23bf85f-f20b-11e8-8268-7cd30abda240:1-861466

               Executed_Gtid_Set : 937ca3af-f20b-11e8-9447-7cd30ab8a5d8:1-178841,a23bf85f-f20b-11e8-8268-7cd30abda240:1-861455

 

show engine innodb status\G结果:

 

  ---TRANSACTION 2819387, ACTIVE 65 sec inserting

  mysql tables in use 1, locked 1

  1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 5919707

  MySQL thread id 23, OS thread handle 140536255158016, query id 102150 Executing event

 

2.1.3 解决方法

  建议等待只读实例应用完主库的更新,会自动恢复。

 

2.2:无主键更新

2.2.1复现方法:

主实例执行较多数据的无主键更新操作:mysql>update MOCK_DATA set id=90 where id<10;,导致延迟出现

2.2.2 延迟出现:

show processlist结果:

1bc75c45d4b3177404e07c7c665b7dc10ae5ec4e

show slave status\G结果:

47dd4874f45cb6d85a6bf3f8544e3e5f5355159f

2.2.3 解决方法

       建议增加有效索引提高更新效率。

2.3:阻塞(ddl)

2.3.1 复现方法

只读实例开启显式事务,执行如下查询,不提交事务:

325bdaee522f36b407b8479dc2f00116e97c9af0

主实例执行truncate table mock_data,只读延迟产生

2.3.2延迟产生

show processlist 结果:

4cdb349d62eff3949d121475d3793c98fd9b26ad

show slave status\G结果:

0a8358893a45f09bc1d39b6b31a049735f67e46b

2.3.3 解决方法

       该类问题一般是其他事务导致ddl操作阻塞,建议kill掉阻塞的事务,参考: https://help.aliyun.com/knowledge_detail/41723.html?spm=5176.11065259.1996646101.searchclickresult.3c2e73bf9s0jmI

2.4:Waiting for Slave Worker to release partition

 参考: https://yq.aliyun.com/articles/188482?spm=a2c4e.11155435.0.0.120f63b5YRAxtQ

猜你喜欢

转载自yq.aliyun.com/articles/692087