Remember the Mysql master-slave replication delay caused by an MDL lock, Waiting for dependent transaction to commit, Waiting for table metadata lock
Digression
In a production environment, for better performance, we often see the situation where the master node writes and the slave node reads only, that is, read-write separation; this article records a special master-slave replication delay encountered during read-write separation.
1. Master-slave replication delay
Received master-slave replication delay alarm
告警内容:(敏感信息做了屏蔽)
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 13754354
Relay_Log_Space: 14690374
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: xxxx
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 53153306
Master_UUID: xxxxxx-3c7f-11e8-969a-005056a16d70
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: xxxxxxx-3c7f-11e8-969a-005056a16d70:153908122-204096262
Executed_Gtid_Set: xxxxxxx-3c7f-11e8-969a-005056a16d70:1-204095398
After receiving the alarm, immediately log in to the slave library:
the phenomenon viewed in the show slave status of the slave library is the same as the received alarm.
Two, master-slave replication delay analysis
It can be clearly observed from the content of the alarm that the master-slave replication does have a delay, and the CPU usage of the slave node server is relatively high at this time, so use show processlist to view the operations being performed by the slave node:
You can see that many select statements have been executed for a long time (this kind of statement is really a pit), guess whether it is caused by these select statements, check again and find a detail:
this is the Mysql master-slave replication process, it is obvious Seeing Waiting for table metadata lock, followed by a create index statement to create an index, it can be judged that MDL (metadata lock) has occurred; the process creates an index for the master node, and when the index creation operation is played back from the node, it is The select statement is blocked, causing delays in master-slave replication.
PS: What is MDL metadata lock?
MySQL 5.5 version introduced the MDL lock (metadata lock), used to solve or ensure the consistency between DDL operations and DML operations. For details, please refer to a blog post I reprinted:
https://blog.csdn.net/Tah_001/article/details/107931747
Three, problem solving
After locating the cause, communicate with the relevant personnel of the business system, you can directly kill the slow query select statement above. After the related select statement is killed, the MDL lock is released and the master-slave replication returns to normal.
Four, summary
The master-slave replication may be delayed:
1. The network delay is high, causing the standby IO thread to wait.
2. The IO hardware condition of the standby library is worse than that of the main library, and the IO capability is insufficient.
3. A large transaction occurred in the execution of the main library, resulting in a delayed spike.
4. The standby database is not enabled for multi-threaded replication, and sql apply has a bottleneck.
5. There is metadata lock waiting in the current session of the standby machine.
6. No primary key table update.
Improvements:
1. Find out the slow query statements of the business system and require rectification. Read-only nodes cannot be used for statement testing.
2. The index creation is a DDL statement, use the pt tool or create it during the peak period of the business.
Oh, not bad! ------Welcome to point out the mistakes and add better methods