MySQL master-slave replication delay reasons and processing ideas

Source: Public Account "The Shadow Corridor of the Oracle"

In an asynchronous or semi-synchronous replication structure, delays from the library are quite normal.
Although the delay is normal, whether it needs attention is generally evaluated by the business.
For example, if there is a read service that requires higher consistency from the library, and the delay is required to be less than a certain value, then you need to pay attention.

A brief overview of the replication logic:

1. The main database records the changes to the database instance in binlog. 
2. The main library will have a binlog dumpthread to monitor the binlog changes in real time and push these new events to the slave library ( Master has sent all binlog to slave; waiting for more updates)
3. The slave library IO Threadreceives these events and records them in the relaylog.
4. SQL ThreadRead the relaylog events from the library , and apply (or replay) these events to the slave library instance.

The above is the default asynchronous replication logic, and semi-synchronous replication is slightly different, so I won't repeat it here.

In addition, judging that there is a delay in the slave library is a very simple matter:
just pass the SHOW SLAVE STATUS
check Seconds_Behind_Mastervalue on the slave library .

Reasons for the delay and solutions

〇Main library DML requests are frequent (large tps)

That is, the main library has many write requests, a large number of insert, delete, and update concurrent operations, and a large number of binlogs are generated in a short time.

[Cause analysis] The
main library writes data concurrently, while the slave library SQL Threadis a single-threaded application log, which can easily cause relaylog accumulation and delay.

[Solution]
Do sharding and break up writing requests through scale out. Or consider upgrading to MySQL 5.7+ and enable parallel replication based on logic clocks.

〇The main library performs major tasks

For example, a large number of import data INSERT INTO $tb1 SELECT * FROM $tb2, LOAD DATA INFILEsuch as
for example UPDATE, DELETEthe whole table and so
Exec_Master_Log_Poshas remained the same, Slave_SQL_Running_Statefor the Reading event from the relay log
analysis of the main library binlog, look at the main library currently executing the transaction can be known.

[Cause analysis]
If the master database takes 200s to update a large table, if the configuration of the master and slave libraries is similar, the slave library also needs to spend almost the same time to update the large table. At this time, the delay of the slave library starts to accumulate and subsequent events Could not update.

[Solution]
Split large transactions and submit them in time.

〇The main library executes DDL statements on large tables

Phenomenon and 主库执行大事务similar.
Check that the Exec_Master_Log_Pos has not moved, or it may be executing DDL.
Analyze the binlog of the main library, and see the transactions currently executed by the main library.

[Reasons]
1, DDL does not start, blocked, SHOW SLAVE STATUSchecked Slave_SQL_Running_Stateinto waiting for table metadata lock, and Exec_Master_Log_Posunchanging.
2. DDL is being executed, and SQL Threadsingle-threaded applications cause delays to increase. Slave_SQL_Running_StateFor altering table, Exec_Master_Log_Posunchanged

[Solutions]
by processlistor information_schema.innodb_trxto find the blocking query DDL statements, get rid of the queries, so that DDL properly executed from the library.
The delay caused by DDL itself is difficult to avoid. It is recommended to consider:
① After the business peak period is executed
②  set sql_log_bin=0, manually execute DDL on the master and slave libraries (this operation will cause data inconsistency for some DDL operations, please be sure to test strictly)

〇The configuration of the master library and the slave library are inconsistent:

[Cause analysis]
Hardware: The main library instance server uses SSD, while the slave library instance server uses ordinary SAS disks, and the CPU frequency is inconsistent.
Configurations: such as inconsistent RAID card write strategies, inconsistent OS kernel parameter settings, and inconsistent MySQL disk placement strategies Wait

[Solution]
Try to unify the configuration of the DB machine (including hardware and option parameters).
Even for some OLAP services, the hardware configuration of the slave library instance is higher than the main library, etc.

〇The table lacks a primary key or unique index

binlog_format=rowUnder the circumstances, if the table lacks a primary key or unique index, in UPDATE, DELETEtime may result in a delay from the library surge.
This time Slave_SQL_Running_Stateis Reading event from the relay log.
And SHOW OPEN TABLES WHERE in_use=1the table always exists.
Exec_Master_Log_Posconstant.
The cpu of the mysqld process is almost 100% (when there is no reading service), the io pressure is not great

] [Cause Analysis
be assumed that in extreme cases, the master database updating a 500w 20w rows in the table, the update statement requires a full table scan
the next row format, is recorded binlog 20w update operation times, then SQL Threadweight Putting will be very slow, each update may require a full table scan

[Solution idea]
Check the table structure to ensure that each table has an explicit self-incrementing primary key, and establish appropriate indexes.

〇The pressure from the library itself is too high

[Cause analysis]
A large number of select requests are executed from the library, or most of the select requests of the business are routed to the slave library instance, even a large number of OLAP services, or the slave library is being backed up.
At this time, the CPU load may be too high, and the io utilization rate may be too high, resulting in too slow SQL Thread application.

[Solution]
Create more slave libraries, break up read requests, and reduce the pressure on existing slave library instances.

〇MyISAM storage engine

At this time from the library Slave_SQL_Running_StateisWaiting for table level lock

[Cause analysis]
MyISAM only supports table-level locks, and read and write operations cannot be performed concurrently.
In @@concurrent_insertthe case of setting the corresponding value, the main library can execute insert concurrently during select, but SQL Threadit cannot be concurrently replayed from the library . If you are interested, you can go to see the implementation of myisam.

[Solution]
Of course, I chose to forgive it. Since I chose MyISAM, I should also be psychologically prepared. (There are other scenarios, and MyISAM is not recommended to be used in the replication structure)
Change to InnoDB.

to sum up:

Pass SHOW SLAVE STATUSand SHOW PROCESSLISTview the current situation from the library. (By the way, this reason can also be ruled out when backing up from the library.)
If it Exec_Master_Log_Posremains the same, consider large transactions, DDL, and no primary key, and check the binlog and position corresponding to the main library.
If it Exec_Master_Log_Poschanges, the delay will gradually increase, consider the load of the slave library machine, such as io, cpu, etc., and consider whether the main library write operation and the pressure of the slave library itself are too large.

If none of the above reasons are present, then ask the DBA leaders.

Of course, it Seconds_Behind_Masteris not necessarily accurate. In a small number of scenarios, although it Seconds_Behind_Masteris 0, the master and slave data are inconsistent.
This will be another blog post.

The full text is over.

Enjoy MySQL :)

Scan the QR code to follow the author's WeChat public account

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/108722021