MySQL master-slave synchronization delay principles and solutions

Outline

MySQL master-slave synchronization is a very sophisticated architecture, advantages:
① in can perform queries work (that is, we often say that the reading function) from the server, reducing the main server stress;
② the backup from the primary server to avoid the impact during backup master service;
③ a problem if the primary server, the server can be switched from.

I am sure these benefits has been very understanding, but also in the deployment project using this program. But MySQL master-slave synchronization delay has been a problem from the library, then why is there such a problem. How to solve this problem it?

  1. MySQL database master-slave synchronization delay principle.
  2. MySQL database master-slave synchronization delay is how generated.
  3. MySQL database master-slave synchronization delay solutions.

MySQL database master-slave synchronization delay principle

Speaking MySQL database master-slave synchronization delay principle, was from the main mysql database from replication principle, from the main mysql replication operations are single-threaded, the main library generated binlog all DDL and DML, binlog written order, so high efficiency, the slave to the master library Slave_IO_Running thread take logs, very high efficiency, next, the question, the slave thread Slave_SQL_Running DDL and DML operations in slave master repository embodiment. DML and DDL IO operations is then, not sequential, much higher costs, may also be other queries generated lock contention on the slave, because the Slave_SQL_Running is single-threaded, so a DDL main card, and you need to perform for 10 minutes, so after all DDL DDL will wait for the completion of execution will proceed, which led to the delay. A friend asks: "DDL that same libraries also need to perform on the main 10 points, slave Why would delay?", The answer is a master can be complicated, Slave_SQL_Running thread can not.

MySQL database master-slave synchronization delay is how did

When TPS main library concurrent high number of DDL generated more than a slave sql thread could afford, then the delay arises, of course, it is possible with large query statements slave lock wait.

We know that an open server N link to connect to the client, so there will be a big concurrent update operations, but read the binlog thread from inside a server only when a SQL execution from the server or a little longer due to an SQL table will cause the lock to be the primary server's SQL large backlog has not been synchronized to the server from inside. This leads to the master-slave inconsistent, that is, the master-slave delay.

MySQL database master-slave synchronization delay Solutions

The easiest slave synchronization delay reduction program is to do optimization on the architecture, try to make a rapid implementation of DDL main library. There is a main library is written, data security is high, such as sync_binlog = 1, innodb_flush_log_at_trx_commit = 1 is set and the like, while the slave you do not need such a high data security, can speak sync_binlog set to zero or close binlog, innodb_flushlog can also be set to 0 to improve the efficiency of sql. The other is to use better than the main library of hardware devices as slave.

In fact, master-slave synchronization delay does not have any way to trick enemy, because all the SQL must be executed from a server inside again, but if the primary server continue to have a steady stream of write update operation, then once the delay generator , then the possibility of delaying the greater emphasis will be original. Of course, we can do some mitigation measures.

  • a. We know that because the primary server is responsible for the update operation, from the server than he, some settings can be modified for all security requirements, such as sync_binlog = 1, innodb_flush_log_at_trx_commit = 1 like setting, while the slave you do not need such a high data security, can speak sync_binlog set to 0 or off binlog, innodb_flushlog, innodb_flush_log_at_trx_commit can also be set to 0 to improve the efficiency of the sql can greatly improve efficiency. The other is to use better than the main library of hardware devices as slave.
  • b. is to, a degree from the server when used as a backup, without providing query, his load down there, inside the SQL execution relay log efficiency naturally high.
  • c. Increase myself, this object is read from the server or distributed pressure, thereby reducing server load.

Analyzing the primary delay, usually there are two methods:

Seconds_Behind_Master and mk-heartbeat, the difference between the two below in particular to achieve the function.

Seconds_Behind_Master

Monitoring can show slave statusthe value of the parameter of the command output Seconds_Behind_Master to determine whether there occurs a delay from the master.
The values are so few:
NULL - represents io_thread or sql_thread there is any failure, that is, Running status of the thread is No, rather Yes.
0 - this value is zero, we are very eager to see, a front from a good copy, it is believed lag does not exist.
Positive - a front has emerged from the delay, the more the larger the number from the library behind the main library.
Negative - rarely see, just listen to some senior DBA said've seen, in fact, this is a BUG, this parameter is not supported by a negative value, that is, should not appear.

Seconds_Behind_Master replication by the timestamp and event execution io_thread sql_thread good comparison of the event timestamp (abbreviated as ts) are compared, and such a difference obtained. We all know the binlog contents inside the relay-log and the main library in exactly the same record at the same time sql statement it will be recorded on the time ts, so the comparison reference value from the binlog, in fact, there is no need to master the NTP synchronization, that is to say without the need to ensure consistent master and slave clock. You will find, in fact, the comparison really happened between io_thread and sql_thread, and io_thread really be linked to the main library, so the problem came out, when the main library I / O load large or network congestion, not timely io_thread copy binlog (no interruption, but also to copy), and sql_thread been able to keep up with io_thread script, then Seconds_Behind_Master value is 0, which is what we think without delay, but not actually, you know. This is why we have to criticize use this parameter to monitor the reasons for the delay are not allowed to databases has occurred, but this value is not always allowed, if the master network when io_thread good case, then the value is also very value. Previously, we mentioned Seconds_Behind_Master this parameter can have negative happens, we already know that the value is the difference between the most recently executed ts to ts with new and sql_thread io_thread, the former is always greater than the latter, the only one that is willing to be ts event of an error occurs, smaller than the previous, so when this happens, a negative result is possible.

mk-heartbeat

mk-heartbeat, Maatkit a universal tool kit, is believed to be an accurate method of determining delay replication.
mk-heartbeat also achieved by comparative timestmp implemented, it first needs to ensure that the master must be consistent from the server, by synchronizing a clock of the same NTP server. It needs to be created in a heartbeat main library table, there are at least two fields id and ts, id is server_id, ts is the current timestamp now (), the structure will be copied to the library, built table later, Taiwan mode process will later go on to perform in the main library command line update operations on a regular basis whereabouts insert data in the table, the default is 1 second period, while the library will also monitor the implementation of a command in the background, and the main library cycle to maintain a consistent comparison, copied on the same value ts ts value of the primary database record, the difference is 0 for no delay, the more larger the difference between the number of seconds of delay. We all know that replication is asynchronous ts would not exactly the same, so the tool allows a half-second gap, in which the difference can be ignored think of no delay. This tool is by copying the real deal, cleverly borrowed timestamp to check for delays, like this one!

Additional:

sync_binlog configuration instructions:
sync_binlog ": This parameter is for the MySQL system is crucial, he not only affect Binlog brought on MySQL performance loss, but also affects the integrity of the data for MySQL." sync_binlog " description of the various parameters set as follows:
sync_binlog = 0, when the transaction commits, MySQL do fsync like disk synchronization command to refresh the information binlog_cache to disk, and let Filesystem to decide what to do time synchronization, or cache full only after synchronization to the disk.
sync_binlog = n, n times after each transaction commits, MySQL fsync or the like will be a synchronization instruction to the disk data binlog_cache the mandatory write disk.

In MySQL, the default setting is sync_binlog = 0, that is, without any mandatory disks refresh command, this time the performance is the best, but the risk is greatest. Because once the system Crash, all binlog information will be lost in the binlog_cache. When set to "1" when, but the performance loss is the safest maximum setting. Because when set to 1, even if the system Crash, also lost up to a transaction binlog_cache unfinished, without any substantial effect on the actual data.

From past experience and the associated test run, the system for concurrent transactions is high, "sync_binlog" is set to 0 and set to 1, the system may write up to five times the performance gap even more.

innodb_flush_log_at_trx_commit Configuration Description:
The default value of 1 means that every transaction commits or additional instructions are required to transaction log writing (the flush) hard drive, which is very time consuming. Especially when using battery-backed cache (Battery backed up cache). 2 is set to use for a lot, especially from MyISAM tables turn over is possible, it means the hard drive but not written into the system cache. The log is still flush per second to the hard drive, so you generally will not lose more than 1-2 seconds to update. 0 is set to be a little faster, but security is rather poor, even if MySQL is also linked to the transaction data may be lost. The value of 2 will only hang the entire operating system may only lose data.

mysql-5.6.3 has support for multi-threaded from the main copy. Similar principles and Dinc, the Dinc is based on the table to do multithreading, Oracle is using the database (schema) as a unit to do multithreading, different libraries can use different replication thread.

Based on master / slave mechanism LAN under normal circumstances has to meet 'real-time' requirement backed up. If the delay is relatively large, it confirms the following factors:

  1. Network latency
  2. load master
  3. slave load

The general practice is to use a plurality of slave read to share request, then take a dedicated server from the slave, only as a backup without performing any other operations, can achieve maximum relative 'real time' is the requirement
slave_net_timeout seconds to 3600 seconds by default
Definition: when the slave to read log data from the primary database fails, how long to wait to re-establish the connection and acquires the data
master-connect-retry seconds with a default of 60 seconds
Definition: when re- establishing master-slave connection, if the connection establishment fails, try again after long intervals.

Typically two or more parameters can be arranged to reduce the problems caused by the primary network from the data synchronization delay

Published 158 original articles · won praise 119 · views 810 000 +

Guess you like

Origin blog.csdn.net/u013474436/article/details/104821971