MySQL master-slave synchronization mechanisms and synchronization delay tracing process issues

Foreword

As a DBA, the work will often encounter some problems MySQL master-slave synchronization delays, slow sync these problems, in fact, the reason is very much, because the main problem may be the network from the lead, probably because the network bandwidth problems caused, may be due to large transaction causes, it may be because the delay caused by single-threaded replication.

Today encounter a problem, Mysql persistent error, master-slave synchronization delay is too large or the number of errors. So this article to share with the investigation from the main principle of synchronization mechanisms and the problems idea.

Failure performance

The most intuitive as follows:

?
1
2
3
4
5
6
7
mysql> show slave status\G;
  // 状态一
  Seconds_Behind_Master: NULL
  // 状态二
  Seconds_Behind_Master: 0
  // 状态三
  Seconds_Behind_Master: 79

Continuous query, the attribute value most of the time = 0, 79, or incidental appear like Null latency value. For the observed from the master alarm synchronization delay duration monitoring.

Failure causes and solutions

Preparation of multiple server-id same machine, the same time causing the host can not connect a preparation stage, and thus can not be synchronized properly.

After modifying server-id, restart the database recovery.

Master-slave synchronization mechanisms

MySQL master-slave synchronization, called a copy (replication), is a high-performance built-in high-availability cluster solutions, the main features are:

  • Data distribution: the synchronization does not require a large bandwidth, multi-data center replication data.
  • Read load balancing: the server cluster, through DNS polling, Linux LVS like the GSLB (global load balancing) mode, reducing the pressure of the main server is read.
  • Database Backup: Copy is part of the backup, but not a substitute for backup. Also we need to be combined with fast camera.
  • High availability and failover: You can quickly switch to the master server from the server, reducing downtime and failure recovery time.

Master-slave synchronization is divided into three steps:

  1. The primary server (master) to the binary data changes log (the binlog) in.
  2. Copied from the server (slave) to the master's own binary log log relay (relay log) in.
  3. Redo logs from the server relay log, the changes to the database on their own, to achieve data consistency.

Master-slave synchronization is an asynchronous real-time synchronization, real time transmission, but there is a delay in the execution, if a great master pressure, the delay will be expanded accordingly.

Through the above chart, you can see a total of three threads need:

  1. Log shipping primary server thread: Incremental responsible for transferring binary log to the backup machine
  2. From the I / O thread server: responsible for reading the master's binary log, and save it as relay logs
  3. From SQL Server thread, responsible for implementing the relay log

View MySQL thread

We can use the show full processlist;command to view the status of MySQL:

Host states:

Preparation of the state machine:

You can see, my cluster architecture of 1 host, 4 sets of backup machine, so there are four simultaneous threads in the host (binlog has sent all the data to the backup machine, waiting for binlog log update), a view command threads ( show full processlist). There is a thread in order to view the backup machine, an I / O thread (wait for the host to send data synchronization events), a SQL thread (all relay logs have been read, waiting for I / O thread to update it).

View sync status

Because the master-slave synchronization is asynchronous real-time, which is the case there will be a delay, we can show slave status; to view the synchronization delay on the standby machine:

In some property master-slave synchronization, we need to focus on, we have to mark red:

  • Slave_IO_State: the current state of I / O threads
  • Master_Log_File: current synchronization master server binaries
  • Read_Master_Log_Pos: current offset binary synchronous primary server, bytes, as shown for the synchronized content 12.9M (13630580/1024/1024) of
  • Relay_Master_Log_File: The current relay log synchronous binary file
  • Slave_IO_Running: operating state from the server I / O thread, YES is operating normally
  • Slave_SQL_Running: From the server running the SQL thread, YES is operating normally
  • Exec_Master_Log_Pos: indicates the offset binary log master synchronization completion of
  • Seconds_Behind_Master: represents a long duration than the data from the server if the primary server behind

The same can show master status; command to view the master server operating status:

Primary synchronization state from a normal operation:

Slave_IO_Running: YES
Slave_SQL_Running: YES
Seconds_Behind_Master: 0

Troubleshooting

In the understanding of the issues from the main mechanism for synchronizing, look encountered today by looking at the backup machine state, we observe a few key attribute value in three states:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> show slave status\G;
#状态一:
  Slave_IO_State: Reconnecting after a failed master event read
  Slave_IO_Running: No
  Slave_SQL_Running: Yes
  Seconds_Behind_Master: NULL
#状态二:
  Slave_IO_State: Waiting for master to send event
  Slave_IO_Running: Yes
  Slave_SQL_Running: Yes
  Seconds_Behind_Master: 0
#状态三:
  Slave_IO_State: Queueing master event to the relay log
  Slave_IO_Running: Yes
  Slave_SQL_Running: Yes
  Seconds_Behind_Master: 636

By the transition MySQL master-slave replication thread state , we can see the different meanings of three states:

?
1
2
3
4
5
6
7
8
9
10
# 状态一
# 线程正尝试重新连接主服务器,当连接重新建立后,状态变为Waiting for master to send event。
Reconnecting after a failed master event read
# 状态二
# 线程已经连接上主服务器,正等待二进制日志事件到达。如果主服务器正空闲,会持续较长的时间。如果等待持续slave_read_timeout秒,则发生超时。此时,线程认为连接被中断并企图重新连接。
Waiting for master to send event
 
# 状态三
# 线程已经读取一个事件,正将它复制到中继日志供SQL线程来处理。
Queueing master event to the relay log

Here, we can guess, for some reason, continue to be disconnected from the server and the main server and attempt to reconnect, reconnection after disconnection again.

Let us look at the operation of the host:

.. Found the problem in 10.144.63 10.144.68 * and * on both machines, which we see one of the error log:

190214 11:33:20 [Note] Slave: received end packet from server, apparent master shutdown: 
190214 11:33:20 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.005682' at postion 13628070

Get keyword Slave: received end packet from server, apparent master shutdown: Google search, in the article Confusing MySQL Replication Error Message can see the reason for the two backup machine server-id repeated.

One day it happen to me, and took me almost an hour to find that out.
Moving foward I always use a base my.cnf to I copy to any other server and the first thing is to increase the server-id.
Could MySQL just use the servername intead of a numeric value?

Bug fixes

Locate the problem, we confirm whether the next repetition, the two found that the field of backup machine does the same:

?
1
2
3
4
5
6
7
vim my.cnf
 
#replication
log-bin=mysql-bin
# 这个随机数字相同导致的
server- id =177230069
sync_binlog=1

更改一个其他不同的数字,保存,重启MySQL进程,报警恢复。

总结

最终来看,这个问题的解决非常简单,但从刚开始的迷茫到最后的思路清晰,都是我们排查问题所常见的,这篇文章的主要收获是让你明白主从同步的机制和追查问题的思路,希望下次我们都能很快的解决主从同步带给我们的问题。

好了,以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对脚本之家的支持。

Guess you like

Origin www.cnblogs.com/HKROnline-SyncNavigator/p/10971471.html