Redis master-slave replication architecture and mechanisms Sentinel Sentinel

A, redis master-slave replication principle

redis master-slave synchronization strategy: Slave just joined the cluster will trigger a full amount of synchronization (full volume copy). After the full amount of synchronization, incremental copy. slave priority is incremental synchronization , incremental synchronization fails if the full amount will attempt to replicate from the master node.
Copy the full amount: Slave initialization phase

As FIG.

  1. First master node connected to a slave server, transmits SYNC command.
  2. master node receives the SYNC command start command BGSAVE (delayed write, read, normal) command generates RDB (snapshot) file, during which write requests are cached master's.
  3. RDB generated, master RDB is sent to all the slave.
  4. slave receive old data files are abandoned before the RDB, and then load the new RDB.
  5. master send complete RDB, a write command is executed before blocking, and these commands synchronized to the slave node. (Incremental copy)
  6. After the above steps are completed master and slave to work properly.

It is worth noting that the full amount of the master-slave replication are non-blocking, asynchronous replication.


Incremental Copy: Master each write command will be synchronized to the slave, slave to receive commands to perform the corresponding command.

HTTP: file in step 3 step process, the Master transmits to the slave in a network failure, after the reconnection, is copied only to the Master slave missing portion (records at offset).

master node maintains in memory a backlog, master and slave keeps a replica offset and a master run id, offset is stored in the backlog of. If the master and slave network connections cut off, slave master will continue to start copying from the last replica offset, if the corresponding offset is not found, it will perform a resynchronization.

Master-slave replication strategy out of memory : slave will not take the initiative to eliminate expired key, master dispose of the expired key, del sends a command to the slave, out of synchronization data.

Heartbeat data: Master sent once every 10 seconds by default heartbeat, slave node every one second sends a heartbeat.

Note that if the master-slave architecture, it is recommended to be turned master node of persistent, does not recommend using slave node as the master node hot backup data, because in that case, if you turn off persistence master, the master may be down restart when data is empty, and then may go through a copy, slave node data is also lost.
In addition, master of various backup solutions they need to do. In case of all local files you are missing, and the selection of a master rdb to restore from a backup, so as to ensure startup, there is data, even with the high availability mechanisms, slave node can automatically take over the master node, but may also sentinel not detected master failure, master node automatically restart, or may result in all of the above slave node data is cleared.

Second, the mechanism Sentinel

Before the main sentry from architecture is required in the respective configuration file to configure your own master or slave node manually. (Configuration facilitate understanding of the back propagation)
If the use of the main program from redis, then when the master node goes down, the standby switching process requires manual operation and maintenance personnel is completed, the artificial failover is very time consuming and op requires maintenance personnel is very high.
So there will be built based on redis Sentinel (Sentinel) mechanism for high availability architecture.

1,sentinel

Chinese sentry. Sentinel is a very important institution in a redis cluster components, mainly in the following functions:

  • Cluster Monitoring: redis master and slave is responsible for monitoring the process are working properly.
  • Message notification: If a redis instance fails, then the sentry responsible for sending the message as an alarm notification to the administrator.
  • Failover: If the master node hang, will be automatically transferred to the slave node.
  • Configuration Center: If a failover occurs, notify the client the client a new master address.

Sentinel achieve high availability for redis cluster itself is distributed, a sentry to run as a cluster, with each other to work together.

  • Failover, determine whether a master node goes down, most of the guards are required consent to it, it comes to the problems of distributed election.
  • Even if some sentinel node hang, Sentinel cluster can still work properly, because if a failover system as an important part of the high availability mechanism itself is a single point, it would be very pit father.

2, core knowledge

  • Sentinel requires at least three instances, to ensure their robustness.
  • Sentinel + redis deployment architecture master-slave is no guarantee zero data loss, can only ensure high availability redis clusters.
  • For Sentinel + redis master-slave architecture of this complex deployments, as far as possible in the test and production environments, we have carried out adequate tests and drills.

3, failover

  • sdown downtime is subjective, it is a sentinel if they feel that a master is down, so is the subjective downtime
  • odown objective is down, if the quorum number of sentinel feel a master is down, so is the objective of downtime

Conditions sdown to achieve is very simple, if after a sentinel ping a master, more than the number of milliseconds is-master-down-after-milliseconds specified on the subjective view master is down; if a sentinel within the specified time, received a quorum number of other guards also believes that the master is sdown, then it is considered to be a odown.

Three, slave-> master election algorithm

If a master is considered odown it, and majority of the number of sentinel allow the switchover, it will perform a sentry standby switching operation, first of all at this time to elect a slave, the slave of some of the information will be considered:

  • When disconnecting the connection with the master of the long
  • slave priority
  • Copy offset
  • run id

If a time connected with the slave master disconnect has more than 10-fold down-after-milliseconds, plus the length of master goes down, then the slave was not considered suitable for the elections to master.
(down-after-milliseconds * 10 ) + milliseconds_since_master_is_in_SDOWN_state

Next will be sorted slave:

  • Slave sorted by priority, the lower the slave priority, the higher the priority.
  • If the slave priority are the same, look replica offset, which is more of a slave to copy data, after offset by the more, the higher the priority.
  • If the above two conditions are identical, then select a run id relatively small one slave.

quorum and majority:


Every time a sentry to do the switchover, first quorum required number of Sentinel think odown, then elected to switch to do a sentinel, the sentinel sentry majority also need to be authorized in order to execute the formal handover.
If the quorum <majority, such as 5 sentinel, Majority is 3, quorum is set to 2, then the three sentinel can perform handover authorization.
However, if the quorum> = majority, you must have authorization quorum number of sentries, such as five Sentinel, quorum is 5, then you must have agreed to authorize five guards to perform switching.


Standby switching caused by data loss


Two cases of data loss:

  • Asynchronous replication of data loss caused by

Because master-> slave replication is asynchronous, so there may be some data is not copied to the slave, master it goes down, then this part of the data is lost.

  • Data loss caused by brain split

Split brain, that is to say, a master machine where suddenly out of the normal network, the machine can not connect with other slave, but in fact master is also running. At this time, Sentinel might think that master is down, and then turn the election, the other slave switch became master. This time, the cluster will have two master, the so-called split-brain.
At this time, although a slave is switched became a master, but the client may not had time to switch to a new master, but also continued to write data to the old master. So when the old master once again restored, it will be as a slave to hang up the new master, your data will be cleared, re-copy the data from the new master. The new master data later client did not write, so this part of the data will be lost.


solution:


As follows:

1 min-slaves-to-write 1
2 min-slaves-max-lag 10

He represents, requires at least a Slave, data replication and synchronization delay must not exceed 10 seconds.
If Once all the slave, data replication and synchronization delays of more than 10 seconds, so this time, master will not receive any requests.

  • Reduce the loss of asynchronous replication of data

With the min-slaves-max-lag this configuration, you can ensure that once the slave to copy data and ack delay is too long, they think too much data could master downtime losses, then refused to write requests, so you can As part of the data is not synchronized to the slave data lost due to decrease of the controllable range when the master down.

  • Split brain to reduce data loss

If there has been a master brain split, with other slave lost connection, then the above two configuration ensures that if it continues to send data to the specified number of slave can not, and slave no more than 10 seconds to give yourself ack message, directly rejected write the client's request. Therefore, in split-brain scenario, up to 10 seconds of data are lost.


Fourth, the cluster of auto-discovery mechanism Sentinel


Sentinel found each other, is achieved by redis the pub / sub systems, each Sentinel will go __sentinel __: hello this channel. Send a message, this time can be consumed all the other guards to the news, and other perceived Sentinel's existence.
Every two seconds, each in his own guards will monitor a master + slaves corresponding __sentinel __: hello channel in sending a message that says your host, ip and runid there to monitor this master configuration.
Each Sentinel to monitor their own will monitor each master + __sentinel __ slaves corresponding: hello channel, then the listener to perceive the same in the presence of other master + slaves of the Sentinel.
Each sentinel surveillance will be exchanged with other configuration of the master of Sentinel, monitoring each configured for synchronization.


Five, configuration propagation


After completing the handover Sentinel will update the latest generation in their own local master configuration, and then sync to other Sentinel is said before by the pub / sub messaging.
Before the version number here is very important, because all kinds of messages are posted and listen to through a channel, so after a sentry to complete a new switch, the new master configuration is followed by a new version number. Other sentinels are based on the size of the version number to update their master configuration.


References:
  https://github.com/hello-shf/advanced-java
  https://www.cnblogs.com/daofaziran/p/10978628.html

  if the wrong place also please leave a message correction.
  The original is not easy, please indicate the original address: https: //www.cnblogs.com/hello-shf/p/12059902.html

Guess you like

Origin www.cnblogs.com/hello-shf/p/12072330.html