Redis high-availability advanced sentinel mechanism principle

Redis high-availability advanced sentinel mechanism principle

(I. Overview

In the last article, we mainly talked about the content of Redis master-slave replication, but Redis master-slave replication has a shortcoming. When the Master goes down, we need to solve it manually, such as using the slave of no one command. In fact, master-slave replication does not achieve true high availability. High availability focuses on backup machines, using the redundancy of the system in the cluster. When a machine in the system is damaged, other backup machines can quickly take over to start the service. Obviously, we need a mechanism to enhance the master-slave replication so that it can truly be highly available, so the sentry mode came into being.

Redis Sentinel (Sentinel) is a distributed architecture, which contains several Sentinel nodes and Redis database nodes. Each Sentinel node monitors the database node and other Sentinel nodes. When it finds that the node is unreachable, it will do the following Line logo. If the master node is identified, it will also "negotiate" with other Sentinel nodes. When most Sentinel nodes think that the master node is unreachable, they will elect a Sentinel node to complete the automatic failover work. The Redis application will be notified of this change in real time. The entire process is completely automatic and does not require human intervention, so this solution effectively solves the high availability problem of Redis.

Insert picture description here

(Two) failover process

1. Primary node failure

The master node fails. At this time, the two slave nodes lose connection with the master node, and the master-slave replication fails:

Insert picture description here
According to the general master-slave replication architecture, we need to manually turn a slave node into a master node. But with the sentinel mode, we can let it be done automatically without manual intervention.

2. Multi-sentinel monitoring

Each Sentinel node finds that the master node has failed through regular monitoring:

Insert picture description here
Each Sentinel sends a PING command to the Master, Slave, and other Sentinel instances it knows once a second. If an instance is more than the value specified by the own-after-milliseconds option from the last valid reply to the PING command, the instance will be marked as subjective offline by Sentinel. If a Master is marked as subjectively offline, all Sentinels that are monitoring this Master must confirm that the Master has indeed entered the subjective offline state at a frequency of once per second. When a sufficient number of Sentinel (greater than or equal to the value specified in the configuration file) confirm that the Master has indeed entered the subjective offline state within the specified time range, the Master will be marked as objectively offline.

In general, each Sentinel will send an INFO command to all Masters and Slaves it knows at a frequency of once every 10 seconds. When the Master is marked as objectively offline by Sentinel, the frequency of sending INFO commands from all slaves of the offline Master by Sentinel will be changed from once every 10 seconds to once every second. If there are not enough Sentinel instances to agree that the Master has gone offline, the objective offline status of the Master will be removed. If the Master returns a valid reply to the Sentinel PING command again, the Master's subjective offline status will be removed.

3. Election of leaders

After discovering the failure of the master node, multiple Sentinel nodes reach an agreement on the failure of the master node, and then one of the nodes will be elected as the leader to be responsible for the failover operation:

Insert picture description here
When a redis service is judged to be objectively offline, multiple sentinels monitoring the service negotiate and elect a leader sentinel to be responsible for failover operations for the redis service. The election leader sentinel follows the following rules:

  • All sentinels have the qualifications to be fairly elected as leaders.
  • All sentinels have and only have one chance to elect a certain sentinel as the leader (in a round of election). Once a certain sentinel is elected as the leader, it cannot be changed.
  • Sentinel sets the leader sentinel on a first-come, first-served basis. Once the current sentinel sets the leader sentinel, other sentinel requests to become the leader in the future will be rejected.
  • Every sentinel that finds that the service is objectively offline will ask other sentinels to set themselves as the leader.
  • When a sentinel (source sentinel) sends the is-master-down-by-addr ip port current_epoch runid command to another sentinel (destination sentinel), the runid parameter is not *, but the sentinel run id, which means that the source sentinel requires the target Sentinel elected him as the leader.
  • The source sentinel will check the response of the target sentinel to its request to be set as the leader. If the returned leader_runid and leader_epoch are the source sentinel, it means that the target sentinel agrees to set the source sentinel as the leader.
  • If a sentinel is set as the leader by more than half of the sentinels, then the sentinel becomes the leader.
  • If the leader sentinel is not elected within the time limit, a tentative period of time will be set for election.

So why does Redis design this leader selection process? Simply put, it is because only one sentinel node can complete the failover.

4. Automatic failover

The Sentinel leader node performed a failover operation. The whole process is basically the same as our manual adjustment, but it is done automatically:

Insert picture description here
The so-called failover is the operation of selecting a suitable slave to be promoted to the master when the master is down, sentinel will automatically complete this, and we do not need to manually implement it. A failover operation is roughly divided into the following processes:

  • Found that the main server has entered an objective offline state.
  • Increment the current cluster and try to be elected in this cluster.
  • If the election fails, it will try to be elected again after twice the set failure migration timeout period. If successful, then perform the following steps:
  • Select a slave server and upgrade it to the master server.
  • Send the SLAVEOF NO ONE command to the selected slave server to turn it into the master server.
  • Through the publish and subscribe function, the updated configuration is propagated to all other Sentinels, and other Sentinels update their own configurations.
  • Send the SLAVEOF command to the slave servers of the offline master server to let them replicate the new master server.
  • When all the slave servers have started to replicate the new master server, the leader Sentinel terminates the failover operation.

Whenever a Redis instance is reconfigured, whether it is set as a master server, a slave server, or a slave server set to another master server, Sentinel will send a CONFIG REWRITE command to the reconfigured instance, thereby Make sure that these configurations will persist on the hard drive.

The above steps mentioned the selection of a new master node. Let's take a look at the rules Sentinel uses to select a new master server:

  • Among the slave servers under the failed master server, those that are marked as subjectively offline, disconnected, or the last time to reply to a PING command for more than five seconds will be eliminated.
  • Among the slave servers under the failed master server, those slave servers that have been disconnected from the failed master server for more than ten times the duration specified by the down-after option will be eliminated.
  • After going through the above two rounds of elimination, we select the slave server with the largest replication offset as the new master server. If the replication offset is not available, or the replication offset of the slave server is the same, the slave server with the smallest run ID becomes the new master server.

5. Re-elect the master node

After failover, the entire Redis Sentinel structure re-elected a new master node:

Insert picture description here

(3) Summary

Insert picture description here
Sentinel is the guarantee for Redis to achieve high availability. The role of the Sentinel system is to monitor the Redis server cluster. It can continuously obtain the status of the redis cluster. When a master node goes down, the failover operation will select a new master node from the slave nodes. Here, the failover is led by Sentinel. Finished.

Don't think about Sentinel too complicated, it is actually a Redis server with a special working mode. Redis is deployed in a cluster. Sentinel here also needs to be deployed in a cluster. If it is a single-point deployment, Sentinel is down. At this time, the Redis cluster Just hang up.

September 18, 2020

Guess you like

Origin blog.csdn.net/weixin_43907422/article/details/108652508