Explain the principle and mechanism of redis sentinel (sentry mode) in detail

Three timed tasks

Sentinel has 3 timing tasks internally

1. Every 10 seconds, each sentinel will execute the info command on the master and slave

This task serves two purposes:

1. Discover the slave node

2. Confirm the master-slave relationship

2. Every 2 seconds, each sentinel exchanges information through the channel of the master node (pub/sub)

There is a publish-subscribe channel (__sentinel__:hello) on the master node.

Sentinel nodes exchange information (the "view" of the node and its own information) through the __sentinel__:hello channel to reach a consensus.

3. Each sentinel pings other sentinel and redis nodes every 1 second (mutual monitoring)

This is actually a heartbeat detection, which is the basis for failure judgment.

Subjective downline and objective downline

There are two configurations in the redis-sentinel conf file:

1.sentinel monitor <masterName> <ip> <port> <quorum>

The meaning of the four parameters:

masterName is a distinguishing identifier for a master+slave combination (a set of sentinel can monitor multiple master+slave combinations).

ip and port are the ip and port number of the master node.

The quorum parameter is a basis for objective offline , which means that at least quorum sentinels subjectively believe that the master is faulty, and then the master will be offline and failover. Because sometimes, a sentinel node may not be able to connect to the master due to its own network reasons, but the master does not fail at this time, so this requires multiple sentinels to agree that the master has a problem before proceeding to the next step. , which ensures fairness and high availability.

2.sentinel down-after-milliseconds <masterName> <timeout> 

This configuration is actually a basis for subjective offline. Needless to say, the parameter masterName, timeout is a millisecond value, which means: if the sentinel exceeds the timeout, it cannot connect to the master including the slave (slave does not need to be offline objectively, because If you do not need failover), you will subjectively think that the master has been offline (the actual offline needs to be judged objectively before going offline)

So, how do multiple sentinels reach consensus?

This is dependent on the second scheduled task mentioned above. A sentinel first subjectively offlines the master node, and then uses the sentinel is-master-down-by-addr command to ask whether the corresponding node is also offline. It is also believed that the master node of the addr needs to be offline objectively. Finally, when the number of sentinels reaching this consensus reaches the value set by the quorum mentioned above, the master node will be offline and failover will be performed. The value of quorum is generally set to one-half the number of sentinels plus 1, for example, 3 sentinels are set to 2

leader election

Why choose leaders? Because there can only be one sentinel node to complete the failover

The sentinel is-master-down-by-addr command has two functions, one is to confirm the offline judgment, and the other is to conduct leader election.

Election process:

1. Each sentinel node doing subjective offline sends the above command to other sentinel nodes, asking it to be set as the leader.

2. If the sentinel node receiving the command has not agreed to the command sent by other sentinels (has not yet voted), then it will agree, otherwise it will be rejected.

3. If the sentinel node finds that it has more than half of its votes and reaches the quorum value, it will become the leader

4. If multiple sentinels become the leader in this process, it will wait for a period of time to re-elect.

failover

The so-called failover is the operation of selecting a suitable slave to be promoted to the master when the master is down. Redis-sentinel will automatically complete this operation, and we do not need to do it manually.

So, how to choose a suitable slave? The order is as follows:

1. Select the slave node with the highest slave-priority (slave node priority configuration), (the default is the same) For example: if we have two slaves on two machines, one with a higher configuration, we want to hang when the master If you drop the one with the highest preference configuration, you can configure the value to be the highest among the slaves. Return if the highest exists, continue if it does not exist

2. Select the node with the largest replication offset (the most complete replication and higher data consistency with the master node), return if it exists, and continue if it does not exist

3. If the above two conditions are not met, select the one with the smallest runId (the earliest start).

Supplementary point: You can also perform manual failover to any sentinel with sentinel failover <masterName>, so that you do not need to go through the above subjective and objective and election process.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324897390&siteId=291194637