Redis cluster--shrinkage process (principle)

Original URL: Redis Cluster--The Process of Shrinkage (Principle)_IT Blog - CSDN Blog

Introduction

This article introduces the process (principle) of Redis Cluster shrinkage.

Process overview

Shrinking the cluster means reducing the scale, and some nodes need to be safely offline from the existing cluster. The safe offline node process is shown in the figure below.

Flow Description:

  1. First of all, it is necessary to determine whether the offline node has a responsible slot. If so, the slot needs to be migrated to other nodes to ensure the integrity of the entire cluster slot node mapping after the node is offline.
  2. When the offline node is no longer responsible for the slot or is a slave node, it can notify other nodes in the cluster to forget the offline node, and when all nodes forget the node, they can be shut down normally.

1. Offline migration slot

Offline nodes need to migrate the slots they are responsible for to other nodes. The principle is the same as the process of migrating slots for node expansion. For example, if we take nodes 6381 and 6384 offline, the node information is as follows:

6381 is the master node, responsible for slots (12288-16383), and 6384 is its slave node, as shown in Figure 10-26. Before offline 6381, the responsible slot needs to be migrated to other nodes.

Shrinkage is just the opposite of expansion migration. 6381 becomes the source node, and other master nodes become target nodes. The source node needs to evenly migrate the 4096 slots it is responsible for to other master nodes. Here, the redis-trib.rb reshard command is used directly to complete the slot migration. Since there can only be one target node each time the reshard command is executed, it is necessary to execute the reshard command three times to migrate 1365, 1365, and 1366 slots respectively, as shown below:

After the slot migration is completed, node 6379 takes over 1365 slots 12288~13652, as shown below:

Continue to migrate 1365 slots to node 6380:

After completion, node 6380 took over 1365 slots 13653~15017, as shown below:

127.0.0.1:6379> cluster nodes
40b8d09d44294d2e23c7c768efc8fcd153446746 127.0.0.1:6381 master - 0 1469896123295 2
    connected 15018-16383
8e41673d59c9568aa9d29fb174ce733345b3e8f1 127.0.0.1:6380 master - 0 1469896125311 11
    connected 6827-10922 13653-15017
...

Migrate the last 1366 slots to node 6385, as follows:


So far, all the slots of node 6381 have been moved out, and 6381 is no longer responsible for any slots. The status looks like this:


After the migration of the offline node slot is completed, the remaining steps need to make the cluster forget the node. 

2. Forget Node

Since the nodes in the cluster are constantly exchanging node status with each other through Gossip messages, it is necessary to use a robust mechanism to make all nodes in the cluster forget the offline nodes. That is to say, other nodes no longer exchange Gossip messages with the node to be offline. Redis provides the cluster616forget{downNodeId} command to implement this function, as shown in the figure below.

Execute the cluster forget operation on all nodes within 60 seconds of the validity period. When the node receives the cluster forget{down NodeId} command, it will add the node specified by nodeId to the forbidden list, and the nodes in the forbidden list will no longer send Gossip messages. The validity period of the forbidden list is 60 seconds, and the node will participate in message exchange again after 60 seconds. That is to say, when the forget command is issued for the first time, we have 60 seconds to make all nodes in the cluster forget about offline nodes.

For online operations, it is not recommended to use the cluster forget command to offline nodes directly. It needs to interact with a large number of node commands. The actual operation is too cumbersome and it is easy to miss the forget node. It is recommended to use the redistrib.rb del-node{host: port}{downNodeId} command, the pseudocode of the internal implementation is as follows: 

It can be seen from the pseudo-code that the del-node command has helped us realize the follow-up operation of safe offline. When the offline master node has a slave node, it is necessary to point the slave node to other master nodes. Therefore, for the case where both the master and slave nodes are offline, it is recommended to offline the slave node first and then the master node to prevent unnecessary full replication. For the offline operation of 6381 and 6384 nodes, the command is as follows:


Confirm the node status after the node goes offline:


Nodes 6384 and 6381 are no longer included in the cluster node status. So far, we have completed the safe offline of nodes. The new cluster structure is shown in the figure below.

This section introduces the principle and operation method of Redis cluster scaling. It is the most important function after Redis clustering. After mastering the cluster scaling skills, you can calmly deal with the online data scale and concurrency.

Guess you like

Origin blog.csdn.net/feiying0canglang/article/details/128920351