Redis cluster--the process of expansion (principle)

Original website: Redis cluster -- expansion process (principle)

Introduction

illustrate

This article introduces the process (principle) of Redis's Cluster cluster expansion.

overview

Capacity expansion is the most common requirement for distributed storage. Redis cluster capacity expansion can be divided into the following steps:

  1. Prepare new nodes.
  2. Join the cluster.
  3. Migrate slots and data.

1. Prepare a new node

The new node needs to be prepared in advance and run in cluster mode. It is recommended that the new node be consistent with the node configuration in the cluster to facilitate unified management. After the configuration is ready, start the two node commands as follows:

redis-server conf/redis-6385.conf
redis-server conf/redis-6386.conf

After startup, the new node runs as an orphan node, and no other nodes communicate with it. The cluster structure is shown in the figure below.

2. Join the cluster

New nodes still use the cluster meet command to join the existing cluster. Execute the cluster meet command on any node in the cluster to allow nodes 6385 and 6386 to join in. The command is as follows:

127.0.0.1:6379> cluster meet 127.0.0.1 6385
127.0.0.1:6379> cluster meet 127.0.0.1 6386

After the new node joins, the cluster structure is shown in the figure below:

After a period of ping/pong message communication between old and new nodes in the cluster, all nodes will discover new nodes and save their status locally. For example, we can see the new node information by executing the clusternodes command on the 6380 node, as follows:

127.0.0.1:6380>cluster ndoes
1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385 master - 0 1469347800759
    7 connected
475528b1bcf8e74d227104a6cf1bf70f00c24aae 127.0.0.1:6386 master - 0 1469347798743
    8 connected
    ...

The new node is in the master node state at the beginning, but because there is no responsible slot, it cannot accept any read and write operations. For subsequent operations on new nodes, we generally have two options:

  • Migrate slots and data for it to achieve capacity expansion.
  • Slaves acting as other masters are responsible for failover.

The redis-trib.rb tool also implements the command to add new nodes to the existing cluster, and also implements the support of directly adding as a slave node. The command is as follows:

redis-trib.rb add-node new_host:new_port existing_host:existing_port --slave
--master-id <arg>

The cluster meet command is also used internally to realize the function of joining a cluster. For the previous operation of joining the cluster, we can use the following command to join the new node:

redis-trib.rb add-node 127.0.0.1:6385 127.0.0.1:6379
redis-trib.rb add-node 127.0.0.1:6386 127.0.0.1:6379

Operation and maintenance tips

In a formal environment, it is recommended to use the redis-trib.rb add-node command to add a new node. This command will perform a new node status check internally. If the new node has already joined another cluster or contains data, the cluster joining operation will be abandoned and the following information will be printed:

[ERR] Node 127.0.0.1:6385 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

If we manually execute the cluster meet command to join nodes that already exist in other clusters, 601 will cause the cluster of the added node to be merged into the existing cluster, resulting in data loss and confusion. The consequences are very serious, so be cautious when operating online.

3. Migration slot and data

After joining the cluster, you need to migrate the slot and related data for the new node. During the migration process, the cluster can provide read and write services normally. The migration process is the core link of cluster expansion, which will be explained in detail below.

1. Slot Migration Plan

Slot is the basic unit of Redis cluster management data. Firstly, it is necessary to formulate a slot migration plan for the new node, and determine which slots of the original node need to be migrated to the new node. The migration plan needs to ensure that each node is responsible for a similar number of slots, so that the data across nodes is evenly distributed. For example, add 6385 nodes to the cluster, as shown in the figure below. After joining the 6385 node, the number of slots the original node is responsible for changes from 6380 to 4096.

After the slot migration plan is determined, the data in the slots will be migrated from the source node to the target node one by one, as shown in the figure below.

2. Migrating data

The data migration process is performed slot by slot, and the flow of data migration for each slot is shown in the figure below.

Flow Description:

  • 1) Send the cluster setslot{slot}importing{sourceNodeId} command to the target node, so that the target node is ready to import the data of the slot.
  • 2) Send the cluster setslot{slot}migrating{targetNodeId} command to the source node, so that the source node prepares to migrate data out of the slot.
  • 3) The source node cyclically executes the cluster getkeysinslot{slot}{count} command to obtain count keys belonging to the slot {slot}.
  • 4) Execute the migrate{targetIp}{targetPort}""0{timeout}keys{keys...} command on the source node, and migrate the obtained keys to the target node in batches through the pipeline mechanism, and migrate the version of migrate in batches The command is available in versions above Redis 3.0.6, and the previous migrate command can only migrate a single key. For scenarios with a large number of keys, batch key migration will greatly reduce the number of network IOs between nodes.
  • 5) Repeat step 3) and step 4) until all key-value data under the slot is migrated to the target node.
  • 6) Send the cluster setslot{slot}node{targetNodeId} command to all master nodes in the cluster to notify that the slot is assigned to the target node. In order to ensure the timely propagation of slot node mapping changes, it is necessary to traverse and send to all master nodes to update the migrated slots to point to new nodes.

Use pseudocode to simulate the migration process as follows:

According to the above process, we manually use the command to migrate the slot 4096 responsible for the source node 6379 to the target node 6385. The process is as follows:

1) The target node is ready to import slot 4096 data:

127.0.0.1:6385>cluster setslot 4096 importing cfb28ef1deee4e0fa78da86abe5d24566744411e
OK

Confirm that slot 4096 import status is enabled:

127.0.0.1:6385>cluster nodes
1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385 myself,master - 0 0 7 connected
    [4096-<-cfb28ef1deee4e0fa78da86abe5d24566744411e]
...

2) The source node prepares to export slot 4096 data:

127.0.0.1:6379>cluster setslot 4096 migrating 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
OK

Confirm that slot 4096 export status is enabled:

605127.0.0.1:6379>cluster nodes
cfb28ef1deee4e0fa78da86abe5d24566744411e 127.0.0.1:6379 myself,master - 0 0 0 connected
    0-5461 [4096->-1a205dd8b2819a00dd1e8b6be40a8e2abe77b756]
...

3) Get the keys corresponding to slot 4096 in batches, here we get 3 keys in this slot:

127.0.0.1:6379> cluster getkeysinslot 4096 100
1) "key:test:5028"
2) "key:test:68253"
3) "key:test:79212"

Confirm that these three keys exist on the source node:

127.0.0.1:6379>mget key:test:5028 key:test:68253 key:test:79212
1) "value:5028"
2) "value:68253"
3) "value:79212"

Migrate these 3 keys in batches, the migrate command ensures the atomicity of each key migration process:

127.0.0.1:6379>migrate 127.0.0.1 6385 "" 0 5000 keys key:test:5028 key:test:68253
key:test:79212

For demonstration purposes, we continue to query these three keys and find that they are no longer in the source node. Redis returns an ASK steering error. ASK steering is responsible for guiding the client to find the node where the data is located. The details will be explained in Section 10.5 "Request Routing" later .

127.0.0.1:6379> mget key:test:5028 key:test:68253 key:test:79212
(error) ASK 4096 127.0.0.1:6385

Notify all master nodes that slot 4096 is assigned to target node 6385:

127.0.0.1:6379>cluster setslot 4096 node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
127.0.0.1:6380>cluster setslot 4096 node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
127.0.0.1:6381>cluster setslot 4096 node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
127.0.0.1:6385>cluster setslot 4096 node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756

606 confirms that source node 6379 is no longer in charge of slot 4096 and is changed to target node 6385 in charge of:
 

127.0.0.1:6379> cluster nodes
cfb28ef1deee4e0fa78da86abe5d24566744411e 127.0.0.1:6379 myself,master - 0 0 0 connected
    0-4095 4097-5461
1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385 master - 0 1469718011079 7
    connected 4096
...

Manually executing the command to demonstrate the slot migration process is to let readers better understand the migration process. The actual operation must involve a large number of slots and each slot corresponds to a lot of keys. Therefore, redis-trib provides the slot re-sharding function, the command is as follows:

redis-trib.rb reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout
    <arg> --pipeline <arg>

Parameter Description:

  • host:port: Required parameter, the address of any node in the cluster, used to obtain the entire cluster information.
  • --from: Specify the id of the source node. If there are multiple source nodes, use commas to separate them. If all source nodes become all master nodes in the cluster, the user will be prompted for input during the migration process.
  • --to: The id of the target node that needs to be migrated. Only one target node can be filled in, and the user is prompted to enter during the migration process.
  • --slots: The total number of slots that need to be migrated, the user is prompted for input during the migration process.
  • --yes: When printing out the reshard execution plan, whether to require the user to input yes to confirm before executing reshard.
  • --timeout: Control the timeout period of each migrate operation, the default is 60000 milliseconds.
  • --pipeline: Control the number of keys for each batch migration, the default is 10.

The reshard command simplifies the workload of data migration, and its internal data migration for each slot also uses the previous process. We have migrated a slot 4096 for the new node 6395, and the rest of the slot data migration is done using redis-trib.rb, the command is as follows:
 

#redis-trib.rb reshard 127.0.0.1:6379
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: cfb28ef1deee4e0fa78da86abe5d24566744411e 127.0.0.1:6379
slots:0-4095,4097-5461 (5461 slots) master
1 additional replica(s)
M: 40b8d09d44294d2e23c7c768efc8fcd153446746 127.0.0.1:6381
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 8e41673d59c9568aa9d29fb174ce733345b3e8f1 127.0.0.1:6380
slots:5462-10922 (5461 slots) master
1 additional replica(s)
M: 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385
slots:4096 (1 slots) master
0 additional replica(s)
// ...
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

After printing out the information of each node in the cluster, the reshard command needs to confirm the number of migrated slots, here we enter 4096:

How many slots do you want to move (from 1 to 16384)4096

Enter the node ID of 6385 as the target node, and only one target node can be specified:

What is the receiving node ID 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756

Then enter the ID of the source node. Here, enter the three node IDs of nodes 6379, 6380, and 6381 respectively, and finally use done to indicate the end:

Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
608Type 'done' once you entered all the source nodes IDs.
Source node #1:cfb28ef1deee4e0fa78da86abe5d24566744411e
Source node #2:8e41673d59c9568aa9d29fb174ce733345b3e8f1
Source node #3:40b8d09d44294d2e23c7c768efc8fcd153446746
Source node #4:done

Before the data migration, it will print out the plan of all slots from the source node to the target node. After confirming that the plan is correct, enter yes to execute the migration work:

Moving slot 0 from cfb28ef1deee4e0fa78da86abe5d24566744411e
....
Moving slot 1365 from cfb28ef1deee4e0fa78da86abe5d24566744411e
Moving slot 5462 from 8e41673d59c9568aa9d29fb174ce733345b3e8f1
...
Moving slot 6826 from 8e41673d59c9568aa9d29fb174ce733345b3e8f1
Moving slot 10923 from 40b8d09d44294d2e23c7c768efc8fcd153446746
...
Moving slot 12287 from 40b8d09d44294d2e23c7c768efc8fcd153446746
Do you want to proceed with the proposed reshard plan (yes/no) yes

The redis-trib tool will print out the progress of each slot migration, as follows:

Moving slot 0 from 127.0.0.1:6379 to 127.0.0.1:6385 ....
....
Moving slot 1365 from 127.0.0.1:6379 to 127.0.0.1:6385 ..
Moving slot 5462 from 127.0.0.1:6380 to 127.0.0.1:6385: ....
....
Moving slot 6826 from 127.0.0.1:6380 to 127.0.0.1:6385 ..
Moving slot 10923 from 127.0.0.1:6381 to 127.0.0.1:6385 ..
...
Moving slot 10923 from 127.0.0.1:6381 to 127.0.0.1:6385 ..

When all the slot migration is completed, the reshard command exits automatically, and executes the cluster nodes command to check the changes of node and slot mapping, as follows:

127.0.0.1:6379>cluster nodes
40622f9e7adc8ebd77fca0de9edfe691cb8a74fb 127.0.0.1:6382 slave cfb28ef1deee4e0fa
78da86abe5d24566744411e 0 1469779084518 3 connected
40b8d09d44294d2e23c7c768efc8fcd153446746 127.0.0.1:6381 master - 0
1469779085528 2 connected 12288-16383
4fa7eac4080f0b667ffeab9b87841da49b84a6e4 127.0.0.1:6384 slave 40b8d09d44294d2e2
3c7c768efc8fcd153446746 0 1469779087544 5 connected
be9485a6a729fc98c5151374bc30277e89a461d8 127.0.0.1:6383 slave 8e41673d59c9568aa
9d29fb174ce733345b3e8f1 0 1469779088552 4 connected
cfb28ef1deee4e0fa78da86abe5d24566744411e 127.0.0.1:6379 myself,master - 0 0 0
connected 1366-4095 4097-5461
475528b1bcf8e74d227104a6cf1bf70f00c24aae 127.0.0.1:6386 master - 0
1469779086536 8 connected
6098e41673d59c9568aa9d29fb174ce733345b3e8f1 127.0.0.1:6380 master - 0
1469779085528 1 connected 6827-10922
1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385 master - 0
1469779083513 9 connected 0-1365 4096 5462-6826 10923-12287

The slot responsible for node 6385 becomes: 0-136540965462-682610923-12287. Since the order of the slots used in the hash operation is meaningless, it is not necessary to force the nodes to be responsible for the order of the slots. After migration, it is recommended to use the redis-trib.rb rebalance command to check the balance of slots between nodes. The command is as follows:

# redis-trib.rb rebalance 127.0.0.1:6380
>>> Performing Cluster Check (using node 127.0.0.1:6380)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 2.0% threshold.

It can be seen that the difference in the number of slots that all master nodes are responsible for after migration is within 2%, so the cluster node data is relatively uniform and no adjustment is required.

3. Add a slave node

At the beginning of the expansion, we added nodes 6385 and 6386 to the cluster, and node 6385 migrated some slots and data as the master node, but compared with other master nodes, there is no slave node at present, so this node does not have the ability to fail over.

At this time, node 6386 needs to be used as the slave node of 6385, so as to ensure the high availability of the entire cluster. Use the cluster replicate{masterNodeId} command to add the corresponding slave node to the master node. Note that the operation of adding slave nodes to slaveof in cluster mode is no longer supported. As follows:

127.0.0.1:6386>cluster replicate 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756

In addition to initiating a full copy of the master node inside the slave node, it is also necessary to update the cluster-related status of the local node. Check the status of node 6386 to confirm that it has become a slave node of node 6385:

127.0.0.1:6386>cluster nodes
475528b1bcf8e74d227104a6cf1bf70f00c24aae 127.0.0.1:6386 myself,slave 1a205dd8b2
819a00dd1e8b6be40a8e2abe77b756 0 0 8 connected
1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 127.0.0.1:6385 master - 0 1469779083513 9
connected 0-1365 4096 5462-6826 10923-12287
...

At this point, the expansion of the entire cluster is completed, and the cluster relationship structure is shown in the figure below.

Guess you like

Origin blog.csdn.net/feiying0canglang/article/details/128917994