Redis Cluster cluster data sharding mechanism

Redis Cluster data sharding mechanism that advanced developers have to understand

Introduction to Redis Cluster

Redis Cluster is Redis' distributed solution. It was officially launched in version 3.0, which effectively solved the Redis distributed needs.

Redis Cluster is generally composed of multiple nodes, and the number of nodes must be at least 6 to ensure a complete high-availability cluster, of which three are master nodes and three are slave nodes. The three master nodes will allocate slots to handle the client's command requests, and the slave nodes can be used to replace the master node after the master node fails.

 

 

As shown in the above figure, the cluster contains 6 Redis nodes, 3 masters and 3 slaves, namely M1, M2, M3, S1, S2, S3. In addition to data replication between the master and slave Redis nodes, all Redis nodes use the Gossip protocol to communicate and exchange and maintain node metadata information.

Generally speaking, the master Redis node will handle the read and write operations of the Clients, while the slave node only handles the read operation.

Data sharding strategy

The most important point in the distributed data storage solution is data sharding, also known as sharding.

In order to allow the cluster to scale horizontally, the first problem to be solved is how to distribute the entire data set to multiple nodes according to certain rules. Common data sharding methods include: range sharding, hash sharding, and consistent hash Algorithms and virtual hash slots.

Range sharding assumes that the data set is ordered, and puts the data in close order together can well support the traversal operation. The disadvantage of range sharding is that there are hot spots when writing sequentially. For example, the writing of the log type, the order of the general log is related to time, and the time is monotonically increasing, so the hot spot of writing is always in the last shard.

 

 

For relational databases, because of the frequent need for table scans or index scans, basically use a range of sharding strategy.

Redis Cluster uses virtual hash slot partitioning. All keys are mapped to integer slots from 0 to 16383 according to the hash function. The calculation formula is slot = CRC16 (key) & 16383. Each node is responsible for maintaining a part of the slots and the key value data mapped by the slots.

Features of Redis virtual slot partition:

  • Decoupling the relationship between data and nodes simplifies the difficulty of node expansion and contraction.
  • The node itself maintains the slot mapping relationship, and does not require the client or proxy service to maintain slot partition metadata
  • Support mapping query between nodes, slots and keys, used for data routing, online cluster scaling and other scenarios.

 

 

Redis cluster provides flexible node expansion and contraction solutions. Without affecting the external services of the cluster, you can add nodes to the cluster to expand the capacity, or you can go offline to reduce the capacity. It can be said that a slot is the basic unit of Redis cluster management data , and cluster scaling is the movement of slots and data between nodes.

Let us first look at the principle of Redis cluster scaling. Then understand how to ensure that the cluster is available during the data migration process of Redis nodes or when the fault is recovered.

Expansion cluster

In order to let readers better understand the capacity expansion operation when going online, we use Redis Cluster commands to simulate the entire process.

 

 

When a new Redis node is running and joining an existing cluster, we need to migrate slots and data for it. First, you must specify a migration plan for the new nodes to ensure that each node is responsible for a similar number of slots after the migration, so as to ensure uniform data for these nodes.

1) First start a Redis node, denoted as M4. 
2) Use the cluster meet command to add the new Redis node to the cluster. At the beginning, the new node is in the state of the master node. Since there is no responsible> slot, it cannot accept any read and write operations. We will migrate the slot and fill the data for him later.
3) Send the cluster setslot {slot} importing {sourceNodeId} command to the M4 node to prepare the target node to import the slot data. > 4) Send the cluster setslot {slot} migrating {targetNodeId} command to the source node, that is, the M1, M2, and M3 nodes, so that the source node> point is ready to move out of the slot data. 
5) The source node executes the cluster getkeysinslot {slot} {count} command to obtain count keys belonging to the slot {slot}, and then performs the operation of step> 6 to migrate key value data.
6) Run the migrate {targetNodeIp} "" 0 {timeout} keys {key ...} command on the source node to migrate the acquired keys to the target node in batches through the pipeline mechanism> the batch migration version of the migrate command is in Redis 3.0. Available in 6 and above.
7) Repeat steps 5 and 6 until all the key-value data under the slot is migrated to the target node. 
8) Send the cluster setslot {slot} node {targetNodeId} command to all master nodes in the cluster to notify the slot to allocate to the target node. In order to ensure that the mapping changes of the slot nodes are propagated in a timely manner, it is necessary to traverse and send all the master nodes to update the migrated slots to execute the new node.

Shrink cluster

Shrinking the node is to take the Redis node offline. The entire process requires the following operation process.

1) First of all, you need to confirm whether the offline node has a responsible slot. If it is, you need to move the slot to another node to ensure the integrity of the node mapping of the entire cluster slot after the node goes offline. 
2) When the offline node is no longer responsible for the slot or is a slave node, it can notify other nodes in the cluster to forget the offline node, and all nodes can be shut down normally after forgetting to change the node.

Offline nodes need to migrate their own slots to other nodes, the principle is the same as the migration slot process of previous node expansion.

 

 

After migrating the slot, you also need to notify all nodes in the cluster that you have forgotten to go offline, which means that other nodes will no longer exchange Gossip messages with the nodes that are going to go offline.

The Redis cluster uses the cluster forget {downNodeId} command to add the specified node to the banned list. Nodes in the banned list no longer send Gossip messages.

Client routing

In cluster mode, the Redis node first calculates the slot corresponding to the key when receiving any key-related commands, and then finds the corresponding node according to the slot. If the node is itself, it processes the key command; otherwise, it returns a MOVED redirection error and notifies the client Request the correct node. This process is called MOVED redirection.

It should be noted that Redis does not simply calculate the key content when calculating the slot. When the key value includes braces, only the content in the brackets is calculated. For example, when the key is user: {10000}: books, only 10000 is calculated for the hash value.

The MOVED error example displays the following information, the hash slot to which the key x belongs is 3999, and the IP and port number of the node responsible for processing this slot is 127.0.0.1:6381. The client needs to send a GET command request to the node to which it belongs based on this IP and port number.

1
< code  class="hljs"></ code >

Since request redirection will increase IO overhead, this is not an efficient way to use Redis clusters, but to use Smart cluster clients. By maintaining the mapping relationship between slots and Redis nodes internally, the Smart client can realize the key-to-node lookup locally, thereby ensuring the maximization of IO efficiency, and the MOVED redirection is responsible for assisting the client to update the mapping relationship.

The Redis cluster supports online migration slots and data to complete horizontal scaling. When the data corresponding to the slot is migrated from the source node to the target node, the client needs to perform intelligent migration to ensure that the key commands can be executed normally. For example, when slot data is migrated from the source node to the target node, some data may appear in the source node and another part in the target node.

 

 

Therefore, based on the above situation, the client command execution process is as follows:

  • The client sends commands to the source node according to the local slot cache, and if there is a key correspondence, it executes directly and returns the result to the client.
  • If the node returns a MOVED error, update the mapping of the local slot to the Redis node, and then re-initiate the request.
  • If the data is being migrated, the node will reply with an ASK redirection exception. The format is as follows: (error) ASK {slot} {targetIP}: {targetPort}

The client extracts the target node information from the ASK redirection exception, sends an asking command to the target node to open the client connection identifier, and then executes the key command.

Although ASK and MOVED both control the redirection of the client, they are essentially different. ASK redirection indicates that the cluster is undergoing slot data migration. The client cannot know when the migration is complete, so it can only be a temporary redirection. The client does not update the mapping cache from slot to Redis node. However, the MOVED redirection indicates that the slot corresponding to the key has been explicitly assigned to the new node, so the mapping cache from slot to Redis node needs to be updated.

Failover

When a small number of nodes in the Redis cluster fail, automatic failover is used to ensure that the cluster can provide services to the outside world normally.

When a Redis node goes offline objectively, the Redis cluster will choose one from its slave nodes to replace it, so as to ensure the high availability of the cluster. This content is not the core content of this article, interested students can learn by themselves.

However, one thing to note. By default, when any of the cluster's 16384 slots is not assigned to a node, the entire cluster is not available. Executing any key command returns the CLUSTERDOWN Hash slot not served command. When the master node holding the slot goes offline, the entire cluster is unavailable from fault discovery to automatic completion of the transfer. For most businesses, this situation is unbearable, so it is recommended to configure the parameter cluster-require-full-coverage to no When the master node fails, it only affects the execution of related commands of the slot it is responsible for, and does not affect the availability of other master nodes.

Guess you like

Origin www.cnblogs.com/xiaozengzeng/p/12682496.html