Introduction to Redis Cluster cluster

Redis, as an open source, in-memory data structure storage system, has won wide recognition in the industry for its excellent performance and rich data structures. However, when we face large amounts of data and high concurrent requests, a single Redis instance may not be able to meet our needs. At this time, we need to use the cluster mode of Redis. Through cluster mode, we can improve the availability and reliability of data, and improve the performance and scalability of the system. In the following articles, I will introduce in detail the basic concepts of Redis cluster, as well as the working principle, failover and expansion of Redis cluster.



1. Introduction to Redis cluster mode

1.1. Overview of Redis cluster mode

Redis cluster mode is a distributed solution provided by Redis. Sentinel solves the problem of high availability, and cluster is the ultimate solution, solving high availability and distributed problems in one fell swoop. In cluster mode, data will be distributed across multiple Redis nodes, and each node is responsible for storing a part of the entire database. This method is called data sharding.

Redis Cluster does not use consistent hashing, but introduces the concept of hash slots. The Redis cluster has 16384 hash slots. When a key-value pair needs to be placed in the Redis cluster, Redis will first perform a CRC16 calculation on the key, and then take the remainder of 16384. The result is the hash slot where the key should be placed. number.

Each Redis node is responsible for a part of the hash slots. For example, in a Redis cluster with 3 nodes, node A may be responsible for hash slots 0-5500, node B may be responsible for hash slots 5501-11000, and node C may be responsible for 11001- Hash slot 16383.

In this way, when a key needs to be accessed (read, write), the Redis cluster will calculate the hash slot number based on the key name, and then find the node responsible for the hash slot.

image-20230911114418005

Redis cluster supports master-slave replication mode. Each node will have 0 or more slave nodes, and data will be copied from the master node to the slave nodes. When the master node goes down, the slave node can be promoted to the master node and continue to provide services.

Redis cluster provides high availability and distributed capabilities, but the client requires certain complexity when using it, such as when processing cross-node transactions and Lua scripts, and redistributing hash slots when adding and deleting nodes.

1.2. Virtual slot partition of Redis cluster

In distributed storage, data sets must be mapped to multiple nodes according to partitioning rules. There are three common data partitioning rules: node remainder partitioning, consistent hash partitioning, and virtual slot partitioning.

Node remainder partitioning (Modulo Partitioning): This method maps data to different nodes by taking the remainder. For example, we can take the modulus of the user ID and the number of nodes, and then store the data on the corresponding node. The advantage of this method is that it is simple to implement and the data distribution is relatively uniform. However, when the number of nodes changes, most of the data needs to be reallocated, which results in a large amount of data migration;

image-20230912143611156Consistent Hashing Partitioning: This method maps data to different nodes through a consistent hash algorithm. The advantage of the consistent hash algorithm is that when the number of nodes changes, only a small part of the data on the hash ring needs to be migrated, which greatly reduces the cost of data migration. At the same time, the consistent hashing algorithm can also ensure relatively uniform data distribution.

For example, in the picture below, Key1 and Key2 will fall into Node1, Key3 and Key4 will fall into Node2, Key5 will fall into Node3, and Key6 will fall into Node4.

image-20230912143735383

But it still has problems: the cache nodes are unevenly distributed on the ring, which will cause greater pressure on some cache nodes; when a node fails, all accesses that this node has to bear will be moved to another node. It will cause force to the following node.

Virtual Slot Partitioning: This method divides the data space into multiple virtual slots, and then maps these virtual slots to different nodes. Redis Cluster uses the virtual slot partitioning method, which divides all key spaces into 16384 virtual slots. The advantage of this method is that when the number of nodes changes, only the virtual slots rather than the data need to be reallocated, which reduces the overhead of data migration. At the same time, virtual slot partitioning can also ensure relatively even data distribution.

1.3. Commonly used commands in Redis cluster

The following are some commonly used commands for Redis clusters:

  1. CLUSTER ADDSLOTS <slot> [slot ...]:在当前节点上添加一个或多个槽。
  2. CLUSTER COUNT-FAILURE-REPORTS <node-id>:返回其他节点对指定节点的故障报告数量。
  3. CLUSTER COUNTKEYSINSLOT <slot>:返回指定槽中的键值对数量。
  4. CLUSTER DELSLOTS <slot> [slot ...]:在当前节点上删除一个或多个槽。
  5. CLUSTER FAILOVER [FORCE|TAKEOVER]:手动触发故障转移,如果指定了 FORCETAKEOVER,则无需等待其他节点的授权。
  6. CLUSTER FLUSHSLOTS:删除当前节点的所有槽信息。
  7. CLUSTER FORGET <node-id>:从集群中移除一个节点。
  8. CLUSTER GETKEYSINSLOT <slot> <count>:返回指定槽中的一些键。
  9. CLUSTER INFO:返回集群的信息。
  10. CLUSTER KEYSLOT <key>:返回键应该被放置在哪个槽上。
  11. CLUSTER MEET <ip> <port>:向集群中添加一个新节点。
  12. CLUSTER NODES:返回集群中所有节点的信息。
  13. CLUSTER REPLICATE <node-id>:将当前节点设置为指定节点的从节点。
  14. CLUSTER RESET [HARD|SOFT]:重置当前节点。
  15. CLUSTER SAVECONFIG:将节点的配置保存到磁盘。
  16. CLUSTER SET-CONFIG-EPOCH <epoch>:设置节点的配置纪元。
  17. CLUSTER SETSLOT <slot> <subcommand> [node-id]:设置槽的状态。
  18. CLUSTER SLAVES <node-id>:返回指定节点的所有从节点。
  19. CLUSTER SLOTS:返回集群中所有槽的信息。

2、Redis集群模式原理

2.1、集群创建

Redis 集群创建时会有以下步骤:

  1. 启动节点:在每个预设的节点上启动 Redis 服务。Redis 集群模式最小节点数量为 3 个,这 3 个都是主节点。这是为了满足 Redis 集群的最小高可用性要求,即在主节点出现故障时,可以通过其他主节点进行故障转移。但是,这种配置下,如果一个主节点出现故障,集群将无法提供服务,因为没有足够的主节点来达成多数派。因此,为了保证高可用性,通常推荐至少 6 个节点,其中 3 个主节点,3 个从节点。
  2. 创建集群:也称节点握手,是指 Redis 集群中节点间建立联系的过程,通过 Gossip 协议进行通信。当一个新的节点需要加入到集群中时,它会向集群中的任意一个节点发送 CLUSTER MEET 命令,包含自己的 IP 地址和端口号。收到命令的节点会更新自己的节点表,并将这个新的节点信息通过 Gossip 协议传播给集群中的其他节点,这样就完成了节点握手,新的节点成功地加入到了集群中;

image-20230912170433776

  1. Allocation slots: In Redis cluster, all data will be mapped to 16384 slots. Each node is responsible for a portion of slots, and only when a node is assigned a slot can it process commands for keys associated with those slots. When creating a cluster or adjusting the cluster structure, you can use CLUSTER ADDSLOTSthe command to allocate slots to nodes;
  2. Set the master-slave relationship: If there are multiple nodes in the cluster, you need to set the master-slave relationship to achieve data backup and high availability.
2.2. Fault discovery

In a Redis cluster, nodes communicate with each other by sending ping/pong messages, which is part of the gossip protocol. Each node will regularly send ping messages to other nodes. If a cluster-node-timeoutpong message from a node is not received within the time, the node will be considered to have failed and will be marked as a subjective offline (pfail) state.

image-20230912172646206

When a node is marked as subjectively offline, this information will be propagated in the cluster through the gossip protocol. When more than half of the master nodes mark a node as subjectively offline, the node will be marked as objectively offline (fail), triggering the failover process.

image-20230912172851058

2.3. Failover

When a master node fails, other master nodes in the cluster will sense the failure through the gossip protocol. These master nodes will then elect one of the slave nodes of the failed master node to replace the failed master node. This process is called failover.

This method is somewhat similar to the failover of Redis sentinel mode, but in sentinel mode, only the sentinel node will participate in the process of fault detection and failover, while in cluster mode, all master nodes will participate in this process. Come in. This can improve the efficiency of failover and reduce the time of failure recovery.

The specific process is as follows:

  1. Eligibility check: All slave nodes of the failed master node will check the time of their last communication with the master node to determine whether they are qualified to replace the failed master node.

  2. Preparation time for election: A slave node that is eligible for failover will set a time for failure election. Only after this time is reached, the election process can be initiated.

  3. Initiate an election: When the slave node scheduled task detects that the failure election time (failover_auth_time) has arrived, the election process will be initiated.

  4. Election voting: The master node holding the slot will process the failure election message and vote. Each master node holding a slot can only vote for one slave node in each configuration epoch, thus ensuring that only one slave node can obtain more than half of the votes.

  5. Elect a new master node: The slave node that obtains more than half of the votes will be elected as the new master node.

  6. Notify the cluster: The new master node will send a message to other nodes in the cluster to notify them that it has been selected as the new master node.

  7. Update slot mapping: The new master node will take over all slots of the failed master node, and other nodes in the cluster will update their own slot mapping information after receiving messages from the new master node.

  8. Start providing services: The new master node starts providing services and handles client requests. This process ensures that when the master node fails, the cluster can fail over quickly and improves the availability of the cluster.

In a Redis cluster, the failover process requires the votes of more than half of the master nodes. If there are not enough master nodes in the cluster, or if multiple master nodes are deployed on the same machine and cannot work properly, then enough votes cannot be collected and the failover process will fail.

Therefore, in order to avoid single points of failure, when we deploy a Redis cluster, we need to try to ensure that all master nodes are distributed on different physical machines. In this way, even if a certain physical machine fails, it will not affect more than half of the master nodes, ensuring the high availability of the cluster.

2.4. Cluster expansion

When the load of the Redis cluster is too high or the storage space is insufficient, the capacity can be expanded by adding new nodes. After adding a new node, some slots need to be migrated to the new node so that the new node can start providing services. This process can be CLUSTER ADDSLOTSaccomplished via the command.

During the expansion process, administrators or operation and maintenance personnel need to send commands to the cluster through the Redis command line tool (redis-cli) or other management tools to perform operations such as adding new nodes, allocating slots, and migrating slots.

The expansion process of Redis cluster mainly includes the following steps:

  1. Add a new node: First, we need to start a Redis instance on the new server and add it to the existing Redis cluster. This can CLUSTER MEETbe done via the command.
  2. Allocate slots: Then, we need to allocate a portion of slots for the new node. This can CLUSTER ADDSLOTSbe done via the command.
  3. Migration slot: Next, we need to migrate some data from the old node to the new node. This can CLUSTER MIGRATESLOTbe done via the command. During this process, the migrated slot will be temporarily unavailable.
  4. Update slot mapping: Finally, we need to update the slot mapping information of all nodes in the cluster to let them know the new slot allocation. This can CLUSTER NODESbe done via the command.

The above is the expansion process of Redis cluster. It should be noted that this process may affect cluster services, because during the process of migrating slots, the migrated slots will be temporarily unable to provide services. Therefore, when we perform capacity expansion operations, we need to try our best to perform them when the load is low to reduce the impact on services.

2.5. Cluster reduction

When the load of the Redis cluster is too low or there are too many idle resources, you can reduce the capacity by removing some nodes. Before removing a node, you need to migrate all slots on this node to other nodes, and then remove this node. This process can be CLUSTER DELSLOTSaccomplished via the command.

The reduction process of Redis cluster mainly includes the following steps:

  1. Data migration: First, we need to migrate all data on the node that needs to be scaled down to other nodes in the cluster. This can CLUSTER MIGRATESLOTbe done via the command. During this process, the migrated slot will be temporarily unavailable.
  2. Remove slots: We then need to remove all slots on the node that needs to be scaled down. This can CLUSTER DELSLOTSbe done via the command.
  3. Remove nodes: Finally, we need to remove the nodes that need to be scaled down from the cluster. This can CLUSTER FORGETbe done via the command.

Guess you like

Origin blog.csdn.net/weixin_45187434/article/details/132837721
Recommended