The interview is not difficult at all: Redis Cluster mode analysis, hash slot, Cluster mode high availability, consistency, client JedisPool, cluster expansion

1 The difference between Cluster mode and master-slave mode

  • In the master-slave mode, the master/slave node contains the full amount of cached data (the expansion bottleneck will be encountered when the amount of data is large)
  • In the cluster cluster mode, the full cache data is scattered on multiple nodes, and each node only contains a part of the full cache data.

2 Division of node data in Cluster mode

Data is divided according to the hash slot.

The Redis cluster has 16384 hash slots, and each key is checked by CRC16 and modulo 16384 to determine which slot the key corresponds to.

For example, the current cluster has 3 nodes, then:

  1. Node A contains hash slots 0 to 5500
  2. Node B contains hash slots 5501 to 11000
  3. Node C contains hash slots 11001 to 16384

3 Cluster mode requires at least three master nodes

  1. Because the Cluster mode is based on the Gossip protocol to synchronize the state of the cluster, it is a completely decentralized mode.
  2. Therefore, in order to achieve the consistency of the cluster state, it is necessary to follow the principle of "majority" (for example, a node suddenly fails to connect, and only the majority of nodes think that it can't connect.)
  3. So at least 3 nodes are required
    1. In the distributed voting scenario, a 3-node cluster can tolerate one node hanging point
    2. Redis has a configuration cluster-require-full-coverage=no, even if one master node is down, other master nodes can still provide services (=yes, as long as one master node is down, the cluster is not available)

4 Cluster mode cluster high availability

In order to ensure the high availability of nodes, the master-slave mode is usually adopted, and each master node has slave nodes.

Therefore, the cluster is highly available in Cluster mode and requires 3 masters and 3 slaves.

5 Single key operation in Cluster mode

5.1 Simple client redis-cli operation

redis 127.0.0.1:7000> set foo bar
-> Redirected to slot [12182] located at 127.0.0.1:7002
OK
  • If the slot corresponding to the key of the current operation is not on the node that the current client is connected to, the cluster will return a MOVED error and indicate the correct target node
  • Then the client continues to send requests to the correct target node

5.1.1 Why does the client perform redirection instead of the server to help forward the operation to the correct node?

guess:

If the server helps forward the operation to the correct node:

  • The server will be blocked (because of single-threaded processing of client requests), the background will forward the operation to other servers and wait for the result
  • The whole process involves 4 network transmissions (the same number of network transmissions as the client redirect method)

If it is client redirect:

  • The server does not block, after returning the redirect command, continue to process other client requests
  • Client redirect, the whole process involves 4 network transmissions (the same as the number of server forwarding operations)

In contrast, client redirect is more affordable.

5.2 Operations under Java JedisPool

There is a problem in the simple client redis-cli mode. It is impossible to know in advance which node the slot corresponding to the current key is, so many redirects will be generated.

JedisPool is a Java redis connection pool. It caches the corresponding relationship of <slot, node>. Therefore, when performing operations, first calculate the slot corresponding to the key, and then find the node corresponding to the slot.

6 Operation of multi keys in Cluster mode

Because multiple keys may correspond to multiple slots, and multiple slots are distributed on different nodes, the Cluster mode usually does not support multi-key related operations.

// 以 Java Redis 客户端 Jedis 为例:
// 对于 multi keys,要求所有的 key 都对应同一个 slot 才能执行(这个限制更严格,更宽松一点的限制是:可以对应多个 slot,但这些slot 都在一个节点上)
if (keys.length > 1) {
    
    
  int slot = JedisClusterCRC16.getSlot(keys[0]);
  for (int i = 1; i < keyCount; i++) {
    
    
    int nextSlot = JedisClusterCRC16.getSlot(keys[i]); // 计算key对应的 slot
    if (slot != nextSlot) {
    
     // slot 不一致则抛异常
      throw new JedisClusterException("No way to dispatch this command to Redis Cluster "
          + "because keys have different slots.");
    }
  }
}

6.2 Use hash tag to control the slot corresponding to the key

In some scenarios, we hope that multiple keys are on one node. How to control it?

  • redis provides key hash tag
  • {hash tag}key value
  • If the key has a hash tag, the hash tag is used when calculating CRC16, not the specific key
  • Therefore, if you want multiple keys to be on the same node (same slot), you can make multiple keys use the same hash tag

7 Non-strong consistency

the reason:

  1. Master-slave synchronization.
    1. It is executed asynchronously. It may happen that the writing is not yet synchronized, but the master hangs, and the writing is lost at this time.
  2. Master-slave switching of the network partition.
    1. Network partition occurs in Cluster mode. If the master and slave nodes happen to be in different partitions, and one of the partition slave nodes is selected as the master, at this time, part of the write operations of the original master node may not be synchronized to the slave due to the network partition.
假设集群包含 A 、 B 、 C 、 A1 、 B1 、 C1 六个节点。 
其中 A 、B 、C 为主节点, A1 、B1 、C1 为A,B,C的从节点。
还有一个客户端 Z1 。

假设集群中发生网络分区,那么集群可能会分为两方,
大部分的一方包含节点 A 、C 、A1 、B1 和 C1 ,
小部分的一方则包含节点 B 和客户端 Z1 。
Z1仍然能够向主节点B中写入, 
如果网络分区发生时间较短,那么集群将会继续正常运作,
如果分区的时间足够让大部分的一方将B1选举为新的master,那么Z1写入B中得数据便丢失了。

8 Cluster cluster expansion/resharding

Adding nodes, deleting nodes, and re-sharding are essentially one type of operation: slot migration.

For example, if a new node is added, some slots on other nodes are allocated to the new node.

In Cluster mode, the basic unit of data migration is slot.

8.1 slot migration

Insert picture description here

Slot migration steps:

  1. Mark the slot as an intermediate transition state (as shown in the figure above, if you move out from node A, then the slot marked on A is migrating state, and if you move into node B, mark slot on B is the importing state)
  2. Migrate one by one according to the key in the slot, synchronously blocking migration
    1. A sends a key in slot to B
    2. B After receiving the data, save it locally and reply OK
    3. A After receiving the reply OK, delete the local key
  3. Recoverable after this migration process is disconnected

8.2 Client's request processing during slot migration

Due to the slot migration process, part of the key in the slot is on node A and part on node B, so the client's request processing will change greatly.

  1. The client first visits the old node corresponding to the slot
  2. If the data is still in the old node, the old node is processed normally
  3. If the data is no longer on the old node, the old node returns the ask B redirection order to the client
  4. The client executes ask B first
  5. The client then executes the get operation

Why can't I directly use get when redirecting, but issue an ask command first?

Because the slot does not belong to node B (still belongs to node A), send the get command directly, and B will redirect the client to A, causing a circular redirection

Guess you like

Origin blog.csdn.net/hugo_lei/article/details/106390853