Consistent hashing and hash slots

1. Consistent Hash Algorithm

1. Build the scene

Suppose we have three cache servers numbered node0, node1, node2, and now there are 30 million keys key, and we hope that these keys can be evenly cached on the three machines, what solution would you think of?

The first solution we may think of is the modulus algorithm hash(key)% N, which performs the hash operation on the key and then takes the modulus, and N is the number of machines. After the key is hashed, the result is modulo 3, and the result must be 0, 1 or 2, which corresponds to the server node0, node1, node2, and you can directly find the corresponding server to access data. It is simple and rude, and can completely solve the above problems.

2. The problem of hash

Although the modulo algorithm is simple to use, the modulus of the number of machines has certain limitations when the cluster expands and shrinks, because in the production environment, it is common to adjust the number of servers according to the size of the business volume; and the number of servers After N changes, hash(key)% Nthe calculated result will also change accordingly.

 For example, if a server node is down, the calculation formula hash(key)% 3changes hash(key)% 2from , and the result will change. At this time, if you want to access a key, the cache location of the key will probably change, and the previously cached key data will also lose its effect. significance.

A large number of caches fail at the same time, causing an avalanche of caches, which in turn leads to the unavailability of the entire cache system. This is basically unacceptable. In order to solve and optimize the above situation, a consistent hash algorithm came into being~

So, how does the consistent hashing algorithm solve the above problems?

3. Consistent hash

The consistency hash algorithm is also a modulo algorithm in essence, but unlike the above modulus according to the number of servers, the consistency hash is a modulo of the fixed value 2^32.

The IPv4 address is composed of 4 groups of 8-digit binary numbers, so using 2^32 can ensure that each IP address will have a unique mapping

4. hash ring

We can abstract these 2^32 values ​​into a ring ( if you are not satisfied with the circle, think of a shape yourself, so you can understand it ), the point directly above the ring represents 0, arranged clockwise, and so on, 1, 2, 3, 4, 5, 6... until 2^32-1, and this ring composed of 2 to the power of 32 points is collectively called hash环.

 So what is the relationship between this hash ring and the consistent hash algorithm? Let's take the above scenario as an example, the numbers of the three cache servers node0, node1, node2, are 30 million key.

5. The server is mapped to the hash ring

 At this time, the calculation formula changes from hash (key) % N to hash (server ip) % 2^32 , using the server IP address for hash calculation, and modulo 2^32 with the hashed result, the result must be a An integer between 0 and 2^32-1, and the position of this integer mapping on the hash ring represents a server, and the three cache servers , , node0and are mapped to the hash ring in turn.node1node2

6. The object key is mapped to the hash ring

Then map the key object that needs to be cached to the hash ring, hash(key)% 2^32 , the server node and the key object to be cached are mapped to the hash ring, so which server should the object key be cached on?

7. The object key is mapped to the server

Starting from the position of the cached object key, the first server encountered clockwise is the server where the current object will be cached.

Because the value of the cached object and the hash of the server is fixed, the object key must be cached on a fixed server under the condition that the server remains unchanged. According to the above rules, the mapping relationship in the following figure:

  • key-1 -> node-1

  • key-3 -> node-2

  • key-4 -> node-2

  • key-5 -> node-2

  • key-2 -> node-0

8. Advantages of consistent hash

We have a brief understanding of the principle of consistent hash, so how does it optimize the addition and reduction of nodes in the cluster, the caching service caused by the common modulo algorithm, and the problem of large-scale unavailability?

Let’s take a look at the expansion scenario first. If the business volume surges, the system needs to expand and add a server node-4, which happens node-4to be mapped to node-1and node-2between and object mapping nodes in a clockwise direction. It is found that the objects that were originally cached on the server are node-2remapped to the , and only a small part of data between nodes and nodes is affected during the entire expansion process .key-4key-5node-4node-4node-1

 Conversely, if node-1the node goes down, the object is mapped to the node in a clockwise direction, and the node-1objects cached on it key-1are remapped to it node-4. At this time, the affected data is only a small part of the data between node-0and .node-1

 From the above two situations, it is found that when the number of servers in the cluster changes, the consistent hash calculation will only affect a small part of the data, ensuring that the cache system as a whole can still provide external services.

9. Data skew problem

When the number of server nodes is too small, it is easy to cause data skew due to uneven distribution of nodes. As shown in the figure below, most of the cached objects are cached on node-4the server, resulting in waste of resources on other nodes, and most of the system pressure is concentrated on node-4the nodes. , such a cluster is very unhealthy.

The solution to data skew is also simple. We need to find a way to make the nodes relatively evenly distributed when they are mapped to the hash ring.

The consistent hash algorithm introduces a virtual node mechanism, that is, multiple hash values ​​are calculated for each server node, and they will be mapped to the hash ring, and the object keys mapped to these virtual nodes will be finally cached on the real node.

The hash calculation of the virtual node can usually be done by adding the IP address of the corresponding node to the digital number suffix  hash (10.24.23.227#1)  . For example, the IP of node-1 is 10.24.23.227, and the node-1hash value is normally calculated.

  • hash(10.24.23.227#1)% 2^32

Suppose we set three virtual nodes for node-1, node-1#1, node-1#2, node-1#3, and take modulo after hashing them.

  • hash(10.24.23.227#1)% 2^32

  • hash(10.24.23.227#2)% 2^32

  • hash(10.24.23.227#3)% 2^32

After the virtual nodes are added in the figure below, the original nodes are relatively evenly distributed on the hash ring, and the pressure on the rest of the nodes is shared.

 But it should be noted that the more virtual nodes are allocated, the more uniform the mapping on the hash ring will be. If there are too few nodes, it is difficult to see the effect

The introduction of virtual nodes also adds new problems, such as the mapping between virtual nodes and real nodes, and 对象key->虚拟节点->实际节点the conversion between them.

10. Application scenarios of consistent hash

Consistency hash should be the preferred algorithm for load balancing in a distributed system. Its implementation is more flexible. It can be implemented on the client side or on the memcachedmiddleware redis. Useful to it.

The memcached cluster is quite special. Strictly speaking, it can only be regarded as a pseudo-cluster , because its servers cannot communicate with each other. The distribution route of the request depends entirely on the client to calculate which server the cache object should fall on, and its route The algorithm uses consistent hash.

 There is also the concept of hash slots in redis clusters. Although the implementations are different, the ideas remain the same. After reading the consistency hash in this article, it will be much easier for you to understand redis slots.

11. Summary

Briefly explain the consistency hash. If there is something wrong, you can leave a message to correct it. No technology is perfect. The consistency hash algorithm also has some potential hidden dangers. If the number of nodes on the hash ring is very large or the update is frequent , the retrieval performance will be relatively low, and the entire distributed cache needs a routing service for load balancing. Once the routing service is down, the entire cache will be unavailable, and high availability should also be considered.

2. Hash slot 

The Redis cluster (cluster) does not use the consistent hash above, but uses the concept of hash slot (slot). The main reason is that as mentioned above, the consistent hash algorithm is not very friendly to the control of data distribution and node location.

First of all, the hash slot is actually two concepts, the first is the hash algorithm. The hash algorithm of redis cluster is not a simple hash(), but a crc16 algorithm, a verification algorithm. The other is the concept of slots and the rules of space allocation. In fact, the essence of the hash slot is very similar to the consistent hash algorithm, the difference is the definition of the hash space. The space of consistent hashing is a ring, and the node distribution is based on the ring, which cannot control the data distribution very well. The slot space of redis cluster is custom-allocated, which is similar to the concept of windows disk partition. This kind of partition can be customized in size and location.

The redis cluster contains 16384 hash slots, and each key will fall into a specific slot after calculation, and which storage node this slot belongs to is defined and allocated by the user. For example, if the hard disk of the machine is small, fewer slots can be allocated, and if the hard disk is larger, more slots can be allocated. If the hard disks of the nodes are almost the same, they can be evenly distributed. So the concept of hash slots is a good solution to the disadvantages of consistent hashing.

In addition, in terms of fault tolerance and scalability, appearance, like consistent hashing, transfers the affected data. The hash slot is essentially a transfer of the slot, and the slot responsible for the faulty node is transferred to other normal nodes. The same is true for expanding nodes, transferring slots from other nodes to new nodes.

But it must be noted that the redis cluster does not automatically perform slot transfer and assignment, but requires manual configuration. Therefore, the high availability of redis cluster depends on the master-slave replication of nodes and the automatic failover between master and slave.

3. Why does Redis need a cluster?

First of all, a Redis single instance mainly has a single point, limited capacity, and the upper limit of traffic pressure. Redis single point of failure can pass master-slave replication replicationand automatic failover sentinelsentinel mechanism.

Redis single Masterinstance provides read and write services, but there are still capacity and pressure issues, so data partitioning is required?

To build multiple Masterinstances to provide read and write services at the same time, a certain mechanism is required to ensure data partitioning. In this way, the capacity can be fully allocated to multiple computers, or the performance of multi-core computers can be fully utilized. And the data cannot be confused among the master nodes. Of course, it is better to support the feature of online data live migration.

Summary: If you use redis purely as a cache, it doesn't matter whether you use hash slots or not. But if it is used as a database, it must adopt the pre-allocated hash slot cluster mode.

Guess you like

Origin blog.csdn.net/summer_fish/article/details/119738856#comments_25709740