Explain the principle of consistent Hash in simple terms

I. Introduction

When solving the problem of load balancing in a distributed system, the Hash algorithm can be used to let a fixed part of the requests fall on the same server, so that each server processes a fixed part of the requests (and maintains the information of these requests), which plays a role in load balancing .

However, the general remainder hash (hash (such as user id)% server machine number) algorithm has poor scalability. When a server machine is added or offline, the mapping relationship between user id and server will fail a lot. Consistent hashing uses hash rings to improve it.

2. Overview of Consistent Hash

In order to intuitively understand the principle of consistent hashing, here is a simple example to explain, assuming there are 4 servers with addresses ip1, ip2, ip3, and ip4.


  • Consistent hash is to first calculate the hash value hash(ip1), hash(ip2), hash(ip3), hash(ip3) corresponding to the four ip addresses . The calculated hash value is a value directly from 0 to the largest positive integer. These four values ​​are presented in the following figure on the consistent hash ring:
    image.png

  • Clockwise from the integer 0 on the hash ring to the largest positive integer, the hash value we calculated based on the four ips will definitely fall to a certain point on the hash ring, so far we have mapped the four ips of the server to the consistency hash ring

  • When the user makes a request on the client, first calculate the routing rule (hash value) according to the hash (user id), and then see where the hash value falls on the hash ring, and find the nearest distance clockwise according to the position of the hash value on the hash ring ip as the routing ip.
    image.png

As shown in the figure above, the requests of user1 and user2 will be processed by the server ip2, the requests of User3 will be processed by the server ip3, the requests of user4 will be processed by the server ip4, and the requests of user5 and user6 will be processed by the server ip1. .

Now consider what happens when the ip2 server hangs up?
When the ip2 server hangs, the consistent hash ring is roughly as follows:
image.png

According to the clockwise rule, the requests of user1 and user2 will be processed by the server ip3, while the processing server corresponding to the requests of other users remains unchanged, that is, only the mapping relationship of some users previously processed by ip2 is destroyed, and it is responsible for processing The request is delegated to the next node clockwise for processing.

Now consider what happens when a new machine is added?
When an ip5 server is added, the consistent hash ring is roughly as follows:
image.png

According to the clockwise rule, the previous request of user1 should be processed by the ip1 server, and now it is processed by the newly added ip5 server. The request processing server of other users remains unchanged, that is, some requests of the new server clockwise and the nearest server will be processed by the new server. replaced by additional servers.

3. Features of Consistent Hash

  • Monotonicity (Monotonicity), monotonicity means that if some requests have been dispatched to the corresponding server for processing through hashing, and a new server is added to the system, it should be ensured that the original request can be mapped to the original request. Or go to the new server without being mapped to the original other server. This can be proved by adding the server ip5 above. After adding ip5, the user6 that was originally processed by ip1 is still processed by ip1, and the user5 that was originally processed by ip1 is now processed by the newly added ip5.

  • Dispersion (Spread): In a distributed environment, the client may not know the existence of all servers when requesting, and may only know some of the servers. From the client's point of view, some of the servers he sees will form a complete hash ring. If multiple clients use part of the server as a complete hash ring, it may cause the request of the same user to be routed to different servers for processing. This situation should obviously be avoided because it does not guarantee that requests from the same user will end up on the same server. The so-called dispersion refers to the severity of the above-mentioned occurrences. A good hashing algorithm should try to avoid as little dispersion as possible. Consistent hash has very low dispersion

  • Balance: Balance means load balancing, which means that the client's hashed requests should be able to be distributed to different servers. Consistent hashing can make each server process requests, but it cannot guarantee that the number of requests processed by each server is roughly the same, as shown in the following figure

The servers ip1, ip2, and ip3 fall into the consistent hash ring after hashing. From the hash value distribution in the figure, it can be seen that ip1 will be responsible for processing about 80% of the requests, while ip2 and ip3 will only be responsible for processing about 20% of the requests, although All three machines are processing requests, but the load of each machine is obviously unbalanced, which is called the inclination of the consistent hash. The emergence of virtual nodes is to solve this problem.

Five, virtual node

When there are few server nodes, there will be the problem of consistent hash skew mentioned in the previous section. One solution is to add more machines, but adding machines is costly, so add virtual nodes, such as the above three machines, each The diagram of the consistent hash ring after the machine introduces 1 virtual node is as follows:
image.png

Where ip1-1 is the virtual node of ip1, ip2-1 is the virtual node of ip2, and ip3-1 is the virtual node of ip3.
It can be known that when the number of physical machines is M and the number of virtual nodes is N, the actual number of nodes on the hash ring is M*N. For example, when the hash value calculated by the client is between ip2 and ip3 or between ip2-1 and ip3-1, the ip3 server is used for processing.

Six, uniform consistency hash

In the previous section, the graph after we used virtual nodes looks more balanced, but if the algorithm for generating virtual nodes is not good enough, the following cycle may be obtained: It can be
image.png
seen that after each service node introduces 1 virtual node, the situation is compared with no balance before introduction. Sex has improved, but not evenly.

A balanced consistent hash should be as follows:
image.png

The goal of uniform consistency hashing is that if there are N servers and M hashes of clients, then each server should handle about M/N users. That is, the load of each server is as balanced as possible

7. Summary

Consistent hashing plays an important role in distributed systems, whether it is distributed cache or the load balancing strategy of the distributed Rpc framework. More exciting welcome to pay attention to Jianshu-Arigado
Welcome everyone to join WeChat scan code to enter the knowledge planet for in-depth discussion

image

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324769740&siteId=291194637