Consistent hashing

    Distributed hash table is a common technology in distributed storage systems, which is a distributed extension of hash table. Consistent hashing is one of them.

1. Problems caused by ordinary hash distribution

(1) Adding or reducing nodes will cause all keys to fail

(2) Unbalanced distribution, such as data skew (that is, most of the data is distributed to one of the hash nodes)

2. Consistent Hash Algorithm

Idea: Assign a random Token to each node in the system, and these Tokens form a hash ring. When performing the data storage operation, first calculate the hash value of the key, and then store it clockwise on the node where the first token greater than or equal to the hash value of the key is located.

Advantages: Adding or reducing nodes will only affect adjacent nodes, but will not affect other nodes, so only some keys will fail

For details, see https://blog.csdn.net/sparkliang/article/details/5279393

3. Improvement

(1) Introduce virtual nodes:

   Reason: i. The location where the machine node is mapped to the replacement structure is random, which may cause the machine load to be unbalanced

              ii. In a large-scale data center, the performance of each physical machine is not necessarily the same. The above consistent hashing algorithm treats them equally, and there may be a high load on machines with lower configuration.

   Idea: Virtualize a physical node into several virtual nodes, which are mapped to different positions of the consistent hash ring structure.

(2) Each node adds routing information to facilitate quick search

        There are three methods:

        i. Each server records the location information of the nodes before and after it. The space complexity is O(1) and the time complexity is O(N). Because every lookup may look up all servers

        ii. Each server maintains a routing table of size n (assuming the hash space size is 0~2 n ). Suppose p is a number of the server in the hash ring, and the i-th element in its routing table records the successor node numbered p+2 i-1 . The space complexity is O(logn), and the time complexity is O(logn)

       iii. Each server records the location information of all nodes. The space complexity is O(n), and the time complexity is O(1). (This method is generally used in engineering).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325219753&siteId=291194637