Consistent hashing Profile

1. Consistent hashing

   What is consistent hashing, and generally distributed hash table (DHT) What is the difference? General DHT positioning data using the following formula: position = the Hash (object name )% N ( N is the number of nodes). Obviously, if we increase or decrease a node in the cluster must be recalculated to position the object, leading to the occurrence of large amounts of data migration. The text object represents a file ( Distributed File System ) , or data block (P2P) . And consistent hashing calculation formula is different: position = the Hash (object name )% M (M is a constant, and generally 2 ^ 31).

        How to use consistent hashing to locate files?

  Generally divided into two steps: 1. Compute node position 2. objects distributed among the nodes. Now, I give you thin to:

  We started using a common hash function to the name of the node or nodes IP and to calculate a hash key value calculated hash and M = 2 ^ 31 modulo obtain a new hash value. The hash value will represent the position of the node. We shown below the node name as a key to calculate N1, N2, N3, N4 hash are four nodes 2000, 3000, 4000, 5000.

  Then, we use the same method to calculate a hash value object, the hash the Hash = (object name )% (2 ^ 31). After completion of the calculation to obtain the object when the hash O1 of 2100, then we would be in a clockwise order O1 into node N2. The same hash value 3200 of O2 are placed node N3.

  So what is so good is it? If we add a node X, computed X node location is 3500. In ordinary DHT we need to re-calculate the position data of each node. But (the focus here) in a consistent hashing, we just need to node N3 objects in the O2 migrate to a new node X in. Affects only the node N3 to the node between objects X, FIG portion marked by bold black. Greatly reducing the nodes are affected.

    

  We look at removing a node, if we delete a node N3. Only need to follow a clockwise direction will migrate all objects in the N3 to its closest node N4.

    

 

 

  Can be seen as a means of data distribution in a distributed environment, consistent hashing retains the load balancing feature of DHT, but also reduces the horizontal expansion of nodes bring data migration overhead. So where have used the consistent hashing? I know that there are currently areas of BT download, Distributed File System (IPFS, ceph, openstack swift).

 

2. Improved consistent hashing

   We can see from the above, when we add or delete a node, it will increase the load on its neighbors, which of course is not what we want to see, the ideal state is to load balancing in the cluster each node. So how can we do this should do? We can solve this problem by introducing a virtual node.

  每个物理节点都被分割为几个虚拟节点,每个对象先计算其位于哪一个虚拟节点,在通过判断虚拟节点的owner,将对象存储到实际的物理节点。

    

  从上图可以看到,整个结构分成两层,外层的虚拟节点层和内层的物理节点层,并且外层的虚拟节点是随机分布的。虚拟节点通过这种随机的排列顺序进一步提高了物理节点的负载均衡。

  我们来看看去掉N4节点时的数据迁移情况。当移除N4节点时,所有属于N4的虚拟节点上的数据都要迁移到它们的顺时针方向的下一个虚拟节点,本例中是N3#1,N2#2,N1#3,N1#1。可以看到N4的数据会均衡的再分布到其余的每一个节点。

3.一致性哈希的实现

  代码已经放到github,使用golang来编写,目前尚未完成,欢迎大家指正。https://github.com/DennisWong/ConsistentHash.git

参考

一致性Hash(Consistent Hashing)原理剖析,关于虚拟节点的解释很易懂

https://blog.csdn.net/lihao21/article/details/54193868

五分钟看懂一致性哈希算法

https://juejin.im/post/5ae1476ef265da0b8d419ef2

介绍openstack的一致性哈希

https://zhuanlan.zhihu.com/p/37924185

Guess you like

Origin www.cnblogs.com/dennis-wong/p/11374737.html