Big Data Foundation - Ask for hammer, the consistent hash you want is here (Part 1) [attached code]

foreword

Recently, people always ask me about the consistent hash. Let's talk about it today. In the first two articles, we introduced two types of hash sharding methods: hash modulo and virtual bucket.

  • The hash modulo method leads to a lack of flexibility in the architecture. Nodes need to be expanded or reduced by twice the capacity to ensure that 50% of the mapping relationship remains unchanged. Otherwise, the query hit rate will be lower. When a node is abnormal, it is a disaster.

  • The sharding method of the virtual bucket is optimized on the basis of hash modulus, which conforms to the general 3-layer routing sharding model. In addition, the number of shards is fixed to avoid the problem of high modulus sensitivity. Part of the node will be migrated to the new node.

Although the virtual bucket is much better than the hash modulo, it still affects more nodes. Is there a method with less impact? In this article, we will start to explain the content of consistent hash.

Implementation of consistent hash principle

1. Build an empty hash ring

  • Let's take it step by step. Before, we tried to take the modulus of the node and the number of shards, but in the consistent hash, we took the modulo of 2^32. When taking the modulus, we generally choose positive numbers, so the result of the modulus must be in the interval [0,2^32-1]. 较真儿一下,为什么是2的32次方呢?After some data searching, it is usually interpreted as "the maximum value of int in java is 2^31-1, and the minimum value is -2^31, 2^32 is just the maximum value of unsigned integer", although this answer is also It also makes sense, but why not short? It provoked whoever it was. I guess it is possible that the creator of the algorithm thinks that the maximum value of short is 32767 (2^16) is too small. So why not other larger values? Anyone who knows the reason can leave a message below the article~

  • Here we imagine this range as a ring, as shown below, the middle is 0, and the value increases in a clockwise direction until the maximum value is 2^32-1. We call such a ring a hash ring.

figure 1

2. Construct the node topology on the hash ring

  • 一致性hash与上图中的hash环到底有什么关系呢?

Here we explain with practical examples. Suppose here we have three nodes, A, B, C. Each node has its own ip address and host name, (generally there will be no duplicate ip and host name in the same VPC or in the cluster), first hash the node’s ip address or host name, and then hash the 2^32 Power modulo.

hash(节点的IP地址或主机名) %  2^32
  • As above, the result of the modulus must be in the interval [0,2^32-1], which also means that it must fall into the hash ring. We perform the same operation on the three nodes A, B, and C. At this time, the following results will be obtained. Each node will be distributed on the hash ring according to the value of the hash modulus.


3. Map the key to the hash ring

  • We process the key using the same method as above, through hashing, modulo 2^32, and map to the hash ring.

  • In the logic of consistent hash, each key will find the nearest node clockwise as its corresponding node, as shown in the figure below, we take 3 keys as an example.

At this time, my friends have doubts. It seems that this does not match the general model mentioned before.是不是代表着一致性hash已经脱离通用的路由分片模型了呢?

In fact, consistent hash is also in line with the general routing model. We talked about the layering of data-shard-node before, so where is the shard? Let's look back at the hash ring in 2. The ring is divided into three sections by three physical nodes, namely node A-node B, node B-node C, node C-node A. Here we can regard each interval as a fragment, such as key-01, which falls on the fragment of node C-node A, so the fragment belongs to node A. Looking at it this way is still in line with our general routing fragmentation model.


4. The simulated node goes offline

Next, we start to simulate node offline (machine downtime or node shrinkage), as shown in the figure below, when node C goes offline, only node A and node B exist on the entire hash ring, which means that there are only two nodes Slices: Node A-Node B, Node B-Node A. Then the original key03 on node C (the shard of node B-node C), according to the above logic, searches for the next node clockwise, and then finds node A, that is, on the shard of node B-node A. At this time, it is found that when a node goes offline, it only affects all the keys between the current offline node and the first node in the counterclockwise direction on the hash ring, and these keys need to be redistributed on the new node. All keys on other shards on the hash ring are not affected.


5. The simulation node goes online

Similarly, after the above simulated node goes offline, we start to simulate the node to go online (expansion node, downtime node recovery). At this time, a new node D is added to the hash ring, and the value after the modulus is obtained in node B Between node C and node B, a fragment of node B-node C is divided into two fragments of node B-node D and node D to node C2. Similar to the node offline, at this time key03 will find the first node clockwise, and redistribute from the past node C to node D. The scope of influence is also limited to all keys between the online node and the first node in the counterclockwise direction on the hash ring.


6. Model analysis

The above is the core principle and implementation of consistent hashing. Let’s look back at this time. For the problem of high data remapping, consistent hashing has greatly improved compared to hash modulo.

那究竟为什么一致性hash在节点变动时会有明显的提升呢?

We still follow our old method and return to the general routing sharding model. When a node changes, the consistent hash will actually only affect the data on one shard and one node, and will not be like virtual buckets and Hash modulo is the same, it will affect multiple shards and multiple nodes. Therefore, consistent hash has advantages in fault tolerance and scalability.

Code

The theory is too dry, let's go to the code. (The following only involves the basic code and comments on important parts)

#!/bin/bash/env python
# -*- coding:utf8 -*-
# auorth:lzj
import hashlib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

class ConsistentHash(object):
    def __init__(self, nodes=None):
        '''
           a. 初始化nodes即节点列表
           b. 字典ring 为我们文中讲述的节点hash取模数值和节点的映射关系
           c. 列表sortedKeys为模拟顺时针数值从0到2^32依次增大的关系,可以想象成环
           d. 然后将节点增加到集群中
        '''

        self.nodes = nodes
        self.ring = {}
        self.sortedKeys = []
        self.addNodes(nodes)

    #nodes是一个节点的列表,遍历列表逐个添加到hash环中
    def addNodes(self, nodes):
        if nodes:
            for node in nodes:
                # 和第一篇hash取模相同,我们使用sha1算法进行hash处理,然后强转int
                nodeHashResult = hashlib.sha1(node).hexdigest()
                intNodeHashResult = int(nodeHashResult, 16)
                modIntNodeHashResult = intNodeHashResult % (2 ** 32)
                self.ring[modIntNodeHashResult] = node
                self.sortedKeys.append(modIntNodeHashResult)
                self.sortedKeys.sort()

    def removeNodes(self, nodes):
        '''
        和增加节点相同,此时我们需要将节点hash取模数值和节点的映射关系进行更新删除,
        并在环上将清理该节点的信息
        '''
        if nodes:
            for node in nodes:
                nodeHashResult = hashlib.sha1(node).hexdigest()
                intNodeHashResult = int(nodeHashResult, 16)
                modIntNodeHashResult = intNodeHashResult % (2 ** 32)
                self.ring.pop(modIntNodeHashResult)
                self.sortedKeys.remove(modIntNodeHashResult)

    def getNode(self, modKeyHashResult):
        '''
            依次遍历 sortedKeys中的所有node,由于是有序的,依次判断,直到node的数值大于key的数值,
            那么此时key就是属于对应的节点
        '''
        position = 0
        for _modIntNodeHashResult in self.sortedKeys:
            position += 1
            if modKeyHashResult < _modIntNodeHashResult:
                return self.ring[_modIntNodeHashResult]
            else:
                continue
        ''' 
            遍历了全部节点都对比失败的话,说明是在hash环上0开始逆时针第一个节点到顺时针第一个节点之间,
            因此将key对应到第一个节点
        '''
        if position == len(self.sortedKeys):
            return self.ring[self.sortedKeys[0]]

    def allocateKey(self,number):
        keyNodeMap = []
        # 模拟若干个key,同样的将key进行hash取模
        for i in range(number):
            keyName = 'testKey' + str(i)
            keyHashResult = hashlib.sha1(keyName).hexdigest()
            intKeyHashResult = int(keyHashResult, 16)
            modKeyHashResult = intKeyHashResult % (2 ** 32)
            # 对key进行寻址,即确定key 是属于哪一个节点,哪一个分片
            _node = self.getNode(modKeyHashResult)
            print '%s is allocateKey to %s' %(keyName,_node)
            keyNodeMap.append(keyName + '_' + _node)
        return keyNodeMap

#模拟集群有4个节点
iniServers = [
    '192.168.1.1',
    '192.168.1.2',
    '192.168.1.3',
    '192.168.1.4',
    ]

print '初始化一个hash环,构建hash环上的节点拓扑'
h = ConsistentHash(iniServers)
print h.ring
print h.sortedKeys

print '构造若干个key,将key映射到hash环'
number = 40
oldMap = h.allocateKey(40)

print  '尝试上线一个新节点,我们看下key是否会正常迁移到新节点'
newNode = ['192.168.1.5']
h.addNodes(newNode)
print h.ring
print h.sortedKeys
addNodeMap = h.allocateKey(40)

print '尝试下线一个节点,我们再观察下'
removeNode = ['192.168.1.1']
h.removeNodes(removeNode)
print h.ring
print h.sortedKeys
removeNodeMap = h.allocateKey(40)
究竟代码完成度高不高呢?我们在从运行结果的角度反向测试下。如下所示
初始化一个hash环,构建hash环上的节点拓扑
环上的节点:{560662416L: '192.168.1.1', 216828752L: '192.168.1.3', 2895068098L: '192.168.1.2', 1580996791L: '192.168.1.4'}
节点hash值排序:[216828752L, 560662416L, 1580996791L, 2895068098L]
构造若干个key,将key映射到hash环
testKey0 is allocateKey to 192.168.1.4
testKey1 is allocateKey to 192.168.1.1
testKey2 is allocateKey to 192.168.1.4
testKey3 is allocateKey to 192.168.1.4
testKey4 is allocateKey to 192.168.1.3
testKey5 is allocateKey to 192.168.1.3
testKey6 is allocateKey to 192.168.1.2
testKey7 is allocateKey to 192.168.1.2
testKey8 is allocateKey to 192.168.1.3
testKey9 is allocateKey to 192.168.1.2
testKey10 is allocateKey to 192.168.1.4
testKey11 is allocateKey to 192.168.1.1
testKey12 is allocateKey to 192.168.1.3
testKey13 is allocateKey to 192.168.1.4
testKey14 is allocateKey to 192.168.1.3
testKey15 is allocateKey to 192.168.1.2
testKey16 is allocateKey to 192.168.1.4
testKey17 is allocateKey to 192.168.1.4
testKey18 is allocateKey to 192.168.1.1
testKey19 is allocateKey to 192.168.1.1
testKey20 is allocateKey to 192.168.1.3
testKey21 is allocateKey to 192.168.1.2
testKey22 is allocateKey to 192.168.1.4
testKey23 is allocateKey to 192.168.1.2
testKey24 is allocateKey to 192.168.1.2
testKey25 is allocateKey to 192.168.1.3
testKey26 is allocateKey to 192.168.1.2
testKey27 is allocateKey to 192.168.1.3
testKey28 is allocateKey to 192.168.1.2
testKey29 is allocateKey to 192.168.1.2
testKey30 is allocateKey to 192.168.1.2
testKey31 is allocateKey to 192.168.1.1
testKey32 is allocateKey to 192.168.1.3
testKey33 is allocateKey to 192.168.1.2
testKey34 is allocateKey to 192.168.1.2
testKey35 is allocateKey to 192.168.1.3
testKey36 is allocateKey to 192.168.1.2
testKey37 is allocateKey to 192.168.1.2
testKey38 is allocateKey to 192.168.1.2
testKey39 is allocateKey to 192.168.1.2
尝试上线一个新节点,我们看下key是否会正常迁移到新节点[注意标红处]
环上的节点:{560662416L: '192.168.1.1', 216828752L: '192.168.1.3', 2895068098L: '192.168.1.2', 1785826697L: '192.168.1.5', 1580996791L: '192.168.1.4'}
节点hash值排序:[216828752L, 560662416L, 1580996791L, 1785826697L, 2895068098L]
testKey0 is allocateKey to 192.168.1.4
testKey1 is allocateKey to 192.168.1.1
testKey2 is allocateKey to 192.168.1.4
testKey3 is allocateKey to 192.168.1.4
testKey4 is allocateKey to 192.168.1.3
testKey5 is allocateKey to 192.168.1.3
testKey6 is allocateKey to 192.168.1.2
testKey7 is allocateKey to 192.168.1.2
testKey8 is allocateKey to 192.168.1.3
testKey9 is allocateKey to 192.168.1.2
testKey10 is allocateKey to 192.168.1.4
testKey11 is allocateKey to 192.168.1.1
testKey12 is allocateKey to 192.168.1.3
testKey13 is allocateKey to 192.168.1.4
testKey14 is allocateKey to 192.168.1.3
testKey15 is allocateKey to 192.168.1.5
testKey16 is allocateKey to 192.168.1.4
testKey17 is allocateKey to 192.168.1.4
testKey18 is allocateKey to 192.168.1.1
testKey19 is allocateKey to 192.168.1.1
testKey20 is allocateKey to 192.168.1.3
testKey21 is allocateKey to 192.168.1.2
testKey22 is allocateKey to 192.168.1.4
testKey23 is allocateKey to 192.168.1.5
testKey24 is allocateKey to 192.168.1.2
testKey25 is allocateKey to 192.168.1.3
testKey26 is allocateKey to 192.168.1.2
testKey27 is allocateKey to 192.168.1.3
testKey28 is allocateKey to 192.168.1.2
testKey29 is allocateKey to 192.168.1.2
testKey30 is allocateKey to 192.168.1.2
testKey31 is allocateKey to 192.168.1.1
testKey32 is allocateKey to 192.168.1.3
testKey33 is allocateKey to 192.168.1.2
testKey34 is allocateKey to 192.168.1.2
testKey35 is allocateKey to 192.168.1.3
testKey36 is allocateKey to 192.168.1.5
testKey37 is allocateKey to 192.168.1.2
testKey38 is allocateKey to 192.168.1.2
testKey39 is allocateKey to 192.168.1.2
尝试下线一个节点,我们再观察下[注意标红处]
环上的节点:{216828752L: '192.168.1.3', 2895068098L: '192.168.1.2', 1785826697L: '192.168.1.5', 1580996791L: '192.168.1.4'}
节点hash值排序:[216828752L, 1580996791L, 1785826697L, 2895068098L]
testKey0 is allocateKey to 192.168.1.4
testKey1 is allocateKey to 192.168.1.4
testKey2 is allocateKey to 192.168.1.4
testKey3 is allocateKey to 192.168.1.4
testKey4 is allocateKey to 192.168.1.3
testKey5 is allocateKey to 192.168.1.3
testKey6 is allocateKey to 192.168.1.2
testKey7 is allocateKey to 192.168.1.2
testKey8 is allocateKey to 192.168.1.3
testKey9 is allocateKey to 192.168.1.2
testKey10 is allocateKey to 192.168.1.4
testKey11 is allocateKey to 192.168.1.4
testKey12 is allocateKey to 192.168.1.3
testKey13 is allocateKey to 192.168.1.4
testKey14 is allocateKey to 192.168.1.3
testKey15 is allocateKey to 192.168.1.5
testKey16 is allocateKey to 192.168.1.4
testKey17 is allocateKey to 192.168.1.4
testKey18 is allocateKey to 192.168.1.4
testKey19 is allocateKey to 192.168.1.4
testKey20 is allocateKey to 192.168.1.3
testKey21 is allocateKey to 192.168.1.2
testKey22 is allocateKey to 192.168.1.4
testKey23 is allocateKey to 192.168.1.5
testKey24 is allocateKey to 192.168.1.2
testKey25 is allocateKey to 192.168.1.3
testKey26 is allocateKey to 192.168.1.2
testKey27 is allocateKey to 192.168.1.3
testKey28 is allocateKey to 192.168.1.2
testKey29 is allocateKey to 192.168.1.2
testKey30 is allocateKey to 192.168.1.2
testKey31 is allocateKey to 192.168.1.4
testKey32 is allocateKey to 192.168.1.3
testKey33 is allocateKey to 192.168.1.2
testKey34 is allocateKey to 192.168.1.2
testKey35 is allocateKey to 192.168.1.3
testKey36 is allocateKey to 192.168.1.5
testKey37 is allocateKey to 192.168.1.2
testKey38 is allocateKey to 192.168.1.2
testKey39 is allocateKey to 192.168.1.2       

Endnote

It can be seen that the code we wrote above restores the consistent hash to a high degree. From our above analysis, its performance is quite good. **So does it mean that the consistent hash has no defects? In the next article, we will analyze the pain points of consistent hash. **Writing and writing is only a few thousand words. I wanted to split it into multiple articles, but after thinking about it, in order to ensure the continuity of knowledge, I only divided it into two articles. Consistent hash is indeed worth analyzing and learning.

Originality is not easy , if you feel that you have gained something, please click on this or forward it ruthlessly . Your support is the motivation for my writing.
Grow together and make progress together

Guess you like

Origin blog.csdn.net/weixin_47158466/article/details/108232056