Kademlia algorithm, an implementation of a distributed hash table (DHT)

Introduction to DHT

The hash table can be regarded as a kv database, and the distributed hash table is different kvs on different machines. Hereinafter, a machine that stores the partial kv is referred to as a node.

For a DHT, the most basic thing is to provide two functions: 1. Store a kv; 2. Enter a key and return the corresponding value. For the second function, the most important thing is to implement the function of inputting a key and returning which nodes have the corresponding value, so that the client can obtain the value by connecting to these nodes. This can be achieved through a centralized index, but this article discusses a decentralized implementation, which must be accessed by visiting several nodes. In order to realize the second function, when implementing the first function, it is necessary to determine which nodes a kv needs to exist according to certain rules, so that the same rules can be applied to obtain these nodes when querying. In addition to these two functions, it is also necessary to consider how the node joins the DHT and how to deal with the node hanging.

Kademlia algorithm

The Kademlia algorithm is an implementation of DHT and is used in IPFS, eMule, BitTorrent, etc. Below we introduce the Kademlia algorithm according to the DHT function mentioned above.

How to decide which nodes exist in a kv

A simple idea is to map the node to the value field of the key, that is, the node also calculates a hash value through a certain hash rule, called node id, the value field is the same as the value field of the key, and then stipulates that each kv It is stored on the node with node id=key, so that the node id can be directly obtained when querying, and then the node information corresponding to the node id can be queried. There are two problems with this method. One is that the number of nodes generally does not cover the entire value range of the key, and node id=k may not exist. The other is that it is not safe to send a value to only one node. Not anymore. So we made a little improvement and introduced the concept of distance. Each value is sent to the k nodes whose node id is closest to the key, so the problem is solved. The Kademlia algorithm does just that.

Enter a key, how to get the value

According to the above discussion, we only need to solve the problem of entering a key, how to find the k nodes whose node id is closest to the key and their node information (ip, port). Since this operation is done by querying several nodes, it can be seen that in addition to storing some values, a node also needs to store the node information of some other nodes. The Kademlia algorithm also does not require a node to store global node information, and only needs to store part of it.

What should I do after I have node information? We introduce a condition.

Condition 1: For any node a and any key j, if there is a node b such that the distance between b and j is less than the distance between a and j, then there must be a node c, so that a stores the node information of c, and the distance between c and j is The distance is less than the distance between a and j.

If the range of distance is a natural number and all nodes satisfy condition 1, when a node a receives an input key j, we can think of a way: find out the k nodes and node information that are closest to j stored by a itself, Select α, ask them to store, who are the k nodes that are closest to j and whose distance is less than the distance between them and j k nodes, select α that have not been asked, continue to ask them, and repeat this process until all the first k nodes have been asked. Due to the existence of condition 1, the algorithm will end, and the result is guaranteed to be globally optimal, and the proof is omitted. The Kademlia algorithm does just that.

So how to satisfy condition 1? The Kademlia algorithm presents an interesting approach. The keys used by the Kademlia algorithm are finite integers. First, the Kademlia algorithm finds a very interesting distance function \oplus , that is, the exclusive OR in the bit operation, the distance between the two keys a, b is a b a \oplus b . We can divide the range of distances into powers of 2: { 0 } , { 2 0 } , [ 2 1 , 2 2 ) , , [ 2 i , 2 i + 1 ) , \{0\}, \{2^0\}, [2^1,2^2),\dots, [2^i,2^{i+1}), \dots Kademlia算法认为每个节点在上述每个距离范围内都要有存储的节点,而且要有k个,不足k个的有多少存多少。这样就满足条件1了。为什么呢?因为异或有一个特别的性质。

性质1: 假设节点a与key j的距离落在区间 [ 2 i , 2 i + 1 ) [2^i,2^{i+1}) ,那么对于任意与a的距离落在区间 [ 2 i , 2 i + 1 ) [2^i,2^{i+1}) 的节点c,都有c与j的距离小于a与j的距离。

证明: 为表述方便,我们用a,j与c代表相应节点或key对应的哈希值,用 a i a_i 表示a写成二进制数的第i位(最低位为 a 0 a_0 ),即 a i a / 2 i m o d 2 a_i\equiv a/2^i \mod 2 。设 a j [ 2 i , 2 i + 1 ) a\oplus j \in [2^i,2^{i+1}) ,且 a c [ 2 i , 2 i + 1 ) a\oplus c \in [2^i,2^{i+1}) 。 可得对任意 l > i l>i ,有 a l j l = a l c l = 0 a_l\oplus j_l=a_l\oplus c_l=0 ,且 a i j i = a i c i = 1 a_i\oplus j_i=a_i\oplus c_i=1 。 根据异或的性质: x y = x z y z = 0 x\oplus y = x\oplus z \Rightarrow y\oplus z=0 ,我们有 对任意 l i l\geq i c l j l = 0 c_l\oplus j_l=0 ,即 c j < 2 i a j c\oplus j<2^i\leq a\oplus j 。得证。

性质1除了能证明条件1外,还能得出算法的时间复杂度。因为 c j < 2 i c\oplus j<2^i c j c\oplus j 最多属于 [ 2 i 1 , 2 i ) [2^{i-1},2^{i}) 区间,最少比 a j [ 2 i , 2 i + 1 ) a\oplus j \in [2^i,2^{i+1}) 左移了一个区间。因此当α与k均为1时,一次请求最多需要询问log N个节点,N为key的最大值。

新节点如何加入DHT

新节点a想要加入一个DHT,首先需要通过外部方法获得DHT中的一个节点b的节点信息,然后计算自己的node id,然后向b询问离a最近的k个节点。Kademlia算法规定当收到其他节点的信息时,会尝试存储这个节点的节点信息,当该节点所在的距离区间内的节点数少于k个时,一定要存下来,当节点数大于等于k个时,节点会ping一下最久没见过的那个节点,如果ping不通,则换成这次请求的节点,如果ping得通,则不存这次的节点。因此a的这次询问既收集到一些节点信息,又向一些节点宣告了自己的存在。一次询问很可能达不到每个区间存k个节点的要求,哪个区间达不到要求,就随机生成一个这个区间的值,并且询问离这个值最近的k个节点,直到满足要求。

我们称上述“一个节点在每个距离范围内都要有存储的节点,而且要有k个,不足k个的有多少存多少”为条件2。那么上面的过程能否维护条件2呢?即下面命题1是否成立?

命题1: 假设新节点a加入DHT前,DHT中的每个节点均满足条件2,而且节点a通过上述方法加入DHT。那么对于任意DHT中的其他节点b,如果a与b的距离 [ 2 i , 2 i + 1 ) \in [2^i,2^{i+1}) ,且在a加入前,b存储的这个范围内的节点数小于k,那么在a加入后,b一定会存储a。

答案是否定的。可能存在节点b查不到节点a的情况。但是作者说取个大点的k和让node id均匀分布会让这个概率很小。

新加入的节点是没有value的,为了维护kv存储规则,Kademlia算法规定每个节点要定时把自身存储的kv发给离key最近的k个节点要求它们存下。理论上,新节点a会存下a在离key最近的k个节点之中的所有kv。原离key最近的第k个节点所存的value将会变成浪费。a的加入仅会影响部分kv的分布。

如何处理节点挂了

如上文所述,节点删除一个挂掉节点的节点信息是在收到其他节点的信息时进行的,这可能不及时。同时新节点可能还没存储到需要存的kv,因此距离key最近的k个节点中并不保证每个节点都有这个kv。但是Kademlia算法认为只要k个节点中有一个节点有这个value就能成功获得这个value,所以k能提供一定的容错。

参考文献

本文简要介绍了Kademlia算法,还有很多细节都没有介绍,大家可以看看论文或其他资料。

Petar Maymounkov, David Mazières: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. IPTPS 2002: 53-65 en.wikipedia.org/wiki/Kademl… www.yeolar.com/note/2010/0…

Guess you like

Origin juejin.im/post/7119009232006938631