skiplist jump table

What is the jump table?

  SkipList are widely used in leveldb, redis and lucence, the data structure is more efficient. Because of the simplicity of its code and the realization of the principle, more acceptable. We first look at the definition of SkipList, why is it called jump table?

  “     Skip lists  are data structures  that use probabilistic  balancing rather  than  strictly  enforced balancing. As a result, the algorithms  for insertion  and deletion in skip lists  are much simpler and significantly  faster  than  equivalent  algorithms  for balanced trees.   ”

  Translation: jump table using a probabilistic balancing technology instead of mandatory balanced, therefore, to insert and delete nodes is more simple and efficient than the traditional balanced tree algorithms. 

Understanding jump table

 Search sorted list: Consider an ordered list

      

 

  From the ordered list search elements <23, 43, 59>, the number of required comparisons were <2, 4, 6>, a total number of times the comparison is 2 + 4 + 6 = 12 times. There is no optimization algorithm do? List is in order, but you can not use binary search. Similar binary search tree, we extract some nodes out as an index. Obtain the following structure:

         

 

  Extracted as an index, so when the search can reduce the number of comparisons. 
 We can then extract some elements from an index, the index as a secondary, tertiary index ... 

           

 Here the elements much, fail to reflect the advantages, if a sufficient number of elements, such an index structure can reflect the advantage here.

Jump table

 The following structure is a jump table is: 
 where -1 represents the minimum INT_MIN, linked list, INT_MAX 1 represents the maximum value, the linked list.

               

 Jump table has the following properties:
  (1) a structure consisting of many layers
  (2) each layer is an ordered list
  (3) the lowest level (Level 1) contains a linked list of all the elements
  (4) If an element appears in Level i in the list, the list below it will also appear in the Level i.
  (5) Each node contains two pointers, a pointer to the next element in the same list, a pointer to one of the following elements.

Search jump table

              

 

 Examples: find elements 117
  (1) 21 Comparative larger than 21, looking to the back
  (2) 37 comparison, large than 37, less than the maximum value of the list, from below 37 to start looking for a layer
  (3) 71 Comparative than 71 large, smaller than the maximum value of the list, from below 71 to start looking for a layer
  (4) Comparative 85, greater than 85, looking from the back
  (5) Comparative 117, is equal to 117, to find the node.

Specific search algorithm is as follows:

/ * If x is present, return to where the node x, 
 * otherwise successor node x * /   
Find (x)    
{   
    P = Top;  
     the while ( . 1 ) {  
         the while (p-> next-> Key < x)   
            P = P -> Next;  
         IF (p-> Down == NULL)   
             return p-> Next;   
        P = p-> Down;   
    }   
}

Insert jump table

 First determine the number of layers of the element to be occupied by K (by way of lost coin, which is totally random ) is 
 then Level 1 ... Level K individual layers are inserted into the list of elements. 
 Examples: Insert 119, K = 2 

             

 

 If K is greater than the number of layers of the list, who wants to add a new layer. 
 Examples: Insert 119, K = 4 

             

 

 丢硬币决定 K 
 插入元素的时候,元素所占有的层数完全是随机的,通过一下随机算法产生: 

int random_level()  
{  
    K = 1;  

    while (random(0,1))  
        K++;  

    return K;  
} 

 

  相当与做一次丢硬币的实验,如果遇到正面,继续丢,遇到反面,则停止,
  用实验中丢硬币的次数 K 作为元素占有的层数。显然随机变量 K 满足参数为 p = 1/2 的几何分布,
  K 的期望值 E[K] = 1/p = 2. 就是说,各个元素的层数,期望值是 2 层。

  跳表的高度。
    n 个元素的跳表,每个元素插入的时候都要做一次实验,用来决定元素占据的层数 K, 跳表的高度等于这 n 次实验中产生的最大 K,待续。。。
  

  跳表的空间复杂度分析
    根据上面的分析,每个元素的期望高度为 2, 一个大小为 n 的跳表,其节点数目的期望值是 2n。

跳表的删除
 在各个层中找到包含 x 的节点,使用标准的 delete from list 方法删除该节点。
 例子:删除 71


               

 

skiplist与平衡树、哈希表的比较

  • skiplist和各种平衡树(如AVL、红黑树等)的元素是有序排列的,而哈希表不是有序的。因此,在哈希表上只能做单个key的查找,不适宜做范围查找。所谓范围查找,指的是查找那些大小在指定的两个值之间的所有节点。
  • 在做范围查找的时候,平衡树比skiplist操作要复杂。在平衡树上,我们找到指定范围的小值之后,还需要以中序遍历的顺序继续寻找其它不超过大值的节点。如果不对平衡树进行一定的改造,这里的中序遍历并不容易实现。而在skiplist上进行范围查找就非常简单,只需要在找到小值之后,对第1层链表进行若干步的遍历就可以实现。
  • 平衡树的插入和删除操作可能引发子树的调整,逻辑复杂,而skiplist的插入和删除只需要修改相邻节点的指针,操作简单又快速。
  • 从内存占用上来说,skiplist比平衡树更灵活一些。一般来说,平衡树每个节点包含2个指针(分别指向左右子树),而skiplist每个节点包含的指针数目平均为1/(1-p),具体取决于参数p的大小。如果像Redis里的实现一样,取p=1/4,那么平均每个节点包含1.33个指针,比平衡树更有优势。
  • 查找单个key,skiplist和平衡树的时间复杂度都为O(log n),大体相当;而哈希表在保持较低的哈希值冲突概率的前提下,查找时间复杂度接近O(1),性能更高一些。所以我们平常使用的各种Map或dictionary结构,大都是基于哈希表实现的。
  • 从算法实现难度上来比较,skiplist比平衡树要简单得多。

 

Redis为什么用skiplist而不用平衡树?

There are a few reasons:

1) They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels 
will make then less memory intensive than btrees.

2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache 
locality of skip lists is at least as good as with other kind of balanced trees.

3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) 
with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

  这里从内存占用、对范围查找的支持实现难易程度这三方面总结的原因。  

 

 

资料出处:https://blog.csdn.net/u014427196/article/details/52454462

 

Guess you like

Origin www.cnblogs.com/myseries/p/11442335.html