[Deterministic Skip Lists]1-2型确定性跳跃表

Skip list最初由William Pugh提出，它是靠类似抛硬币的方式决定每个元素的层数，因此结果是随机的，它有概率(很小，但存在)出现最坏情况使得查找的时间复杂度退化为O(n)，最好的情况下是O(lgn)。随机的方式虽然有这种情况出现，但是它简单，不用人为的手段来平衡。因此，为了避免最坏的情况出现，有人在原来的基础上进行了改进，利用人为的手段来使得任何情况下整个跳跃表都处于相对平衡的状态。这就是Deterministic Skip Lists的由来了，确定性跳跃表是直接软件翻译的…具体是不是这样我也不确定，网上搜的跳跃表都是最初的那个随机性跳跃表。确定性跳跃表的介绍我是在Wikipedia提供的一篇文章中看到的，这篇博客主要是翻译下相关介绍，顺带理解下确定性跳跃表的原理。翻译并不专业，是结合自己的理解和查找单词来翻的，当然也不是每个字都翻译上去。

Chapter 4
Deterministic Skip Lists

原文很长，并不只讲确定性跳跃表，这是第四章。

The good average case performance of the PSL discussed in the previous chapter,
although independent of the input, does depend on the random number generator
behaving “as expected”.
随机性跳跃表(Probabilistic Skip Lists)在最佳情况下的性能并不会受元素输入情况的影响，但是依赖于元素层级的随机结果是否符合预期。

Should this not be the case at a particular instance (if, for example, the random number generator creates elements of equal heights, or of decreasing heights left-to-right), the PSL may degenerate into a structure worse than a linear linked list.
如果这个跳跃表不符合预期，而是处于一种特殊的状态(比如，每个元素的随机层级的高度一样，或从左往右，层级高度依次递减)，那么这个随机性跳跃表的性能就会衰退到和普通链表的一样。

It is therefore natural to try to explore techniques based on the skip list notion, that will lead to good worst case performance.
因此很自然就会想到，在原来的跳跃表的基础上进行一些改进，以提升在最坏情况下的性能。

As discussed in Section 1.1, there exist several balanced search tree schemes (AVL, 2-3, red-black) which have guaranteed logarithmic worst case search and update costs.
1.1节中提到过，现有的几种平衡搜索树结构(AVL树，2-3树，红黑树)，在最坏情况下的搜索和更新成本都满足对数级别。

A general problem though with these schemes is that they are above the threshold of difficulty for most programmers to implement in virtually any case, except of course in B-tree packages.
一个普遍的问题是，这些树结构的实现对于大多程序员来说门槛还是较高的，当然除了B树。

The solutions that we propose in this chapter are competitive in terms of space and time with balanced search trees, and, we feel, inherently simpler when taken from first principles.

我们在这里提出的确定性跳跃表的解决方案在空间和时间复杂度上和那些平衡搜索树都有得一比，并且我们觉得，即使是第一次接触这个概念，也会比较容易理解。

They also give rise to a hybrid data structure, halfway between the standard search tree and the original skip list.
这个混合型数据结构(指确定性跳跃表既有二叉搜索树的优点也有跳跃表的优点)，它的实现难度介于标准的搜索树和原始的跳跃表之间

确定性跳跃表和对应的2-3树结构。图片截自原文。
这里写图片描述

4.1 The Starting Point

这小节主要介绍如何得到一个确定性跳跃表，以及这种跳跃表的几个主要操作。

A natural starting point is the perfectly balanced skip list, a skip list in which every kth element of height at least h is also of height at least h+1 (for some fixed k).
一个自然的切入点当然是从完美的平衡跳跃表开始，这种跳跃表，它的第k个高度至少为h的元素，它的高度至少也为h+1(对于一些修复后的k)【为什么高度为h，也为h+1，因为插入或删除一个元素后，跳跃表就需要修复，以达到平衡。这时，有些元素的高度会增加，不会减少。可以在一个平衡的跳跃表上试着删除第k个元素或在第k的位置插入元素，观察平衡后的第k个元素的高度。】

Although the search cost in balanced skip lists is logarithmic, the insert and delete costs are prohibitive.
虽然搜索操作的时间复杂度在平衡的跳跃表上是对数级别的，但是插入和删除操作的开销就很大了。(因为插入或删除元素会破坏表的平衡，这时就需要人为的去修复这个表，修复操作需要做挺多事的。这就是为什么随机性跳跃表要简单的原因了，随机比人为控制要简单的多。)

We should therefore examine the consequences of relaxing a bit the strict requirement “every kth element”.
因此我们需要对每第k个元素的检查结果的要求降低一点。

Assuming that a skip list of n keys has a 0th and a (n+1)st element of height equal to the height of the skip list, we will say that two elements are linked when there exists (at least) one pointer going from one to the other.
假设含有n个键的跳跃表的第0个到第n+1个元素中有两个元素的高度和跳跃表的高度一样，并且存在(至少)一个指针从一个元素指向另一个元素，我们就说这两个元素是链在一起的。

Given two linked elements, one of height exactly h (h > 1) and another of height h or higher, their gap size will be the number of elements of height h−1 that exist between them.
两个链在一起的元素，其中一个的高度为h(h > 1)，另一个的高度大于等于h，则它们两个的间隔大小为在h - 1的层级中存在于它们之间的元素个数。

For example, in the skip list of Figure 3.1 the gap size between 19 and 30 is 0, and in the skip list of Figure 4.1(a) the gap size between 10 and 25 is 2, whereas the gap size between −∞ and 3 is 1.
举个例子，图3.1中，元素19和30之间的间隔大小为0，图4.1(a)中，元素10和25之间的间隔大小为2，而−∞到3之间的间隔为1。

图3.1，截自原文。图4.1在上面有。
这里写图片描述

A skip list with the property that every gap is of size either 1 or 2 will be called a 1-2 Deterministic Skip List (DSL).
拥有间隔大小为1或2这种特性的跳跃表被称为1-2型确定性跳跃表

As we see from Figure 4.1, there exists a one-to-one correspondence between 1-2 DSLs and 2-3 trees.
从图4.1可看出，每个1-2型确定性跳跃表都有一个对应的2-3树。

A search for a key in a 1-2 DSL is performed in the same manner as in Pugh’s PSL. One may of course observe that after 2 key comparisons at the same level, the only next legal step is to drop down a level. Therefore, it is possible to throw in an extra line of code and save up to 1 key comparison per level.
在1-2型确定性跳跃表中查找一个键的方式和Pugh提出的随机性跳跃表中的一样。你会发现在同一层级中对两个键进行比较后，下一步操作是降到下一层。因此，可以通过额外的代码，使得每层可以省去至少一个键的比较

An insertion of a key in a 1-2 DSL is made by initially searching for the key to be inserted, and by subsequently inserting the key as an element of height 1 in the appropriate spot. This may, of course, trigger the invalid configuration of 3 elements of height 1 “in a row”.
插入一个键，首先找到要插入的位置，然后将该键插入到对应位置，并将该键的层级高度置为1。这样可能会使得一行中出现”连续”(这个连续并非指物理上的连续)3个元素的高度都为1，这是不符合要求的。

This is easily rectified by letting the middle of these 3 elements grow to height 2. If this now results in 3 elements of height 2 in a row, we let the middle of these 3 elements grow to height 3, etc.
这种情况可以很容易地通过使中间那个元素的高度上升为2来修复。如果修复后又使得一行中有”连续”(这个连续并非指物理上的连续)3个元素的高度都为2，那么我们继续将中间元素的高度上升为3，以此类推。

For example, the insertion of 15 in the skip list of Figure 4.1(a) will cause 13 to grow from height 1 to height 2, that will cause 17 to grow from 2 to 3, and that will cause 17 again to grow from 3 to 4.
举个例子，在图4.1(a)的跳跃表中插入元素15，会导致元素13的高度从1上升到2，然后导致元素17的高度从2上升到3，然后再次导致元素17的高度从3上升到4。

A deletion of a key from a 1-2 DSL is done in a way completely analogous to the way it is done in the corresponding 2-3 tree. The deletion of 5 for example from the 1-2 DSL of Figure 4.1(a), will cause 3 to shrink, and this will cause 10 to shrink and 17 to grow.
删除一个键，和对应的2-3树中的删除操作类似。例如在图4.1(a)中删除元素5，会导致元素3的高度下降，同时导致元素10的高度下降和元素17的高度上升。

1-2型确定性跳跃表的原理大致就是这样了，作者还分析了时间和空间复杂度，讲了元素的垂直结构是用数组或链表的区别什么的，当然还有变种跳跃表。翻译就先翻这么多，后面有兴致再把余下部分翻完吧。 :)

References
Skip Lists and Probabilistic Analysis of Algorithms 在页面底部可下载到原版PDF文件

[Deterministic Skip Lists]1-2型确定性跳跃表

猜你喜欢