Common index structure-skip table

Original author: JeemyJohn

Original address: The principle and realization of skip table

Recommended reading: Manga Algorithm: What is a Jump Table?

1. The principle of skip table

      Anyone who has studied data structure knows that the time complexity of querying an element in a singly linked list is O(n). Even if the singly linked list is ordered, we cannot reduce the time complexity by 2 points. 

Write picture description here

      As shown in the figure above, if we want to query the node with element 55, we must start from the beginning node and loop through to the last node, excluding -INF (negative infinity) and query 8 times in total. So what method can be used to visit 55 with fewer times? The most intuitive, of course, is a new shortcut to access 55. 

Write picture description here

      As shown in the figure above, we want to query the node with element 55, and we only need to search 4 times in the L2 layer. In this structure, querying the element with a node of 46 will cost the most queries 5 times. That is, first query 46 in L2, and find element 55 after 4 queries. Because the linked list is ordered, 46 must be to the left of 55, so there is no element 46 in L2 layer. Then we go back to element 37, and continue searching for 46 at the next layer, L1. Fortunately, we only need one more query to find 46. This cost a total of 5 queries.

So, how can we search 55 faster? With the above experience, it is easy for us to think of and create a shortcut. 

 

Write picture description here



      As shown in the picture above, we only need 2 searches to search for 55. In this structure, query element 46 is still the most time-consuming, requiring 5 queries. That is, first look up 2 times in the L3 layer, then look up 2 times in the L2 layer, and finally look up once in the L1 layer, a total of 5 times. Obviously, this idea is very similar to 2 points, so our final structure diagram should be as shown below.


 

Write picture description here

 

       We can see that the most time-consuming visit 46 requires 6 queries. That is, L4 accesses 55, L3 accesses 21, 55, L2 accesses 37, 55, and L1 accesses 46. We intuitively believe that this structure will make querying an element of an ordered linked list faster. So what is the complexity of the algorithm?

       If there are n elements, because it is 2 points, the number of layers should be log n layers (all logs in this article are based on 2), plus one layer of its own. Take the above picture as an example, if there are 4 elements, then the layers are L3 and L4, plus its own L2, a total of 3 layers; if there are 8 elements, then it is 3+1 layers. The most time-consuming query is naturally to access all layers, and it takes logn+logn, which is 2logn. Why is 2 times the logn? Let’s take 46 in the above figure as an example. The query to 46 needs to access all layers, and each layer has to visit 2 elements, the middle element and the last element. So the time complexity is O(logn).

       So far, we have introduced the most ideal jump list, but what if you want to insert or delete an element in the above picture? For example, if we want to insert an element 22, 23, 24..., naturally in the L1 layer, we insert these elements after the element 21, then what about the L2 and L3 layers? Do we have to consider how to adjust the connection after insertion in order to maintain this ideal skip list structure. We know that the adjustment of a balanced binary tree is a headache, left-handed, right-handed, left-handed, left-handed, left-handed, left-handed, left-handed, left-handed, left-handed, left-handed, and left-handed. Fortunately, we do not need to adjust the connection through complicated operations to maintain such a perfect jump table. There is an insertion algorithm based on probability statistics that can also obtain query efficiency with a time complexity of O(logn). This kind of jump table is what we really want to achieve.

2. Analysis of the realization steps of the jump table

       Let’s discuss insertion first. Let’s look at the ideal jump table structure. The number of elements in the L2 layer is 1/2 of the number of elements in the L1 layer, and the number of elements in the L3 layer is 1/2 of the number of elements in the L2 layer. analogy. From here, we can think that as long as we try to ensure that the number of elements in the upper layer is 1/2 of the elements in the next layer when inserting, our jump list can become an ideal jump list. So how can we ensure that the number of elements in the previous layer is 1/2 of the number of elements in the next layer when inserting? It's easy, just flip a coin! Assuming that element X is to be inserted into the skip list, it is obvious that X must be inserted into the L1 layer. So should I insert X in the L2 layer? We hope that the number of elements in the upper layer is 1/2 of the number of elements in the lower layer, so we have a probability of 1/2 that we want X to be inserted into the L2 layer, then flip a coin, insert it on the front and not insert it on the back. So should L3 insert X? Compared to the L2 layer, we still hope that the probability of 1/2 is inserted, so continue to toss the coin! By analogy, the probability that the element X is inserted into the nth layer is (1/2) n times. In this way, we can insert an element in the jump list.

Here is the above figure as an example: the initial test state of the jump table is as follows, there is no element in the table: 

 

Write picture description here

 

If we want to insert element 2, first insert element 2 at the bottom, as shown below: 

 

Write picture description here

 

Then we toss a coin, the result is heads, then we need to insert 2 into the L2 layer, as shown below: 

 

Write picture description here



Continue to toss the coin, the result is the opposite, then the insertion operation of element 2 stops, and the table structure after insertion is as shown in the figure above. Next, we insert element 33, which is the same as element 2, now insert 33 in the L1 layer, as shown below: 

 

Write picture description here

 

Then toss a coin, the result is the opposite, then the insertion operation of element 33 is over, and the table structure after insertion is as shown in the figure above. Next, we insert element 55, first insert 55 in L1, after insertion, as shown below: 

 

Write picture description here

 

Then toss a coin, the result is heads, then 55 needs to be inserted into the L2 layer, as shown below: 

 

Write picture description here

 

Continue to toss the coin, and the result is heads again, then 55 needs to be inserted into the L3 layer, as shown below: 

 

Write picture description here

 

       By analogy, we insert the remaining elements. Of course, because of the small scale, the result may not be an ideal jump list. But if the scale of the number of elements n is large, students who have studied probability theory know that the final table structure must be very close to the ideal jump table.

       Of course, this kind of analysis is very direct perceptually, but the proof of time complexity is really complicated. I won't go into it here. If you are interested, you can read the paper on the jump table. Let's discuss deletion again. There is nothing to say about deletion. Just delete the element directly, and then adjust the pointer after deleting the element. It is exactly the same as the normal linked list deletion operation. Let's discuss the time complexity again. The time complexity of inserting and deleting is the time complexity of querying the insertion position of the element. This is not difficult to understand, so it is O(logn).

3. Code implementation

In Chapter 2, we use a coin toss to determine the highest level of new element insertion, which of course cannot be implemented in the program. In the code, we use random number generation to get the highest level of new element insertion. Let us first estimate the scale of n, and then define the maximum number of levels maxLevel of the jump table. Then the bottom layer, which is the 0th layer, must insert elements with a probability of 1; the highest layer, which is the maxLevel layer, has the probability of element insertion. Is 1/2^maxLevel.

We first randomly generate an integer r ranging from 0 to 2^maxLevel-1. Then the probability that the element r is less than 2^(maxLevel-1) is 1/2, the probability that r is less than 2^(maxLevel-2) is 1/4,..., the probability that r is less than 2 is 1/2^(maxLevel- 1) The probability that r is less than 1 is 1/2^maxLevel.

For example, suppose maxLevel is 4, then the range of r is 0-15, the probability that r is less than 8 is 1/2, the probability that r is less than 4 is 1/4, the probability that r is less than 2 is 1/8, and r is less than 1. The probability of is 1/16. 1/16 is exactly the probability of inserting elements in the maxLevel layer, 1/8 is exactly the probability of inserting elements in the maxLevel layer, and so on.

Through this analysis, we can first compare r and 1, if r<1, then the element must be inserted below the maxLevel layer; otherwise, compare r and 2, if r<2, then the element must be inserted into the maxLevel-1 layer Below; compare r and 4, if r<4, then the element will be inserted below the maxLevel-2 layer...If r>2^(maxLevel-1), then the element will only be inserted in the bottom layer.

The above analysis is the key to the random number algorithm. The algorithm has nothing to do with the implementation and the language, but it is easier for Java programmers to understand the jump list of the Java code implementation, and I will post someone else’s java code implementation below.

/***************************  SkipList.java  *********************/
 
import java.util.Random;
 
public class SkipList<T extends Comparable<? super T>> {
    private int maxLevel;
    private SkipListNode<T>[] root;
    private int[] powers;
    private Random rd = new Random();
    SkipList() {
        this(4);
    }
    SkipList(int i) {
        maxLevel = i;
        root = new SkipListNode[maxLevel];
        powers = new int[maxLevel];
        for (int j = 0; j < maxLevel; j++)
            root[j] = null;
        choosePowers();
    }
    public boolean isEmpty() {
        return root[0] == null;
    }
    public void choosePowers() {
        powers[maxLevel-1] = (2 << (maxLevel-1)) - 1;    // 2^maxLevel - 1
        for (int i = maxLevel - 2, j = 0; i >= 0; i--, j++)
           powers[i] = powers[i+1] - (2 << j);           // 2^(j+1)
    }
    public int chooseLevel() {
        int i, r = Math.abs(rd.nextInt()) % powers[maxLevel-1] + 1;
        for (i = 1; i < maxLevel; i++)
            if (r < powers[i])
                return i-1; // return a level < the highest level;
        return i-1;         // return the highest level;
    }
    // make sure (with isEmpty()) that search() is called for a nonempty list;
    public T search(T key) { 
        int lvl;
        SkipListNode<T> prev, curr;            // find the highest nonnull
        for (lvl = maxLevel-1; lvl >= 0 && root[lvl] == null; lvl--); // level;
        prev = curr = root[lvl];
        while (true) {
            if (key.equals(curr.key))          // success if equal;
                 return curr.key;
            else if (key.compareTo(curr.key) < 0) { // if smaller, go down,
                 if (lvl == 0)                 // if possible
                      return null;      
                 else if (curr == root[lvl])   // by one level
                      curr = root[--lvl];      // starting from the
                 else curr = prev.next[--lvl]; // predecessor which
            }                                  // can be the root;
            else {                             // if greater,
                 prev = curr;                  // go to the next
                 if (curr.next[lvl] != null)   // non-null node
                      curr = curr.next[lvl];   // on the same level
                 else {                        // or to a list on a lower level;
                      for (lvl--; lvl >= 0 && curr.next[lvl] == null; lvl--);
                      if (lvl >= 0)
                           curr = curr.next[lvl];
                      else return null;
                 }
            }
        }
    }
    public void insert(T key) {
        SkipListNode<T>[] curr = new SkipListNode[maxLevel];
        SkipListNode<T>[] prev = new SkipListNode[maxLevel];
        SkipListNode<T> newNode;
        int lvl, i;
        curr[maxLevel-1] = root[maxLevel-1];
        prev[maxLevel-1] = null;
        for (lvl = maxLevel - 1; lvl >= 0; lvl--) {
            while (curr[lvl] != null && curr[lvl].key.compareTo(key) < 0) { 
                prev[lvl] = curr[lvl];           // go to the next
                curr[lvl] = curr[lvl].next[lvl]; // if smaller;
            }
            if (curr[lvl] != null && key.equals(curr[lvl].key)) // don't 
                return;                          // include duplicates;
            if (lvl > 0)                         // go one level down
                if (prev[lvl] == null) {         // if not the lowest
                      curr[lvl-1] = root[lvl-1]; // level, using a link
                      prev[lvl-1] = null;        // either from the root
                }
                else {                           // or from the predecessor;
                     curr[lvl-1] = prev[lvl].next[lvl-1];
                     prev[lvl-1] = prev[lvl];
                }
        }
        lvl = chooseLevel();                // generate randomly level 
        newNode = new SkipListNode<T>(key,lvl+1); // for newNode;
        for (i = 0; i <= lvl; i++) {        // initialize next fields of
            newNode.next[i] = curr[i];      // newNode and reset to newNode
            if (prev[i] == null)            // either fields of the root
                 root[i] = newNode;         // or next fields of newNode's
            else prev[i].next[i] = newNode; // predecessors;
        }
    }
}

原文地址:https://blog.csdn.net/u013709270/article/details/53470428

 

Guess you like

Origin blog.csdn.net/sanmi8276/article/details/112987067