Jump table (illustration)


1.1 Jump table diagram

Jumping table is a kind of algorithm similar to "division" based on linked list. It can quickly add, delete, modify, and check operations.

Let's first look at how to find a singly linked list:

Insert picture description here

Linked list, I believe everyone is familiar with it. Maintaining an ordered linked list is a very simple matter. We all know that in an ordered linked list, the time complexity required to find a certain data is O(n) .

How to improve query efficiency? If we add an index to the singly linked list, the query efficiency will be improved.

Insert picture description here

As shown in the figure, when we extract an element from every other node to the upper layer, this layer is called an index, and the index nodes of the upper layer are all added with a down pointer to point to the original node.

When we look up element 11, the singly linked list needs to be compared 6 times, while the indexed two-level linked list only needs to be compared 4 times. When the amount of data increases to a certain level, the efficiency will be significantly improved.

If we add a few more levels of indexes, the efficiency will be further improved. This structure of linked list plus multi-level index is called skip list.
Insert picture description here

The query time complexity of the skip table can reach O(logn)

1.2 Why Redis's ordered set SortedSet should be implemented with a jump table

The skip table is such a data structure, the node skips a part, thus speeding up the query. What is the difference between a jumping watch and a red-black tree? Since the algorithm complexity of the two is similar, why does Redis's ordered set SortedSet use a jump table implementation instead of red-black trees?

Ordered collections in Redis are implemented by jumping tables. Strictly speaking, hash tables are actually used.

If you check the Redis development manual, you will find that the core operations supported by ordered collections in Redis mainly include the following:

  • Insert a piece of data;

  • Delete a piece of data;

  • Find a data;

  • Find data according to the interval (for example, find data with a value between [100, 356]);

  • Iteratively output an ordered sequence.

Among them, the operations of inserting, deleting, searching, and iteratively outputting an ordered sequence can also be completed by the red-black tree, and the time complexity is the same as that of the skip table. However, the efficiency of the red-black tree is not as high as that of looking up the data according to the interval. For the operation of searching data according to the interval, the jump table can achieve O(logn) time complexity to locate the starting point of the interval, and then traverse backwards in the original linked list in order. This is very efficient.

Of course, there are other reasons why Redis uses skip tables to implement ordered collections. For example, skip tables are easier to implement in code. Although the implementation of the jump table is not simple, it is easier to understand and write than the red-black tree, and simplicity means good readability and less error-prone. Also, the jump table is more flexible, it can effectively balance the execution efficiency and memory consumption by changing the index construction strategy

1.3 Query operation of the jump table

If we want to query 11, then we start from the top layer and find that the next one is 5, and the next one is 13, which is already greater than 11, so we enter the next layer, the next one is 9, find the next one, and the next one is again It is 13, enter the next level again. Finally found 11.

Insert picture description here

Is it very simple? We can summarize the search process as a binary expression (is the next one greater than the result? Next: the next level). It is very important to understand the query process of the jump table. Try to query other numbers. As long as you understand the query, the latter two are very simple.

1.4 Insertion of Jump Table

Next, let’s look at the insertion. Let’s look at the ideal jump table structure. The number of elements in the L2 layer is 1/2 of the number of elements in the L1 layer, and the number of elements in the L3 layer is 1/2 of the number of elements in the L2 layer. And so on. From here, we can think that as long as we try to ensure that the number of elements in the upper layer is 1/2 of the elements in the next layer when inserting, our jump list can become an ideal jump list. So how can we ensure that the number of elements in the previous layer is 1/2 of the number of elements in the next layer when inserting? It's simple, just flip a coin!

Suppose the element X is to be inserted into the skip list. Obviously, X must be inserted into the L1 layer. So should X be inserted into the L2 layer? We hope that the number of elements in the upper layer is 1/2 of the number of elements in the lower layer, so we have a probability of 1/2 that we want X to be inserted into the L2 layer, then flip a coin, insert it on the front side, and not insert it on the back side. So, should X be inserted into L3? Compared to the L2 layer, we still hope that the probability of 1/2 is inserted, so continue to toss the coin! By analogy, the probability that the element X is inserted into the nth layer is (1/2) n times. In this way, we can insert an element in the jump list.

The initial test state of the jump table is as shown in the figure below, there is no element in the table:

Insert picture description here

If we want to insert element 2, first insert element 2 at the bottom, as shown below:

Insert picture description here

Then we toss a coin and the result is heads, then we need to insert 2 into the L2 layer, as shown below

Insert picture description here

Continue to toss the coin, the result is the opposite, then the insertion operation of element 2 stops, and the table structure after insertion is as shown in the figure above. Next, we insert a new element 33, which is the same as the insertion of element 2. Now insert 33 in the L1 layer, as shown below:

Insert picture description here

Then toss a coin, the result is the opposite, then the insertion operation of element 33 is over, and the table structure after insertion is as shown in the figure above. Next, we insert a new element 55, first insert 55 in L1, after inserting it as shown below:

Insert picture description here

Then toss a coin, the result is heads, then 55 needs to be inserted into the L2 layer, as shown in the following figure:

Insert picture description here

Continue to toss the coin, and the result is heads again, then 55 needs to be inserted in the L3 layer, as shown in the following figure:

Insert picture description here

Continue to toss the coin, and the result is heads again, so insert 55 in L4, and the result is as follows:

Insert picture description here

Continue to toss the coin, the result is the opposite, then the insertion of 55 ends, and the table structure is as shown in the figure above.

Of course, it is impossible to increase the number of layers indefinitely. In addition to the number of layers obtained according to a random algorithm, in order to prevent the number of layers from being too large, there will be a limit of the maximum number of layers MAX_LEVEL, and the number of layers generated by the random algorithm must not be greater than this value.

By analogy, we insert the remaining elements. Of course, because of the small scale, the result may not be an ideal jump list. But if the scale of the number of elements n is large, students who have studied probability theory know that the final table structure must be very close to the ideal jump table.

Of course, this kind of analysis is very direct perceptually, but the proof of time complexity is really complicated. I won't go into it here. If you are interested, you can read the paper on the jump table.

1.5 Deletion of Jump Table

Let's discuss deletion again. There is nothing to say about deletion. Just delete the element directly, and then adjust the pointer after deleting the element. It is exactly the same as the ordinary linked list delete operation.

The time complexity of insertion and deletion is the time complexity of querying the insertion position of the element, which is not difficult to understand, so it is O(logn).


Back to ◀ Crazy Maker Circle

Crazy Maker Circle-Java high-concurrency research community, open the door to big factories for everyone

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/109488587