Doubly linked list binary search

Tip: This article is reproduced from Meteor 007. Do you know what makes ordinary linked lists achieve the efficiency of binary search?
If there is any infringement, please contact to delete


foreword

如有侵权 请联系删除 If there is any infringement, please contact to delete. If there is any infringement, please contact to delete.
Copyright statement: This article is an original article of CSDN blogger "Meteor 007", which follows the CC 4.0 BY-SA copyright agreement. For reprinting, please attach the original source link and this statement.
Original link: https://blog.csdn.net/qq_33220089/article/details/114641975


1. Arrays and Linked Lists

1. Array

In computer science, an array data structure (English: array data structure), referred to as an array (English: Array), is a data structure composed of a collection of elements of the same type, and allocates a continuous memory for storage. The storage address corresponding to the element can be calculated by using the index of the element.

Advantages
a. Random access is faster (based on subscript access).
b. Simple to implement and easy to use.
c, The memory address is continuous, which is very friendly to the CPU cache. For example, the high-performance queue disruptor also utilizes the continuity of the CPU cache + array address to greatly optimize the performance.

Disadvantages
a. Memory contiguousness can be an advantage or a disadvantage. If the memory is tight, the array will be greatly limited.
b. When inserting and deleting, it will cause the movement of elements (data copy), which is slower.
c. The size of the array is fixed, which greatly limits the number of elements, and is not friendly to many dynamic data.

2. Linked list

Linked list (Linked list) is a common basic data structure. It is a linear list, but it does not store data in a linear order, but stores a pointer (Pointer) to the next node in each node. Since it does not have to be stored in order, the linked list can reach the complexity of O(1) when inserting, which is much faster than another linear list sequence table, but it takes O(n) to find a node or access a specific numbered node time.

Advantages
a. The linked list structure can make full use of the computer memory space and realize flexible dynamic memory management.
b. Delete insert without moving other elements.
c. It is not limited by the size of the element and can be expanded at will.

Disadvantage
a. The advantages of random read of arrays are lost, and at the same time, the linked list has a relatively large space overhead due to the increase of the pointer field of the node.
b. The efficiency of random access is lower than that of arrays.

3. Time complexity analysis

Understand time complexity and space complexity
Random access
a. Array: O(1)
b. Linked list does not support random access
random insertion, delete
a. Array: ordered array -> O(n) need to maintain index, unordered array - >O(1)
b. Linked list: O(1)

2. Jump table

1. What is a skip table

The concept of linked list is mentioned above. The query time complexity of linked list is: O(n). Even if it is an ordered linked list, it has to traverse and search from the first element of the linked list. The time complexity is high. If you let You come to optimize, do you have any solution? At this time, the skip table appears.

Jump list, full name: jump list, in computer science, jump list is a data structure. It makes the average time complexity of the search and insertion operations of an ordered sequence containing n elements both be:logn

For those who have forgotten the log, you can read this to recall [ What is logarithmic log?

Let's draw an ordinary linked list structure:
insert image description here
this is an ordinary single linked list structure, if we want to find out the node 58, we have to query 1 -> 4 -> 7 -> 15 -> 20 -> 35 -> 50 -> 58, a total of 8 times to find out the results we need, time complexity: O(n), the efficiency can be imagined, then let's try to optimize it together.
Mysql's query efficiency will drop sharply when the amount of data is large. How do they optimize the efficiency? That is the index, here we can also use something like an "index" to optimize this ordinary singly linked list.
insert image description here
We copy the nodes after every two nodes in the original linked list to make an index to facilitate data search

We still continue to search for 58, we only need to go through: 1 -> 7 -> 20 -> 50 -> 58, have you found it? We found the node we wanted after only searching 5 times. Is it efficient? Improved a little bit?
Let me explain this picture now. Since we have established a layer of index, when querying, the first index layer is searched first. When searching the index layer, it is found that 50 and the previous nodes are all less than 58. When traversing to 50 , his next node 66 is larger than the node to be queried, so at this time, the down node in the node 50 (the original linked list node is the pointer of 50) will be found, and then traversing down one will find 58.

The addition of "index" only saves three queries, and the optimization is not obvious. Is this optimization necessary? We have established a layer of "index" here, how many more layers of "index" should we build? What will it be like?

insert image description here
At this time, we found that it only takes 4 times to find the node 58 we need to find. 4 times is half as fast as 8 times. I don’t have too much data here. When there is enough data, the effect will be more obvious.

We call this data structure composed of layer-by-layer indexes: 跳表, now I believe you should have a deeper understanding of what a jump table is, and
you may have another question at this time? In order to build the index layer, we will definitely consume a lot of space. This is a way of exchanging space for time , but when the amount of data is large, will there be insufficient space?

Let's analyze the time/space complexity of jumping tables now.

2. Skip table analysis

Assuming that the number of nodes in the original linked list is: n, we know that we extract a node every two nodes as a node in the index layer, then the number of nodes in the first index layer should be the original linked list node 1/2 of the number, that is, n/2, the number of nodes in the second-level index is 1/2 of the number of nodes in the first-level index, which is 1/4 of the number of nodes in the original linked list, that is, n /4, the number of nodes in the third layer index is 1/2 of the number of nodes in the second layer, 4/1 of the number of nodes in the first layer index, and 1/8 of the number of nodes in the original linked list , and so on, the number of nodes indexed at layer k is 1/2 of the index at layer k-1, then the number of nodes indexed at layer k :
insert image description here
insert image description here

Why is the base 2 missing in logarithms? In fact, in the time complexity analysis, the analysis is only a trend, not a fixed value. The relative number level, the base number can be ignored. The specific related introduction will not be introduced here. If you are interested, you can Baidu it, and It's not complicated. What we need to figure out now is how much is this x?

In the jump table structure diagram drawn above, we extract an index node every two nodes, so when we need to query a certain node in the jump table, we need to traverse from top to bottom in the index layer , each layer will not exceed 3 at most, why 3 instead of 4, 5, 6? Let's draw a simple diagram to illustrate.

insert image description here
We need to find the node 12. When we reach the node 9, we find that 12 > 9 && 12>11. At this time, he will judge the next node 11 of 9, and 11 < 12 && 12<13, so it goes down one level Index, reach the first layer index, and there are only three nodes between 9 - 13, even if it descends to the original linked list, there are only three nodes at most before the two range nodes, so each layer needs to be traversed The maximum number of nodes is obtained in this way, because x is a constant 3, which can be omitted directly, so the final time complexity of the jump table is:log(n)

3. Jump table occupies memory

Although the efficiency of the query has been greatly improved, there is also a very important problem, that is, the jump table will take up additional memory space, because it requires many layers of indexes, so this data structure is not worth recommending. ? Now that redis is officially in use, what are you still afraid of?

Let's analyze the space complexity of the jump table again to see if it consumes a lot of memory?

Assuming that the length of the original linked list is n, then the length of the first layer index is n/2, the length of the second layer index is: n/4, the length of the third layer index: n/8, and the length of the last layer (highest layer) index: 2 , it is obvious that this is a geometric sequence :
insert image description here
the total size occupied by the index layer is: n - 2, plus the original linked list n = 2n -2, so the space complexity of the skip list is: , the O(n)constant can be omitted, which is that each two The space complexity of extracting an index node from one node, how many more nodes to extract an index node? For example, one is extracted every 3, and one is extracted every 5. The calculation method is the same as above, and the resulting space complexity is also: , O(n)but the space is indeed reduced a lot.

Now that we have finished the query, what about the insertion and deletion effects of the jump table? Will it also perform well? Let's take a look together.

Since our linked list is ordered, we have to find the insertion position first when inserting. The query time complexity of the jump table is, after O(logn)finding the position to be inserted is the insertion operation, the insertion time complexity of the single linked list is O(1), I won’t prove it here, so the whole insertion process is equal to finding the insertion position + inserting = O(logn).

In fact, the time complexity of the delete operation is also O(logn), why?
In fact, it is the same as inserting, but there will be an additional operation to find the predecessor node, because deletion will cause the pointers of the front and rear nodes to change. The latter node exists in the currently deleted node, but the former predecessor node does not. The query is also O(logn), so delete O(logn)+O(logn), so the time complexity of deletion is finally: O(logn), but there is one thing to note about deletion, if the node to be deleted is also in the index layer exists, then the nodes of the index layer need to be deleted at the same time , and the problem of the predecessor nodes can be solved by using a doubly linked list

4. Index update

The query, deletion, and addition of jump tables are all excellent in terms of performance, so can it remain so excellent? Since the jump table is also used to store data, it will definitely be accompanied by frequent additions and deletions. Assuming that more data is inserted between index nodes in a certain adjacent interval, it will lead to uneven data distribution. Evenly, the query time efficiency in a certain interval is reduced, and in the worst case it may be degenerated into a single-linked list. All data is in this interval, so we update and maintain the index layer in real time when data is inserted to ensure jumping. The data structure of the table will not degenerate excessively, so how do we maintain the changes in the index layer?

In fact, it is not difficult. The jump table uses a random function to maintain the balance of the data, that is, to keep the data relatively uniform. When we add data to the jump table, a random number is generated by the random function. This random number is the index. The number of layers, and then add the nodes to be added to the index layer nodes in this layer, assuming that the data we need to insert is 7, and the random number generated by the random function is: 2, then the insertion will be as follows As shown in the figure:

insert image description here
Here, the requirements for the random function are very high. It can well ensure the balance of the jump table and will not degrade the performance of the jump table. This is the same as the left-handed/right-handed red-black tree.

Summarize

1. What is a skip table

Skip list, all the way: Jump list In computer science, a jump list is a data structure. It makes the average time complexity of the search and insertion operations of an ordered sequence containing n elements both be:logn

2. Time/space complexity of jump list

Time complexity: lO(logn)l, space complexity: lO(n)l, space complexity will decrease with the increase of index interval distance, but the trend of space complexity is still lO(n)l, because the information stored in the index layer is relative to the original linked list It will be much less, so even the space complexity of O(n) is acceptable.

3. Application of skip table

  1. An ordered collection for redis.
  2. Google's open source key/value storage engine.
  3. HBase MemStore internal storage structure.

4. Dynamic update of jump list

Since the skip table is composed of multi-layer indexes, when inserting frequently, it will cause too much data before the adjacent index nodes at one end. In the worst case, all the data will be concentrated in a certain segment. Between adjacent index nodes, this will cause the jump list to degenerate into an ordinary linked list, and the time complexity will also degenerate to O(n), so at this time, it is necessary to dynamically update the index nodes in the jump list. Currently, through random function implementation.

5. Why does redis use jump list to implement ordered collection

  1. Jump list is a data structure composed of linked list + multi-level index. Through the design idea of ​​exchanging space for time, the "binary search" based on linked list is realized. The time complexity of searching, deleting and inserting is O(logn).
  2. Compared with B-tree, red-black tree, AVL tree, etc., the implementation of skip table is much simpler. The balance maintenance of these types of trees is quite troublesome, while the dynamic index update of skip table is relatively much simpler.
  3. The interval search of the jump table is better than that of the red-black tree.

If there is any infringement, please contact to delete
Copyright statement: This article is an original article of CSDN blogger "Meteor 007", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprinting.
Original link: https://blog.csdn.net/qq_33220089/article/details/114641975

Guess you like

Origin blog.csdn.net/packge/article/details/126803650