Mysql Index talk and redis jump table --- ordered set zset redis underlying data structure uses the principle of a jump table time complexity of O (logn) (Ali)

The reason redis jump table without using a B + number is: redis is a memory database, and B + tree is purely for this IO mysql database to prepare. B + number of each node of the tree is a size (Ali interview) mysql partition page

There are a few companion: Introduction of B + mysql reference index principle: a step by step analysis of why the B + tree structure for index principle and as an index (Ali interview)

Reference: Kafka used to live how to achieve high concurrent memory - how to find data a consumer needs (Ali)

Reference: binary search method: Time various sorting algorithms complexity and space complexity (Ali)

About mysql storage engine introduced, including the default indexing reference: MySql multi-storage engine architecture, the difference between the default InnoDB engine and MYISAM of (pieces of Ali)

Knock on the blackboard:

Each level can be traversed three nodes, and jump table height h, so every time to find a node, number of nodes is required to traverse  3*跳表高度 , so ignore time after the low-order terms and coefficients of complexity is ○ (㏒ n), the spatial complexity is O (n) 

data structure The principle key query Find efficiency Storage size Insert, delete efficiency
Hash Hash table Supports single key Close O (1) Small, in addition to data without additional storage O (1)
B + Tree Balanced binary tree expansion comes Single key, range, tab O(Log(n) In addition to data, more than about pointer, and the leaf node pointer O (Log (n), need to adjust the structure of the tree, the algorithm is more complex
Jump table Ordered list extension from Single key, pagination O(Log(n) In addition to data, more than a pointer, but the pointer in each node is less than <2, it is smaller than the B + tree space O (Log (n), only the processing chain, relatively simple algorithms

 LSM structure of interest may have a look cassandra vs mongo (1) Storage Engine

problem

If you are confused or little knowledge of the following questions, please read on, I believe that this article will certainly help you

  • How to implement mysql index
  • Mysql any difference between the B + tree index structure and hash. Respectively, for what the scene
  • Index database can have other achieve it
  • redis jump table is how to achieve
  • Table jump and B + tree, LSM tree there and the difference between them

Resolve

First, why should mysql index and redis jump table together to discuss it, because they are solving the same problem, for solving the problem of data collection to find that according to the specified key, quickly found its place (or the corresponding value)

When you stand on this point to think about the problem, but also do not know the difference between B + tree index and hash indexes do

Find the problem of data collection

Now we will clear border demarcation problem areas, find problems to solve data collection. This one needs to consider what issues it

  1. What needs to find ways to support, single key / multi-key / range search,
  2. Insert / Delete efficiency
  3. Find efficiency (ie time complexity)
  4. Storage size (spatial complexity)

We look at several commonly used to find the structure

hash

 Here Insert Picture Description

a hash key, value form, by a hash function, according to the key value to quickly find

About hash algorithm, which is the principle p3 depth Ali, I wrote several blog: especially the last one resize, resize and circumstances before and after the hashmap,

      Reference: The principle of HashMap - Hash list

      Reference: the Hashtable data storage structure - traversal rule, type of Hash why complexity are O (1) - Analysis source 

      Reference: the HashMap, the HashTable, HashSet, the time complexity of the TreeMap   

      Reference: HashMap HashMap principle underlying implementation differs HashTable / HashMap with distinction / HashSet 

      Reference: ConcurrentHashMap principle analysis (1.7 and 1.8) -put and get twice Hash arrived at the designated HashEntry 

resize Reference: HashMap multithreading problem analysis - normal and abnormal rehash1 (Ali)

B + tree:

Note that this is about the summary of B + tree, if you grasp this level is not enough,

Please refer to the detailed B + tree principle: a step by step analysis of why the B + tree structure for index principle and as an index (Ali interview)

Data in the B + tree leaf node, non-leaf nodes store the index

 

Here Insert Picture Description

 

B + tree is a balanced binary tree of evolution on the basis of why we did not learn a B + tree and jump on the table this structure algorithm class too. Because they are derived from engineering practice, we make a compromise on the basis of theory.

B + tree is first ordered structure, in order not to tree height is too high, affecting search efficiency in the leaf node is not a single data store, but a page of data to improve search efficiency, and in order to better support range queries , B + tree leaf nodes in the leaf nodes of non-redundant data, to support page, connected by pointers between the leaf nodes.

Jump table  

 

Jump table: Why Redis must use the jump table to achieve an ordered collection? 

On several major learning binary search algorithm, but the underlying binary search is dependent on the characteristics of an array of random access, so only use arrays to implement. If the data is stored in a linked list, no way to use a binary search yet? 

At this point jump table appeared, 跳表(Skip list) in fact, it is the transformation generated on the basis of the list. 

Jump table is a comparison of various aspects of performance are excellent dynamic data structure that can support rapid insertion, deletion, lookup operation is not complicated to write, or even replace  red-black tree? ?

A total of 5 Redis data structure, comprising:

 

1, String (String)
Redis KV for high efficiency operation can be directly used as a counter. For example, the number of online statistics, etc., the other type is a string of binary storage security, so you can use it to store images and even video.

2, hash (hash)
stored key-value pair may be generally used to keep the basic attribute information of an object, e.g., user information, product information, etc. In addition, since the size of the hash is used when the configuration is smaller than the size ziplist structure, comparison save memory, so that for a large number of data storage may be considered to use the hash to achieve the amount of fragmentation of data compression, memory saving purposes, e.g., for high-volume commodity image corresponding address name. For example: a fixed commodity code is 10 bits, the former may be selected as a key hash 7 after a three Field, picture as an address value. Each hash table so that no more than 999, as long as the hash-max-ziplist-entries redis.conf was changed to 1024, can.
3. List (List)
list type, can be used to implement the message queue, you can also use it to provide a range command, do pagination queries.

 

4, the set (Set)
collection, ordered list of integers can be used directly set. Some weight functions may be used to, for example, a user name can not be repeated, etc. In addition, also can be set intersection, and set operations, certain elements common to find

 

5, ordered set (zset)
ordered set, can be used to find the range, leaderboards or topN function.

The fifth of which is to use an ordered collection of zset jump table to achieve. Why then would choose to use Redis jump table to achieve an ordered collection of it?  

First, how to understand the jump table? 

For a single list, we find certain data, can only traverse the list from beginning to end, then the time complexity is ○ (n). 

 

Single list 

So how to improve the efficiency of a single list to find it? See the figure, the establishment of a list  索引, each of the two nodes to a node on the extraction level is extracted to this level is called  索引 or  索引层

 

The first level index 

Development often used way of processing, the value stored in the hashmap type is a list, where you can put the index key as the hashmap, each two nodes as a value corresponding to each key list. 

Therefore, to find 13, the node will not require full traverse before 16 again, need only to traverse the index and found 13, then the next node is found 17, it must be between 16 [13, 17], and this when lowered to the original position of the chain in the surface layer 13, find 16, after adding a layer index to find a node number of nodes need to be traversed is reduced, that is to say to improve the search efficiency 

So we add an index it?
Establish a similar way with the previous index, the basis of our first level of the index on, out of every two nodes on a node to the second level index. At this time again to find 16, 6 need only to traverse the nodes, number of nodes need to be traversed again reduced. 

 

The second level of the index 

When the number of nodes of the time, this way of adding an index, make queries more efficient is very obvious,

 

This linked list structure Cadogan level of the index is to jump table. 

Second, with a jump-table query in the end how fast 

In a single linked list, a data query time complexity is ○ (n), then jump in a table in a multi-level index, a data query time complexity of how much is it? 

Following the above example, both nodes for each index extracting a one, an index of each of two further extracted a secondary index, the number of nodes in the first level index is about the  n/2second node a level index the number is  n/4, the node k-th stage is the index number  n/2^k

Assume that the establishment of a total of h-level index, the highest level of the index has two nodes (if only one of the most advanced indexing node, then this would not achieve the role of a judge index range, then little significance), so there are: 

 

Analysis of time complexity 

 

The number of nodes per level traversal 

According to that figure, each level three nodes to traverse, and jump table height is h, so that each time a node to find, as the number of nodes need to be traversed  3*跳表高度 , so ignore time after the low-order terms and the coefficient complexity is ○ (㏒n) 

In fact, this time based on the equivalent of a single linked list to achieve a binary search. But this query efficiency improvement, due to the establishment of many-level index, will not be a waste of memory? 

Third, the jump table is not a waste of memory? 

To analyze the jump table space complexity. Is O (n)

 

Each index node points 

 

Space complexity 

So if the n-node contains a single linked list structure to jump table, we need additional storage space and then close n nodes, then how to reduce the memory space occupied by the index it? 

In front of each of the two nodes node to draw a higher index, if each of our three, or five per node, node to draw a higher index, is not it would not have so many index nodes of it? 

 

Each node extracts a superior three index 

The same process space and the computational complexity of the previous, although the final space complexity is still ○ (n), but we know that the use of large ○ represents low-order terms or coefficients ignored law, in fact, it will also have an impact, but we In order to focus on higher-order terms and they are ignored. 

 

Space complexity 

In fact, in the actual development, we do not need to be too concerned about the extra space occupied by the index, when learning data structures and algorithms, we are accustomed to be treated as an integer data, but the actual development, the original list is likely to be stored the object is great, and the index node only needs to store critical value (value used for comparison), and a few pointers (pointers to find lower index), do not need to store a complete list of the original object, so the object when the ratio of the index many large node, that index is taken of the additional space will be ignored. 

Fourth, efficient dynamic insertion and deletion 

The jump table dynamic data structures, not only to support the search operation, also supports dynamic insertions, deletions, and insertions, deletions time complexity is ○ (㏒n). 

For a simple single linked list, each node needs to traverse to find the insertion position. But for the jump table, because it's time to find a node complexity is ○ (㏒n), so find a location data should be inserted here, the time complexity is ○ (㏒n). 

 

Insert 

Then delete it? 

 

Deletion 

Fifth, the jump table index dynamically updated 

When we stop inserting data into the jump table, if we do not update the index, it may be between a 2 index nodes lot of data occurs. In extreme cases, jump table will degenerate into a single list. 

 

As a dynamic data structure, we need some means to maintain the original smooth between the index list size, that is, if the list of multi-point nodes, inode number is increased accordingly, to avoid the complexity of degradation, and search, insert, delete operations performance.

Skip list is maintained by the random function mentioned earlier  平衡性

When we insert the data into the jump table, can choose to be inserted into the first few levels of the index data, such as a random function generating values ​​of K, then we will add this node to the first stage to the second stage of the K K grade index. 

 

Random function can ensure the balance of the index size and the size of the jump table data, and will not excessive performance degradation.

Jump table to achieve a little more complicated, and not a jump table to achieve this focus. The main idea is to learn. 

Six, answer begins 

Ordered set Redis is achieved by jumping table, strictly speaking points, also uses the hash table ( on hash tables ), to see if Redis development manual, you will find Redis of an ordered collection of core operating support are the following these: 

  • Inserting a data
  • To delete a data
  • Find a data
  • Find accordance interval data (such as finding data between [100,356]) is
  • Ordered sequence of iterative output 

Among them, insert, search, delete and iterative output an ordered sequence of these operations, red-black tree can be completed, the time complexity and jump table is the same, but, in accordance with this section to find data operation, efficiency red-black tree no high jump table. 

For this operation to find data in accordance with the interval, jump table can be done ○ (㏒n) time complexity locate the beginning of the interval, and then back in the original list in order to traverse it. This is very efficient. 

Of course, there are other reasons, for example, jump table code easier to implement and readable less error-prone. Jump table is more flexible and can build strategies to effectively balance the efficiency and memory consumption by changing the index. 

But can not completely replace the jump table red-black tree. Because the earlier red-black tree appears. Many programming languages ​​Map types are implemented with the red-black tree. When writing business directly on the line, but there is no ready-made jump table implementation, development want to use the jump table, have their own implementation. 

Reference: the Redis Detailed (iv) ------ redis underlying data structure

Reference: talk Mysql index and redis jump table

Reference: principle redis five data structure analysis

Published 171 original articles · won praise 214 · views 190 000 +

Guess you like

Origin blog.csdn.net/mrlin6688/article/details/104741046