Mysql - Why does the index use B+ tree instead of B tree

The location of the index on the computer

Generally speaking, the index itself is too large to be stored in the memory. It is generally stored on the disk in the form of a file, and index retrieval requires disk I/O operations. Judging the pros and cons of a data structure as an index mainly depends on the progressive complexity of disk IO during the query process. A good index should minimize the number of disk IO operations .

Why use B+ tree

1. B-tree is only suitable for random retrieval, while B+ tree supports both random retrieval and sequential retrieval ;
2. B+ tree has higher space utilization

Because the internal nodes of the B+ tree (non-leaf nodes, also called index nodes) do not store data, only the index value, compared with the B tree, a node of the B+ tree can store more index values, making the whole B+ tree become The number of I/Os is reduced, and the cost of disk read and write is lower. The number of I/O reads and writes is the biggest factor affecting the efficiency of index retrieval;
3. B+ tree query efficiency is more stable

Because in the B+ tree, the sequential retrieval is more obvious . In the random retrieval, since all the data domains of the B+ tree (the part storing data elements in the node) are all at the root node, any keyword search must take a path from the root node to In the path of leaf nodes , the search paths of all keywords are the same , so that the query efficiency of each keyword is basically the same, and the time complexity is fixed at O(log n) ; while the B-tree search may end at non-leaf nodes, approximately close to The shorter the record search time of the root node, the performance is equivalent to doing a binary search in the complete set of keywords. The query time complexity is not fixed, it is related to the position of the key in the tree, and the best case is O(1);
4 ,  B+ tree range query performance is better 

Because the leaf nodes of the B+ tree use pointer order (linked list) to connect together from small to large, the connection of B+ tree leaf nodes in pairs can greatly increase the accessibility of the interval, as long as the leaf nodes are traversed, the entire tree can be traversed, while the B tree The leaf nodes are independent of each other, and each node key (index) and data are together, so the interval cannot be searched;

[According to the principle of spatial locality : if a certain location of a memory is accessed, then the location near it will also be accessed]

If the access node key is 50, the nodes whose keys are 55, 60, and 62 may also be accessed in the future, and these data can be read into memory in advance by using the disk pre-reading principle, reducing the number of disk IOs. Of course, the B+ tree can also complete the range query very well, for example, it will also query the nodes whose key values ​​are between 50-70.

5. B+ tree is more efficient when adding and deleting files (nodes)

Because the leaf nodes of the B+ tree contain all keywords and are stored in an ordered linked list structure .

Explanation: How many rows of data can be stored in a B+ tree of InnoDB? About 20 million.

B tree [ Balance  --- "multi-way balanced search tree", the height difference of the subtrees of any node is less than or equal to 1]

B-tree has the following characteristics:

1. Each node stores key and data, all nodes form this tree, and the leaf node pointer is null;

2. Any keyword appears and only appears in one node;

3. The search may end at a non-leaf node (the best case is O(1) to find the data);

4. Do a search in the complete collection of keywords, and the performance is close to binary search.

B+ tree [ improved version of B tree , let internal nodes (non-leaf nodes ) only be used as indexes, leaf nodes contain all key values ​​of this tree, leaf nodes do not store pointers]

B+ tree has the following characteristics:

1. Only leaf nodes store data, including all key values ​​of the tree, and leaf nodes do not store pointers. (Non-leaf nodes only store index values, without actual data, not real data);

2. The sequential access pointer is added, that is, each leaf node adds a pointer to the adjacent leaf node. Such a tree becomes the preferred data structure for the database system to realize the index --- B+ tree.

the difference

1) The data of the B+ tree is only stored on the leaf nodes , and all nodes of the B tree store key and data

The non-leaf nodes of the B+ tree do not store data, so a node can store more index values, which can make the tree shorter (smaller in height), so the number of IO operations is less.

2) All leaf nodes of the B+ tree form an ordered linked list , and all records can be traversed in order according to the order of key code sorting

Since the data is arranged in sequence and connected, it is convenient for range search and search. The B-tree requires recursive traversal of each layer, and adjacent elements may not be adjacent in memory, so the cache hit performance is not as good as that of the B+ tree.

Guess you like

Origin blog.csdn.net/qq_23375733/article/details/128100863