The use of B+ tree in mysql database index

One: A B-tree is a balanced multiway lookup tree that is useful in filesystems. 
Definition: A B-tree of order m, either an empty tree, or an m-ary tree satisfying the following properties: 
(1) Each node in the tree has at most m subtrees. 
(2) If the root node is not a leaf node, there are at least two subtrees. 
(3) All non-leaf nodes except the root node have at least [m/2] subtrees; 
⑷ All non-terminal nodes contain the following information data: (n, A0, K1, A1, K2, ..., Kn, An) 
Among them: n is the number of keys, Ki (i=1,2,…,n) is the key and Ki<Ki+1, Ai is the pointer to the root node of the subtree (i=0,1,… ,n), and the keys of all nodes in the subtree pointed to by the pointer Ai-1 are less than Ki and greater than Ki-1. 
⑸ All leaf nodes appear at the same level and have no information (in fact, these nodes do not exist, and the pointers to these nodes are empty). That is, all leaf nodes have the same depth, which is equal to the tree height. 
For example, a fourth-order B-tree has a depth of 4, as shown in the following figure: 
 
 
The search of the B-tree is similar to the search of the binary sorting tree. The difference is that each node of the B-tree is an ordered list of multi-key keys. When a node is reached, it is first searched in the ordered list. If it is found, the search is successful; otherwise, it is searched in the subtree pointed to by the corresponding pointer information. When the leaf node is reached, it means that there is no corresponding key in the tree. 
 
The process of finding the keyword 47 on the B-tree in the above figure is as follows: 
1) First, start from the beginning, and find the *a node according to the root node pointer, because there is only one keyword in the *a node, and the given value 47 > keyword 35, if it exists, it must be in the subtree pointed to by the pointer A1. 
2) Find the *c node along the pointer, this node has two keywords (43 and 78), and 43 < 47 < 78, if it exists, it is in the subtree pointed to by the pointer A1. 
3) Similarly, find the *g node along the pointer, find the keyword 47 at this node, and the search is successful. 
 
Two: The B+ tree is a variant of the B-tree that is generated in response to the needs of the file system. 
The difference between a B+ tree of order m and a B-tree of order m is: 
(1) A node with n subtrees contains n keys; 
(2) All leaf nodes contain all key information and pointers to records containing these keys, and the leaf nodes themselves form a sequential linked list according to the size of the key. 
(3) The process of random search, insertion and deletion on B+ tree is basically similar to B-tree. Just when searching, if the key on the non-leaf node is equal to the given value, it does not stop, but continues down to the leaf node. Therefore, in the B+ tree, no matter whether the search is successful or not, each search is a path from the root node to the leaf node. 
 
 
 
 
As shown in a 3rd-order B+ tree: there are usually two head pointers on the B+ tree, one points to the root node, and the other points to the leaf node with the smallest keyword. Therefore, two search operations can be performed on the B+ tree: one is to search sequentially from the smallest key, and the other is to search randomly from the root node. 
 
Three: Let's introduce the typical application of B+ tree in database index. 
Database indexes are often implemented with B+ trees. Database indexes are divided into two types: clustered index (also called clustered index) and non-clustered index (non-clustered index). 
The InnoDB engine uses clustered indexes. The so-called clustered index: that is, a complete data record is stored in the data field of the leaf node of the B+ tree. 
For the clustered index, it is divided into primary key index and auxiliary index according to the different keywords in the B+ tree. Using the primary key attribute as the key of the B+ tree is the primary key index, and using the non-primary key attribute as the key of the B+ tree is the auxiliary index. As shown below: 
 
 
The above picture is the InnoDB primary key index 
 
 
The above picture is the InnoDB auxiliary index 
 
The implementation of the clustered index makes the search by the primary key very efficient, but the secondary index search needs to retrieve the index twice: first, the secondary index is retrieved to obtain the primary key, and then the primary key is used to retrieve the records in the primary index. It can also be understood from the above data structure that the shorter the field of the attribute column used for indexing, the better the performance. 
 
 
The MyISAM engine uses a non-clustered index. The so-called non-clustered index: that is, the address of the data record is stored in the data field of the leaf node of the B+ tree. According to the different keywords in the B+ tree, it can also be divided into primary key index and auxiliary index. as the picture shows: 
 
 
The above picture is the MyISAM primary key index 
 
 
The above picture is the MyISAM auxiliary index 
 
Differences between InnoDB indexes and MyISAM indexes: 
1) InnoDB is a clustered index, and the data field of the leaf node of the B+ tree stores complete data records. MyISAM is a non-clustered index, and the data field of the leaf node of the B+ tree stores the address of the data record. 
2) In the auxiliary index of InnoDB, the corresponding primary key is stored in the data field of the leaf node of the B+ tree, so the use of the auxiliary index in InnoDB needs to be searched on both sides. For MyISAM indexes, both the primary key index and the auxiliary index pair need only one search. 
3) For a clustered index, because the complete row information is stored in the data field of its primary key index, the row information will not be stored separately, which is why the primary key value is stored in the data field of its auxiliary index. . For a non-clustered index, the data fields of its primary key index and auxiliary index store the address of row information, so separate space is required to store row information. As shown below: 
 
 
 
B-Tree indexes are suitable for full key-value, key-value or prefix lookups. where key prefix lookup only works for lookups based on the most prefix. The indexes described earlier are valid for the following types of queries: 
1. Full value matching: Full value matching refers to matching all columns in the index. 
2. Match the leftmost prefix: that is, only match the leftmost column of the index. 
3. Match part of the leftmost prefix: That is, only match part of the leftmost column of the index. 
4. Match range value: Specifies the range of the leftmost column of the index. 
5. Exactly match the first few columns and range match the next column 
 
Summarizing the above description, the limitations of B-Tree indexes can be summarized: 
1. You cannot use an index if you do not start the search by the leftmost column of the index. 
2. Columns in the index cannot be skipped when querying. 
3. If there is a range query of a column in the query, all columns to the right of it cannot be searched using index optimization. 
Readers should understand that the order of indexed columns is very important. When optimizing performance, indexes on the same columns but in different orders can be established to meet different needs. 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325979171&siteId=291194637