MySQL (InnoDB analysis): ---InnoDB index overview, data structure and algorithm overview (binary search, binary search tree, balanced binary tree, B+ tree)

1. Index overview

  • Indexing is an important aspect of application design and development. If there are too many indexes, the performance of the application may be affected. Too few indexes will have an impact on query performance. To find a balance

Two, InnoDB storage engine index overview

  • InnoDB supports the following common indexes:
    • B+ tree index
    • Full-text index
    • Hash index
  • As mentioned earlier, the hash index supported by InnoDB is adaptive . The InnoDB storage engine will automatically generate a hash index for the table according to the usage of the table. It is not possible to manually intervene whether to generate a hash index in a table.
  • The B+ tree index is an index in the traditional sense , which is currently the most common and effective index for searching in relational databases. The structure of the B+ tree index is similar to the binary tree, and the data can be found quickly according to the key value

  • Note: The B+ tree index cannot find a specific row with a given key value. The B+ tree index can find only the page where the data row is searched. Then the database reads the page into the memory, and then searches in the memory, and finally gets the searched data

Three, binary search method

  • Binary search is also called binary search. It is used to find a record in an ordered array of records. The basic idea is to arrange the records in order (increasing or decreasing). In the process, a jump search method is adopted, that is, the midpoint position of the ordered sequence is the comparison object. If the value of the element to be found is less than the midpoint element, the sequence to be searched is reduced to the left half, otherwise it is the right half. . Through a comparison, the search interval is reduced by half

Demo case

  • If there are 10 numbers of 5, 10, 19, 21, 31, 37, 42, 48, 50, 52, now we need to find 48 this record from these 10 trees. The search process is as follows:

  • As you can see from the picture above, you can find it for 3 here. If it is a sequential search, 8 times are required. If you want to find the record of 5, the sequential search only needs 1 time, and the binary search method needs 4 times
  • For the above 10 numbers, the average number of searches is (1+2+3+4+5+6+7+8+9+10)/10=5.5 times. The binary search method is (4+3+2+4+3+1+4+3+2+3)/10=2.9 times. In the worst case, the number of sequential searches is 10, and the number of binary searches is 4
  • The binary search method is widely used. As mentioned earlier, the slots in each page of the PAGE Directory are stored in the order of the primary key, and the query for each specific record is performed by binary search on the Page Directory

 

Four, binary search tree

  • definition:
    • Binary search tree is a binary tree
    • The key value of the left subtree is less than the key value of the parent node
    • The key value of the right subtree is greater than the key value of the parent node
  • For example, the following is a binary search tree

Find complexity

  • Find 5 this node:
    • Then first follow the search, then the left subtree 3, then the right subtree 5, and finally find it. I found it 3 times
    • It also takes 3 times if it is traversed through the middle order
  • Find 8 this node:
    • First look for 6, then look for 7, then look for 8, and finally find it. I found it 3 times
    • It takes 6 times to traverse through the middle order
  • to sum up:
    • The average number of searches for a binary tree is (3+3+3+2+2+1)/6=2.3 times
    • The number of searches in the middle order traversal is (1+2+3+4+5+6)/6=3.3 times
    • Therefore, the average search speed of the binary search tree is faster

Five, balanced binary tree

  • The balanced binary tree is a tree improved based on the binary search tree. For example, we have nodes 2, 3, 5, 6, 7, and 8. The binary search tree is built through the structure shown in the figure below. The average search times is (1+2+3+4+5+5)/6=3.16 times. Therefore, the search efficiency is relatively low

  • Definition of balanced binary tree:
    • Is also a binary search tree
    • But the height difference between the left and right subtrees is at most 1, and cannot exceed 1. If it exceeds 1, then it will be out of balance.

Single rotation

  • Insert node 9 into the balanced binary search tree shown in the figure above, then it is not balanced, because the height difference between the left and right subtrees of node 7 is 2

  • So you need to do a single spin to get back to balance

Double rotation

  • Insert node 3 into the balanced binary search tree shown in the figure above, then it is not balanced, because the height difference between the left and right subtrees of node 2 is 2.

  • So you need to do a double rotation to get back to balance

Six, B+ tree

  • The B+ tree, like the binary tree and the balanced binary tree, is a classic data structure. The B+ tree evolved from the B tree and the index sequential access method (ISAM, which is the data structure originally referred to by the MyISAM engine), but the B number is almost no longer used in the implementation process
  • Below we briefly describe the B+ number: B+ tree is a balanced search tree designed for disks or other direct storage auxiliary devices, connected by the pointers of each leaf node. First look at a B+ number, its height is 2, each page can store 4 records, the fan out is 5, as shown in the figure below
  • As can be seen from the figure below, all records are stored on the leaf nodes and stored sequentially. If the leaf node on the leftmost seat of the user starts to traverse sequentially, the order of all the key values ​​can be obtained: 5, 10, 15, 20, 25, 30, 50, 55, 60, 65, 75, 80, 85, 90

Insert operation of B+ tree

  • The insertion of the B+ tree must ensure that the records in the leaf nodes are still sorted after insertion, and at the same time, three situations of inserting into the B+ tree need to be considered. Each situation may lead to a different insertion algorithm. As shown below:

Inserts where Leaf Page is not full and Index Page is not full

  • As shown in the above B+ tree, if the user inserts the key value of 28, because the Leag Page and Index Page are not full, it can be inserted directly, and the result shown in the figure below is obtained.

Inserts where Leaf Page is full and Index Page is not full

  • Following the above figure, we insert the key value of 70. At this time, the Leaf Page is full but the Index Page is not full. After inserting, the Leaf Page status is: 50, 55, 60, 65, 70. At this time, the intermediate node is 60. 60 to split leaf nodes
  • Insert according to the following rules at this time:
    • Check the Leaf Page
    • Put the middle node (60) into the Index Page
    • The records smaller than the middle node (50, 55) are placed on the left
    • The records (65, 70) greater than or equal to the middle node are placed on the right
  • The final result is as shown in the figure below (Note: The figure below does not add a doubly linked list pointer to each leaf node, but like the figure above, it does exist):

Inserts where Leaf Page is full and Index Page is full

  • Following the above figure, we insert the key value of 95. At this time, both Leag Page and Index Page are full, so two splits are required. The execution steps are as follows:
    • Split Leaf Page
    • The records smaller than the middle node are placed on the left
    • The records greater than or equal to the middle node are placed on the right
    • Split Index Page
    • The records smaller than the middle node are placed on the left
    • The records larger than the middle node are placed on the right
    • Put the intermediate node into the index page of the previous layer

Rotate operation

  • As you can see from the above, no matter how the B+ tree changes, it will eventually be balanced. Because the B+ tree will continue to split pages. However, B+ numbers are mainly used for disks, and page splitting means disk operations. Therefore, page splitting operations should be minimized when possible. Therefore, B+ trees also provide rotation operations similar to balanced binary trees.
  • principle:

    • Rotation occurs when the Leaf Page is full, but its left and right sibling nodes are not full
    • At this time, the B+ tree will not be eager to split the page, but will move the record to the sibling node of the page.
    • Under normal circumstances, the left sibling will be checked first for rotation operations
  • Let's take a look at the above "Leag Page is not full, Index Page is not full insertion", as shown in the following figure:

  • If you insert a key value of 70, in fact, the B+ tree will not rush to split the leaf nodes, but will do a rotation operation, and get the operation shown in the figure below.

Delete operation of B+ tree

  • B+ tree uses fill factor to control the deletion and change of the tree , 50% is the minimum value that the fill factor can be set
  • The deletion operation of the B+ tree must also ensure that the records in the leaf nodes are still sorted after deletion
  • Same as insertion, the deletion operation of B+ tree also needs to consider the following three situations. The difference from insertion is that deletion is measured according to the change of the fill factor

The first case of deletion in the table (the deleted node is a leaf node) :

  • For example, according to the above figure, we want to delete the record 70. Because 70 is a leaf node, you can delete this leaf node directly
  • The final result is as follows

The first case of deleting from the table (deleting a node that is not a leaf node) :

  • Following the above figure, we delete the record with a key value of 25, but the value is still the value in the Index Page, so after deleting it, we need to replace it with its right sibling node (2). The final result is as follows

The first case deleted from the table

  • Next to the figure above, we want to delete the 60 node
  • After deleting the record with the key value of 60 in the Leaf Page, the File Factor is less than 50%. At this time, a merge operation is required. Similarly, after deleting the related records in the Index Page, the Index Page merge operation is required. The final result is as follows:

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/m0_46405589/article/details/113779318