Talking about MySQL index B+ tree

Talking about MySQL index B+ tree

index definition

An index is a data structure used to help us quickly locate the data we want to find in a large amount of data. The most vivid metaphor of the index is the catalog of books.
Indexes are divided into three categories in the MySQL database

  • B+ tree index
  • Hash index
  • full text index

The B+ tree index used in the InnoDB storage engine most commonly encountered in work. To introduce the B+ tree index, you need to introduce the binary search tree, the three data structures of the balanced binary tree and the B tree; the B+ tree evolved from the three of them.

binary search tree

insert image description here
The circle in the figure is the node of the binary search tree, and the key (key) and data (data) are stored in the node. The key corresponds to the id in the user table, and the data corresponds to the row data in the user table.

The characteristic of the binary search tree is that the key value of the left child node of any node is smaller than the key value of the current node, and the key value of the right child node is greater than the key value of the current node. The node at the top is called the root node, and the nodes without children are called leaf nodes.

If we need to find user information with id=12, using the binary search tree index we created, the search process is as follows:

  • Take the root node as the current node, compare 12 with the key value 10 of the current node, 12 is greater than 10, and then we take the current node > the right child node as the current node.
  • Continue to compare 12 with the key value 13 of the current node, and find that 12 is smaller than 13, and take the left child node of the current node as the current node.
  • Comparing 12 with the key value 12 of the current node, 12 is equal to 12. If the condition is met, we take out the data from the current node, that is, id=12, name=xm.

Using binary search tree we only need 3 times to find matching data. If we search one by one in the table, we need 6 times to find it.

balanced binary tree

If the above binary search tree is structured as follows, we
insert image description here
can see that our binary search tree has become a linked list. If we need to find the user information with id=17, we need to search 7 times, which is equivalent to The whole table is scanned.

The reason for this phenomenon is that the binary search tree has become unbalanced, that is, the height is too high, which leads to unstable search efficiency.
In order to solve this problem, we need to ensure that the binary search tree is always balanced, so we need to use a balanced binary tree.
A flat binary tree is also called an AVL tree. On the basis of satisfying the characteristics of a binary search tree, it is required that the height difference between the left and right subtrees of each node cannot exceed 1.

The following is a comparison between a balanced binary tree and an unbalanced binary tree:
insert image description here
a balanced binary tree ensures that the structure of the tree is balanced, but when we insert or delete data that cannot satisfy the balanced binary tree, the balanced binary tree will adjust the nodes on the tree to maintain balance.
Compared with the binary search tree, the balanced binary tree has more stable search efficiency and faster overall search speed.

B-tree

Because of the volatility of memory. Under normal circumstances, we will choose to store the data and indexes in the user table in peripheral devices such as disks. However, compared with memory, the speed of reading data from disk will be hundreds of times, thousand times or even ten thousand times slower, so we should minimize the number of times to read data from disk. In addition, when reading data from the disk, it is read according to the disk block, not one by one. If we can put as much data as possible into disk blocks, more data will be read in one disk read operation, and the time for us to find data will be greatly reduced. If we use the tree data structure as the index data structure, then we need to read a node from the disk every time we search for data, which is what we call a disk block. We all know that a balanced binary tree stores only one key value and data per node. What does that mean? Explain that each disk block only stores a key value and data! So what if we want to store massive amounts of data?
It is conceivable that there will be many nodes in the binary tree, and the height will be extremely high. When we search for data, we will also perform a lot of disk IO, and the efficiency of our search for data will be extremely low!
insert image description hereIn order to solve this shortcoming of balanced binary tree, we should look for a balanced tree in which a single node can store multiple key values ​​and data. That is the B tree we are going to talk about next.

B-tree ( Balance Tree ) means a balanced tree. The following figure is a B-tree: the
insert image description here
p node in the figure is a pointer to a child node. There are also binary search trees and balanced binary trees, because of the beauty of the figure, is omitted.
Each node in the figure is called a page, and a page is the disk block we mentioned above. The basic unit of data reading in MySQL is a page, so what we call a page here is more in line with the underlying data structure of the index in MySQL.

As can be seen from the above figure, compared with the balanced binary tree, each node of the B tree stores more key values ​​(key) and data (data), and each node has more child nodes, and the number of child nodes is generally It is called the order, and the B-tree in the above figure is a 3rd-order B-tree, and its height will be very low.
Based on this feature, the number of times B-trees search for data and read disks will be very small, and the data search efficiency will be much higher than that of balanced binary trees.

If we want to find the user information with id=28, then the process of searching in the tree B in the above figure is as follows:

  • First find the root node, which is page 1, and judge that 28 is between the key value 17 and 35, then we find page 3 according to the pointer p2 in page 1.
  • Comparing 28 with the key value in page 3, 28 is between 26 and 30, we find page 8 according to the pointer p2 in page 3.
  • Comparing 28 with the key value in page 8, it is found that there is a matching key value 28, and the user information corresponding to key value 28 is (28, bv).

B+ tree

The B+ tree is a further optimization of the B tree. The B+ tree structure is as follows:
insert image description here
According to the above figure, let's see the difference between the B+ tree and the B tree:

  1. The non-leaf nodes of the B+ tree do not store data, only key values ​​are stored, while the B tree nodes not only store key values, but also store data.
    The reason for this is that the page size is fixed in the database, and the default page size in InnoDB is 16KB.
    If the data is not stored, then more key values ​​will be stored, the corresponding tree order (the child node tree of the node) will be larger, and the tree will be shorter and fatter, so that we search for data for disk storage The number of IOs will be reduced again, and the efficiency of data query will be faster.
    In addition, the order of the B+ tree is equal to the number of key values. If one node of our B+ tree can store 1000 key values, then the 3-layer B+ tree can store 1000×1000×1000=1 billion data.
    Generally, the root node is resident in memory , so generally we only need 2 disk IOs to search for 1 billion data.
  2. Because all the data of the B+ tree index are stored in the leaf nodes, and the data is arranged in order.
    Then the B+ tree makes range search, sort search, group search and deduplication search extremely simple. However, because the data of the B-tree is scattered in each node, it is not easy to achieve this.
    Interested readers may also find that the pages in the B+ tree in the above figure are connected through a doubly linked list, and the data in the leaf nodes are connected through a one-way linked list.
    In fact, we can also add a linked list to each node of the above B tree. These are not the differences they were before because that's how indexes are stored in MySQL's InnoDB storage engine.
    In other words, the B+ tree index in the above figure is the real implementation of the B+ tree index in InnoDB. To be precise, it should be a clustered index

As can be seen from the above figure, in InnoDB, we can find all the data in the table through the connection between the data pages through the doubly linked list and the data in the leaf nodes through the one-way linked list.
The B+ tree index implementation in MyISAM is slightly different from that in InnoDB. In MyISAM, the leaf nodes of the B+ tree index do not store data, but store the file address of the data.

Guess you like

Origin blog.csdn.net/weixin_45340300/article/details/127689918