Why does the index structure in MYSQL use B+ tree instead of B tree?

introduction

Regarding the question of why the index structure of MYSQL is B+ tree instead of B tree (B-tree also becomes B tree), this question may be often asked during the interview process, so I will summarize it here and publish a blog. I hope to see Someone on this blog who encountered this question in an interview can do a good job of answering it.

The definition of a B-tree

For a B-tree of order m , the definition is as follows:

  • Each node has at most m child nodes.
  • Except for the root node, each non-leaf node has at least m/2 (rounded down) child nodes.
  • A root node that is not a leaf node has at least two child nodes.
  • A non-leaf node with k subtrees has k-1 keys, and the keys are arranged in increasing order.
  • The leaf nodes are all in the same layer. 

Definition of two B+ trees

For a B+ tree of order m , the definition is as follows:

  • Each node has at most m child nodes.
  • Except for the root node, each non-leaf node has at least m/2 (rounded down) child nodes.
  • A root node that is not a leaf node has at least two child nodes.
  • A non-leaf node with k subtrees has k keys, and the keys are arranged in increasing order.
  • The leaf nodes are all in the same layer, and the leaf nodes are connected through a linked list . (It can be bidirectional or unidirectional)
  • Non-leaf nodes do not save data, only index values ​​are saved, and all data is saved by leaf nodes
  • All intermediate node elements also exist in child nodes, which are the largest (or smallest) elements in the child node elements.

Three reasons for using B+ tree

1. A node can save more keys, which can eventually make the height of the tree relatively shorter, reduce the number of hard disk IO accesses during query, and improve efficiency. (B and B+ trees are the same here)

2. The non-leaf nodes of the B+ tree do not save data, only the index is saved, which greatly reduces the space of the non-leaf nodes of the data that uses the B+ tree as the storage structure . This can also query data in the database At the same time, you can load some or all of these indexes into the memory instead of querying on the hard disk, that is, convert a large number of hard disk IO operations into memory IO operations, and at the same time, the speed of CPU reading memory is at least several thousand times faster than reading hard disk . Therefore, this can greatly improve the access speed, thereby improving the efficiency of database operation.

 If there are 1 billion pieces of data, if the integer int is used as the index, the index is only about 4G, and if the storage structure is like a B-tree, it is difficult to load the 1 billion pieces of data into the memory at once.

3. Range query in MYSQL is a relatively common operation, and all data of B+ tree are stored on the leaf nodes, and the leaf nodes of B+ tree in MYSQL are connected by doubly linked list , so when performing range query, you only need to search The nodes of the two endpoints can be traversed. That is, the B+ tree is very efficient in range query . If the B tree wants to perform range query, it needs to traverse this number all the time, and the efficiency is much worse than that of the B+ tree.

4.  In terms of data retrieval, since all the data of the B+ tree are stored on the leaf nodes, the number of IOs for any piece of data in the B+ tree is the same ; on the contrary, the B tree is unstable because we store the data The depth of the node is not fixed. (Don't underestimate this stability, for a program, stability is crucial)

5. Because the leaf nodes of the B+ tree store all the data, the global scanning capability of the B+ tree is stronger , because it only needs to scan the leaf nodes, but the B tree needs to traverse the entire tree to perform global scanning.

Four summary

In general, the choice of technical solutions is to solve specific problems in the current scenario, not necessarily for all databases, B+ tree is the best choice , just like non-relational databases mostly use B tree as storage The structure of the data, and relational databases like MYSQL more often use B+ trees.

If this blog is helpful to you, please help the blogger to like and collect it, thank you very much!

Guess you like

Origin blog.csdn.net/m0_70322372/article/details/129786462