To answer a question that I have never understood, is the wider the B+ tree, the better?

This problem is caused by the data storage structure of the mysql database. Mysql uses the B+ tree structure to save data. The main reason is that the B+ tree query efficiency is stable. The maximum number of IOs is the height of the tree, and the B+ tree of the same height is more efficient than the B+ tree. The tree saves more data, so the question arises. When the height of the B+ tree is constant, is the wider the B+ tree, the better? To answer this question, we first need to understand the impact of increasing the width of the B+ tree.

First of all, increasing the width of the tree will affect the size of the data. When the width of the tree increases, it means that the subtrees have increased, and the stored data will also increase. Will the increase in the amount of data affect the query efficiency? The answer is No, because the non-leaf nodes of the B+ tree store the index and the pointer of the next-level node. When querying, the root node will be read first, the address of the next-level node will be directly determined based on the index comparison, and then the next level node will be read. level nodes, and then compare until you find the leaf node, for example, you need to query 59

 

1) When accessing the disk for the first time, I accessed the first layer and found the ID values ​​59 and 97 of the key (primary key), and the 59 accessed is the maximum number of left nodes and 97 is the maximum number of right nodes. Determine the Whether the accessed element is less than or equal to 59, if yes, go to the left node to reach the second level

2) During the second layer access, it was found that the id values ​​of the access key were 15, 44, and 59, and the keyid value 59 was found to be greater than 44 and less than or equal to 59 (binary search is used here), so the third child node was accessed

3) When accessing the third-layer leaf node, find the keyid values ​​51 and 59, do a sequential search, and then traverse the search internally, and then find the data corresponding to the id value of the key.

Therefore, the size of the data volume and the width of the tree will not affect the efficiency of the query.

In this case, wouldn't the capacity of the database be infinite if the width of the tree increased infinitely? Of course not, what is the maximum width? This starts with the size of a single node. When reading node information, it actually reads from the disk. Load data into memory. One thing to know here is that when memory is loaded, it is loaded according to pages. One page is 16k by default, and a node of the B+ tree occupies one page of memory by default. This is because one page basically satisfies Most of the data requirements are met. If it occupies two pages, two IOs are required each time the node is read, which not only reduces query efficiency but also increases memory waste. Therefore, the size of a node in the B+ tree is the same as the page size, which is 16k. By combining this number with the size of the data, you can basically calculate how much data each node can save. For example, if calculated according to the bigint index, the key value occupies 8 bytes. The index pointer occupies 6 bytes, which is 14 bytes. That root node can store up to 16*1024/14=1170 indexes, so the second level of the B+ tree is the widest, which is 1170 subtrees, and the third level is the widest. 1170*1170 nodes

Conclusion: When the height of the B+ tree is constant, the wider the width, the more data can be saved, and the disk can be fully utilized, reducing waste, and will not affect the query efficiency, but the width has an upper limit. When the maximum width is reached can no longer increase the time

Guess you like

Origin blog.csdn.net/weixin_45087884/article/details/131082433