B-Tree

B-Tree is the B-tree we often say, and it must not be read as B-minus tree, otherwise it will be very embarrassing. The B-tree data structure is often used to implement database indexes because of its high search efficiency.

Disk IO and read ahead

Disk reading relies on mechanical motion, which is divided into three parts: seek time, rotation delay, and transmission time. The time-consuming addition of these three parts is the time of one disk IO, about 9ms. This cost is about 100,000 times that of accessing memory; it is precisely because disk IO is a very expensive operation, so the computer operating system optimizes it: read ahead; each IO, not only loads the data of the current disk address into memory, while also loading adjacent data into the memory buffer. Because the principle of local pre-reading shows that when accessing data at an address, the adjacent data will also be accessed soon. The data read each time disk IO is called a page. The size of a page is related to the operating system, generally 4k or 8k. This also means that when reading data in a page, a disk IO actually occurs.

Comparison of B-Tree and Binary Search Tree

  We know that the time complexity of binary search tree query is O(logN), with the fastest search speed and the least number of comparisons. Since the performance is already so good, why is the index implemented using B-Tree instead of binary search tree, the key factor is the number of disk IOs.

The database index is stored on the disk. When the amount of data in the table is relatively large, the size of the index also increases, reaching several gigabytes or even more. When we use the index to query, it is impossible to load all the indexes into the memory, and we can only load each disk page one by one, and the disk pages here correspond to the nodes of the index tree.

1. Binary tree

Let's first look at the time of disk IO during binary tree search: define a binary tree with a tree height of 4 and a search value of 10:

                                                            

 

The first disk IO:

                         

 

 

 Second disk IO

                           

 

The third disk IO:

                             

 

Fourth disk IO:

                                   

From the search process of the binary tree, the height of the tree and the number of disk IO are both 4, so in the worst case, the number of disk IO is determined by the height of the tree.

From the previous analysis, to reduce the number of disk IOs, it is necessary to compress the height of the tree, so that the thin and tall tree can be turned into a squat tree as much as possible, so B-Tree was born in the context of such a great era.

2. B-Tree

The m-order B-Tree satisfies the following conditions:

1. Each node has at most m subtrees

2. The root node has at least 2 subtrees

3. The branch node has at least m/2 subtrees (all branch nodes except the root node and leaf node)

4. All leaf nodes are in the same layer, each node can have at most m-1 keys, and they are arranged in ascending order

 There is a B-tree of order 3 as follows, and observe the process of finding element 21:

                                                                              

The first disk IO:     

                                                           

The second disk IO:

                                                  

Here is a memory comparison: compare with 3 and 12 respectively

The third disk IO:

                                                     

Here is a memory comparison, which is compared with 14 and 21 respectively

From the search process, it is found that the number of comparisons of the B tree and the number of disk IOs are not much different from the binary tree, so it seems that there is no advantage.

However, if you look closely, you will find that the comparison is done in memory and does not involve disk IO, and the time-consuming can be ignored. In addition, a node of the B tree species can store a lot of keys (the number is determined by the tree level).

The number of nodes generated in the B-tree with the same number of keys is far less than the number of nodes in the binary tree, and the difference in the number of nodes is equivalent to the number of disk IOs. After reaching a certain number in this way, the difference in performance becomes apparent.

3. The addition of B-tree

Add element 4 on the basis of just now, it should be between 3 and 9:

                                 

                                     

                                     

 

Fourth, the deletion of B-tree

 Remove element 9:

                                  

 

                                    

V. Summary

  Inserting or deleting elements will cause a fission reaction of the node, which is sometimes very troublesome, but because of this, the B-tree can always maintain multi-way balance, which is also an advantage of the B-tree itself: self-balancing; B-tree is mainly used in files System and some database indexes, such as MongoDB, most relational database indexes are implemented using B+ trees.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325272320&siteId=291194637