B-Tree tree

B-Tree is the B-tree we often say, and it must not be read as B-minus tree, otherwise it will be very embarrassing. The B-tree data structure is often used to implement database indexes because of its high search efficiency.

Disk IO and read ahead

Disk reading relies on mechanical motion, which is divided into three parts: seek time, rotation delay, and transmission time. The time-consuming addition of these three parts is the time of one disk IO, about 9ms. This cost is about 100,000 times that of accessing memory; it is precisely because disk IO is a very expensive operation, so the computer operating system optimizes it: read ahead; each IO, not only loads the data of the current disk address into memory, while also loading adjacent data into the memory buffer. Because the principle of local pre-reading shows that when accessing data at an address, the adjacent data will also be accessed soon. The data read each time disk IO is called a page. The size of a page is related to the operating system, generally 4k or 8k. This also means that when reading data in a page, a disk IO actually occurs.

Comparison of B-Tree and Binary Search Tree

  We know that the time complexity of a binary search tree query is O(logN), with the fastest search speed and the least number of comparisons. Since the performance is already so good, why is the index implemented using B-Tree instead of a binary search tree, the key factor is the number of disk IOs .

The database index is stored on the disk. When the amount of data in the table is relatively large, the size of the index also increases, reaching several gigabytes or even more. When we use the index to query, it is impossible to load all the indexes into the memory, and we can only load each disk page one by one, and the disk pages here correspond to the nodes of the index tree.

1. Binary tree

Let's first look at the time of disk IO during binary tree search: define a binary tree with a tree height of 4 and a search value of 10:

                                                            

 

The first disk IO:

                         

 

 

 Second disk IO

                           

 

The third disk IO:

                             

 

Fourth disk IO:

                                   

From the search process of the binary tree, the height of the tree and the number of disk IOs are both 4, so in the worst case, the number of disk IOs is determined by the height of the tree.

From the previous analysis, to reduce the number of disk IOs, it is necessary to compress the height of the tree, so that the thin and tall tree can be turned into a squat tree as much as possible, so B-Tree was born in the context of such a great era.

二、B-Tree

m阶B-Tree满足以下条件:

1、每个节点最多拥有m个子树

2、根节点至少有2个子树

3、分支节点至少拥有m/2颗子树(除根节点和叶子节点外都是分支节点)

4、所有叶子节点都在同一层、每个节点最多可以有m-1个key,并且以升序排列

 如下有一个3阶的B树,观察查找元素21的过程:

                                                                              

第一次磁盘IO:     

                                                           

第二次磁盘IO:

                                                  

这里有一次内存比对:分别跟3与12比对

第三次磁盘IO:

                                                     

这里有一次内存比对,分别跟14与21比对

从查找过程中发现,B树的比对次数和磁盘IO的次数与二叉树相差不了多少,所以这样看来并没有什么优势。

但是仔细一看会发现,比对是在内存中完成中,不涉及到磁盘IO,耗时可以忽略不计。另外B树种一个节点中可以存放很多的key(个数由树阶决定)。

相同数量的key在B树中生成的节点要远远少于二叉树中的节点,相差的节点数量就等同于磁盘IO的次数。这样到达一定数量后,性能的差异就显现出来了。

 三、B树的新增

在刚才的基础上新增元素4,它应该在3与9之间:

                                 

                                     

                                     

 

四、B树的删除

 删除元素9:

                                  

 

                                    

五、总结

  插入或者删除元素都会导致节点发生裂变反应,有时候会非常麻烦,但正因为如此才让B树能够始终保持多路平衡,这也是B树自身的一个优势:自平衡;B树主要应用于文件系统以及部分数据库索引,如MongoDB,大部分关系型数据库索引则是使用B+树实现。

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326660478&siteId=291194637