B-tree (B- tree), B + tree

B-tree (B- Tree)


1, the basic concept of B- Tree (B tree)
the maximum number of child nodes of all the nodes in the tree becomes B- B- tree order, typically represented by m, from the viewpoint of searching efficiency generally requires m> = 3. An m-th order B- tree or an empty tree or tree satisfies the following condition m.
1) Each node has a maximum of m branches (subtree); a minimum number of branches depending on whether the root node, and if the root node is not a leaf node, it must have at least two branches, non-root non leaf node has at least ceil (m / 2) branches, where ceil representative of rounding up.
2) If a node has n-1 keyword, then the node has n branches. N-1 which are arranged in ascending order of keyword.
3) the structure of each node of:
    

Wherein n is the number of nodes in the keyword; keyword ki that satisfies the node and ki <ki + 1; key child node pointer and pi satisfy the indicated nodes that node pi greater than and less than the word ki ki + 1, p0 on keywords indicated node is less than k1, pn junction within the meaning of the keyword on the greater than kn.

4) different for each keyword and the like in the node in ascending order.
5) leaf nodes in the same layer; may be represented by a null pointer, a failure to find the location of the arrival.
NOTE: m balancing binary search tree refers to the absolute value of the difference between the height of the left and right subtree subtree for each keyword search tree is not more than 1, and the node structure mentioned above B- tree node the same configuration, we can see, B- m tree is balanced binary search tree, but more restricted, requiring all leaf nodes are in the same layer.
The above explanation may look at everyone on the B- tree is not so well understood, let's use an example to explain.
                 

 

 




The above picture shows a B- tree, the bottom of the leaf nodes are not shown. We interpreted by one of the five above-mentioned features:
      1) the number of branches equal to the number of nodes +1 keywords, the maximum number of branches is the order of B- Tree, so the most B- tree nodes of order m there are m branching, so you can see, the above order of a tree is a 5- B- tree.

       2) Because of the above step 5 is a B- Tree, so the non-root non-leaf node must have at least ceil (5/2) = 3 branches. Root can not satisfy this condition, the figure has two root branches.

      3) If no keywords no root branches, B- tree at this time is empty tree root node if there is a keyword, which is the number of branches is equal to or greater than 2, since the branch number is equal to the number of keywords +1 .
      4) in addition to the above figure the root node, the node number of the keyword is at least 2, at least 3 because the branch number, branch number by 1 than the number of keys can also be seen that the node key is orderly, and in the same layer, the left node is less than all of the keywords in the keyword on the right node, e.g., two nodes on the second layer, the key is the left node 15, 26 they were less than 45. the key 39 and the right node
          is a very important feature B- tree, the lower key value in the node always falls within an upper layer node key divided section, which falls within a specific range can be seen from the pointer to it. For example, a keyword in the second layer nodes leftmost divided three sections, 15, 15 to less than 26, greater than 26, it can be seen that the keywords in the lower leftmost point of less than 15 nodes, intermediate nodes the key point between 15 and 26, the keyword is greater than the right node 26. a
      5) in the figure above the leaf nodes on the fourth layer, the representative position search is unsuccessful.
2, B- tree lookup operation
to find B- Tree is simple, is extended binary sort tree, binary sort tree is to find Road, B- tree is a multiplexer to find, because the key within the tree node B- word are ordered, when the node in addition to find sequential search, but also can be used to enhance the efficiency of a binary search. B- tree to find the specific steps are as follows (assuming a keyword to find key):
      . 1) and let the root key of the keyword comparison, if the key is equal to k [i] (k [] within a node key group of words), the lookup succeeds
      2) when the key <k [1], then to p [0] indicated subtree continues to look for (P [] nodes within the array of pointers) to note here B- the internal structure of each tree node.
      3) If the key> k [n], the channel subtree p [n] indicated by the continue searching.
      4) If k [i] <key <k [i + 1], then the pointer to the p [I] indicated subtree continue searching.
      5) If the last encounter a null pointer, then prove that search is unsuccessful.
     Take the above example binary tree, for example, we want to find the key 42, the following figure shows the bold part of the Pathfinder:
              

3, B- tree insertion

Like the binary sort tree, the tree is also the process of creating B- keywords individually inserted into the process tree.
Before insertion, to determine what range of the number of keywords in each node, if the order of B- Tree is m, the range of the node number of the keywords is ceil (m / 2) -1 ~ m-1 th.
For the insert key, you need to find the insertion position. B- tree during the search, when the null pointer, then the proof search is unsuccessful, found also inserted position, i.e., the insertion position may be determined at the lowest level non-leaf node in accordance with a null pointer, for convenience, we call the bottom of the non-leaf node is a terminal node, we can see, insert B- tree node always falls on the terminal nodes. Possible damage during insertion B- tree features, such as inserting a new keyword in the keyword node number exceeds a predetermined number, which is to split the node.
Next, we create a sequence {keyword} 1,2,6,7,11,4,8,13,10,5,17,9,16,20,3,12,14,18,19,15 a 5-order B- tree, we will understand in detail the insertion process B- tree.
(1) determining the number of nodes in the scope of a keyword
due to the subject of the request to establish the fifth-order B- Tree, so the range of the number keys 2-4
(2) the root node can hold up to four keywords are sequentially inserted into the key 1,2,6,7 B- tree after word as shown below:

(3) When the key 11 is inserted when the number of nodes found at this time becomes the key 5, is out of range, need fission (per fission and out of the keywords to the parent node), the key to an intermediate position in the group of words, i.e. k [3] = 6, as a separate node, the new root node i.e., the key 6 the left and right key respectively made of two nodes, as the new root two branch nodes, the tree at this time as shown below:

 

(4)新关键字总是插在叶子结点上,插入关键字4、8、13之后树为:

 


(5)关键字10需要插入在关键字8和11之间,此时又会出现关键字个数超出范围的情况,因此需要进行裂变。并需要将裂变出来的关键字10纳入根结点中,并将10左右的关键字做成两个新的结点连在根结点上。插入关键字10并经过拆分操作后的B-树如下图:

   

 

 



(6)插入关键字5、17、9、16之后的B-树如图所示:

 

 (7)关键字20插入在关键字17以后,此时会造成结点关键字个数超出范围,需要裂变,方法同上,树为:


 

 



(8)按照上述步骤依次插入关键字3、12、14、18、19之后B-树如下图所示:

 



(9)插入最后一个关键字15,15应该插入在14之后,此时会出现关键字个数超出范围的情况,则需要进行裂变,将13并入根结点,13并入根结点之后,又使得根结点的关键字个数超出范围,需要再次进行裂变,将10作为新的根结点,并将10左、右关键字做成两个新结点连接到新根结点的指针上,这种插入一个关键字之后出现多次裂变的情况称为连锁反应,最终形成的B-树如下图所示:

 

 

 

4、B-树的删除
对于B-树关键字的删除,需要找到待删除的关键字,在结点中删除关键字的过程也有可能破坏B-树的特性,如旧关键字的删除可能使得结点中关键字的个数少于规定个数,这是可能需要向其兄弟结点借关键字或者和其孩子结点进行关键字的交换,也可能需要进行结点的合并,其中,和当前结点的孩子进行关键字交换的操作可以保证删除操作总是发生在终端结点上。
我们用刚刚生成的B-树作为例子,一次删除8、16、15、4这4个关键字。
(1)删除关键字8、16。关键字8在终端结点上,并且删除后其所在结点中关键字的个数不会少于2,因此可以直接删除。关键字16不在终端结点上,但是可以用17来覆盖16,然后将原来的17删除掉,这就是上面提到的和孩子结点进行关键字交换的操作。这里不能用15和16进行关键字交换,因为这样会导致15所在结点中关键字的个数小于2。因此,删除8和16之后B-树如下图所示:

 

(2)删除关键字15,15虽然也在终端结点上,但是不能直接删除,因为删除后当前结点中关键字的个数小于2。这是需要向其兄弟结点借关键字,显然应该向其右兄弟来借关键字,因为左兄弟的关键字个数已经是下限2.借关键字不能直接将18移到15所在的结点上,因为这样会使得15所在的结点上出现比17大的关键字,所以正确的借法应该是先用17覆盖15,在用18覆盖原来的17,最后删除原来的18,删除关键字15后的B-树如下图所示:

 

 



(3)删除关键字4,4在终端结点上,但是此时4所在的结点的关键字个数已经到下限,需要借关键字,不过可以看到其左右兄弟结点已经没有多余的关键字可借。所以就需要进行关键字的合并。可以先将关键字4删除,然后将关键字5、6、7、9进行合并作为一个结点链接在关键字3右边的指针上,也可以将关键字1、2、3、5合并作为一个结点链接在关键字6左边的指针上,如下图所示:

 

 



显然上述两种情况下都不满足B-树的规定,即出现了非根的双分支结点,需要继续进行合并,合并后的B-树如下图所示:

 

 

 

有时候删除的结点不在终端结点上,我们首先需要将其转化到终端结点上,然后再按上面的各种情况进行删除。在讲述这种情况下的删除方法之前,要引入一个相邻关键字的概念,对于不在终端结点的关键字a,它的相邻关键字为其左子树中值最大的关键字或者其右子树中值最小的关键字。找a的相邻关键字的方法为:沿着a的左指针来到其子树根结点,然后沿着根结点中最右端的关键字的右指针往下走,用同样的方法一直走到叶结点上,叶结点上的最右端的关键字即为a的相邻关键字(这里找的是a左边的相邻关键字,我们可以用同样的思路找到a右边的相邻关键字)。可以看到下图中a的相邻关键字是d和e,要删除关键字a,可以用d来取代a,然后按照上面的情况删除叶子结点上的d即可。
5、B-树的应用
为了将大型数据库文件存储在硬盘上,以减少访问硬盘次数为目的,在此提出了一种平衡多路查找树——B-树结构。由其性能分析可知它的检索效率是相当高的 为了提高 B-树性能’还有很多种B-树的变型,力图对B-树进行改进,比如B+树。

 

 

 

B+树

  

B+树
       B+树是B-树的变体,也是一种多路搜索树:

       1.其定义基本与B-树同,除了:

       2.非叶子结点的子树指针与关键字个数相同;

       3.非叶子结点的子树指针P[i],指向关键字值属于[K[i], K[i+1])的子树(B-树是开区间);

       5.为所有叶子结点增加一个链指针;

       6.所有关键字都在叶子结点出现;

       如:(M=3)

             

 

 



   B+的搜索与B-树也基本相同,区别是B+树只有达到叶子结点才命中(B-树可以在非叶子结点命中),其性能也等价于在关键字全集做一次二分查找;

       B+的特性:

       1.所有关键字都出现在叶子结点的链表中(稠密索引),且链表中的关键字恰好是有序的;

       2.不可能在非叶子结点命中;

       3.非叶子结点相当于是叶子结点的索引(稀疏索引),叶子结点相当于是存储(关键字)数据的数据层;

       4.更适合文件索引系统;

  

那么MySQL更适合哪个?B-Tree or B+Tree?

     在讲这两种数据结构在数据库中的选择之前,我们还需要了解的一个知识点是操作系统从磁盘读取数据到内存是以磁盘块(block)为基本单位的,位于同一个磁盘块中的数据会被一次性读取出来,而不是需要什么取什么。即使只需要一个字节,磁盘也会从这个位置开始,顺序向后读取一定长度的数据放入内存。这样做的理论依据是计算机科学中著名的局部性原理: 当一个数据被用到时,其附近的数据也通常会马上被使用。

  预读的长度一般为页(page)的整倍数。页是计算机管理存储器的逻辑块,硬件及操作系统往往将主存和磁盘存储区分割为连续的大小相等的块,每个存储块称为一页(在许多操作系统中,页得大小通常为4k)。

  B-Tree和B+Tree该如何选择呢?都有哪些优劣呢?
  1、B-Tree因为非叶子结点也保存具体数据,所以在查找某个关键字的时候找到即可返回。而B+Tree所有的数据都在叶子结点,每次查找都得到叶子结点。所以在同样高度的B-Tree和B+Tree中,B-Tree查找某个关键字的效率更高。
  2、由于B+Tree所有的数据都在叶子结点,并且结点之间有指针连接,在找大于某个关键字或者小于某个关键字的数据的时候,B+Tree只需要找到该关键字然后沿着链表遍历就可以了,而B-Tree还需要遍历该关键字结点的根结点去搜索。
  3、由于B-Tree的每个结点(这里的结点可以理解为一个数据页)都存储主键+实际数据,而B+Tree非叶子结点只存储关键字信息,而每个页的大小有限是有限的,所以同一页能存储的B-Tree的数据会比B+Tree存储的更少。这样同样总量的数据,B-Tree的深度会更大,增大查询时的磁盘I/O次数,进而影响查询效率。
  鉴于以上的比较,所以在常用的关系型数据库中,都是选择B+Tree的数据结构来存储数据!下面我们以mysql的innodb存储引擎为例讲解,其他类似sqlserver、oracle的原理类似!

 本文摘自:https://blog.csdn.net/qq_28584889/article/details/88777393

 

 

 

 

Guess you like

Origin www.cnblogs.com/ljl150/p/11996044.html