Detailed B-tree (two)

introduction

In the part of B-tree detailed explanation (1) , we start from the database index and compare the search efficiency of binary search tree and B-Tree. B-Tree improves search efficiency by reducing disk IO. In the first part, we have a certain understanding of the definition and the whole of the B-tree, the following will focus on the specific process of B-Tree search, insertion, and deletion.

definition

B-tree is a balanced multi-branch tree, usually we say m-order B-tree, it is either an empty tree, or must meet the following conditions:

  • If the root is not a leaf node, then the root has at least two child nodes (otherwise it will become a single branch), with [1, m-1] elements
  • Each intermediate node (not the root node and leaf node) contains [math.ceil(m/2)-1, m-1] elements and [math.ceil(m/2),m] child nodes
  • Each leaf node contains [math.ceil(m/2)-1, m-1] elements
  • Each node has at most m child nodes.
  • All leaf nodes are on the same layer (the same height).
  • The elements in each node are arranged from small to large, and the k-1th element in the node is exactly the range division of the element contained in the kth child.

Note:
math.ceil(x) is an upward integer that returns a number. For example, math.ceil(1.2) returns 2.

Find

Take a simple 3rd-order B-tree as an example, the query value is 5
Insert picture description here
First disk IO:
Insert picture description here
first positioning in memory (compare with 9):
Insert picture description here
second disk IO:

second positioning in memory (and 2 , 6 comparison): the
Insert picture description here
third disk IO: the
Insert picture description here
third time positioning in the memory (compared with 3, 5):
Insert picture description here
summary:

  • The number of comparisons of B-Tree in the search process is not less than that of the binary search tree, especially when there are many elements in a single node, the number of comparisons of B-Tree may be more (but from the first section we can learn that "execute once IO time can execute 400,000 instructions", so a slight increase in the number of comparisons has no effect)
  • The number of disk IO in the search process of B-Tree is reduced (from 4 times to 3 times, which is equivalent to 9MS faster). This is one of the advantages of B-tree. Another advantage is self-balancing.

supplement:

  • B-Tree uses binary search to search inside the node, and the B+ tree to be discussed later is also

insert

For a B-tree of order m and height h, when inserting an element, first whether it exists in the B-tree, if it does not exist, it ends at the leaf node, and then inserts the new element in the leaf node.

  • If the number of elements of the node is less than m-1, insert directly;
  • If the number of elements of the node is equal to m-1, the node will be split; taking the middle element of the node as the boundary, take the middle element (even number, the middle two are randomly selected) and insert it into the parent node;
  • Repeat the above actions until all nodes meet the rules of the B-tree; in the worst case, it splits to the root node, generating a new root node, and the height increases by 1;

Take the 5th order B-tree as an example:

Key points of the 5th order B tree:

  • 2<=Number of root node child nodes<=5
  • 3<=Number of child nodes of inner node<=5
  • 1<=Number of root node elements<=4
  • 2<=Number of non-root node elements<=4

Insert picture description here
Picture (1) inserts element [8] and then becomes picture (2). At this time, the number of root node elements is 5, which does not meet 1<=number of root node elements<=4, and split (the real situation is split first, then insert elements, where the first is to directly insert elements, the following operations are the same, not repeat), the intermediate element takes the node [7], is added to the parent node, the left and right split into two nodes, as shown in (3)
Insert picture description here
followed by When inserting elements [5], [11], [17], no splitting operation is required, as shown in figure (4)

Insert picture description here
Insert element [13] The
Insert picture description here
node element exceeds the maximum number, split, extract the middle element [13], insert it into the parent node, as shown in Figure (6),
Insert picture description here
then insert elements [6], [12], [20], [23] When
Insert picture description here
inserting [26] as shown in (7) , the space of the rightmost leaf node is full, and the split operation is required. The middle element [20] is moved up to the parent node. Pay attention to moving up In the middle element, the tree remains balanced in the end, and there are two key elements in the node of the split result.
Insert picture description here
When inserting [4], the leftmost leaf node is split, [4] happens to be the middle element, moved up to the parent node, and then elements [16], [18], [24], [25] are inserted one after another No split operation is required.
Insert picture description here
Finally, when inserting [19], the nodes containing [14], [16], [17], [18] need to be split, move the middle element [17] up to the parent node, but The situation is here, the space in the parent node is full, so it is necessary to split, move the middle element [13] in the parent node up to the newly formed root node, so that the specific insertion operation is completed.
Insert picture description here

delete

First find the element to be deleted in the B-tree, if the element exists in the B-tree, delete the element in its node; after deleting the element, first determine whether the element has left and right child nodes, if so, Then move up a certain similar element in the child node ("the rightmost node of the left child" or "the leftmost node of the right child") to the parent node, and then the situation after the move; if not, delete it directly.

  • If the number of elements in a node is less than (m/2)-1, and (m/2) is rounded up, you need to see whether its neighboring sibling node is full;
  • If full (the number of elements in the node is greater than (m/2)-1), then borrow an element from the parent node to meet the condition;
  • If its neighboring siblings are not full, that is, the number of nodes is equal to (m/2)-1, then the node and its neighboring sibling node will be "merged" into one node;

Next, we will take a 5-level B-tree as an example to explain the delete action in detail;

  • The key point is that if the number of elements is less than 2 (m/2 -1), they will be merged, and if the number of elements is greater than 4 (m-1), they will be split.

Delete in sequence as shown in the figure. Delete [8], [20], [18], [5]
Insert picture description here
First delete the element [8], of course, first find [8], [8] in a leaf node, after deleting the leaf node The number of point elements is 2, which conforms to the B-tree rules. The operation is very simple. We only need to move [11] to the original position of [8], and move [12] to [11] (that is, after deleting the element in the node) Move forward)
Insert picture description here
Next, delete [20], because [20] is not found in the leaf node, but in the middle node. We find his successor [23] (the next element in ascending alphabetical order) ), move [23] up to the position of [20], and then delete [23] in the child node. Just after deleting, the number of elements in the child node is greater than 2, so there is no need to merge.
Insert picture description here
The next step is to delete [18], [18] in the leaf node, but the number of elements in the node is 2, and the deletion results in only 1 element, which is already less than the minimum number of elements 2, and we already know from the previous: If the number of adjacent sibling nodes is fuller (the number of elements is greater than ceil(5/2)-1=2), you can borrow an element from the parent node, and then move the fullest adjacent sibling node up to the end Or the first element to the parent node. In this example, the right adjacent sibling node is fuller (3 elements are greater than 2), so first borrow an element from the parent node [23] and move it down to the leaf node , Instead of the original position of [19], [19] moves forward; then [24] moves up in the adjacent right sibling node to the parent node, and finally deletes [24] in the adjacent right sibling node, followed by The element moves forward.
Insert picture description here
The last step is to delete [5], which will cause a lot of problems, because the number of nodes where [5] is located just meets the standard, just meets the minimum number of elements (ceil(5/2)-1=2), and the adjacent brothers The same is true for nodes. Deleting an element cannot satisfy the condition, so the node needs to be merged with a neighboring sibling node; first move the element in the parent node (the element is in the two nodes that need to be merged). (Between point elements) move down to its child nodes, and then merge these two nodes into one node. So in this example, we first move the element [4] in the parent node down to the node that has deleted [5] but only [6], and then add the nodes containing [4] and [6] and [1], [3] adjacent sibling nodes are merged into one node.
Insert picture description here
Maybe you think this delete operation is over, but it is not. Looking at the above picture, for this special case, you will immediately find that the parent node contains only one element [7], which is not up to the standard (because non-root nodes include leaf nodes) The element K of must satisfy 2=<K<=4, and here K=1), which is not acceptable. If the neighboring brothers of the problem node are fuller, you can borrow an element from the parent node. At this time, the element of the sibling node is just 2, which can only be merged, and the only element in the root node [13] moves down to the child node, so that the height of the tree is reduced by one level.
Insert picture description here

application

B-tree is mainly used for file system and some database indexes, such as the famous non-relational database MongBD

Most relational databases, such as MYSQL, use B+ trees as indexes, which will be introduced later

Reference:
https://zhuanlan.zhihu.com/p/54084335
https://www.cnblogs.com/lianzhilei/p/11250589.html

Guess you like

Origin blog.csdn.net/csdniter/article/details/111614696