B tree (BTree) and B+ tree (B+Tree)

What is a B-tree?

B-tree is a multi-way balanced search tree
Balance means that the height of the subtree is the same (that is, all leaf nodes are on the same layer), that is, the balance factor of each node is equal to 0
Multi-way, that is, except for the root node (the reason why the number of forks of the root node is not limited is because when the whole tree has only 1 keyword, the root node can only have 2 forks), each of the other All nodes have at least m/2 up to take the entire fork. (m is its order, and m is also the maximum number of forks of a node, which can also be understood as each node has at most m subtrees)

(1) Among all nodes, the one with the largest number of children, that is, the maximum number of forks, is called the order of the entire B-tree.
For example, if a node has at most 3 forks, it is called a 3rd-order B-tree
(2) The multiple data elements contained in each node are called "keywords". When a node has m subtrees When , there must be m-1 keywords.
There are 3 bifurcated nodes in the figure below, and only 3-1=2 keywords can be inserted in the gap
insert image description here
(3) If the root node is not a terminal node (the terminal node is the next layer above the leaf node Node, the leaf node here does not actually exist, it represents the node that failed to find, does not carry any keyword information, the terminal node is the "leaf node" that actually has keywords stored, but it is also called is a terminal node), then it has at least 2 subtrees (that is, at least one keyword can occupy a node, and a keyword can generate bifurcations, so there are at least two subtrees) (4
) Similar to the binary sorting tree, the B-tree also follows the principle of small left and large right, but because there are multiple keywords in one node, it also has multiple ranges. As shown below:
insert image description here

The search process of the B tree:

Suppose X=8
1: According to the search target X, start from the root node 12, smaller than 12, go to the left subtree, larger than 12, go to the right subtree
2: reach the root node of the left subtree, compare in order ( You can also use the binary search method) 6, 9, 10; find 9<X<10; then follow the fork pointer down, but find that it is null, that is, this range does not exist. At this time, you know that the corresponding keyword is not included in the B tree, and the return search fails.

(5) The data in the node is ordered from left to right.

(6) Except the root node, all non-leaf nodes have at least m/2 (round up) forks (subtrees), that is, at least m/2 (round up) -1 keywords.
Why do non-leaf nodes have at least m/2 (rounded up) forks (subtrees)?

B-tree generation and insertion:

Assume that the order of a certain B-tree is set to m=5, that is, each node has at most 4 keywords. When inserting a keyword, it will first use the above-mentioned B-tree search method to locate where the new keyword should be stored. Position, and then insert keywords, but there will be a problem, if the corresponding terminal node already has 4 keywords, and another one is inserted, which will not satisfy the nature of the 5th-order B tree, it will cause " Split". (That is, after inserting a new keyword, if the number of keywords corresponding to the terminal node is less than or equal to m-1, the insertion is successful. When the number of keywords of a node is greater than m-1, the node must be split), As shown in the figure below:
insert image description here
Therefore, each node is split, and when a split occurs, it is guaranteed that the node must be full and overflow, and each node after the split must be There are m/2 (rounded up)-1 keywords (except the root node), then there must be m/2 (rounded up) forks (subtrees).
⚠️: Not every time it splits, the tree will grow one level higher. The split must cause the split of the root node, and the tree will grow one level higher. If it does not affect the split of the root node, then no matter how split, the tree will grow Will not grow taller.

(7) The nodes in the B-tree all contain the real storage address of the record corresponding to the keyword (generally, the keywords in the B-tree are ID, UserId and the like). Therefore, in the search, if a certain layer matches the target keyword, the storage address of the keyword is immediately searched in the disk.

(8) Since insertion is a new addition, deletion must correspond to a merge operation. However, after deletion, the characteristics of the B-tree must be maintained, so it depends on the position of the deleted keyword.

B-tree deletion:

Still take the above 5th-order B tree as an example, suppose the keyword to be deleted is X
type 1: X is at the terminal node, and the number of keywords of the terminal node (before deletion) >= m/2 (take up whole), delete it directly.
insert image description here

Type 2: X is at the terminal node, but the number of keywords of the terminal node is exactly = m/2 (rounded up), and after deleting one, it does not satisfy the definition of the B-tree. At this time, it is necessary to consider how to fill this vacancy . It is further divided into the method of father-son transposition and the method of pulling the father into the water

Father-son transposition method: When a node is deleted and the key words are not enough to support the definition of the tree, check to see if there are any redundant nodes that can be borrowed from the left and right brothers of the node. If so, use the key of the borrowed brother The word is lifted up to the parent node, and the keyword corresponding to the parent node is filled back to the node that does not meet the definition.

insert image description here

Pull dad into the water: When the number of keywords of the left and right brothers is barely enough to support yourself, and you can’t even out, you can only pull dad into the water at this time to support the tree together

insert image description here

Type 3: X is in a non-terminal node, then replace the position of X with the predecessor or successor keyword of X, and then delete the corresponding predecessor and successor keyword, which becomes the case of deleting the terminal node.

Advantages and disadvantages

Advantages: In the query, external memory, that is, disk IO, is the least efficient. When the tree structure has a large amount of data, chain storage is generally selected. Chain storage will make the data scattered in all corners of the disk. It is necessary to continuously perform disk IO for seek and read, and use B-tree, because each node puts multiple keyword data, which greatly compresses the depth of the tree, and each node adopts the form of an array of continuous storage, After reading into the memory, the query speed is greatly improved, and at the same time, the number of IOs can be reduced, thereby improving the query speed.
Missing: Some keywords are in non-terminal nodes, and some keywords are in terminal nodes, so there is no way to perform range query. At the same time, because each node contains the storage information of keywords, it occupies a part of the space, so it is destined to The number of keywords affecting each node does not optimize the disk IO to the optimum.

B+ tree

B+ tree is a deformation of B tree, and it is a very important data structure at the bottom of the database.
(1) Non-leaf nodes do not contain the storage address of keywords.
(2) A keyword corresponds to a subtree, that is, a keyword is followed by a fork. So a node with n keywords contains n subtrees.

The fork in the B tree is sandwiched between the keywords. This is the most prominent difference in judging whether the tree in the title is B or B+. Look at the position of the fork

(3) ⚠️The root node must have at least two subtrees (two forks), and the branch nodes must have at least m/2 (round up) subtrees.

B tree allows only one keyword for the root node, and two for B+ tree

(4) All keywords (including those that appear on non-leaf nodes) will appear in the leaf nodes, and the leaf nodes are sorted according to the size of the keywords, and the adjacent leaf nodes are connected to each other in the form of a linked list . So B+ tree supports sequential range lookup. At the same time, B+Tree has two head pointers, one pointing to the root node and the other pointing to the leaf node of the smallest key.
insert image description here
(5) Since the records of the B+ tree can only be found in the leaf nodes, the bottom layer will be found when searching, and the B tree will return after finding the target keyword in the middle node.
(6) Since the non-leaf nodes of the B+ tree do not contain the storage information of the corresponding records of the keywords, a continuous disk space can be used to store more keywords, so that the order is larger, the tree is shorter, and the disk space is further reduced. I/O times.

Guess you like

Origin blog.csdn.net/whiteBearClimb/article/details/128131370