B-tree

1. B-tree

1. Definition of B-tree

B-tree, also known as B-tree, is a multi-way balanced search tree. When we describe a B-tree, we need to specify its order. The order indicates how many child nodes a node has at most. Generally, the letter m is used to indicate the order. When m takes 2, it is our common binary search tree.

A B-tree of order m is defined as follows:

1) Each node has at most m-1 keywords.

2) The root node can have at least one keyword.

3) Non-root nodes have at least Math.ceil(m/2)-1 keywords.

4) The keywords in each node are arranged in ascending order, all the keywords in the left subtree of each keyword are less than it, and all the keywords in the right subtree are greater than it.

5) All leaf nodes are located at the same layer, or the length from the root node to each leaf node is the same.

The picture above is a B-tree of order 4. In practical applications, the order m of the B-tree is very large (usually greater than 100), so even if a large amount of data is stored, the height of the B-tree is still relatively small. Each node stores the keyword (key) and the data (data) corresponding to the keyword, as well as the pointer of the child node. We call a key and its corresponding data a record . But for the convenience of description, unless otherwise specified, key will be used instead of (key, value) key-value pair as a whole in subsequent texts . In the database, we use the B tree (and B+ tree) as the index structure, which can speed up the query speed. At this time, the key in the B tree represents the key, and the data represents the logical address of the entry corresponding to this key on the hard disk.

1.2 Insertion operation of B-tree

The insert operation refers to inserting a record, that is, a key-value pair of (key, value). If the key-value pair that needs to be inserted already exists in the B-tree, replace the old value with the value that needs to be inserted. If the key does not exist in the B-tree, the insertion operation must be performed in the leaf node.

1) According to the value of the key to be inserted, find the leaf node and insert it.

2) Determine whether the number of keys of the current node is less than or equal to m-1, if so, end, otherwise go to step 3.

3) Split into left and right parts with the key in the middle of the node as the center, and then insert the middle key into the parent node, the left subtree of this key points to the left half of the split, and the right sub-branch of this key points to The right half of the split, then point the current node to the parent node, and continue to step 3.

The following takes the 5th-order B-tree as an example to introduce the insertion operation of the B-tree. In the 5th-order B-tree, the node has a maximum of 4 keys and a minimum of 2 keys.

a) Insert 39 in the empty tree

At this time, the root node is a key, and the root node is also a leaf node at this time.

b) Continue to insert 22, 97 and 41

The root node has 4 keys at this time

c) continue to insert 53

After the insertion, the maximum number of keywords allowed is 4, so the split is performed with the key value of 41 as the center. The result is shown in the following figure. After the split, the current node pointer points to the parent node, which satisfies the B-tree condition, and the insertion operation ends. When the order m is an even number, there is no key in the middle when the split needs to be split, then we can choose the previous key in the middle position or the next key in the middle position as the center to split.

d) Inserting 13, 21, and 40 in turn will also cause splitting, and the result is shown in the figure below.

e) Insert 30, 27, 33; 36, 35, 34; 24, 29 in turn, and the result is shown in the figure below.

f) Insert the record with the key value of 26, and the result after insertion is shown in the following figure.

The current node needs to be split with 27 as the center, and carry 27 to the parent node, and then the current node points to the parent node, and the result is shown in the following figure.

After the carry, the current node (that is, the root node) also needs to be split, and the result of the split is shown in the following figure.

After the split, the current node points to the new root, and no adjustment is required at this time.

g) Finally, insert records with keys 17, 28, 29, 31, and 32 in sequence, and the result is shown in the following figure.

In the code that implements the B-tree, in order to make it easier to write the code, we can define the length of the array of records stored in the node as m instead of m-1, which is convenient for the bottom node to insert a record to the upper layer due to splitting. The upper layer has redundant locations to store this record. At the same time, each node can also store a reference to its parent node, so that recursive programs do not have to be written.

In general, for a certain m and certain types of records, the node size is fixed, no matter how many records it actually stores. However, the method of allocating a fixed node size will be wasteful, such as the node where the key is 28, 29, and the position of 2 keys is not used, but it is impossible to continue to insert any value, because this node The preorder key is 27, the successor key is 30, and all integer values are used up. Therefore, if the records are sorted by the size of the key first, and then inserted into the B-tree, the utilization rate of the node will be very low, and the utilization rate is only 50% in the worst case.

1.3 Delete operation of B-tree

The delete operation refers to deleting a record according to the key. If there is no record corresponding to the key in the records in the B-tree, the deletion fails.

1) If the key that needs to be deleted is located on a non-leaf node, use the successor key (the successor key here refers to the meaning of the successor record) to overwrite the key to be deleted, and then delete the successor key in the sub-branch where the successor key is located. . At this time, the subsequent key must be located on the leaf node, and this process is similar to the method of deleting nodes in a binary search tree. After deleting this record go to step 2

2) If the number of keys of the node is greater than or equal to Math.ceil(m/2)-1, the deletion operation is ended, otherwise step 3 is performed.

3) If the number of keys in the sibling node is greater than Math.ceil(m/2)-1, the key in the parent node is moved down to this node, and a key in the sibling node is moved up, and the deletion operation ends.

Otherwise, move the key in the parent node down and merge the key in the current node and its siblings to form a new node. The two child pointers of the key in the original parent node become a child pointer, pointing to this new node. Then the pointer of the current node points to the parent node, and the above step 2 is repeated.

Some nodes may have both left brothers and right brothers, so we can arbitrarily select a sibling node to operate.

The following takes the fifth-order B-tree as an example to introduce the deletion operation of the B-tree. In the fifth-order B-tree, the node has a maximum of 4 keys and a minimum of 2 keys.

a) original state

b) Delete 21 in the above B-tree. After deletion, the number of keywords in the node is still greater than or equal to 2, so the deletion ends.

c) In the above case then delete 27. It can be seen from the above figure that 27 is located in a non-leaf node, so replace it with the successor of 27. As can be seen from the figure, the successor of 27 is 28, we replace 27 with 28, and then delete 28 in the right child node of 28 (original 27). The result after deletion is shown in the figure below.

After deletion, it is found that the number of records of the current leaf node is less than 2, and there are 3 records in its sibling nodes (the current node also has a right brother, and selecting the right brother will merge nodes, regardless of whether You can choose either one, but the shape of the B-tree will be different in the end), we can borrow a key from the sibling node. So 28 in the parent node moves down, 26 in the sibling node moves up, and the deletion ends. The result is shown in the figure below.

d) In the above case followed by 32, the result is as shown below.

When deleted, there is only key in the current node, and there are only 2 keys in the sibling node. Therefore, only 30 in the parent node can be moved down and the keys in the two child nodes can be merged to become a new node, and the pointer of the current node points to the parent node. The result is shown in the figure below.

The number of keys of the current node satisfies the condition, so the deletion ends.

e) In the above case, we then delete the record whose key is 40, and the result after deletion is shown in the following figure.

In the same way, the number of records of the current node is less than 2, and there is no extra key in the sibling node, so the key in the parent node is moved down, and the node is merged with the sibling (here we choose the left brother, and the right brother can also be selected), merge The subsequent pointer to the current node points to the parent node.

Similarly, for the current node, it can only continue to merge, and the final result is as follows.

After the merge, the current node of the node meets the conditions, and the deletion ends.

2. B+ tree

2.1 Definition of B+ tree

There are different definitions of B+ trees in various materials. One way of defining it is that the number of keywords and the number of child nodes are the same. Here we adopt the method defined on Wikipedia, that is, the number of keywords is 1 less than the number of child nodes, which is basically equivalent to the B-tree. The above picture is a B+ tree of order 4.

In addition, the B+ tree has the following requirements.

1) The B+ tree contains two types of nodes: internal nodes (also called index nodes) and leaf nodes. The root node itself can be an internal node or a leaf node. The number of keywords in the root node can be at least one.

2) The biggest difference between the B+ tree and the B tree is that the internal nodes do not store data, they are only used for indexing, and all data (or records) are stored in the leaf nodes.

3) The m-order B+ tree indicates that the internal node has at most m-1 keywords (or the internal node has at most m subtrees), and the order m also limits the leaf node to store at most m-1 records.

4) The keys in the internal nodes are arranged in ascending order. For a key in the internal node, all the keys in the left tree are less than it, and the keys in the right subtree are all greater than or equal to it. The records in the leaf nodes are also arranged according to the size of the key.

5) Each leaf node has pointers to adjacent leaf nodes, and the leaf nodes themselves are linked in ascending order according to the size of the keyword.

2.2 Insertion operation of B+ tree

1) If it is an empty tree, create a leaf node, and then insert the record into it. At this time, the leaf node is also the root node, and the insertion operation ends.

2) For leaf type nodes: find the leaf node according to the key value, and insert records into this leaf node. After insertion, if the number of keys of the current node is less than or equal to m-1, the insertion ends. Otherwise, split the leaf node into two left and right leaf nodes, the left leaf node contains the first m/2 records, the right node contains the remaining records, and the key of the m/2+1th record is carried to the parent In the node (the parent node must be an index type node), the left child pointer of the key carried to the parent node is to the left node, and the right child pointer is to the right node. Point the pointer of the current node to the parent node, and then perform step 3.

3) For index type nodes: If the number of keys of the current node is less than or equal to m-1, the insertion ends. Otherwise, split this index type node into two index nodes, the left index node contains the first (m-1)/2 keys, the right node contains m-(m-1)/2 keys, and the first index node contains the first (m-1)/2 keys. m/2 keys are carried into the parent node, the left child of the key carried into the parent node points to the left node, and the right child of the key carried into the parent node points to the right node. Point the current node's pointer to the parent node, and then repeat step 3.

The following is the insertion process of a 5th-order B-tree. The node of the 5th-order B number has at least 2 keys and at most 4 keys.

a) Insert 5 in empty tree

b) Insert 8, 10, 15 in sequence

c) Insert 16

After inserting 16, the number of keywords exceeds the limit, so it is necessary to split. When the leaf node is split, the split left node has 2 records, the right 3 records, the middle key becomes the key in the index node, and the current node points to the parent node (root node) after the split. The result is shown in the figure below.

Of course, we have another way of splitting, giving 3 records to the left node and 2 records to the right node. At this time, the key in the index node becomes 15.

d) Insert 17

e) Insert 18, as shown below after inserting

The number of keywords of the current node is greater than 5, and the split is performed. Split into two nodes, the left node has 2 records, the right node has 3 records, the keyword 16 is carried into the parent node (index type), and the pointer of the current node points to the parent node.

The number of keywords in the current node satisfies the condition, and the insertion ends.

f) After inserting some data

g) Insert 7 in the picture above, the result is shown in the picture below

The number of keywords in the current node exceeds 4 and needs to be split. The left node has 2 records, and the right node has 3 records. After the split, the keyword 7 enters the parent node, and points the pointer of the current node to the parent node. The result is shown in the following figure.

The number of keywords of the current node exceeds 4, and it needs to continue to split. The left node has 2 keywords, the right node has 2 keywords, and the keyword 16 enters the parent node and points the current node to the parent node. The result is shown in the following figure.

The number of keywords in the current node satisfies the condition, and the insertion ends.

2.3 Delete operation of B+ tree

If there is no corresponding key in the leaf node, the deletion fails. Otherwise do the following steps

1) Delete the corresponding key in the leaf node. After deletion, if the number of keys of the node is greater than or equal to Math.ceil(m-1)/2 – 1, the deletion operation ends, otherwise, step 2 is performed.

2) If the key of the sibling node has surplus (greater than Math.ceil(m-1)/2 – 1), borrow a record from the sibling node, and replace the parent node with the borrowed key (referring to the current node and the sibling node). point common parent node) the key in the point, delete the end. Otherwise go to step 3.

3) If there is no redundant key in the sibling node, the current node and the sibling node are merged into a new leaf node, and the key in the parent node is deleted (the child pointers on both sides of the key in the parent node are It becomes a pointer, just pointing to this new leaf node), point the current node to the parent node (must be the index node), and execute step 4 (the operation after step 4 is exactly the same as the B tree , mainly to update inodes).

4) If the number of keys of the index node is greater than or equal to Math.ceil(m-1)/2 – 1, the delete operation ends. Otherwise go to step 5

5) If the sibling node has surplus, the key of the parent node is moved down, the key of the sibling node is moved up, and the deletion ends. Otherwise go to step 6

6) The current node and the sibling node and the parent node move down the key to merge into a new node. Point the current node to the parent node and repeat step 4.

Note that after the deletion operation of the B+ tree, the key that exists in the index node does not necessarily have a corresponding record in the leaf node.

The following is the deletion process of a 5th-order B-tree. The nodes of the 5th-order B-number have at least 2 keys and at most 4 keys.

a) initial state

b) Delete 22, the result after deletion is as shown below

After deletion, the number of keys in the leaf node is greater than or equal to 2, and the deletion ends

c) Delete 15, the result after deletion is shown in the following figure

After deletion, the current node has only one key, which does not meet the conditions, while the sibling node has three keys. You can borrow a record with a key of 9 from the sibling node, and at the same time update the key in the parent node from 10 to 9. The deletion ends.

d) Delete 7, the result after deletion is shown in the following figure

The number of keywords in the current node is less than 2, and there are no redundant keywords in the (left) sibling node (the current node also has a right brother, but you can choose any one for analysis. Here we choose the left one. ), so the current node and the sibling node are merged, and the key in the parent node is deleted, and the current node points to the parent node.

At this time, the number of keywords of the current node is less than 2, and the keywords of the sibling nodes are not redundant, so the keywords in the parent node are moved down and merged with the two child nodes. The result is shown in the following figure.

2. B+ tree

Guess you like