B-tree and B+tree

At present, most database systems and file systems use B-Tree or its variant B+Tree as the index structure. In the next section of this article, we will discuss why B-Tree and B+Tree are used in this way in combination with memory principles and computer access principles. Widely used for indexes, this section describes them purely from a data structure perspective.

B-Tree

In order to describe B-Tree, first define a data record as a two-tuple [key, data], key is the key value of the record, for different data records, the keys are different from each other; data is the data of the data record except the key . Then a B-Tree is a data structure that satisfies the following conditions:

d is a positive integer greater than 1, called the degree of B-Tree.

h is a positive integer called the height of the B-Tree.

Each non-leaf node consists of n-1 keys and n pointers, where d<=n<=2d.

Each leaf node contains at least one key and two pointers, and at most 2d-1 keys and 2d pointers, and the pointers of the leaf nodes are all null.

All leaf nodes have the same depth, equal to the tree height h.

The key and the pointer are spaced apart from each other, and the two ends of the node are pointers.

The keys in a node are arranged non-decreasingly from left to right.

All nodes form a tree structure.

Each pointer is either null or points to another node.

If a pointer is at the far left of the node node and is not null, all the keys it points to the node are less than v(key1), where v(key1) is the value of the first key of the node.

If a pointer is at the far right of the node node and is not null, all the keys it points to the node are greater than v(keym), where v(keym) is the value of the last key of the node.

If the adjacent keys of a pointer on the left and right of node node are keyi and keyi+1 respectively and are not null, then all keys pointed to the node are less than v(keyi+1) and greater than v(keyi).

Figure 2 is a schematic diagram of a B-Tree with d=2.

figure 2

Due to the characteristics of B-Tree, the algorithm for retrieving data by key in B-Tree is very intuitive: first, perform a binary search from the root node, if found, return the data of the corresponding node, otherwise, recursively search for the node pointed to by the pointer of the corresponding interval , until a node is found or a null pointer is found, the former succeeds and the latter fails. The pseudocode of the search algorithm on B-Tree is as follows:

BTree_Search(node, key) {
    if(node == null) return null;
    foreach(node.key)
    {
        if(node.key[i] == key) return node.data[i];
            if(node.key[i] > key) return BTree_Search(point[i]->node);
    }
    return BTree_Search(point[i+1]->node);
}
data = BTree_Search(root, my_key);

There are a series of interesting properties about B-Tree. For example, a B-Tree with degree d, if its index is N keys, the upper limit of its tree height h is logd((N+1)/2), retrieving a key, The asymptotic complexity of finding the number of nodes is O(logdN). It can be seen from this point that B-Tree is a very efficient index data structure.

In addition, since inserting and deleting new data records will destroy the nature of B-Tree, when inserting and deleting, it is necessary to perform a split, merge, transfer and other operations on the tree to maintain the nature of B-Tree. This article does not intend to fully discuss B-Tree. For these contents, because there are many materials that describe the mathematical properties of B-Tree and the insertion and deletion algorithm in detail, interested friends can find the corresponding materials in the References column at the end of this article for reading.

B+Tree

There are many variants of B-Tree, the most common of which is B+Tree. For example, MySQL generally uses B+Tree to implement its index structure.

Compared with B-Tree, B+Tree has the following differences:

The upper bound of the pointer per node is 2d instead of 2d+1.

Inner nodes do not store data, only keys; leaf nodes do not store pointers.

Figure 3 is a simple B+Tree schematic.

image 3

Since not all nodes have the same domain, leaf nodes and inner nodes in a B+Tree are generally of different sizes. This is different from B-Tree. Although the number of keys and pointers stored in different nodes in B-Tree may be inconsistent, the domain and upper limit of each node are consistent, so in implementation, B-Tree often applies the same amount to each node. size of space.

Generally speaking, B+Tree is more suitable for implementing external storage index structure than B-Tree. The specific reasons are related to the principle of external storage and the principle of computer access, which will be discussed below.

B+Tree with sequential access pointers

The B+Tree structures generally used in database systems or file systems are optimized on the basis of classic B+Trees, adding sequential access pointers.

Figure 4

As shown in Figure 4, adding a pointer to an adjacent leaf node in each leaf node of a B+Tree forms a B+Tree with sequential access pointers. The purpose of this optimization is to improve the performance of interval access. For example, in Figure 4, if you want to query all data records with keys from 18 to 49, when 18 is found, you only need to traverse the nodes and pointers in order for one-time access. To all data nodes, the efficiency of interval query is greatly mentioned.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326215360&siteId=291194637