Basic concepts of B-tree

Basic concepts of B-tree

concept

  • A balanced search tree designed for disks or other storage devices

  • Similar to the red-black tree, the difference: it is better in reducing the number of disk IO operations, the node can have many children

  • Each node of the red-black tree has 1 keyword and at most 2 child nodes. If a node of the B-tree has n keywords, there are n+1 child nodes, and the n keywords are the separation domains of n+1 child nodes.

  • B-tree is also called B-tree

  • Other variants of B tree, such as B+ tree

nature

The nature of the B-tree with a single node is as follows:

  1. Attribute of node x

    • xn: the number of keywords stored in node x

    • x.key: indicates the key in node x, such as x.key1, x.key2, x.key3...x.keyn, stored in order (not descending order), x.key1<=x.key2<= ...x.keyn

    • x.leaf: bool variable, whether node x is a leaf node, true: yes, false: no

    • In addition, each node stores a value, or a pointer to the value

  2. Non-leaf nodes contain xn + 1 pointers to children. Leaf nodes have no children. This pointer attribute is meaningless.

  3. The key x.key in the node is the dividing line of the key range in its child nodes. For example: the keyword of node x is k1<=k2<=k3, x has 4 sub-nodes c1, c2, c3, c4, the keyword in c1 <= k1 <= the keyword in c2 <= k2 < = keyword in c3 <= k3 <= keyword in c4.

  4. Each leaf node has the same depth (height of the tree)

  5. The number of keywords contained in each node has an upper/lower bound. Use a minimum degree (fixed integer t, t>=2) to indicate.

    • The tree is not empty, and the root node has at least one keyword

    • Each node other than the root node has at least t-1 keywords (lower limit), that is, t sub-nodes

    • Each node has at most 2t-1 keywords (upper limit), that is, 2t sub-nodes, reaching the upper limit is called a full-leaf node

The following is a B-tree, where the number is a keyword, t is 3, and the height is only 2 layers, and the next step is similar, separated by keywords, and n keywords are separated by n+1 sub-nodes.

Basic operation

All operations below ps are assumed to be in memory and no disk IO operations have occurred

Create empty tree

  1. Create a root node

  2. Set the number of keywords to 0

  3. Set it as a leaf node

insert

Start from the root node

  1. If it is a leaf node, start to retrieve the insertion position according to the size: traverse forward from the last keyword, move one bit larger than it, until you find a suitable position and insert, the number of keywords is increased by 1.

  2. Non-leaf nodes: Recursively search downwards from the current node until a suitable leaf node is found, proceed to step 1.

    In the search process, it splits when it encounters full nodes (2t-1 keywords). The splitting process:

    • The full node is the root node: create an empty node as the new root node, and divide the original root node from the middle (at the t-th keyword) into 2 as the left and right subtrees of the new root node , The t-th keyword rises to the keyword of the new root node. At this time, the height of the tree is increased by 1.

    • The full node x is a non-root node:

      1. From the middle (the t-th keyword, the keyword x.keyt), it is divided into 2 and divided into two nodes, y and z. The y node is the original x node, and z is the new node. Half of the keywords are copied to z.

      2. Raise the keyword x.keyt to the parent node as the key of the parent node

    The above steps ignore the attribute assignment process, such as: when creating a new z node, the leaf attribute of the z node is the same as the original x node; when it is divided into left and right child nodes, its parent node needs to add a pointer to the newly created right Child node.

delete

Add nodes to ensure that the number of node keywords cannot exceed 2t-1, and split when the node is full; delete nodes, except for the root node, the number of node keywords must not be less than t (minimum degree), otherwise Need to merge.

Delete the keyword k from the node x, starting from the root node t and going down:

  1. x is a leaf node and k is the key of node x, delete k

  2. x is a non-leaf node and k is a keyword of x:

    • If the left child node of the k keyword of x has at least t keywords, find the predecessor kp of k from the left child node tree, overwrite the value of kp with the value of x node k (k has been deleted), and delete it recursively kp

    • The right child node of the k keyword of x has at least t keywords, find the successor kn of k from the right child node tree, overwrite the value of kn with the value of x node k (k has been deleted), and delete kn recursively

    • The left and right child nodes of the k keyword of x have only t-1 keywords, merge all the keywords of k and the right child node in x into the left child node, and release the key of x and the key that points to the right child node. Pointer; recursively delete k from the left child node

  3. k is not a key of x, and k is in a certain subtree x.ck of x:

    • x.ck has only t-1 keywords and adjacent siblings have at least t keywords: give a certain keyword of the parent node x to x.ck, and give a certain keyword of the sibling node of x.ck The parent node x, so x.ck has t keywords

    • x.ck and all its sibling nodes are t-1 keywords: merge x.ck with one of its siblings, and take the separated keyword of x as the merged intermediate keyword, at this time If x is the root node and it is empty at this time, then remove x and use the merged node as the new root node

Note that once the key of the root node is empty, its only child node is used as the new root node, and the tree height is reduced by 1.

search for

Similar to a binary search tree, but at this time, the number of keywords of the node is n, and n+1 search

B+ tree

B-tree variant, personal experience is mostly used for database indexing

The difference from the B-tree is that non-leaf nodes (called internal nodes) are not values ​​or attributes that point to values. Each node of the B-tree carries a value. A B+ tree is a value that contains only the leaf nodes, or a pointer to the location of the stored value.

The reason why the B+ tree is more suitable for indexing than the B tree: It is recommended to look at professional materials for understanding when it comes to memory, disk, and IO operands. Simple understanding: the query efficiency is determined by the number of IO operations (how many times to return to the disk), and the data is finally persisted to the disk. When querying the data, the memory loads the data from the disk in units of pages. If the internal node has only keywords but no value, more keywords can be loaded at one time, the data is more compact, and more IO operations can be reduced. Personal understanding is for reference only.

Reference

"Introduction to Algorithms Third Edition"

 

 

 

Guess you like

Origin blog.csdn.net/x763795151/article/details/107143951