"Tree" of data structure - binary tree, red-black tree, B tree, B+ tree, B* tree

This article mainly briefly summarizes the basic structures and principles of binary trees, red-black trees, B trees, B+ trees, and B* trees.

1. Binary tree

A binary tree is a tree of degree no more than 2 (each node has at most two child nodes).
A binary tree is an ordered tree (binary sorted tree) , if its left and right subtrees are reversed, it becomes another different binary tree.
There are three types of traversal of a binary tree, divided into preorder, inorder, and postorder (relative to the root node). As shown in the picture:
insert image description here

root node-left node-right node
left node-root node-right node
left node-right node-root node

1. Full binary tree

The child nodes are full.
insert image description here

2. Complete binary tree

Except for the last layer, all layers are full of nodes, and all nodes of the last layer are concentrated on the far left. Full binary tree evolved.
insert image description here

3. Binary search tree (binary sort tree)

The most basic binary tree is unordered, and the query efficiency is extremely low. The binary tree after sorting is a binary search tree, which is also a binary sorting tree.
Binary search tree : the nodes on the left subtree are all smaller than the root node, and the values ​​of all nodes on the right subtree are greater than the root node. The left and right subtrees are also binary sorted trees respectively.
insert image description here

4. Balanced binary tree (AVL tree)

In a binary search tree, the maximum height difference between two subtrees corresponding to any node is 1 , and such a binary search tree is called a balanced binary tree. The height difference between the left and right subtrees also has a professional name: balance factor. Both left and right nodes have to be balanced binary trees.
insert image description here
A balanced binary tree uses the binary search method to assemble data into a tree-structured data according to the rules. Using this tree-structured data reduces the retrieval of irrelevant data and greatly improves the speed of data retrieval.

5. Left-handed and right-handed binary trees

Left rotation: Use the "right branch" of the node as the axis to rotate counterclockwise.
Right rotation: Rotate clockwise with the "left branch" of the node as the axis.
insert image description here
In order to achieve balance, the binary tree will be rotated. ----Once the height difference between the left and right subtrees is greater than 1 due to insertion or deletion, it is necessary to rotate some nodes to adjust the height of the tree to achieve a balanced state again. This process is called rotation rebalancing.

Two, red-black tree

The red-black tree is also a special binary search tree, but the red-black tree can prevent the tree from frequently rotating and balancing. In the final analysis, there is one sentence: as long as the longest subtree is not more than twice the shortest subtree, it is OK, and the color-changing behavior is added to ensure the balance of the
tree .
Specific requirements are as follows:

①Each node is either black or red.
②The root node is black.
③Each leaf node (NIL) is black. [Note: The leaf node here refers to the leaf node that is empty (NIL or NULL)]
④ If a node is red, its child nodes must be black.
⑤ All paths from a node to the descendants of this node contain the same number of black nodes. [Here refers to the path to the leaf node]

As shown in the figure:
insert image description here
the insertion operation of the red-black tree
is as follows: First, insert; after insertion, use the newly inserted node as the current balance node N to perform the balance operation.
The diagram is as follows:
insert image description here

Three, B tree

The difference between B-tree and balanced binary tree is that B-tree belongs to multi-fork tree, also known as balanced multi-way search tree (there are more than two search paths), and the data structure of B-tree and B+ tree is widely used in database indexing technology.
The B-tree properties are as follows:

Each node has at most m child nodes
Each non-leaf node (except the root node) has at least ⌈m/2⌉ child nodes, ⌈m/2⌉ means rounding up.
If the root node is not a leaf, then it has at least two children A
non-leaf node with k children has k − 1 keys
All leaf nodes are at the same level

Let's look at the database index, first look at the picture:
insert image description here
this is a typical B-tree, take the search 22 as an example, the query process is as follows; 1. Root node query, read into the first memory (1 disk I/O); ; 4. Find 22 in
Disk 7 . The general process is as above, but the following disadvantages are also obvious: 1. Each node stores data, and the disk size is limited. If the data is large, the stored data will be limited. 2. When the amount of data is large, the depth of the tree will increase, and the number of I/O will increase.





The InnoDB storage engine reads 16kb by default, and reads a total of three disk blocks, which means that a total of 48k of data is read. If the above p pointers and 16 key values ​​do not need to occupy additional storage space, one piece of data occupies 1kb of space, which means that the current node can store up to 16 pieces of data, the next disk block also has 16 pieces, and the third disk block also has 16 pieces. If calculated, it is 16×16×16, which is 4096 pieces of data. The amount of data supported by this is too small. In a production environment, a random mysql table has millions of entries, and it is impossible to have only tens of thousands or thousands. This is unlikely. At this time, it is necessary to think about what is wrong with the b-tree.

4. B+ tree

In order to solve the above problems, B+ tree is introduced.
insert image description here
As shown in the figure, the data is stored in the leaf nodes, so that the problem of limited storage space will not be caused.

The read data is still 16kb, 16kb, 16kb, assuming that the key value plus the p pointer occupies a total of 10 bytes, then 16kb is 16×1000/10, and the result is 1600, the second layer is also 1600, and the third layer is still 16, so the final result is 40960000 pieces of data, reaching the level of tens of millions. And just now the B-tree is 4096, which is not an order of magnitude at all. Therefore, in the B+ tree of three to four layers, it can basically support the storage of tens of millions of data volumes.

MySQL uses B+ trees.

Five, B* tree

It is a variant of the B+ tree, adding pointers to siblings at the non-root and non-leaf nodes of the B+ tree.
insert image description here

The B* tree defines that the number of non-leaf node keywords is at least (2/3)*M, that is, the minimum usage rate of the block is 2/3 (instead of 1/2 of the B+ tree);

B+树的分裂:当一个结点满时,分配一个新的结点,并将原结点中1/2的数据复制到新结点,最后在父结点中增加新结点的指针;B+树的分裂只影响原结点和父结点,而不会影响兄弟结点,所以它不需要指向兄弟的指针;
B*树的分裂:当一个结点满时,如果它的下一个兄弟结点未满,那么将一部分数据移到兄弟结点中,再在原结点插入关键字,最后修改父结点中兄弟结点的关键字(因为兄弟结点的关键字范围改变了);如果兄弟也满了,则在原结点与兄弟结点之间增加新结点,并各复制1/3的数据到新结点,最后在父结点增加新结点的指针。

Guess you like

Origin blog.csdn.net/liwangcuihua/article/details/130615803