Turn: comparison [binary search tree, balanced binary tree, red-black tree, B-/B+ tree]

Transfer from: https://blog.csdn.net/z702143700/article/details/49079107

For personal backup only, please see the original text for browsing

 

table of Contents

1. Binary Search Tree

concept

Analysis of the operating cost of BST:

Summary of BST efficiency:

2. Balanced Binary Search Tree

AVL operation cost analysis:

AVL efficiency summary:

3. Red-Black Tree

Operation cost analysis of RBT:

RBT efficiency summary:

4. B~Tree/B+Tree (B-Tree)

Analysis of operation cost of B-Tree:

Summary of B-Tree efficiency:

Comparison of dynamic search tree structure:

(1) Balanced binary tree and red-black tree [AVL PK RBT]

(2) B-tree and B+tree [B-Tree PK B+Tree]


Preface: BST, AVL, RBT, and B-tree are all search trees, and the search time is basically on the order of O(logN). Make a detailed comparison below.

1. Binary Search Tree

concept

Binary search tree, also known as binary search tree, binary sort tree, has the following characteristics:
1. The value of all nodes on the left subtree is less than the root node
2. The value of all nodes on the right subtree is greater than the root node
3. The left and right subtrees of the node are themselves a binary search tree
. 4. The middle order traversal of the binary search tree results in an ascending sequence of nodes.

Analysis of the operating cost of BST:

(1) Search cost:
The search process of any data needs to start from the root node and proceed along a certain path to the leaf node. Therefore, the number of data comparisons in the search is closely related to the shape of the tree.
When the height of the left and right subtrees of each node in the tree is approximately the same, the height of the tree is logN. The average search length is proportional to logN, and the average time complexity of the search is on the order of O(logN).
When the keywords inserted one after another are ordered, the BST degenerates into a single tree structure. The tree height is n at this time. The average search length is (n+1)/2, and the average time complexity of the search is on the order of O(N).

(2) Insertion cost:
new nodes are inserted into the leaves of the tree, and there is no need to change the organizational structure of the original nodes in the tree. The cost of inserting a node is exactly the same as the cost of finding a non-existent data.

(3) Deletion cost:
When deleting a node P, the node P needs to be located first. This process requires a search cost. Then slightly change the shape of the tree. If there is only one left and right subtree of the deleted node, the cost of changing the form is only O(1). If both the left and right subtrees of the deleted node exist, you only need to swap the right leaf node of P's left child, right child, and right child with P, and change some left and right subtrees. Therefore, the time complexity of the delete operation will not exceed O(logN) at most.

Summary of BST efficiency:

Find the best time complexity O(logN) and the worst time complexity O(N).
The insert and delete operation algorithm is simple, and the time complexity is similar to that of search.

2. Balanced Binary Search Tree

In the worst case, the binary search tree has the same efficiency as the sequential search, which is unacceptable. Facts have also proved that when the stored data is large enough, the structure of the tree has a great influence on the search efficiency of certain keywords. Of course, the main reason for this situation is that the BST is not balanced (the height difference between the left and right subtrees is too large). In this case, then we need to pass a certain algorithm to change the unbalanced tree into a balanced tree. Therefore, the AVL tree was born.

AVL operation cost analysis:

(1) Lookup cost:
AVL is a strictly balanced BST (the balance factor does not exceed 1). Then the search process is the same as BST, except that the worst-case BST (single tree) will not appear in AVL. Therefore, the search efficiency is the best, and the worst case is O(logN) order of magnitude.

(2) Insertion cost:
AVL must ensure strict balance (|bf|<=1), so every time data is inserted that the balance factor of some nodes in the AVL exceeds 1, the rotation operation must be performed. In fact, each AVL inserting a node operation requires only one rotation (single rotation or double rotation) at most. Therefore, the overall cost of the insertion operation is still at the O(logN) level (the insertion of a node needs to find the location of the insertion first).

(3) Delete cost:
The algorithm of AVL delete node can refer to the delete node of BST, but after deletion, the balance factor of all nodes on the path from the deleted node to the root node must be checked. Therefore, the cost of deletion is slightly higher. Each delete operation requires at most O(logN) rotations. Therefore, the time complexity of the delete operation is O(logN)+O(logN)=O(2logN)

AVL efficiency summary:

The time complexity of the search is maintained at O(logN), and the worst-case
AVL tree does not require at most 1 rotation when performing each insertion operation, and its time complexity is about O(logN).
The AVL tree is slightly more expensive when performing deletions, and the time complexity of executing each deletion operation requires O(2logN).

3. Red-Black Tree

The strict balance strategy of the binary balanced tree sacrifices the cost of establishing a search structure (insertion and deletion operations) in exchange for a stable O(logN) search time complexity. But is it worth it?
Can you find a compromise strategy that can ensure stable and efficient search efficiency without sacrificing too much the cost of establishing a search structure? The answer is: red-black trees.

Operation cost analysis of RBT:

(1) Search cost:
Due to the nature of the red-black tree (the longest path length does not exceed twice the shortest path length), it can be explained that although the red-black tree is not strictly balanced like AVL, the balance performance is still better than BST . The search cost is basically maintained at about O(logN), but in the worst case (the longest path is 2 times the shortest path and 1 less), it is slightly inferior to AVL.

(2) Insertion cost: When
RBT inserts a node, a rotation operation and a color change operation are required. But because it only needs to ensure that the RBT is basically balanced. Therefore, inserting a node requires only 2 rotations at most, which is the same as the insert operation of AVL. Although the color-changing operation requires O(logN), the color-changing operation is very simple and costly.

(3) Delete cost:
The delete operation cost of RBT is much better than that of AVL. It only needs 3 rotations to delete a node at most.

RBT efficiency summary:

The time complexity of the search efficiency is O(logN) in the best case, but it is worse than AVL in the worst case, but it is also far better than BST.
The probability of insert and delete operations to change the balance of the tree is much less than AVL (RBT is not highly balanced). Therefore, the possibility of the required rotation operation is small, and once rotation is required, inserting a node only needs to rotate at most 2 times, and deleting at most only needs to rotate 3 times (less than the number of rotations required by the AVL delete operation). Although the time complexity of the color changing operation is O(logN), in fact, the cost of this operation is very small due to its simplicity.

4. B~Tree/B+Tree (B-Tree)

For the search structure in the memory, the efficiency of the red-black tree is already very good (in fact, many practical applications also optimize the RBT). But what if it is a search with a very large amount of data? It is obviously impractical to put all these data into memory and organize them into an RBT structure. In fact, like the file directory storage in the OS, the storage of the file index structure in the database... it is impossible to establish a search structure in the memory. This structure must be established in the disk. So in this context, is RBT still a good choice?
Organize the search structure in the disk. From any node to other nodes, it is possible to read the disk data once, and then write the data to the memory for comparison. Everyone knows that frequent disk IO operations are very inefficient (I don't know how much slower mechanical motion is than electronic motion). Obviously, all binary tree search structures are inefficient on disk. Therefore, B-tree solves this problem very well.

Analysis of operation cost of B-Tree:

(1) Search cost:
B-Tree acts as a balanced multi-way search tree (m-fork). B-tree search is divided into two types: one is to locate the disk address (search address) when searching for the address of another node from one node, and the search cost is extremely high. The other is to put the ordered key sequence in the node into the memory for optimized search (you can use half), which is extremely low compared to the search cost. The height of the B-tree is very small, so in this context, the B-tree is much more efficient than any binary structure search tree. Moreover, as a variant of B-tree, B+ tree has higher search efficiency.

(2) Insertion cost:
The insertion of B-Tree will cause node splitting operation. When the insertion operation causes the split of s nodes, the number of disk accesses is h (read the node on the search path) + 2s (write back the two split new nodes) + 1 (write back the new root node or after the insertion There is no node that caused the split). Therefore, the required number of disk accesses is h+2s+1, up to 3h+1. Therefore, the cost of insertion is very high.

(3) Deletion cost: Deletion of
B-Tree will cause node merging operation. In the worst case, the number of disk accesses is 3h = (h read access is required to find the deleted element) + (h-1 read access is required to obtain the nearest neighbor of the second to h layer) + (in the third to h The merging of layers requires h-2 write accesses) + (3 write accesses to the modified root node and the two nodes of the second layer).

(4) Definition:
an m-level (m>=3, that is, the number of data and sub-nodes contained in a node), a 3-level B-tree has the following characteristics:
1. The root node is as many as 3 subtrees
2. Definition :

define m 3                 /*B 树的阶*/  
typedef struct Node{  
    int keynum;             /* 结点中关键码的个数,即结点的大小*/  
    int key[m];               /*结点数据数组*/  
    struct Node *parent;  /*指向父节点的指针*/  
    Node*son[m];    /*指向子结点的指针数组*/  
};   

Summary of B-Tree efficiency:

Due to the consideration of disk storage structure, the cost of searching, deleting, and inserting B-trees is far less than any binary structure tree (reduction in the number of reads and writes to disk).

Comparison of dynamic search tree structure:

(1) Balanced binary tree and red-black tree [AVL PK RBT]

Both AVL and RBT are optimizations of binary search trees. Its performance is much better than binary search tree. They all have their own advantages, and their applications are also different.
Structural comparison: The structure of AVL is highly balanced, and the structure of RBT is basically balanced. Balance AVL> RBT.
Lookup comparison: AVL lookup time complexity is the best, and the worst case is O(logN). RBT lookup time complexity is O(logN) at best, and it is slightly worse than AVL in the worst case.
Comparison
of insertion and deletion : 1. AVL's insertion and deletion of nodes can easily cause an imbalance in the tree structure, while RBT requires a lower degree of balance. Therefore, in the case of a large amount of data insertion, the frequency of RBT needing to re-balance through the rotation and color change operation is less than that of AVL.
2. If balance processing is required, RBT has one more color-changing operation than AVL, and the time complexity of color-changing is on the order of O(logN). However, due to the simple operation, this discoloration is still very fast in practice.
3. When inserting a node all causes the unbalance of the tree, both AVL and RBT require at most 2 rotation operations. But after deleting a node to cause imbalance, AVL requires at most logN rotation operations, while RBT only requires at most 3 rotations. Therefore, the cost of inserting a node is similar between the two, but the cost of deleting a node RBT is lower.
4. The insertion and deletion costs of AVL and RBT are mainly spent on finding the nodes to be operated on. Therefore, the time complexity is basically proportional to O(logN).
Overall evaluation: A large amount of data has proved that the overall statistical performance of RBT is better than that of a balanced binary tree.

(2) B-tree and B+tree [B-Tree PK B+Tree]

The B+ tree is a variant of the B-tree. In the disk search structure, the B+ tree is more suitable for the disk storage structure of the file system.
Structure comparison:
B-tree is a balanced multi-path search tree, all nodes contain valid information of the key to be searched (such as file disk pointer). If each node has n keywords, there are n+1 pointers to other nodes.

The characteristics of B+ tree compared to B-tree:
1. The data only appears in the leaf nodes, and each node of the B-tree contains data;
2. The leaf nodes are connected by pointers;
3. The height of the B+ tree is average Is 3;

Search comparison:
1. Under the same amount of data to be searched, the disk IO operations that need to be called during the B+ tree search process are less than those of the ordinary B-tree. Because the B+ tree is located in the background of the disk storage, the search performance of the B+ tree is better than that of the B-tree.
2. The search efficiency of the B+ tree is more stable, because all the leaf nodes are in the same layer, and the search for all keywords must complete the entire process from the root node to the leaf node. Therefore, in the same B+ tree, the number of searches and comparisons for any keyword is the same. The B-tree is not necessarily the case, it may end when a certain non-terminal point is found.
Comparison of insertion and deletion: The efficiency of B+ tree and B-tree in insertion and deletion operations is almost the same.
Overall evaluation: in the context of application, especially in file structure storage. B+ tree has more applications, and its efficiency is better than B-tree.

Guess you like

Origin blog.csdn.net/chushoufengli/article/details/114598566