In-depth understanding of advanced data structure red-black tree

Table of contents

1. Why is there a red-black tree?

2. What is a "balanced binary search tree"?

3. Definition of red-black tree

4. Why is it said that the red-black tree is "approximately balanced"?

5. Why does the red-black tree have good overall performance?

Sixth, realize the red-black tree

1. Balance adjustment of insertion operation

2. Balance adjustment of delete operation 

1. Preliminary adjustments for deleting nodes

2. Make secondary adjustments to focus nodes

3. Summary 

6. Application scenarios of red-black tree

The scene where the red-black tree has landed

 

1. Why is there a red-black tree?

Binary search tree is the most commonly used binary tree. It supports fast insertion, deletion, and search operations. The time complexity of each operation is proportional to the height of the tree. Ideally, the time complexity is O(logn).

However, why do we need to introduce a red-black tree when we already have a binary search tree with good performance?

That is because, in the process of frequent dynamic update of the binary search tree, the height of the tree may be much larger than log2n, which leads to a decrease in the efficiency of each operation. In extreme cases, the binary tree will degenerate into a linked list, and the time complexity will degenerate to O(n).

Moreover, we hope that the structure of the tree is associative, that is, between adjacent versions, such as the first insertion, and the second insertion, the structure of the tree cannot change too much, and it should be possible to pass O(1) times. Can vary finish. For an AVL tree, insertion satisfies this condition, but deletion does not.

To solve these two problems, we need to design a balanced binary search tree, which is the red-black tree we will talk about today.

2. What is a "balanced binary search tree"?

The strict definition of a balanced binary tree is this: the height difference between the left and right subtrees of any node in the binary tree cannot be greater than 1. From this definition, the complete binary tree and full binary tree we talked about in the previous section are actually balanced binary trees, but incomplete binary trees may also be balanced binary trees.

86a6df5169684f77a26a4b16f3ad083b.png

A balanced binary search tree not only satisfies the above definition of a balanced binary search tree, but also satisfies the characteristics of a binary search tree. The first balanced binary search tree invented is the AVL tree, which strictly conforms to the definition of a balanced binary search tree I just mentioned, that is, the difference between the heights of the left and right subtrees of any node does not exceed 1, and it is a highly balanced binary tree. Fork search tree.

However, many balanced binary search trees do not strictly conform to the above definition (the difference between the heights of the left and right subtrees of any node in the tree cannot be greater than 1), such as the red-black tree we will talk about below, from the root node to each leaf The longest path to a node may be twice as large as the shortest path.

We learn data structures and algorithms in order to apply them to actual development, so I don't think it is necessary to define them. Regarding the concept of a balanced binary search tree, I think we need to understand the meaning of "balance" from the origin of this data structure.

The original intention of inventing a data structure such as a balanced binary search tree is to solve the problem of time complexity degradation of ordinary binary search trees in the case of frequent insertions, deletions, and other dynamic updates.

Therefore, the meaning of "balance" in a balanced binary search tree is to make the whole tree look more "symmetrical" and "balanced" from left to right, so that the left subtree is not tall and the right subtree is very short. In this way, the height of the whole tree can be relatively lower, and the corresponding operations such as insertion, deletion, and search can be more efficient.

So, if we design a new balanced binary search tree now, as long as the height of the tree is not much larger than log2n (for example, the height of the tree is still logarithmic), although it does not meet the strict balanced binary search tree we talked about earlier definition, but we can still say that this is a qualified balanced binary search tree.

3. Definition of red-black tree

The English of red-black tree is "Red-Black Tree", RB Tree for short. It is a loosely balanced binary search tree. As I said earlier, its definition does not strictly conform to the definition of a balanced binary search tree. How is the red-black tree defined?

As the name implies, the nodes in the red-black tree, one class is marked as black, and one class is marked as red. In addition, a red-black tree needs to meet the following requirements:

  1. Each node is either red or black.
  2. The root node is black.
  3. Each leaf node is a black empty node (NIL), that is, the leaf node does not store data;

  4. If a node is red, then both of its children are black.
  5. For any node, each path to the NIL pointer at the end of the leaf node tree contains the same number of black nodes.

8173ff15aa3e454e8989d25278f27632.png

4. Why is it said that the red-black tree is "approximately balanced"?

 The original intention of balancing the binary search tree is to solve the performance degradation problem caused by the dynamic update of the binary search tree. Therefore, the meaning of "balanced" can be equivalent to no performance degradation. "Approximate balance" is equivalent to that the performance will not be degraded too seriously .

The performance of many operations on a binary search tree is proportional to the height of the tree. The height of an extremely balanced binary tree (full binary tree or complete binary tree) is about log2n, so if we want to prove that the red-black tree is approximately balanced, we only need to analyze whether the height of the red-black tree is relatively stable and close to log2n. up.

Red-black trees are similar to AVL trees, but provide faster real-time bounded worst-case insertion and deletion performance (up to two and three rounds respectively to balance the tree), but are slightly slower (but still O(log n)) lookup time;

Red-black trees, like AVL trees, provide the best possible worst-case guarantees for insertion time, deletion time, and lookup time. This not only makes them valuable in time-sensitive applications such as real time applications, but also makes them valuable as basic templates in other data structures that provide worst-case guarantees; for example, in computational geometry Many data structures used can be implemented based on red-black trees.

Compared with the AVL tree, the red-black tree sacrifices part of the balance in exchange for a small amount of rotation operations during insertion/deletion operations, and its overall performance is better than that of the AVL tree.

5. Why does the red-black tree have good overall performance?

It is said in " Algorithms (4th Edition) " that a red-black tree is equivalent to a 2-3 tree. In other words, for each 2-3 tree, there is at least one red-black tree whose data elements are in the same order. Insertion and deletion operations on 2-3 trees are also equivalent to color flips and rotations in red-black trees. This makes 2-3 trees an important tool for understanding the logic behind red-black trees, which is why many textbooks introducing algorithms introduce 2-3 trees before red-black trees, even though 2-3 trees are not often used in practice.

Among them, 2-node is equivalent to the node of ordinary balanced binary tree, and 3-node is essentially an unbalanced cache. When rebalancing is required, during addition and deletion operations, the conversion between 2-nodes and 3-nodes will absorb the imbalance, reduce the number of rotations, and end the rebalancing as soon as possible. Under comprehensive conditions, when the addition and deletion operations are equal and the randomness of the data is strong, the unbalanced buffering effect of the 3-node is more obvious. Therefore, the overall performance of the red-black tree is better.

Continuing to trace the source, the performance advantage of the red-black tree is essentially exchanging space for time.

Sixth, realize the red-black tree

The balance process of the red-black tree: the general process is: what kind of node arrangement we encounter, we will adjust accordingly . As long as these fixed adjustment rules are operated, an unbalanced red-black tree can be adjusted into a balanced one.

A qualified red-black tree needs to meet the following requirements:

  1. Each node is either red or black.
  2. The root node is black.
  3. Each leaf node is a black empty node (NIL), that is, the leaf node does not store data;

  4. If a node is red, then both of its children are black.
  5. For any node, each path to the NIL pointer at the end of the leaf node tree contains the same number of black nodes.

In the process of inserting and deleting nodes, the fourth and fifth requirements may be destroyed, and the "balance adjustment" we are going to talk about today is actually to restore the damaged third and fourth points.

Before officially starting, I will introduce two very important operations, rotate left and rotate right . The full name of left-handed is actually called left-handed around a certain node , and the full name of right-handed is probably you have guessed it, it is called right-handed around a certain node .

7eefa338273249dab7b6af63ecdf5fed.png

As I said before, the insertion and deletion operations of the red-black tree will destroy the definition of the red-black tree. Specifically, it will destroy the balance of the red-black tree. After that, how to adjust the balance and continue to be a qualified red-black tree.

1. Balance adjustment of insertion operation

First, let's look at the insert operation.

The red-black tree stipulates that the inserted node must be red. Moreover, the newly inserted nodes in the binary search tree are placed on the leaf nodes . Therefore, regarding the balance adjustment of insert operations, there are two special cases, but they are also very easy to handle.

  • If the parent node of the inserted node is black, then we do nothing, it still satisfies the definition of a red-black tree.

  • If the inserted node is the root node, then we can directly change its color and turn it into black.

In addition, other situations will violate the definition of red-black tree, so we need to adjust. The adjustment process includes two basic operations: left and right rotation and color change .

The red-black tree balance adjustment process is an iterative process. We call the node being processed the focus node . The attention node will continue to change as iteratively processes. The initial focus node is the newly inserted node.

After a new node is inserted, if the balance of the red-black tree is broken, there are generally the following three situations. We only need to keep adjusting according to the characteristics of each situation, so that the red-black tree can continue to meet the definition, that is, continue to maintain balance.

Let's look at the adjustment process for each case in turn. Just to remind you, in order to simplify the description, I call the sibling node of the parent node the uncle node, and the parent node of the parent node is called the grandparent node.

CASE 1: If the concerned node is a and its uncle node d is red , we will perform the following operations in sequence:

  • Set the colors of the parent node b and uncle node d of the concerned node a to black;

  • Set the color of the grandparent node c of the concerned node a to red;

  • The attention node becomes the grandparent node c of a;

  • Skip to CASE 2 or CASE 3.

277e8a1983184ed09294e431d61485b0.png

CASE 2: If the concerned node is a, its uncle node d is black, and the concerned node a is the right child node of its parent node b , we perform the following operations in sequence:

  • The concerned node becomes the parent node b of node a;

  • Rotate left around the new focus node b;

  • Skip to CASE 3.

 d5c30c3dd8514cefa3e32167b020e277.png

CASE 3: If the concerned node is a, its uncle node d is black, and the concerned node a is the left child node of its parent node b , we perform the following operations in sequence:

  • Rotate right around the grandparent c of the concerned node a;

  • Swap the colors of the parent node b and brother node c of the concerned node a.

  • The adjustment is complete.

fb7f95ba7a3044c4b7916576245c629b.png

2. Balance adjustment of delete operation 

The balance adjustment of the red-black tree insertion operation is not difficult, but the balance adjustment of its deletion operation is relatively more difficult. However, the principles are similar. We still only need to adjust according to certain rules according to the arrangement characteristics of the focus node and the surrounding nodes.

The balance adjustment of the delete operation is divided into two steps. The first step is the preliminary adjustment for the delete node . The preliminary adjustment is only to ensure that the entire red-black tree still meets the last defined requirement after a node is deleted, that is, each node, all paths from the node to its reachable leaf nodes contain the same number of black node; the second step is to make a secondary adjustment to the attention node , so that it meets the third definition of the red-black tree, that is, there are no two adjacent red nodes.

1. Preliminary adjustments for deleting nodes

It should be noted here that the definition of red-black tree "contains only red nodes and black nodes". - Black" or "Black-Black". If a node is marked as "black-black", it should be counted as two black nodes when calculating the number of black nodes.

In the following explanation, if a node can be either red or black, when drawing, I will use half red and half black to represent it. If a node is "red-black" or "black-black", I would use a small black dot in the upper left corner to indicate the extra black.

CASE 1: If the node to be deleted is a, which has only one child node b , then we perform the following operations in sequence:

  • Delete node a, and replace node b to the position of node a, this part of the operation is the same as the deletion operation of ordinary binary search tree;

  • Node a can only be black, node b can only be red, and other cases do not conform to the definition of red-black tree. In this case, we change node b to black;

  • After the adjustment is completed, no secondary adjustment is required.

394c90cdf2d74805809df7019b9637be.png

CASE 2: If the node a to be deleted has two non-empty child nodes, and its successor node is the right child node c of node a . We will do the following in turn:

  • If the successor node of node a is the right child node c, then the right child node c must not have a left subtree. We delete node a and replace node c in the place of node a. This part of the operation is no different from the deletion operation of an ordinary binary search tree;

  • Then set the color of node c to the same color as node a;

  • If the node c is black, in order not to violate the last definition of the red-black tree, we add a black to the right child node d of node c, at this time node d becomes "red-black" or "black-black";

  • At this time, the attention node becomes node d, and the adjustment operation in the second step will be done for the attention node.

113dd25025b64b1d8c750ca88a8b70a0.png

CASE 3: If node a is to be deleted, it has two non-empty child nodes, and the successor node of node a is not the right child node , we perform the following operations in sequence:

  • Find the successor node d and delete it. The process of deleting the successor node d refers to CASE 1;

  • Replace node a with successor node d;

  • Set the color of node d to the same color as node a;

  • If node d is black, in order not to violate the last definition of the red-black tree, we add a black to the right child node c of node d, at this time node c becomes "red-black" or "black-black";

  • At this time, the focus node becomes node c, and the adjustment operation in the second step will be done for the focus node.

4b572fcd43d1489daf1cc575c5f4ed52.png

2. Make secondary adjustments to focus nodes

After preliminary adjustments, the nodes of interest become "red-black" or "black-black" nodes. For this concern node, we will make secondary adjustments in four situations. The secondary adjustment is to make no adjacent red nodes exist in the red-black tree.

CASE 1: If the concerned node is a, and its sibling node c is red , we will perform the following operations in sequence:

  • Rotate left around the parent node b of the concerned node a;

  • Focus on node a's parent node b and grandparent node c exchanging colors;

  • Focus on the node unchanged;

  • Continue to select the appropriate rule from the four situations to adjust.

 3777a1a8135744b6a7db91f5c98ae57d.png

CASE 2: If the concerned node is a, its sibling node c is black, and the left and right child nodes d and e of node c are both black , we will perform the following operations in sequence:

  • Change the color of sibling node c of focus node a to red;

  • Remove a black from the attention node a, at this time node a is pure red or black;

  • Add a black to the parent node b of the concerned node a, and at this time node b becomes "red-black" or "black-black";

  • The focus node changes from a to its parent node b;

  • Continue to select the matching rules from the four situations to adjust.

 aa82742947d04345a99e9859e33973a7.png

CASE 3: If the concerned node is a, its sibling node c is black, the left child node d of c is red, and the right child node e of c is black, we perform the following operations in sequence :

  • Rotate right around the brother node c of the concerned node a;

  • Node c and node d exchange colors;

  • Focus on the node unchanged;

  • Jump to CASE 4 and continue to adjust.

 866a03ae9252496b8c9ff1a53bb5fef3.png

CASE 4: If the sibling node c of the concerned node a is black, and the right child node of c is red , we perform the following operations in sequence:

  • Rotate left around the parent node b of the concerned node a;

  • Set the color of sibling node c of focus node a to the same color as parent node b of focus node a;

  • Set the color of parent node b of focus node a to black;

  • Remove a black from the attention node a, and the node a becomes pure red or black;

  • Set the uncle node e of the concerned node a to black;

  • The adjustment is complete.

 68f7e8ba5ba24f639b940af9649a522b.png

3. Summary 

First, compare the process of red-black tree balance adjustment to Rubik's cube restoration, and don't delve too deeply into the correctness of this algorithm . You just need to understand that as long as you follow the fixed operation steps, keep the process of inserting and deleting, and don't destroy the definition of the balanced tree.

The second point is to identify the attention nodes, and don't lose or mistake the attention nodes . Because each operation rule is based on the attention node, only when the attention node is correct can it correspond to the correct operation rule. During the iterative adjustment process, the attention nodes are constantly changing, so we must pay attention to this process so as not to lose the attention nodes.

The third point is that the balance adjustment of the insertion operation is relatively simple, but the deletion operation is more complicated . For the deletion operation, we have two adjustments, the first is to make a preliminary adjustment for the node to be deleted, so that the adjusted red-black tree continues to meet the fourth definition, "the path from each node to the reachable leaf node contains same number of black nodes". But at this time, the third definition is not satisfied, and there may be a situation where two red nodes are adjacent. The second adjustment is to solve this problem, so that there are no adjacent red nodes in the red-black tree.

6. Application scenarios of red-black tree

1. If the insertion and deletion are frequent in the application scenario, and the search requirements are high, this scenario is more suitable for red-black trees (if the insertion and deletion are not frequent in the application scenario, but the search requirements are high, then AVL is still better than the red-black tree. Black tree, this scenario is more suitable for AVL tree).

2. Usually, red-black trees are used to store ordered data in memory, which can be quickly added and deleted, and memory storage does not involve I/O operations (B/B+ trees are more suitable for data structures of IO operations (such as disk data), which can reduce I/O operations. /O times, B and B+ are mainly used for indexing in file systems and databases, such as Mysql: B-Tree Index in MySql)

3. Now some scenarios use skip tables to replace red-black trees. How to choose a suitable data structure (jump tables or red-black trees) when there are requirements for concurrency and performance?

     3.1 The complexity of skiplist is the same as that of red-black tree, and it is simpler to implement.

     3.2 Skiplist has another advantage in a concurrent environment. Red-black trees may need to do some rebalance operations when inserting and deleting. Such operations may involve other parts of the entire tree, while skiplist operations are obviously more localized. , the lock needs to peg fewer nodes, so the performance is better in this case.

     3.3. It is more efficient to look up the range of the jump table, and it is simple to perform range-related functions. After locating the beginning and the end, you can directly get the predecessor and successor.

So the conclusion here is that if the implementation is considered simple (the red-black tree is more complicated than the skiplist implementation, balance adjustment needs to consider 7 kinds of scene balance rotation, as shown in the figure below), the performance is not bad, concurrency is friendly, and interval search is required, maybe skiplist is a A better choice.

4. Red-black trees are also particularly useful in functional programming, where they are one of the most commonly used persistent data structures. They are used to construct associative arrays and collections. After each insertion and deletion, they can maintain for previous versions. In addition to the time, the durable version of the red-black tree does not require much space for each insertion or deletion.

In fact, for ordinary developers, there is no need to implement red-black trees by themselves (you can learn from mature implementations and ignore the complexity of red-black tree implementations), so the use of red-black trees in production environments is still relatively common.

The scene where the red-black tree has landed

1、C++

Widely used in the STL of C++. For example, both map and set are implemented with red-black trees;

2、Java

Java's collection framework (HashMap, TreeMap, TreeSet); the underlying implementation of HashMap, in order to solve the long linked list caused by excessive hash conflicts in JDK1.8, when the length of the linked list is greater than a certain threshold, the linked list will be converted into a red-black tree;

3. Linux operating system

 

In the CFS process scheduling algorithm, vruntime uses a red-black tree for storage, and selects the smallest vruntime node for scheduling.

Packet CD/DVD drivers do the same.

The high-resolution timer code uses an rbtree to organize outstanding timer requests.

The ext3 filesystem tracks directory entries in a red-black tree.

Virtual Memory Architecture Management (VMA).

The core structure of Epoll with multiplexing technology is also red-black tree + doubly linked list. Both encryption keys and network packets are tracked by red-black trees.

4. Linux applications

Nginx uses red-black trees to manage timers, etc.

 

 

 

Guess you like

Origin blog.csdn.net/weixin_52967653/article/details/126799761