Really understand the red-black tree, the real (data structure widely used in the Linux kernel

As a data structure, the red-black tree is not simple, because various publicity makes it too mysterious. A lot of articles about red-black trees have been collected on the Internet, which are nothing more than stereotyped, introducing concepts, analyzing performance, and posting Code, and then give a sinful sentence, what is the worst case for it...

We thought, why is a binary tree the worst case, that is, it degenerates into a linked list, so that the search becomes a traversal. The question is, how can a balanced binary tree fall back to a linked list! How is it balanced? Can you explain it briefly? sure! The general information about red-black trees directly gives the same black nodes and discontinuous red nodes as a hard enough but not too hard constraint to ensure the balance of the tree, but in fact, it has simpler understand the way.

 

1. Find - in height not in width

For search, if the height of a binary tree is N, then the search can be completed within N steps at most. This does not need to be explained, and it is a bit overwhelming to explain this. That is to say, the height of the tree should be as low as possible. Considering the average situation of the search, the distance from the leaf node to the root node cannot be too different.

2. The unbalanced root of the binary tree

A tree becomes unbalanced for lookups because the heights of the subtrees vary widely.

Why is the binary tree so easy to become unbalanced? It is very simple, because it only has binary, and there is a 50% probability of left and right. Then the probability of inserting N nodes that all are left or right nodes is 50% to the Nth power. If it is an octree, then this probability is 12.5% ​​to the Nth power, whichever probability is greater, you can calculate it yourself.

3. Multi-fork tree-width for height

In Section 1 and Section 2, we already know that the larger the width of the tree, the smaller the height, so the query will be faster. Isn’t there a 256-fork or even 1024-fork tree in the Cisco router? But is it really that good? For sparse nodes, this can severely consume memory.

If we consider the MMU system of the CPU, we will know that the difference between the second-level page table and the third-level page table lies in the different effects of dealing with sparse address spaces.

4. Trade-offs - 2, 3 trees

We found that Tao produces one, one produces two, and the binary tree is a perfect start, but we find that it is particularly easy to tilt, and don't touch it when it is tilted. We can't go to the 256-fork tree all at once, even if it is in the case of massive nodes, it can't resist, so this blind width-for-height scheme has no scalability. We need to find a dynamic mechanism to allow a tree to dynamically adjust and maintain balance.

In order to find out this mechanism more easily and make it easier to show, temporarily increase the width of the tree. If you can’t find a solution when you increase it to a 3-point tree, you can increase it to a 4-point tree... The N-ary tree we are talking about is not Saying that a node must have N child nodes means that it has at most N child nodes.

So far, it used to be my own metaphysical point of view. A few years ago, my thoughts stopped here. The reason was that I was very depressed during that time and wanted to find out some technical metaphysical ideas, but suddenly I became better, so I Did not continue. Fortunately, I now find that there is indeed such a plan, and the red-black tree is a regression from the ternary tree.

To my delight, my train of thought did not go astray.

5.2-3 Tree Balance Transformation

If it is a binary tree, then you insert a node, you only have at most 1 chance to keep the height of the subtree unchanged, if it is a ternary tree, then there are 2 chances. From now on, we add a fork to the binary tree and become a ternary tree.

In a binary tree, a node has two branches, and in a ternary tree, there are three branches. A point can divide the interval into two partial areas. To divide an interval into three partial areas, two points are needed. Therefore, in the case of a trifurcation, the node stores two points instead of one, as shown in the figure below Show:

Now consider inserting a new node, how does this 2-3 tree maintain balance. It's very simple. We know that the insertion position must be a leaf. Assuming that the current tree is balanced, there are two situations:

1). The parent node of the inserted new leaf node is a binary node

This is the simplest case, just change the binary node into a trifurcated node, as shown in the following figure:

2). The parent node of the inserted new leaf node is a three-fork node

This situation is more complicated. The tree always grows taller, and the way to maintain balance is to grow taller at the same time, but this is impossible. Inserting a node can only make the subtree where the node is located grow taller. However, if this information can be raised to the root and grow taller at the root, then "grow taller at the same time" will be realized!
Still following the idea above, we continue to increase the number of tree forks, and we increase it to 4! The insertion of a new node is shown in the figure below:

Unfortunately, the mission was not completed, but in the end we had two problems, and as soon as these two problems were solved, all problems were solved.

Solving these two problems will undoubtedly involve the parent node of node P and the upper nodes. There are two possibilities:


Possibility 1: The parent node PP of P is a binary node.
This is very cool. We can directly mention P and its subtrees to the PP node, similar to the scenario of B insertion, as shown in the following figure:

Problem 2 solved.

Possibility 2: P's parent node PP is a trident node,
which is a bit difficult to handle, but there is a final blow! In any case, first mention the P node and its child nodes to PP to maintain the balance at the bottom, so that it can be solved recursively. At this time, we once again encountered the problem of inserting child nodes into a trident node. In order to Without increasing the height of the tree, the only way is to expand into a quadruple node - width for height. As shown below:

Finally, we found that in the process of recursion, either P..P is a binary node. At this time, according to the solution of problem 2, the value of the current node is directly mentioned in P...P, and its subtree is reduced. A height offsets the increased height, the balance is maintained, the recursion ends, or the recursion reaches the root node, and only one split operation is needed to end it perfectly!

  Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

6. Evolution to red-black tree

Obviously, through the above description, we seem to have found a solution to keep the tree balanced, and it is quite perfect balance! The core is the game between width and height. We can always use a width to offset the height of a layer. The whole process is one or more additions and one subtraction, and the final result is still 0!
However, this is no longer a binary tree, and some nodes have become three-fork, and two values ​​are saved, which divide the interval into three parts, which are three-fork! Therefore, it is not as convenient as a binary tree in use, and the comparison operation is complicated. In fact, by processing the three-point node into a two-point node, the tree becomes a red-black tree! How to deal with it? It's easy! As shown below:

See, the red node is separated from the 2-3 tree. In order to maintain a binary tree instead of a 2-3 tree, the tri-fork node must be changed into a binary node. This is a width-for-height fallback, that is Changing height to width, of course, the cost is that it is no longer perfectly balanced.

According to the transformation above, try it yourself, can you change two consecutive red nodes? NO! Are you still struggling with the concept of the nature of the red-black tree? After watching its evolution, you will find that many complex concepts and confusing performance of red-black trees are natural. Let's take a look at what its worst-case scenario is.

Still use 2-3 tree analysis, if in a 2-3 tree, the nodes on the leftmost path are all three-point nodes, and the nodes on the rightmost path are all binary nodes, then transform it into binary red After the black tree, you will find that the leftmost path has red and black nodes, and the rightmost path is all black nodes, and their height difference is close to 2 times. It is sad that such a situation occurs, but it is also extremely low probability.

All operations of red-black trees, including rotation, can be mapped to 2-3 trees, and we have enough understanding of 2-3 trees and the game between height and width. Please understand the red-black tree again, and then look at its properties and concepts, together with left-handed and right-handed, is there a new experience?

Original Author: Geek Reborn

 

 

Guess you like

Origin blog.csdn.net/youzhangjing_/article/details/131917013