Understand the origin of red-black trees and understand the nature of red-black trees

foreword

> This article is included in the album: http://dwz.win/HjK , click to unlock more knowledge of data structures and algorithms.

Hello, my name is Tong.

In the first two sections, we learned the theoretical knowledge about skip tables together, and handwritten two completely different implementations. Let's put a picture to briefly review:

15

The key to implementing the skip list is to add indexes of various layers on the basis of the ordered linked list. Through these indexes, the time complexity of O(log n) can be used to quickly insert, delete and find elements.

Speaking of the jump table, we have to mention another very classic data structure - the red-black tree. Compared with the jump table, although the time complexity of the red-black tree is O(log n), the red-black tree has a The usage scenarios are relatively wider. The implementation of red-black trees has always existed in the early Linux kernel, and it is also used in the more efficient multiplexer Epoll.

Therefore, the red-black tree is a knowledge point that every programmer has to know, and even some perverted interviewers will let you write part of the red-black tree, such as left-hand rotation, right-hand rotation, the process of inserting balance, and deleting balanced ones. The process, these contents are very complex, and it is often difficult to fully grasp by rote memorization.

Brother Tong has also been looking for a memory method for red-black trees, and finally I found such a pretty good way, starting from the origin of red-black trees, understanding the essence of red-black trees, and then starting from the essence, thoroughly Learn how to avoid rote memorization and write it out by hand at the end.

From this section onwards, I will also pass this method to you, so the part of the red-black tree, I will explain it in three subsections:

  • From the origin of the red-black tree to the nature of the red-black tree
  • From the essence of the red-black tree, find a way without rote memorization
  • Don't rely on rote memorization, handwritten red black tree

Alright, let's move on to the first section.

Origin of the red-black tree

binary tree

Speaking of trees, we have to say that the most famous tree is the binary tree. What is a binary tree?

A binary tree is a tree in which each node in the tree has at most two children.

1

Of course, the binary tree itself does not seem to be useful. The binary tree we usually refer to basically refers to the binary search tree, or ordered binary tree, binary search tree, and binary sorting tree.

binary search tree

Binary search tree (BST, binary search tree) is to add order on the basis of binary tree. This order generally refers to natural order. With order, we can use binary tree to quickly search and delete , insert the element.

2

For example, in the above binary search tree, the average time complexity of finding elements is O(log n).

However, the binary search tree has a very serious problem. Just imagine, what if these three elements are inserted in the order of A, B, and C?

3

what is this? Single list? Yes, when elements are inserted according to their natural order, the binary search tree degenerates into a singly linked list. What is the time complexity of inserting, deleting, and finding elements in a singly linked list? O(n).

So, in the extreme case, the time complexity of binary search tree is very poor.

Since the performance of the binary search tree may deteriorate after inserting elements, can we add some means to make the binary search tree still perform well after inserting elements?

The answer is yes, this method is called 平衡, this kind of self-balancing tree is called a balanced tree.

Balanced tree

Self-balancing or height-balanced binary search tree refers to a binary search tree that can be self-balanced after inserting and deleting elements, so that its time complexity can always be asymptotically close to O(log n).

For example, in the tree above, after inserting elements according to A, B, and C, and performing a rotation operation, it can become a tree with a search time complexity of O(log n).

4

However, the balanced tree has always been just a concept, and it was not until 1962 that two Soviets invented the first balanced tree, the AVL tree.

> Strictly speaking, a balanced tree refers to a binary search tree that can be self-balancing. There are three keywords: self-balancing, binary, and search (ordered).

AVL tree

An AVL tree (named by the acronym for inventors A delson - Velsky and L andis), is a balanced tree in which the height difference between the two subtrees of any node does not exceed 1.

5

For example, the above tree is an AVL tree. If you don’t believe me, you can count it to see if the height difference between the two subtrees of each node is not more than 1.

Is it difficult to find that it is really an AVL tree? Yes, this is the first shortcoming of AVL tree, which is not intuitive, especially when the number of nodes is large.

The second disadvantage is that the process of self-balancing when inserting and deleting elements is very complicated. For example, the above tree inserts a node T:

6

We look up from T, its parent node U, the height difference between the two subtrees of U is 1, which satisfies the rules of the AVL tree, and further up, the height difference between the two subtrees of S is 1, which also satisfies the rules , and further up, the height difference between the two subtrees of V is 2, which does not satisfy the rule. At this time, a self-balancing process is required. How to self-balance?

I give the diagram below, you can try to understand it:

7

The red nodes represent the axis of rotation.

After two rotations, the tree becomes an AVL tree again, and this is only one of the insertion scenarios. In the real situation, different rotations are required according to the insertion position. You can insert a few more nodes and try it yourself. Balance it out.

Similarly, the code of the AVL tree is not so easy to implement. Anyway, so far, Brother Tong has not understood the various rules of the AVL tree.

Based on these shortcomings, various magical balanced trees have been developed later.

2-3 trees

2-3 tree , refers to each node with child nodes (internal node, internal node) either has two child nodes and one data element, or a self-balancing tree with three child nodes and two data elements, all its leaves Nodes all have the same height.

To put it simply, the non-leaf nodes of a 2-3 tree all have two forks or three forks, so it is easier to understand it as a 2-fork-3-fork tree.

In other words, a node with two child nodes and one data element is also called a 2-node, and a node with three child nodes and two data elements is also called a 3-node, so the whole tree is called a 2-3 tree.

8

2-3 tree, the process of self-balancing after inserting elements is much simpler than that of AVL trees. For example, in the above tree, insert an element K, it will find I Jthis node first, insert element K, and form a temporary node I J K, does not conform to the rules of the 2-3 tree, so split, Jmove up, F Hthis node becomes F H J, and does not conform to the rules of the 2-3 tree, continue to move up H, the root node becomes D H, at the same time, in the process of moving up , the child nodes should be split accordingly. The process is roughly as follows:

9

> Working hard to draw, pay attention to a wave: Brother Tong reads the source code.

It can be seen that in the above self-balancing process, a node appears, which has four child nodes and three data elements. This node can be called a 4-node. If the 4-node is considered to be allowed to exist, then, There is another kind of tree: the 2-3-4 tree.

2-3-4 tree

A 2-3-4 tree , each of its non-leaf nodes, is either 2 nodes, 3 nodes, or 4 nodes, and can be self-balancing, so it is called a 2-3-4 tree.

The definitions of 2-node, 3-node, and 4-node have been mentioned above, let's reiterate:

2 node: contains two child nodes and a data element;

3 node: contains three child nodes and two data elements;

4 nodes: contains four child nodes and three data elements;

10

Of course, the process of inserting elements in the 2-3-4 tree is also well understood. For example, in the above tree, insert element M, find K Lthis node, and insert it to form 4 nodes, which satisfy the rules and do not require self-balancing:

11

What about inserting element N? The process is the same as the 2-3 tree, which can be split upward. At this time, there are two middle nodes, and it is possible to move up any one of them. Here we take the middle left node as an example. The general process is as follows:

12

Isn't it very simple, at least much simpler than the left-handed right-handed rotation of the AVL tree.

Similarly, there is a temporary 5-node during the self-balancing process of the 2-3-4 tree, so what if the existence of 5-node is allowed?

Well, the 2-3-4-5 tree was born!

Similarly, there are 2-3-4-5-6 trees, 2-3-4-5-6-7 trees... children and grandchildren, endlessly~

Therefore, some people put this type of tree into a new name: B-tree.

B-tree

B-tree , represents a kind of tree, which allows a node to have more than two child nodes, and is also self-balancing, and the height of leaf nodes is the same.

Therefore, in order to better distinguish which kind of tree a B-tree belongs to, we give it a new attribute: Degree.

A B-tree with a degree of 3 means that a node has at most three child nodes, which is the definition of a 2-3 tree.

A B-tree with a degree of 4 means that a node has at most four children, which is the definition of a 2-3-4 tree.

13

B-tree, a node can store multiple elements, which is conducive to caching disk data. The overall time complexity tends to be O(log n), and the principle is relatively simple. Therefore, it is often used for database indexes, including early MySQL. B-tree as an index.

However, the B tree has a big flaw. For example, I want to find elements by range. Taking the 2-3-4 tree above as an example, I want to find all elements greater than B and less than K. How to achieve it?

It is difficult and almost no solution, so there is a solution to replace the B tree: B+ tree.

Of course, the B+ tree is not the focus of this section, the focus of this section is the red-black tree.

Nani, where is the red-black tree? I have written more than 3000 words, but I haven't seen the shadow of the red-black tree, I'm embarrassed~

Come, come, the interesting red and black tree is coming~~

red-black tree

The first picture, please understand carefully:

14

Do you understand? What is a red-black tree? A red-black tree is a 2-3-4 tree! ! !

OK, that's the end of this section.

postscript

In this section, we start from the binary tree, go through the binary search tree, the balanced tree, the AVL tree, the 2-3 tree, the 2-3-4 tree, the B tree, and finally get the essence of the red-black tree, the red-black tree. The essence of the tree is a 2-3-4 tree, just changed the skin.

So, why build another red-black tree? Is it not fragrant to use the 2-3-4 tree directly?

We will answer in the next section. At the same time, in the next section, we will start from the essence of red-black trees and thoroughly understand the whole process of red-black tree insertion, deletion, search, left-hand rotation and right-hand rotation. don't follow me ^^

> Follow Princess "Tongge Read Source Code" to unlock more source code, basic and architecture knowledge.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324116281&siteId=291194637