Hello "Mr. Tree"

When your talent can't support your ambition, you should calm down and learn.

Preface

First consider the question: Why are there trees? Because the photosynthesis of the tree can suck, bah , because the hierarchical management (addition, deletion , modification , and check) of data (in many scenarios) has higher efficiency!
Insert picture description here
It is worth noting that, whether it is a sequence list, a linked list, a queue, or various complex tree data structures, each data structure is proposed because it solves the problem that other data structures cannot solve or the effect is not good enough.

text

1. Binary tree

Insert picture description here
The introduction of the basic concepts of binary trees will not be repeated here. Below we will focus on two concepts, namely:

  • Binary search tree
  • Balanced binary tree

2. Binary search tree

In the binary search tree, the left side of the current root node is all smaller than the root node, and the right side of the current root node is all larger than the root node. This arrangement is undoubtedly helpful for data search. Starting from the root node, it is recursively compared with the root node of the left/right subtree by comparing the key value of the node, and finally it is equal to determine the finding.
Insert picture description here
But the binary tree has a problem in the dynamic search process. Since the search efficiency of the tree is directly related to the height of the tree, if multiple additions and deletions of nodes make the binary search tree "thin and tall", its effect will degenerate into a linked list structure And, in the case of different node insertion order, a tree structure of different depths will be formed (as shown above). After analyzing ASL (Average Search Length), we can know that the most "balanced" binary tree in the middle has the highest efficiency. So in summary, we can lead to my balanced binary tree to solve this problem.

The absolute value of the height difference between the left and right subtrees of any node of the balanced binary tree (AVL tree) does not exceed 1. Height h = O (lg (N)) O(lg(N))O ( l g ( N ) )

Many friends don't know what AVL stands for, it is actually made up of the initials of the scientists who invented this data structure.

How to solve the imbalance problem encountered by the binary search tree? Since you are a bit crooked, naturally the first thing that comes to mind is to adjust. How to adjust it? The tree structure can be adjusted by turning right or left.

RR rotation:
Insert picture description here
LL rotation:
Insert picture description here
LR rotation:
Insert picture description here
RL rotation: The
Insert picture description here
judging method is based on who broke the balance of whose balance (instantly the height difference (balance factor) of a node is greater than 1), just do the corresponding rotation.

Here is an introduction about heap. A tree can be used to implement a heap (priority queue).
Use a binary search tree? NOOOO! If the maximum value is always deleted, the binary search tree degenerates into a linked list. But is it all gone? No, this shortcoming can be avoided by making some adjustments, that is, storing the maximum/small value at the root node of a complete binary tree, so that even if it is deleted, it will not affect the balance factor. However, how to adjust the remaining part after this operation into a normal tree structure?
Well, the above is some small thoughts about the heap. In fact, the heap can be composed of a complete binary tree represented by an array, and its arbitrary node is the maximum/minimum value of all nodes in its subtree.

Huffman Tree

The problem to be solved by the Huffman tree is how to construct a more effective search tree according to the different search frequencies of nodes ?

For example, in the following example, a search tree is constructed according to the distribution of student performance:
Insert picture description here
first give a concept: weighted path length. Full-time here can be understood as query frequency.
Insert picture description here
The method of constructing the Huffman tree is also very simple, which is to merge the two binary trees with the smallest weight each time. For example:
Insert picture description here
first 1, 2 gets 3, then 3, 3 gets 6, then 4, 5 gets 9, and then 6, 9 gets 15. The obtained Huffman tree is as follows:
Insert picture description here

3. B tree

Now that there is an AVL tree, the search of the AVL tree is stable again, and the time complexity of search, insertion, and deletion is O (log N) O (logN)O ( l o g N ) , isn't it good? Why do we need a B-tree?

  1. This is because the B-tree needs to maintain its own balance, so it needs to rotate the nodes frequently when inserting and deleting nodes.
  2. At the same time, each node of the AVL tree can only store one element, and each node has only two child nodes. When searching, it needs multiple disk IO, (data is stored in the disk, each query is to add a page of data in the disk to the memory, each layer of the tree is stored in a page, different layers of data Stored in different pages.) In this way, if multiple queries are required, multiple disk IOs are required.

In order to solve the above-mentioned problems of the AVL tree, the B tree appeared.

B-tree is also a balanced search tree, but a multi-way balanced search tree. Its structure rules are as follows:

(1) Sorting method: All node keywords are arranged in increasing order, and follow the principle of small left and large right;

(2) Number of child nodes: the number of child nodes of non-leaf nodes>1, and <=M, and M>=2, except for empty trees (Note: M order represents the maximum number of search paths for a tree node, M=M Road, when M=2 is a two-ary tree, M=3 is a three-ary tree);

(3) Number of keywords: The number of keywords of branch nodes is greater than or equal to ceil(m/2)-1 and less than or equal to M-1;

(4) All leaf nodes are in the same layer. In addition to the pointers that contain keywords and keyword records, leaf nodes also have pointers to their child nodes, but their pointer addresses are all null, which corresponds to the space in the last layer of the node in the figure below. ;

Insert picture description here
Regarding B-tree query, insertion, deletion and other operations, there is a very vivid article here. After reading the operation, you can understand it thoroughly [ Portal ]

Are you finished? Back, let me summarize:

  1. When querying, the pointer to the next node is obtained by comparing with the node, and then dividing the value range according to the node, and then comparing until it is found, or to null.
  2. When inserting, when the number of keywords in a node reaches the split condition, the intermediate value is taken as the parent node, and the new node is added again.
  3. When deleting a node, the key value needs to be merged if the number of keys is less than a certain value.

Note:
The difference between the B tree and the balanced binary tree is that each node contains more keywords, especially when the B tree is applied to the database, the database makes full use of the principle of disk blocks (disk data storage uses blocks) In the form of storage, the size of each block is 4K. When data is read by IO, the data of the same disk block can be read at one time.) The node size is limited and fully used in the disk fast size range; After the node keywords increase, the tree level is less than the original binary tree, which reduces the number and complexity of data search;


Disadvantages:
Buuuut, B-tree also has its problems, that is, B-tree search is unstable, the best situation It is found at the root node. In the worst case, it is found at the leaf node. In addition, the B-tree is more troublesome in traversal. Due to the need for in-order traversal, a certain amount of disk IO will also be performed.
In order to solve these problems, B+ trees appeared .

4. B+ tree

Insert picture description here
Rules (quoted from here ):

(1) B+ is different from the B tree. The non-leaf nodes of the B+ tree do not store pointers to key records, but only perform data indexing, which greatly increases the keywords that each non-leaf node of the B+ tree can store;

(2) The leaf node of the B+ tree saves the pointers of all the key records of the parent node, and all data addresses must be obtained in the leaf node. So the number of data queries is the same each time;

(3) The keywords of the B+ tree leaf nodes are arranged in an orderly manner from small to large, and the data at the end of the left will store the pointer of the start data of the right node.

Features:

1. The B+ tree has fewer levels: Compared with the B tree B+ stores more keywords per non-leaf node, the tree has fewer levels, so the query data is faster;

2. B+ tree query speed is more stable: B+ all keyword data addresses are stored on the leaf nodes, so the number of searches each time is the same, so the query speed is more stable than B-tree;

3. The B+ tree naturally has the sorting function: all the leaf node data of the B+ tree constitutes an ordered linked list, which is more convenient when querying data in the size range, the data is very tight, and the cache hit rate will be higher than that of the B tree.

4. B+ tree full node traversal is faster: B+ tree traverses the entire tree only needs to traverse all the leaf nodes, instead of traversing each layer like the B tree, which is conducive to the full table scan of the database .

The advantage of the B-tree over the B+ tree is that if the frequently accessed data is very close to the root node, and the non-leaf node of the B-tree itself stores the address of the key and its data, so this kind of data retrieval will be more than the B+ tree fast.

5. Red-Black Tree

Red-black tree is also called RB tree, RB-Tree. It is a self-balancing binary search tree, and its nodes are red and black. It does not strictly control the difference between the height of the left and right subtrees or the number of nodes to be less than or equal to 1. It is also a data structure that solves the extreme cases of binary search trees.

The red-black tree stipulates:

1. The node is red or black.

2. The root node is black.

3. Each leaf node is a black empty node (NIL node).

4 The two child nodes of each red node are black. That is to say, there cannot be two consecutive red nodes on all paths from each leaf to the root).

5. All paths from any node to each of its leaves contain the same number of black nodes.

Insert picture description here
Here is a detailed demonstration of the specific operations of the red-black tree, which is more complicated, but in summary:

The red-black tree is almost the same as the AVL tree operation in terms of search. But in the insertion and deletion operations, the AVL tree will perform a large number of balance calculations for each insertion and deletion. The red-black tree sacrifices the superior conditions of strict height balance at the expense of it. It only requires that the balance requirements are partially met, combined with discoloration. Reduced the requirements for rotation, thereby improving performance. Red-black trees can search, insert, and delete operations with O(log2 n) time complexity. In addition, due to its design, any imbalance will be resolved within three rotations.

Compared with BST, because the red-black tree can ensure that the longest path of the tree is not greater than twice the length of the shortest path, it can be seen that its search effect is guaranteed to be the lowest. In the worst case, O(logN) can be guaranteed, which is better than binary search tree. Because the worst-case binary search tree can make the search reach O(N).

to sum up

1. From the overall perspective of balanced binary trees, B trees, B+ trees, and B* trees, their implementation ideas are the same. They all use dichotomy and data balancing strategies to improve the speed of finding data;

  1. The difference is that they evolve step by step through the principle of reading data from disk through IO during the evolution process. Each evolution is to make the space of the node more reasonable, so that the level of the tree is reduced to achieve rapid The purpose of finding the data;

Guess you like

Origin blog.csdn.net/weixin_41896265/article/details/108427213