Talking about the tree of data structure, you will never be afraid of the interviewer asking!

本篇只涉及树的概念范围和应考思路，不涉及具体结构或算法的实现与思考，具体的结构或算法会在后面跟新。

tree

In computer science, a tree (English: tree) is an abstract data type or a data structure that implements this abstract data type, and is used to simulate a data collection with the nature of a tree structure. It is a hierarchical collection composed of n (n>0) finite nodes. It is called a "tree" because it looks like an upside-down tree, which means it has its roots facing up and its leaves facing down. It has the following

Features:

①每个节点有零个或多个子节点；
②没有父节点的节点称为根节点；
③每一个非根节点有且只有一个父节点；
④除了根节点外，每个子节点可以分为多个不相交的子树；

Talking about the tree of data structure, you will never be afraid of the interviewer asking!
Then you have to know a lot of terms about trees: degree, leaf node, root node, parent node, child node, depth, height.

Binary tree

Binary tree: A tree with at most two subtrees per node is called a binary tree. (The trees we usually see in the test questions in the book are binary trees, but it does not mean that all trees are binary trees.)

Under the concept of binary tree, the concept of full binary tree and complete binary tree is derived

Full binary tree: Except for the last level without any child nodes, all nodes on each level have two child nodes. It can also be understood that all nodes except leaf nodes have two child nodes. When the number of nodes reaches the maximum, all leaf nodes must be on the same level as a
complete binary tree: if the depth of the binary tree is set to h, the number of nodes in all other layers (1～(h-1)) is up to The maximum number, all nodes in the h-th layer are continuously concentrated on the leftmost side, which is a complete binary tree.

Algorithm implementation (laugh)

Binary tree:

 private static class TreeNode {
        int val;
        TreeNode left;
        TreeNode right;
TreeNode(int x) { val = x; }
}

The way to traverse the binary tree

First-order traversal: first root node -> traverse the left subtree -> traverse the right subtree

Mid-order traversal: traverse the left subtree -> root node -> traverse the right subtree

Post-order traversal: traverse the left subtree -> traverse the right subtree -> root node

Talking about the tree of data structure, you will never be afraid of the interviewer asking!
4

Depth First Search (DFS) and Breadth First Search (BFS)

Implementation: bfs = queue, enter the queue, access one path at a time out of the queue; dfs = stack, push the stack, access multiple paths at one time from the stack (from Zhihu)

Relationship: Any problems solved with DFS can be solved with BFS. DFS is easy to write (recursive), it consumes less time but is prone to stack explosion, while BFS can control the length of the queue.

2. Dynamic search tree

2.1 Binary search tree

Binary search tree is a derivative concept of binary tree:

Binary Search Tree (English: Binary Search Tree), also known as binary search tree, ordered binary tree or sorted binary tree, refers to an empty tree or a binary tree with the following properties:
1 If the left subtree of any node is not empty, then the values of all nodes on the left subtree are less than the value of its root node;
2. If the right subtree of any node is not empty, then the values of all nodes on the right subtree Both are greater than the value of its root node;
3. The left and right subtrees of any node are also binary search trees respectively;
4. There is no node with the same key value.

Compared with other data structures, the advantage of binary search tree is that the time complexity of search and insertion is lower than O (log ⁡ n). Binary search tree is a basic data structure used to build more abstract data structures, such as sets, multiple sets, and associative arrays.

2.2 Balanced Binary Tree (AVL Tree)

Balanced binary tree: a binary tree whose height difference between two subtrees of any node is not greater than 1;

Among them, the AVL tree is the first self-balanced binary search tree invented, and is the most primitive and typical balanced binary tree.

The balanced binary tree is an improvement based on the binary search tree. Because in some extreme cases (such as when the inserted sequence is ordered), the binary search tree will degenerate into an approximate chain or chain. At this time, the time complexity of its operation will degenerate to linear, that is, O (n). Therefore, we construct a balanced binary tree whose height difference between two subtrees does not exceed 1 through a self-balancing operation (ie, rotation).

For details, please refer to the paper "An algorithm for the organization of information" by GM Adelson-Velsky and EM Landis in 1962. (I will fix this pit later)

2.3 Red and black trees

The red-black tree is also a self-balancing binary search tree.
1. Each node is either red or black. (Red or black)
2. The root node is black. (Root black)
3. Each leaf node (leaf node refers to the NIL pointer or NULL node at the end of the tree) is black. (Ye Hei)
4. If a node is red, then its two sons are black. (Red and black)
5. For any node, each path to the NIL pointer at the end of the leaf node tree contains the same number of black nodes. (The same black under the path)

Talking about the tree of data structure, you will never be afraid of the interviewer asking!

The picture shows a typical red-black tree. To ensure that the red-black tree meets its basic properties is to adjust the self-balance of the data structure.

The red-black tree self-balance adjustment operation methods include rotation and color change.

Red-black tree is a widely used data structure, such as the bottom layer of TreeSet and TreeMap in Java collection classes, set and map in C++STL, and virtual memory management in Linux.

2.4 Huffman Tree

The Huffman tree is a binary tree with the shortest weighted path length, also known as the optimal binary tree.

Generally, it can be constructed as follows:

1. All left and right subtrees are empty as the root node.
2. In the forest, select two trees with the smallest root node weight as the left and right subtrees of a new tree, and set the weight of the additional root node of the new tree to its left and right subtree root nodes The sum of weights. Note that the weight of the left subtree should be less than the weight of the right subtree.
3. Remove these two trees from the forest and add the new tree to the forest at the same time.
4. Repeat steps 2 and 3 until there is only one tree in the forest. This tree is the Huffman tree.

You may have heard more about Huffman coding, which is actually the application of Huffman trees. That is, how to make the characters appearing in the text as short as possible and ensure that there is no ambiguity when decoding.
Talking about the tree of data structure, you will never be afraid of the interviewer asking!

3. Multi-way search tree

In large-scale data storage, under the actual background of implementing index query, the number of elements stored in the tree node is limited (if the number of elements is very large, the search will degenerate into a linear search inside the node), which leads to a binary search tree The structure of the tree is too deep to cause too frequent disk I/O reads and writes, which in turn leads to low query efficiency.

3.1 B tree

B-tree (English: B-tree) is a self-balancing tree that can keep data in order. This data structure enables the operations of searching data, sequential access, inserting data, and deleting all in logarithmic time. B-tree, in general, is a generalized binary search tree (binary search tree), which can have up to 2 child nodes. Unlike self-balancing binary search trees, B-trees are suitable for storage systems that read and write relatively large data blocks, such as disks.
1. The root node has at least two children.

2. Each intermediate node contains k-1 elements and k children, where m/2 <= k <= m

3. Each leaf node contains k-1 elements, where m/2 <= k <= m

4. All leaf nodes are located on the same layer.

5. The elements in each node are arranged from small to large, and the k-1 elements in the node are exactly the range divisions of the elements contained in the k children.

Talking about the tree of data structure, you will never be afraid of the interviewer asking!
As shown in the figure, it is a B-tree that meets the specification. As compared with the speed of disk IO, the time consumption in memory can be almost omitted, so as long as the tree height is low enough and the number of IOs is small enough, query performance can be improved.

The addition and deletion of the B-tree also follows the nature of self-balance, with rotation and transposition.

The application of B-tree is file system and some non-relational database indexes.

3.2 B+ tree

B+ tree is a tree data structure, usually used in relational databases (such as Mysql) and operating system file systems. The B+ tree is characterized by its ability to keep data stable and orderly, and its insertion and modification have a relatively stable logarithmic time complexity. B+ tree elements are inserted from bottom to top, which is the opposite of binary tree.

On the basis of the B tree, add a linked list pointer (B tree + leaf ordered linked list) for the leaf nodes. All keywords appear in the leaf nodes, and the non-leaf nodes are used as the index of the leaf nodes; the B+ tree always arrives The leaf node hits.

The non-leaf nodes of the b+ tree do not store data, only the critical value (maximum or minimum) of the subtree. Therefore, for nodes of the same size, the b+ tree can have more branches than the b tree, making the tree more squat. The number of IO operations done during query is also less.
Talking about the tree of data structure, you will never be afraid of the interviewer asking!
This usually occurs when most nodes are in secondary storage such as hard drives. By maximizing the number of child nodes within each internal node to reduce the height of the tree, balancing operations do not occur frequently, and efficiency is increased.

3.3 B* tree

B* tree is a variant of B+ tree. In the non-root and non-leaf nodes of B+ tree, pointers to brothers are added

On the basis of the B+ tree, a linked list pointer is also added for non-leaf nodes, which increases the minimum utilization of nodes from 1/2 to 2/3.

3.4 R tree

R-tree is a tree-like data structure used for spatial data storage. For example, create indexes for geographic locations, rectangles and polygons.

The core idea of R-tree is to aggregate nodes with similar distances and express them as the minimum bounding rectangle (MBR) of these nodes at the upper level of the tree structure. This minimum bounding rectangle becomes a node of the upper level. Because all nodes are in their smallest bounding rectangle, queries that do not intersect with a certain rectangle must not intersect all nodes in this rectangle. Each rectangle on the leaf node represents an object, and the nodes are the aggregation of objects, and the more objects are aggregated higher. Each layer can also be regarded as an approximation of the data set. The leaf node layer is the finest-grained approximation, with a similarity of 100% to the data set, and the higher the layer, the rougher it is.

Talking about the tree of data structure, you will never be afraid of the interviewer asking!