Data structure and algorithm: Pre-sort traversal tree algorithm

1. Multi-level classification problem

In the actual development process, the problem of multi-level classification will often be encountered. For example, multi-level classification problems such as navigation bar, menu, product category, multi-level linkage, dictionary table, etc. Then you can add a pidfield related data, which in essence is actually a tree. The tree can solve the sub-category query of multi-level classification very well.

Insert picture description here

But this method has a fatal problem: the query efficiency is too low! ! !

When we query a child node in the program, we must first perform a recursive query from the root node. The time complexity is O(n).

So is there a way to improve the query efficiency of the tree? The answer is yes! Many trees have been improved on standard trees, such as binary trees, red-black trees, heaps, and so on. But this is not the point, what I want to share today is 预排序遍历树算法(MPTT).

MPTTIt is precisely in order to solve the query efficiency problem of multi-level relational data, its time complexity can be as efficient as a constant, that is O(1). Isn’t it incredible? Let’s learn the pre-sort traversal tree algorithm together and see how it is implemented.

Second, pre-sort traversal tree

The full name of the pre-sort traversal tree algorithm is: Modified Preorder Tree Traversalabbreviation MPTT.

1. ORM mapping

class Tree(Base, BaseNestedSets):

    __tablename__ = 'tree'

    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(8), nullable=True, default=None)

    def __repr__(self):
        return f'<Tree(id={self.id}, name={self.name})>'

2. MPTT analysis

In the above code, only two fields are defined: id, name. But there was an extra database out of the 5fields, lftnamely: rgt, level, tree_id, parent_id, .

These extra fields are used to define the structure and hierarchy of the tree. Let's analyze what is the role of each field.

  • tree_id: Tree id, used to distinguish a certain tree among the many trees in the database.

  • level: A standard tree will have height, depth, and level. The level of the root node is 1, and the level of child nodes is the level of the parent node plus 1.

  • parent_id: Parent id, the parent idof the node, the root node has no parent, so the value is NULL.

  • lft: The left value of the node.

  • rgt: The right value of the node.

Left and right values node is MPTTthe core of this algorithm is a particularly clever place, the complexity is reduced to tree traversal time O(1). Next, we will focus on analyzing how the left and right values ​​traverse the tree.

A standard tree structure:

Insert picture description here

Correspondence of database data:

Insert picture description here

Data hierarchy:

- 【1】
- - 【2】
- - - 【3】
- - 【4】
- - - 【5】
- - - 【6】
- - 【7】
- - - 【8】
- - - - 【9】
- - - 【10】
- - - - 【11】

Traverse the whole tree

Traversing the entire tree just need to find tree_idequal 1conditions to

Find all descendants of a node

Find node 4all descendants of nodes to 4as a reference point. Value is greater than the left 6and the right value is less than 11all descendant nodes, the node is 4all descendant nodes.

Find all child nodes under a node

Find node 1of all child nodes to 1as a reference point. tree_idEqual 1and levelequal 2.

Find the path of a node

Find node 9all the higher paths to 9as a reference point. Value is less than the left 14and the right is greater than the value 15of all the nodes, the node is ``. 9 的路径。结果是:. 1 -> 7 -> 8 -> 9 '.

3. MPTT balance algorithm

MPTTWhen traversing quickly, but other operations will become very slow, so use MPTTto try to avoid other operations outside of the query.

Insert picture description here

Then why are other operations very slow except for query operations?

This is because the insertion, update (movement), and deletion of nodes will disrupt the balance of the tree. So when doing these operations, you need to adjust the number to achieve a new balance.

Add

Taking the new node operation as an example, the algorithm can be broken down into the following steps:

  • If you want to add a new node to a tree that does not exist, you must create a new tree. Then it is not parent_id, it parent_idis NULL, levelit is 1, tree_idis the biggest in accordance with the existing tree tree_idplus 1.

  • If you want to add a new node to the existing tree. So it parent_idis a parent node id, levela parent node levelplus 1, tree_idand consistent parent.

  • Repair the left value of other nodes whose balance is broken. Is greater than parent_idthe left and right values of all nodes added to the value 2.

  • Repair the right value of other nodes whose balance is broken. Than or equal to parent_idthe right of all the nodes of the right value plus value 2.

delete

It is similar to adding, except that after deleting a node, the left value and the right value are reversed, that is, subtracting 2.

Update (mobile)

Updating (moving) is actually deleting an old node and adding a new node. Refer to the above example for the specific algorithm.

3. Comparison of the pros and cons of standard trees and pre-sorted traversal trees

  • 标准树:Suitable for scenes with many additions and deletions, and only one piece of data needs to be modified each time. In terms of query, as the classification level increases, the efficiency of the recursive query of the adjacency table gradually decreases.

  • 预排序遍历树: Applicable to scenarios with many query operations. The query efficiency is not affected by the increase of the classification level. However, with the increase of data, every time data is added or deleted, multiple affected data must be operated at the same time, and the execution efficiency gradually decreases.

There is no perfect algorithm. Which storage structure and algorithm to choose in the actual development process needs to be selected according to the specific application scenario.

Guess you like

Origin blog.csdn.net/yilovexing/article/details/107066591