1. Multi-level classification problem
In the actual development process, the problem of multi-level classification will often be encountered. For example, multi-level classification problems such as navigation bar, menu, product category, multi-level linkage, dictionary table, etc. Then you can add a pid
field related data, which in essence is actually a tree. The tree can solve the sub-category query of multi-level classification very well.
But this method has a fatal problem: the query efficiency is too low! ! !
When we query a child node in the program, we must first perform a recursive query from the root node. The time complexity is O(n)
.
So is there a way to improve the query efficiency of the tree? The answer is yes! Many trees have been improved on standard trees, such as binary trees, red-black trees, heaps, and so on. But this is not the point, what I want to share today is 预排序遍历树算法(MPTT)
.
MPTT
It is precisely in order to solve the query efficiency problem of multi-level relational data, its time complexity can be as efficient as a constant, that is O(1)
. Isn’t it incredible? Let’s learn the pre-sort traversal tree algorithm together and see how it is implemented.
Second, pre-sort traversal tree
The full name of the pre-sort traversal tree algorithm is: Modified Preorder Tree Traversal
abbreviation MPTT
.
1. ORM mapping
class Tree(Base, BaseNestedSets):
__tablename__ = 'tree'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(8), nullable=True, default=None)
def __repr__(self):
return f'<Tree(id={self.id}, name={self.name})>'
2. MPTT analysis
In the above code, only two fields are defined: id
, name
. But there was an extra database out of the 5
fields, lft
namely: rgt
, level
, tree_id
, parent_id
, .
These extra fields are used to define the structure and hierarchy of the tree. Let's analyze what is the role of each field.
-
tree_id
: Treeid
, used to distinguish a certain tree among the many trees in the database. -
level
: A standard tree will have height, depth, and level. The level of the root node is1
, and the level of child nodes is the level of the parent node plus1
. -
parent_id
: Parentid
, the parentid
of the node, the root node has no parent, so the value isNULL
. -
lft
: The left value of the node. -
rgt
: The right value of the node.
Left and right values node is MPTT
the core of this algorithm is a particularly clever place, the complexity is reduced to tree traversal time O(1)
. Next, we will focus on analyzing how the left and right values traverse the tree.
A standard tree structure:
Correspondence of database data:
Data hierarchy:
- 【1】
- - 【2】
- - - 【3】
- - 【4】
- - - 【5】
- - - 【6】
- - 【7】
- - - 【8】
- - - - 【9】
- - - 【10】
- - - - 【11】
Traverse the whole tree
Traversing the entire tree just need to find tree_id
equal 1
conditions to
Find all descendants of a node
Find node 4
all descendants of nodes to 4
as a reference point. Value is greater than the left 6
and the right value is less than 11
all descendant nodes, the node is 4
all descendant nodes.
Find all child nodes under a node
Find node 1
of all child nodes to 1
as a reference point. tree_id
Equal 1
and level
equal 2
.
Find the path of a node
Find node 9
all the higher paths to 9
as a reference point. Value is less than the left 14
and the right is greater than the value 15
of all the nodes, the node is ``. 9 的路径。结果是:
. 1 -> 7 -> 8 -> 9 '.
3. MPTT balance algorithm
MPTT
When traversing quickly, but other operations will become very slow, so use MPTT
to try to avoid other operations outside of the query.
Then why are other operations very slow except for query operations?
This is because the insertion, update (movement), and deletion of nodes will disrupt the balance of the tree. So when doing these operations, you need to adjust the number to achieve a new balance.
Add
Taking the new node operation as an example, the algorithm can be broken down into the following steps:
-
If you want to add a new node to a tree that does not exist, you must create a new tree. Then it is not
parent_id
, itparent_id
isNULL
,level
it is1
,tree_id
is the biggest in accordance with the existing treetree_id
plus1
. -
If you want to add a new node to the existing tree. So it
parent_id
is a parent nodeid
,level
a parent nodelevel
plus1
,tree_id
and consistent parent. -
Repair the left value of other nodes whose balance is broken. Is greater than
parent_id
the left and right values of all nodes added to the value2
. -
Repair the right value of other nodes whose balance is broken. Than or equal to
parent_id
the right of all the nodes of the right value plus value2
.
delete
It is similar to adding, except that after deleting a node, the left value and the right value are reversed, that is, subtracting 2
.
Update (mobile)
Updating (moving) is actually deleting an old node and adding a new node. Refer to the above example for the specific algorithm.
3. Comparison of the pros and cons of standard trees and pre-sorted traversal trees
-
标准树
:Suitable for scenes with many additions and deletions, and only one piece of data needs to be modified each time. In terms of query, as the classification level increases, the efficiency of the recursive query of the adjacency table gradually decreases. -
预排序遍历树
: Applicable to scenarios with many query operations. The query efficiency is not affected by the increase of the classification level. However, with the increase of data, every time data is added or deleted, multiple affected data must be operated at the same time, and the execution efficiency gradually decreases.
There is no perfect algorithm. Which storage structure and algorithm to choose in the actual development process needs to be selected according to the specific application scenario.