A sub-category query optimization scheme for multi-level classification

background

In the project, we often encounter sub-category query problems of multi-level classification, such as organizational structure, multi-level menu, commodity type, etc. The common data structure is as follows:

picture.png

Problems with this design:

  • The sub-category query for multi-level classification needs to be implemented through recursion, the time complexity is 0(n), and the efficiency is relatively low.

  • Too deep recursion can easily cause stack overflow.

    So for multi-level classification sub-category query, is there a better solution? We can solve the sub-category query problem of multi-level classification by presorting tree traversal algorithm (MPTT).

Introduction

MPTT (Modified Preorder Tree Taversal) pre-sort traversal tree algorithm, mainly used in the storage and traversal of hierarchical relationships.

Fundamental

The principle of pre-sorting traversal tree algorithm: the relationship of the tree level is represented by the value of the left and right nodes of the number.

As shown below:

picture.png

The tree structure is converted into the structure data of the database as follows:

picture.png

Description: MPTT data structure contains information such as left, right node, level, parent node, etc.

  • treeid: The id of the tree is used to identify a tree in the database.
  • level: indicates the level of the tree, the level of the root node is 1, and the level of the child node is the level of the parent node plus 1.
  • parentId: parent node id, since the root node has no parent node, the value is -1.
  • lftNode: Left node value.
  • rgtNode: Right node value.

Basic application

traverse the whole tree

To traverse the entire tree, you only need to query the treeId condition.

Find all descendant nodes under a node

To find all descendant nodes under Fruit, you only need to find the value of the left node is greater than 2 and the value of the right node is less than 11.

SELECT * FROM t_tree  t WHERE t.`lftNode`>2 AND t.`rgtNode`<11;
复制代码

search result

picture.png

A query with an efficiency of 0(1) is much more efficient than a recursive query.

Find all child nodes under a node

To find all sub-points under Fruit, you only need to query the level equal to 2

SELECT * FROM t_tree  t  WHERE t.`level`='2';
复制代码

search result

picture.png

Find the path of a node

To find the path of the Banana node, you only need to query that the left node is less than 8 and the right node is greater than 9.

SELECT * FROM t_tree  t WHERE t.`lftNode`<8 AND t.`rgtNode`>9;
复制代码

search result

picture.png

Through the query result, the node path of Banana is: Food->Fruit->Yellow

How many descendants does a node have?

Find the number of all descendant nodes under Fruit, the specific formula is as follows:

 子孙总数=(右值-左值-1)/2 
复制代码

Through formula calculation, we know that the total number of descendant nodes of Fruit is (18-1-1)/2=8

Add child node

Although the efficiency of precompiled sorting is very high on query nodes, it is very inefficient in adding and deleting nodes, because adding and deleting nodes requires recalculation and movement of nodes, all of which will be slower.

implement logic

  • Add a new node to a tree that does not exist, and create a new tree. Then the parentid is -1, the level is 1, and the treeId is based on the maximum treeId of the existing tree plus 1.

  • Add a new node to an existing tree. parentId is the id of the parent node, level is the level of the parent node plus 1, and the treeId is the same as the parent node.

  • Repair the left value of other nodes whose balance is broken, and add 2 to the left value of all nodes greater than the right value of parentId.

  • Repair the rvalues ​​of other nodes whose balance is broken, and add 2 to the rvalues ​​of all nodes greater than the rvalue of parentId.

Implementation

DELIMITER $$

DROP PROCEDURE IF EXISTS `SP_TREE_ADD`$$

CREATE PROCEDURE `SP_TREE_ADD`(
IN p_oid INT (11),
IN p_name VARCHAR (30),
IN p_parentOid INT (11)
)
BEGIN
       #获取父节点的相关数据信息
        SELECT @myRight :=rgtNode,@oid :=oid,@level :=LEVEL,@treeid :=treeid FROM t_tree t WHERE oid = p_parentOid;
        #大于parentId右值的所有节点的右值加2。
	UPDATE t_tree SET rgtNode = rgtNode + 2 WHERE rgtNode > @myRight;
        #大于parentId右值的所有节点的左值加2
        UPDATE t_tree SET lftNode = lftNode + 2 WHERE lftNode > @myRight;
        
        UPDATE t_tree SET rgtNode=rgtNode+2 WHERE oid=@oid;
        #插入数据
	INSERT INTO t_tree(oid,NAME,parentOid,LEVEL,lftNode,rgtNode,treeid) 
	VALUES(p_oid, p_name, p_parentOid,@level+1,@myRight, @myRight +1,@treeid);
END$$

DELIMITER ;
复制代码

test

Add an apple subnode under the Red node and execute the above stored procedure

CALL SP_TREE_ADD('11','apple','3');
复制代码

After successful insertion, the result is as follows

picture.png

To re-verify the number of nodes under Red, you only need to query if the left node is greater than 3 and the right node is less than 8.

SELECT * FROM t_tree  t WHERE t.`lftNode`>3 AND t.`rgtNode`<8;
复制代码

picture.png

delete child node

implement logic

  • Delete nodes whose left value is greater than parent node and less than right value
  • Repair the left value of other nodes whose balance is broken, and subtract the difference value from the left value of all nodes that are greater than the right value of parentId.
  • Repair the rvalues ​​of other nodes whose balance is broken, and subtract the difference from the rvalues ​​of all nodes that are greater than the rvalue of parentId.

Implementation

DELIMITER $$
DROP PROCEDURE IF EXISTS `SP_TREE_DEETE`$$

CREATE PROCEDURE `SP_TREE_DEETE`(
       IN p_Oid INT (11)
    )
BEGIN      
        SELECT 
            @myleft := lftNode,
            @myright :=rgtNode,
            @oid :=oid,
            @mywidth := rgtNode-lftNode+1
        FROM t_tree WHERE oid = p_Oid;
        
        DELETE FROM t_tree WHERE lftNode BETWEEN @myleft AND @myright;
        UPDATE t_tree SET rgtNode = rgtNode - @mywidth WHERE rgtNode > @myright; 
        UPDATE t_tree SET lftNode = lftNode - @mywidth WHERE lftNode > @myright;
    END$$

DELIMITER ;
复制代码

test

Delete the apple subnode under the Red node and execute the above stored procedure.

CALL SP_TREE_DEETE(11);
复制代码

The result after deletion is as follows

picture.png

To re-verify the number of nodes under Red, you only need to query if the left node is greater than 3 and the right node is less than 6.

SELECT * FROM t_tree  t WHERE t.`lftNode`>3 AND t.`rgtNode`<6;
复制代码

picture.png

Advantages and disadvantages

advantage

Pre-sort traversal tree query efficiency is high, the efficiency of the query is not affected by the increase of the classification level, and it is suitable for scenarios with many query operations.

shortcoming

Adding and deleting requires recalculating the left and right values ​​of the node, so the execution efficiency is low

Summarize

This article explains the pre-sorting algorithm. No solution is perfect. We need to master its advantages and disadvantages. In actual projects, we need to choose the appropriate solution according to specific business scenarios.

I am participating in the recruitment of the creator signing program of the Nuggets Technology Community, click the link to register and submit .

Guess you like

Origin juejin.im/post/7119407970491826190