Tree_Comparison and use of database table design for hierarchical data

I have been working on a project management project recently, which has hierarchical relationships and multiple modules. I feel that the usual tree-like database design is not very good, so I learned about it. This article summarizes it very well.

Tree-structured or hierarchical data is very common in enterprise applications, such as the company's organizational structure, the directory structure of the document library, the location organization of the warehouse, and the classification of objects, etc.

A common tree diagram is a data structure. It is called a "tree" because it looks like an upside-down tree, that is, it has the roots facing up and the leaves facing down.

It has the following characteristics: each node has zero or more child nodes; a node without a parent node is called a root node ; each non-root node has one and only one parent node; except for the root node In addition, each child node can be divided into multiple disjoint subtrees.

The tree structure is a non-linear storage structure that stores a collection of data elements with a "one-to-many" relationship.

This article introduces and compares several common designs for tree/hierarchical data in computer database table models:

  • adjacency list model
  • Path enumeration model
  • Closure table model
  • nested set model

adjacency list model

When designing a tree-structured database table, most developers will subconsciously choose the adjacency list model. For example:

CREATE TABLE category(
        category_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        parent INT DEFAULT NULL
);

INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
        (4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7),
        (9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);

SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name                 | parent |
+-------------+----------------------+--------+
|           1 | ELECTRONICS          |   NULL |
|           2 | TELEVISIONS          |      1 |
|           3 | TUBE                 |      2 |
|           4 | LCD                  |      2 |
|           5 | PLASMA               |      2 |
|           6 | PORTABLE ELECTRONICS |      1 |
|           7 | MP3 PLAYERS          |      6 |
|           8 | FLASH                |      7 |
|           9 | CD PLAYERS           |      6 |
|          10 | 2 WAY RADIOS         |      6 |
+-------------+----------------------+--------+

In the category table above, we have defined three fields: category ID ( category_id), category name ( name), and parent category ID ( parent).

The truth here is that the simple adjacency list model above is not a normalized structure. The short definition of normalization is that all data redundancy has been removed and no data anomalies occur. In a normalized data model, data should be "one simple fact, in one place, one time", that is, one thing is recorded only once in one place, and one thing is recorded only once in one place. Record something.

The first characteristic of the normalized table is that it only records one thing, while the previous approach is to record both the name of the classification and the hierarchical relationship of the classification in one table, which is a mixed object. The correct approach is to use two tables, one to record the various attributes of the classification, and the other to record the affiliation between the classifications.

The second characteristic of normalized tables is that each fact appears "in one place" (i.e., it belongs to one row of a table), but the subtrees of each node of the adjacency list can be located in multiple rows. The third characteristic of normalized tables is that each fact appears "once" in the schema (i.e., one wishes to avoid data redundancy). If both conditions are violated, we may encounter exceptions.

Here are some non-normalized behaviors produced by the adjacency list model.

For example, when creating a new category or changing a category, if you accidentally write the wrong parent, it is easy to construct a circular dependency relationship:

INSERT INTO category VALUES(11,'TV123',3);
UPDATE category SET parent=11 WHERE name='TUBE';

Here TV123 and TUBE are each other's parent categories.

or:

INSERT INTO category VALUES(12,'TV456',12);

Here TV456 has classified its own parent.

In addition, the simple adjacency list model does not support the inheritance of affiliation relationships. Deleting a row will split the tree into several smaller trees, for example:

DELETE FROM category WHERE name='PORTABLE ELECTRONICS';

Finally, we need to preserve the tree structure in the table. We need to ensure that there is only one NULL in the structure, and the simple adjacency list model cannot prevent multiple NULLs or cyclic dependencies. The problem is that the adjacency list model can actually be any graph, and trees are a special case of graphs, so we need to put constraints on the adjacency list model to ensure that there is only one tree.

Get the entire adjacency list tree

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';

+-------------+----------------------+--------------+-------+
| lev1        | lev2                 | lev3         | lev4  |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
| ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
| ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)

The above SQL achieves information acquisition of the entire tree through self-join. The disadvantage of this method is obvious. As many layers as there are, there are as many self-joins as there are. This is the only way in the old version of MySQL because it does not support recursion.

In Oracle, recursive queries can be implemented through the syntax. In MySQL 8, recursion can be implemented connect byusing the SQL-99 standard CTE (common table expression) syntax :RECURSIVE

WITH RECURSIVE T1(category_id,name,parent) AS (
SELECT * FROM category T0 WHERE
    T0.parent IS NULL    -- ANCHOR MEMBER
UNION ALL
SELECT T2.category_id,T2.name,T2.parent FROM category T2, T1  -- RECURSIVE MEMBER
  WHERE T2.parent = T1.category_id
)
SELECT * FROM T1;
+-------------+----------------------+--------+
| category_id | name                 | parent |
+-------------+----------------------+--------+
|           1 | ELECTRONICS          |   NULL |
|           2 | TELEVISIONS          |      1 |
|           6 | PORTABLE ELECTRONICS |      1 |
|           3 | TUBE                 |      2 |
|           4 | LCD                  |      2 |
|           5 | PLASMA               |      2 |
|           7 | MP3 PLAYERS          |      6 |
|           9 | CD PLAYERS           |      6 |
|          10 | 2 WAY RADIOS         |      6 |
|           8 | FLASH                |      7 |
+-------------+----------------------+--------+

The recursive execution process is as follows:

  1. Looking for the first category of parent IS NULL, we can get ELECTRONICS;
  2. Then search for the second type of electrical appliances with parent = ELECTRONICS. It can be seen that we can get TELEVISIONS and PORTABLE ELECTRONICS;
  3. Then search for parent = TELEVISIONS and parent = PORTABLE ELECTRONICS, and we can get the third type of electrical appliances, which are PLASMA, MP3 PLAYERS, CD PLAYERS, 2 WAY RADIOS, TUBE, and LCD;
  4. Then continue to search for products belonging to the third category of electrical appliances, and finally get FLASH;
  5. Finished.

As we all know, the efficiency of recursion is relatively low. The problem with recursive queries is that as the amount of data and levels increase, the number of nested levels of recursion will also increase, so we will have to optimize these queries later.

Get adjacency list subtree

WITH RECURSIVE T1 AS (
SELECT * FROM category T0 WHERE
    T0.name = 'TELEVISIONS'    -- ANCHOR MEMBER
UNION ALL
SELECT T2.category_id,T2.name,T2.parent FROM category T2, T1  -- RECURSIVE MEMBER
  WHERE T2.parent = T1.category_id
)
SELECT * FROM T1;
+-------------+-------------+--------+
| category_id | name        | parent |
+-------------+-------------+--------+
|           2 | TELEVISIONS |      1 |
|           3 | TUBE        |      2 |
|           4 | LCD         |      2 |
|           5 | PLASMA      |      2 |
+-------------+-------------+--------+

Recursive query using CTE under MySQL 8. The method of using self-joining will not be described in detail here, it is similar to the whole tree.

Get adjacency list leaf nodes

SELECT t1.name FROM category AS t1 
LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
+--------------+
| name         |
+--------------+
| TUBE         |
| LCD          |
| PLASMA       |
| FLASH        |
| CD PLAYERS   |
| 2 WAY RADIOS |
+--------------+

The above obtains all leaf nodes by judging that there are no child nodes.

Get the complete single path of the adjacency list

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t4.name = 'FLASH';

+-------------+----------------------+-------------+-------+
| lev1        | lev2                 | lev3        | lev4  |
+-------------+----------------------+-------------+-------+
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
+-------------+----------------------+-------------+-------+

The above obtains a complete path through self-connection.

The main limitation of this approach is that each level in the hierarchy requires a self-join, and each level added degrades performance as the join complexity increases.

In MySQL 8 and later, CTE is also used for recursive search:

WITH RECURSIVE T1(category_id,name,parent) AS (
SELECT * FROM category T0 WHERE
    T0.name = 'FLASH'   -- ANCHOR MEMBER
UNION ALL
SELECT T2.category_id,T2.name,T2.parent FROM category T2, T1  -- RECURSIVE MEMBER
  WHERE T2.category_id = T1.parent
)
SELECT * FROM T1;
+-------------+----------------------+--------+
| category_id | name                 | parent |
+-------------+----------------------+--------+
|           8 | FLASH                |      7 |
|           7 | MP3 PLAYERS          |      6 |
|           6 | PORTABLE ELECTRONICS |      1 |
|           1 | ELECTRONICS          |   NULL |
+-------------+----------------------+--------+

Add node

It is more convenient to add nodes to the adjacency table. Just insert a record directly. You only need to pay attention to the correct setting of the parent node.

Delete node

Deleting leaf nodes, that is, nodes without child nodes, is very simple, just delete them directly. But if we want to delete the intermediate node, in order to prevent the occurrence of isolated subtrees, we need to determine how to deal with its original child nodes after deleting the intermediate node:

  • One way is to directly find the original parent node of the deleted intermediate node and make it the new parent node of its child node, which is the so-called grandfather adopting the grandson;
  • One is to promote a certain child node (the so-called eldest son) as the new parent node, and redirect the parent nodes of other child nodes to this new parent node, which is the so-called inheritance of father's inheritance;
  • Another way is to delete all subtrees of related intermediate nodes. This is to destroy the whole family...

Delete subtree

To delete a subtree of the adjacency list, you usually need to know the parent node, and then recursively find all the child nodes and delete them one by one. This can also be done automatically via an ON DELETE CASCADE foreign key constraint.

WITH RECURSIVE T1 AS (
SELECT * FROM category T0 WHERE
    T0.name = 'TELEVISIONS'    -- ANCHOR MEMBER
UNION ALL
SELECT T2.category_id,T2.name,T2.parent FROM category T2, T1  -- RECURSIVE MEMBER
  WHERE T2.parent = T1.category_id
)
DELETE FROM category WHERE category_id IN (SELECT category_id FROM T1);

Summarize

Using the adjacency list model in pure SQL is relatively intuitive, but implies some difficulties. We need to add restrictions to the adjacency list to prevent the problems described earlier. Some of these restrictions can be solved by using client code or stored procedures. Since recursion is needed to query child nodes, the query efficiency will be relatively low for trees with a large amount of data.

Path enumeration model

The Path Enumeration model saves the enumeration of all nodes passing from the root node to this node through a string field record.

Create a table and put the person information and route information together for convenience.

CREATE TABLE Personnel_OrgChart(
    emp_name CHAR(10) NOT NULL,
    emp_id CHAR(1) NOT NULL PRIMARY KEY,
    path_string VARCHAR(500) NOT NULL
);


INSERT INTO Personnel_OrgChart 
VALUES('Albert','A','A'),('Bert','B','AB'),
    ('Chuck','C','AC'),('Donna','D','ACD'),
    ('Eddie','E','ACE'),('Fred','F','ACF');

SELECT * FROM Personnel_OrgChart ORDER BY emp_id;
+----------+--------+-------------+
| emp_name | emp_id | path_string |
+----------+--------+-------------+
| Albert   | A      | A           |
| Bert     | B      | AB          |
| Chuck    | C      | AC          |
| Donna    | D      | ACD         |
| Eddie    | E      | ACE         |
| Fred     | F      | ACF         |
+----------+--------+-------------+

The path enumeration model is characterized by uniting all ancestor information into a string and saving it as an attribute of each node. Strings are constructed according to preference or need. The above path_string can also be written as 'A/C/E' or 'A_C_E'. emp_id can also be a number, such as '1/3/5'.

The problem of the path enumeration model is similar to that of the adjacency list. In the absence of restrictions, loop paths such as 'ACEA' may occur. Deleting an intermediate node may cause an isolated subtree, and inserting an intermediate node will cause the paths of multiple nodes to need to be modified. .

Get path enumeration table subtree

Generally, developers will directly use the following statement to obtain a certain subtree:

SELECT * FROM Personnel_OrgChart WHERE path_string LIKE '%C%';
+----------+--------+-------------+
| emp_name | emp_id | path_string |
+----------+--------+-------------+
| Chuck    | C      | AC          |
| Donna    | D      | ACD         |
| Eddie    | E      | ACE         |
| Fred     | F      | ACF         |
+----------+--------+-------------+

The problem here is that using wildcard %search will scan the entire table, which is very slow for tables with a lot of data.

Get the parent node of the path enumeration table

SELECT P2.*
FROM Personnel_OrgChart AS P1,
Personnel_OrgChart AS P2
WHERE P1.emp_id = 'F'
AND POSITION(P2.path_string IN P1.path_string)= 1;
+----------+--------+-------------+
| emp_name | emp_id | path_string |
+----------+--------+-------------+
| Albert   | A      | A           |
| Chuck    | C      | AC          |
| Fred     | F      | ACF         |
+----------+--------+-------------+

Add node

Adding leaf nodes to the path enumeration table is relatively simple, just insert a piece of data directly:

INSERT INTO Personnel_OrgChart VALUES('Gary','G','ABG');

But if you want to insert it before a certain node, the paths of the inserted node and its child nodes need to be modified. For example, insert Gary as the parent node of Chuck:

INSERT INTO Personnel_OrgChart VALUES('Gary','G','AG');
UPDATE Personnel_OrgChart SET path_string = REPLACE(path_string, 'AC', 'AGC') WHERE path_string LIKE 'AC%';

SELECT * FROM Personnel_OrgChart;
+----------+--------+-------------+
| emp_name | emp_id | path_string |
+----------+--------+-------------+
| Albert   | A      | A           |
| Bert     | B      | AB          |
| Chuck    | C      | AGC         |
| Donna    | D      | AGCD        |
| Eddie    | E      | AGCE        |
| Fred     | F      | AGCF        |
| Gary     | G      | AG          |
+----------+--------+-------------+

Delete node

Similar to adding nodes, deleting leaf nodes is relatively simple. Just delete the record directly:

DELETE FROM Personnel_OrgChart WHERE path_string = 'AGCD';

But if we want to delete an intermediate node, such as Chuck, then we need to determine how to deal with the original child nodes of Chuck, just like the adjacency list.

  • One way is to directly find the original parent node of the deleted intermediate node and make it the new parent node of its child node, which is the so-called grandfather adopting the grandson;
  • One is to promote a certain child node (the so-called eldest son) as the new parent node, and redirect the parent nodes of other child nodes to this new parent node, which is the so-called inheritance of father's inheritance;
  • Another way is to delete all subtrees of related intermediate nodes. This is to destroy the whole family...

The following SQL is processed in the first way:

DELETE FROM Personnel_OrgChart WHERE emp_id = 'C';
UPDATE Personnel_OrgChart SET path_string = REPLACE(path_string, 'C', '') WHERE path_string LIKE '%C%';

SELECT * FROM Personnel_OrgChart;
+----------+--------+-------------+
| emp_name | emp_id | path_string |
+----------+--------+-------------+
| Albert   | A      | A           |
| Bert     | B      | AB          |
| Donna    | D      | AGD         |
| Eddie    | E      | AGE         |
| Fred     | F      | AGF         |
| Gary     | G      | AG          |
+----------+--------+-------------+

Delete subtree

Knowing a node, deleting its subtree is similar to obtaining the subtree, just use the % wildcard:

DELETE FROM Personnel_OrgChart WHERE path_string LIKE '%G%';

Convert existing adjacency list to path enumeration list

There is a reporting relationship table in our actual OA database, which is an adjacency list model. Its general fields are as follows (a temporary table is used for my convenience APPROVAL_GROUP_TEMP):

CREATE TABLE `APPROVAL_GROUP_TEMP` (
  `ID` decimal(8,0) NOT NULL,
  `FATHERID` decimal(8,0) DEFAULT NULL, -- 上级ID
  `APPROVALGROUPNAME` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -- 名称
  `SHOWFLAG` decimal(1,0) DEFAULT '1', -- 状态(1::启用,0:禁用,2:删除)
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

We build a path enumeration table:

CREATE TABLE `AG_PathEnum` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增ID',
  `node` varchar(100) NOT NULL,  -- 名称
  `nodeid` INT(10) COMMENT '节点ID',
  `path_string` VARCHAR(500) NOT NULL COMMENT '相隔层级,>=1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

Then use the following CTE SQL to convert the adjacency list relationship into the path enumeration table (only conversion enabled SHOWFLAG=1):

INSERT INTO AG_PathEnum(node,nodeid,path_string)
WITH RECURSIVE T1(node,nodeid,path_string) AS
(
    SELECT
        T0.APPROVALGROUPNAME AS node,
        T0.ID AS nodeid,
        CAST(T0.ID AS char(500)) AS path_string
    FROM APPROVAL_GROUP_TEMP AS T0 WHERE T0.SHOWFLAG = 1

    UNION ALL

    SELECT
        C.APPROVALGROUPNAME AS node,
        C.ID          AS nodeid,
        CONCAT(T1.path_string,"/",C.ID) AS path_string
    FROM APPROVAL_GROUP_TEMP C, T1
        WHERE C.FATHERID = T1.nodeid AND C.SHOWFLAG = 1
)
SELECT * FROM T1 WHERE T1.path_string LIKE '16060%' GROUP BY T1.nodeid,T1.node,T1.path_string ORDER BY T1.nodeid

After running:

SELECT * FROM AG_PathEnum;
+-----+---------------------------------+--------+-------------------------------------------------+
| id  | node                            | nodeid | path_string                                     |
+-----+---------------------------------+--------+-------------------------------------------------+
|   1 | 公司总部                        |  16060 | 16060                                           |
|   2 | 研发中心                        |  16062 | 16060/16062                                     |
|   3 | 发行中心                        |  16064 | 16060/16064                                     |
|   4 | 管理中心                        |  16066 | 16060/16066                                     |
|   5 | 人力资源部                      |  16700 | 16060/16066/16883/16700                         |
|   6 | 法务部                          |  16701 | 16060/16066/16883/16701                         |
|   7 | 财务部                          |  16702 | 16060/16066/16883/16702                         |
|   8 | 总裁办                          |  16705 | 16060/16066/16705                               |
|   9 | 发行技术部                      |  16711 | 16060/16064/16711                               |
|  10 | 创新中心                        |  16721 | 16060/16721                                     |
|  11 | 原创IP部                        |  16789 | 16060/16721/16789                               |
|  12 | BU财务管理部                    |  16871 | 16060/16066/16883/16702/16871                   |
|  13 | 直属员工                        |  16880 | 16060/16880                                     |
|  14 | 直属员工                        |  16881 | 16060/16062/16881                               |
|  15 | 某某某直属员工                  |  16883 | 16060/16066/16883                               |
|  16 | 直属员工                        |  16885 | 16060/16066/16883/16701/16885                   |
|  17 | 直属员工                        |  16886 | 16060/16066/16883/16702/16886                   |
|  18 | 直属员工                        |  16889 | 16060/16066/16705/16889                         |
|  19 | 直属员工                        |  16895 | 16060/16064/16711/16895                         |
|  20 | 直属员工                        |  16904 | 16060/16721/16904                               |
|  21 | 证券部                          |  17100 | 16060/16066/16883/17100                         |
|  22 | 直属员工                        |  17101 | 16060/16066/16883/17100/17101                   |
|  23 | 商务部                          |  17180 | 16060/16064/17180                               |
|  24 | 直属员工                        |  17181 | 16060/16064/17180/17181                         |
|  25 | 公共关系与政府事务部            |  17400 | 16060/16066/16883/17400                         |
|  26 | 采购部                          |  17540 | 16060/16066/16883/17540                         |
|  27 | 直属员工                        |  17541 | 16060/16066/16883/17540/17541                   |
|  28 | 行政部                          |  17728 | 16060/16066/16883/16700/17728                   |
|  29 | 人力信息部                      |  17750 | 16060/16066/16883/16700/17750                   |
|  30 | 直属员工                        |  17751 | 16060/16066/16883/16700/17750/17751             |
|  31 | 薪酬福利部                      |  17752 | 16060/16066/16883/16700/17752                   |
|  32 | 直属员工                        |  17753 | 16060/16066/16883/16700/17752/17753             |
|  33 | 培训发展部                      |  17756 | 16060/16066/16883/16700/17756                   |
|  34 | 直属员工                        |  17757 | 16060/16066/16883/16700/17756/17757             |
|  35 | 企业文化部                      |  18566 | 16060/16066/16883/16700/18566                   |
|  36 | 直属员工                        |  18567 | 16060/16066/16883/16700/18566/18567             |
|  37 | 渠道部                          |  18660 | 16060/16064/36071/20640/18660                   |
|  38 | 公益部                          |  18780 | 16060/16066/16883/18780                         |
|  39 | 直属员工                        |  18781 | 16060/16066/16883/18780/18781                   |
|  40 | 蓝图                            |  18840 | 16060/18840                                     |
|  41 | 直属员工                        |  18841 | 16060/18840/18841                               |
|  42 | 事业支援中心                    |  18842 | 16060/18840/38840/18842                         |
|  43 | 直属员工                        |  18843 | 16060/18840/38840/18842/18843                   |
|  44 | 某某神奇项目组                  |  18854 | 16060/16062/21322/18854                         |
|  45 | 直属员工                        |  18855 | 16060/16062/21322/18854/18855                   |
|  46 | 程序组                          |  18902 | 16060/16062/21322/18854/38854/18902             |
|  47 | 直属员工                        |  18903 | 16060/16062/21322/18854/38854/18902/38906/18903 |
|  48 | 策划组                          |  18904 | 16060/16062/21322/18854/38856/18904             |
|  49 | 直属员工                        |  18905 | 16060/16062/21322/18854/38856/18904/18905       |
|  50 | 美术组                          |  18906 | 16060/16062/21322/18854/38856/58855/18906       |
|  51 | 直属员工                        |  18907 | 16060/16062/21322/18854/38856/58855/18906/18907 |
-- ...以下省略

Summarize

The path enumeration is designed in such a way that it can be easily sorted according to the hierarchy of nodes, because the distance between the nodes that separate the two sides in the path is always 1, so the depth of the hierarchy can be known by comparing the length of the path string. But it also has the following disadvantages:

1. It cannot ensure that the format of the path is always correct or that the nodes in the path actually exist (if the intermediate node is deleted, there is no foreign key constraint);

2. It relies on high-level program code to maintain the strings in the path, and it is very expensive to verify the correctness of the strings;

3. The length of the path VARCHAR is difficult to determine. No matter how large the length of VARCHAR is set, there are situations where it cannot be expanded infinitely.

Closure table model

Closure Table is a model that exchanges space for time. It uses a special relationship table (in fact, this is also the normalization method we recommend) to record the hierarchical relationships and distances between nodes on the tree.

CREATE TABLE `NodeInfo` (
    `node_id` INT NOT NULL AUTO_INCREMENT,
    `node_name` VARCHAR (255),
    PRIMARY KEY (`node_id`)
) DEFAULT CHARSET = utf8mb4;

CREATE TABLE `NodeRelation` (
    `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增ID',
    `ancestor` INT(10) UNSIGNED NOT NULL DEFAULT '0' COMMENT '祖先节点',
    `descendant` INT(10) UNSIGNED NOT NULL DEFAULT '0' COMMENT '后代节点',
    `distance` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0' COMMENT '相隔层级,>=1',
    PRIMARY KEY (`id`),
    UNIQUE KEY `uniq_anc_desc` (`ancestor`,`descendant`),
    KEY `idx_desc` (`descendant`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4 COMMENT = '节点关系表'

In order to prevent errors when inserting data, we need a stored procedure:

CREATE DEFINER = `root`@`localhost` PROCEDURE `AddNode`(`_parent_name` varchar(255),`_node_name` varchar(255))
BEGIN
    DECLARE _ancestor INT(10) UNSIGNED;
    DECLARE _descendant INT(10) UNSIGNED;
    DECLARE _parent INT(10) UNSIGNED;
    IF NOT EXISTS(SELECT node_id From NodeInfo WHERE node_name = _node_name)
    THEN
        INSERT INTO NodeInfo (node_name) VALUES(_node_name);
        SET _descendant = (SELECT node_id FROM NodeInfo WHERE node_name = _node_name);
        INSERT INTO NodeRelation (ancestor,descendant,distance) VALUES(_descendant,_descendant,0);

        IF EXISTS (SELECT node_id FROM NodeInfo WHERE node_name = _parent_name)
        THEN
            SET _parent = (SELECT node_id FROM NodeInfo WHERE node_name = _parent_name);
            INSERT INTO NodeRelation (ancestor,descendant,distance) SELECT ancestor,_descendant,distance+1 FROM NodeRelation WHERE descendant = _parent;
        END IF;
    END IF;
END;

Then we insert some data. Here is an example of posting a reply in the forum:

CALL OrgAndUser.AddNode(NULL,'这是主贴');
CALL OrgAndUser.AddNode('这是主贴','回复主贴1');
CALL OrgAndUser.AddNode('回复主贴1','回复:回复主贴1');
CALL OrgAndUser.AddNode('这是主贴','回复:这是主贴,啥意思');
CALL OrgAndUser.AddNode('这是主贴','回复:挺有意思');
CALL OrgAndUser.AddNode('回复:挺有意思','Reply:回复:挺有意思');
CALL OrgAndUser.AddNode('回复:这是主贴,啥意思','第3层?');
CALL OrgAndUser.AddNode('第3层?','不对,我才是第3层');

SELECT * FROM NodeInfo;
+---------+-----------------------------------+
| node_id | node_name                         |
+---------+-----------------------------------+
|       1 | 这是主贴                           |
|       2 | 回复主贴1                          |
|       3 | 回复:回复主贴1                     |
|       4 | 回复:这是主贴,啥意思                |
|       5 | 回复:挺有意思                      |
|       6 | Reply:回复:挺有意思                |
|       7 | 第3层?                            |
|       8 | 不对,我才是第3层                    |
+---------+-----------------------------------+

The previous stored procedure will insert the relationship and distance between each post and itself and its superior post in the relationship table:

SELECT * FROM NodeRelation;
+----+----------+------------+----------+
| id | ancestor | descendant | distance |
+----+----------+------------+----------+
|  1 |        1 |          1 |        0 |
|  2 |        2 |          2 |        0 |
|  3 |        1 |          2 |        1 |
|  4 |        3 |          3 |        0 |
|  5 |        2 |          3 |        1 |
|  6 |        1 |          3 |        2 |
|  8 |        4 |          4 |        0 |
|  9 |        1 |          4 |        1 |
| 10 |        5 |          5 |        0 |
| 11 |        1 |          5 |        1 |
| 12 |        6 |          6 |        0 |
| 13 |        5 |          6 |        1 |
| 14 |        1 |          6 |        2 |
| 16 |        7 |          7 |        0 |
| 17 |        4 |          7 |        1 |
| 18 |        1 |          7 |        2 |
| 20 |        8 |          8 |        0 |
| 21 |        7 |          8 |        1 |
| 22 |        4 |          8 |        2 |
| 23 |        1 |          8 |        3 |
+----+----------+------------+----------+

Get closure table full tree or subtree

SELECT n3.node_name FROM NodeInfo n1
INNER JOIN NodeRelation n2 ON n1.node_id = n2.ancestor
INNER JOIN NodeInfo n3 ON n2.descendant = n3.node_id
WHERE n1.node_id = 1 AND n2.distance != 0;
+-----------------------------------+
| node_name                         |
+-----------------------------------+
| 回复主贴1                          |
| 回复:回复主贴1                     |
| 回复:这是主贴,啥意思                |
| 回复:挺有意思                      |
| Reply:回复:挺有意思                |
| 第3层?                            |
| 不对,我才是第3层                    |
+-----------------------------------+

SELECT n3.node_name FROM NodeInfo n1
INNER JOIN NodeRelation n2 ON n1.node_id = n2.ancestor
INNER JOIN NodeInfo n3 ON n2.descendant = n3.node_id
WHERE n1.node_name = '回复:这是主贴,啥意思' AND n2.distance != 0;
+---------------------------+
| node_name                 |
+---------------------------+
| 第3层?                    |
| 不对,我才是第3层            |
+---------------------------+

Through the parent-child relationship of the associated table, remove self-referential records and use inner joins to obtain all child nodes.

Get closure table leaf node

SELECT n1.node_id, n1.node_name FROM NodeInfo n1
INNER JOIN NodeRelation n2 ON n1.node_id = n2.ancestor
GROUP BY n1.node_id, n1.node_name
HAVING COUNT(n2.ancestor) = 1;
+---------+-----------------------------+
| node_id | node_name                   |
+---------+-----------------------------+
|       3 | 回复:回复主贴1                |
|       6 | Reply:回复:挺有意思           |
|       8 | 不对,我才是第3层              |
+---------+-----------------------------+

The characteristic of a leaf node is that it has no child nodes, so its ID will only appear once in the ancestor field of the associated table, which is the self-referential time.

Get the closure table parent node

SELECT n1.* FROM NodeInfo AS n1
    INNER JOIN NodeRelation n2 on n1.node_id = n2.ancestor
    WHERE n2.descendant = 8;
+---------+-----------------------------------+
| node_id | node_name                         |
+---------+-----------------------------------+
|       8 | 不对,我才是第3层                    |
|       7 | 第3层?                            |
|       4 | 回复:这是主贴,啥意思                |
|       1 | 这是主贴                           |
+---------+-----------------------------------+

Check backwards from the relationship table, because the relationships between each node and all its superiors are recorded in the relationship table.

Add node

Refer to the previous stored procedure AddNode(_parent_name, _node_name).

Delete node

Deleting a leaf node is relatively simple. In addition to deleting a record in the NodeInfo table, delete all records whose descendant value is the leaf node node_id in the relationship table NodeRelation.

DELETE FROM NodeInfo WHERE node_id = 8;
DELETE FROM NodeRelation WHERE descendant = 8;

But if you want to delete an intermediate node, as with the previously discussed model, you need to determine how to deal with its children or subtrees.

  • One way is to directly find the original parent node of the deleted intermediate node and make it the new parent node of its child node, which is the so-called grandfather adopting the grandson;
  • One is to promote a certain child node (the so-called eldest son) as the new parent node, and redirect the parent nodes of other child nodes to this new parent node, which is the so-called inheritance of father's inheritance;
  • Another way is to delete all subtrees of related intermediate nodes. This is to destroy the whole family...

Delete subtree

DELETE FROM NodeInfo WHERE node_id = 4;
DELETE FROM NodeRelation AS n1 WHERE n1.descendant IN (SELECT a.descendant FROM (SELECT n2.descendant FROM NodeRelation AS n2 WHERE n2.ancestor = 4) AS a);

SELECT * FROM NodeRelation;
+----+----------+------------+----------+
| id | ancestor | descendant | distance |
+----+----------+------------+----------+
|  1 |        1 |          1 |        0 |
|  2 |        2 |          2 |        0 |
|  3 |        1 |          2 |        1 |
|  4 |        3 |          3 |        0 |
|  5 |        2 |          3 |        1 |
|  6 |        1 |          3 |        2 |
| 10 |        5 |          5 |        0 |
| 11 |        1 |          5 |        1 |
| 12 |        6 |          6 |        0 |
| 13 |        5 |          6 |        1 |
| 14 |        1 |          6 |        2 |
+----+----------+------------+----------+

Note that the second delete statement cannot be used directly DELETE FROM NodeRelation AS n1 WHERE n1.descendant IN (SELECT n2.descendant FROM NodeRelation AS n2 WHERE n2.ancestor = 4);, MySQL will report an error: ERROR 1093 (HY000): You can't specify target table 'n1' for update in FROM clause. This is because in MySQL, you cannot directly delete or modify records through nested subqueries. You need to specify the nested subquery as a temporary table through an alias.

Convert existing adjacency list to closure list

There is a reporting relationship table in our actual OA database, which is an adjacency list model. Its general fields are as follows (a temporary table is used for my convenience APPROVAL_GROUP_TEMP):

CREATE TABLE `APPROVAL_GROUP_TEMP` (
  `ID` decimal(8,0) NOT NULL,
  `FATHERID` decimal(8,0) DEFAULT NULL, -- 上级ID
  `APPROVALGROUPNAME` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -- 名称
  `SHOWFLAG` decimal(1,0) DEFAULT '1', -- 状态(1::启用,0:禁用,2:删除)
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

We build a closure table:

CREATE TABLE `AG_Closure` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增ID',
  `node` varchar(100) NOT NULL,  -- 名称
  `ancestor` INT(10) UNSIGNED NOT NULL DEFAULT '0' COMMENT '祖先节点',
  `descendant` INT(10) UNSIGNED NOT NULL DEFAULT '0' COMMENT '后代节点',
  `distance` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0' COMMENT '相隔层级,>=1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_anc_desc` (`ancestor`,`descendant`),
  KEY `idx_desc` (`descendant`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

Then use the following CTE SQL to convert the adjacency list relationship into the closure table (only conversion enabled SHOWFLAG=1):

INSERT INTO AG_Closure(node,ancestor,descendant,distance)
WITH RECURSIVE T1(node,ancestor,descendant,distance) AS
(
    SELECT
        APPROVALGROUPNAME AS node,
        ID AS ancestor,
        ID AS descendant,
        0  AS distance
    FROM APPROVAL_GROUP_TEMP WHERE SHOWFLAG=1

    UNION ALL

    SELECT
        C.APPROVALGROUPNAME AS node,
        T1.ancestor  AS ancestor,
        C.ID          AS descendant,
        T1.distance + 1 AS distance
    FROM APPROVAL_GROUP_TEMP C, T1
        WHERE C.FATHERID = T1.descendant AND C.SHOWFLAG = 1
)
SELECT * FROM T1 ORDER BY T1.descendant

After running:

SELECT * FROM AG_Closure;
+------+---------------------------------+----------+------------+----------+
| id   | node                            | ancestor | descendant | distance |
+------+---------------------------------+----------+------------+----------+
|    1 | 公司总部                        |    16060 |      16060 |        0 |
|    2 | 研发中心                        |    16062 |      16062 |        0 |
|    3 | 研发中心                        |    16060 |      16062 |        1 |
|    4 | 发行中心                        |    16064 |      16064 |        0 |
|    5 | 发行中心                        |    16060 |      16064 |        1 |
|    6 | 管理中心                        |    16066 |      16066 |        0 |
|    7 | 管理中心                        |    16060 |      16066 |        1 |
|    8 | 人力资源部                      |    16700 |      16700 |        0 |
|    9 | 人力资源部                      |    16883 |      16700 |        1 |
|   10 | 人力资源部                      |    16066 |      16700 |        2 |
|   11 | 人力资源部                      |    16060 |      16700 |        3 |
|   12 | 法务部                          |    16701 |      16701 |        0 |
|   13 | 法务部                          |    16883 |      16701 |        1 |
|   14 | 法务部                          |    16066 |      16701 |        2 |
|   15 | 法务部                          |    16060 |      16701 |        3 |
|   16 | 财务部                          |    16702 |      16702 |        0 |
|   17 | 财务部                          |    16883 |      16702 |        1 |
|   18 | 财务部                          |    16066 |      16702 |        2 |
|   19 | 财务部                          |    16060 |      16702 |        3 |
|   20 | 总裁办                          |    16705 |      16705 |        0 |
|   21 | 总裁办                          |    16066 |      16705 |        1 |
|   22 | 总裁办                          |    16060 |      16705 |        2 |
|   23 | 发行技术部                      |    16711 |      16711 |        0 |
|   24 | 发行技术部                      |    16064 |      16711 |        1 |
|   25 | 发行技术部                      |    16060 |      16711 |        2 |
|   26 | 创新中心                        |    16721 |      16721 |        0 |
|   27 | 创新中心                        |    16060 |      16721 |        1 |
|   28 | 原创IP部                        |    16789 |      16789 |        0 |
|   29 | 原创IP部                        |    16721 |      16789 |        1 |
|   30 | 原创IP部                        |    16060 |      16789 |        2 |
|   31 | BU财务管理部                    |    16871 |      16871 |        0 |
|   32 | BU财务管理部                    |    16702 |      16871 |        1 |
|   33 | BU财务管理部                    |    16883 |      16871 |        2 |
|   34 | BU财务管理部                    |    16066 |      16871 |        3 |
|   35 | BU财务管理部                    |    16060 |      16871 |        4 |
|   36 | 直属员工                        |    16880 |      16880 |        0 |
|   37 | 直属员工                        |    16060 |      16880 |        1 |
|   38 | 直属员工                        |    16881 |      16881 |        0 |
|   39 | 直属员工                        |    16062 |      16881 |        1 |
|   40 | 直属员工                        |    16060 |      16881 |        2 |
|   41 | 某某某直属员工                  |    16883 |      16883 |        0 |
|   42 | 某某某直属员工                  |    16066 |      16883 |        1 |
|   43 | 某某某直属员工                  |    16060 |      16883 |        2 |
-- ...以下省略

Summarize

The closure table model uses a relational table to store the entire path of the tree structure, allowing fast querying without using recursion. But storing the entire path creates huge storage requirements, increasing the storage size exponentially. Adding nodes is more complex and requires recalculating the descendants or distances before and after the affected nodes.

reference:

Tree closure table

Database - closure table

nested set model

The algorithm of the Nested Set model is also called the Modified Preorder Tree Taversal algorithm MPTT (Modified Preorder Tree Taversal).

In the nested set table, we will have two fields, lftand rgt, which are used to record the distance from the left and right sides of a node to the left side of the root node when traversing the entire set or the entire tree.

CREATE TABLE nested_category (
        category_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        lft INT NOT NULL,
        rgt INT NOT NULL
);

INSERT INTO nested_category VALUES
    (1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),
    (3,'TUBE',3,4),(4,'LCD',5,6),(5,'PLASMA',7,8),
    (6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),
    (8,'FLASH',12,13),(9,'CD PLAYERS',15,16),
    (10,'2 WAY RADIOS',17,18);

SELECT * FROM nested_category ORDER BY category_id;
+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  20 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  19 |
|           7 | MP3 PLAYERS          |  11 |  14 |
|           8 | FLASH                |  12 |  13 |
|           9 | CD PLAYERS           |  15 |  16 |
|          10 | 2 WAY RADIOS         |  17 |  18 |
+-------------+----------------------+-----+-----+

If we look at the graph of a set, the left side of the root node is initially 1, then all subsets are drawn from left to right, and the count increases by 1 when it encounters the edge of the set. The number of the left and right sides of each subset (i.e. node) is It came out clearly.

Change to a tree structure display, then walk down the branches from the left side of the root node. When you encounter a child node, add 1 to the left side of the child node. If the child node is a leaf node, go to the right side of the node and walk up and right. , 1 is also added to the right side of the child node. In this way, the entire tree is traversed, and the value of the left and right sides of each node is also revealed.

Get nested set full tree or subtree

SELECT node.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name                 |
+----------------------+
| ELECTRONICS          |
| TELEVISIONS          |
| TUBE                 |
| LCD                  |
| PLASMA               |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS          |
| FLASH                |
| CD PLAYERS           |
| 2 WAY RADIOS         |
+----------------------+

-- 查询子树
SELECT node.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.name = 'PORTABLE ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name                 |
+----------------------+
| PORTABLE ELECTRONICS |
| MP3 PLAYERS          |
| FLASH                |
| CD PLAYERS           |
| 2 WAY RADIOS         |
+----------------------+

You only need to query the records whose left and right values ​​are between the left and right values ​​of the parent node.

Get the parent node of the nested set

SELECT parent.name FROM
    nested_category AS node,
    nested_category AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
    AND node.name = 'LCD'
ORDER BY parent.lft DESC
LIMIT 1
+-------------+
| name        |
+-------------+
| TELEVISIONS |
+-------------+

The left and right values ​​of the child node must be between the left and right values ​​of the parent node. Here we only get the direct superior node. If we remove it LIMIT 1, all the superior nodes of this node can be obtained.

Get nested set leaf nodes

SELECT name
FROM nested_category
WHERE rgt = lft + 1;
+--------------+
| name         |
+--------------+
| TUBE         |
| LCD          |
| PLASMA       |
| FLASH        |
| CD PLAYERS   |
| 2 WAY RADIOS |
+--------------+

The right value of a leaf node is only one more step (here 1) than the left value.

Get the complete single path of a nested set

SELECT parent.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.name = 'FLASH'
ORDER BY parent.lft;
+----------------------+
| name                 |
+----------------------+
| ELECTRONICS          |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS          |
| FLASH                |
+----------------------+

You can see that compared to adjacency lists, it does not require recursion.

Get nested set node depth

SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY depth;
+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| ELECTRONICS          |     0 |
| PORTABLE ELECTRONICS |     1 |
| TELEVISIONS          |     1 |
| 2 WAY RADIOS         |     2 |
| CD PLAYERS           |     2 |
| MP3 PLAYERS          |     2 |
| PLASMA               |     2 |
| LCD                  |     2 |
| TUBE                 |     2 |
| FLASH                |     3 |
+----------------------+-------+

By counting the number of parent nodes after merging by name, we can obtain the depth value of the node. We can also concretely display node relationships in this way:

SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name                   | lft |
+------------------------+-----+
| ELECTRONICS            |   1 |
| --TELEVISIONS          |   2 |
| ----TUBE               |   3 |
| ----LCD                |   5 |
| ----PLASMA             |   7 |
| --PORTABLE ELECTRONICS |  10 |
| ----MP3 PLAYERS        |  11 |
| ------FLASH            |  12 |
| ----CD PLAYERS         |  15 |
| ----2 WAY RADIOS       |  17 |
+------------------------+-----+

Get subtree depth

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
                FROM nested_category AS node2,
                nested_category AS parent2
                WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
                AND node2.name = 'PORTABLE ELECTRONICS'
                GROUP BY node2.name
                ORDER BY depth2
        ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
ORDER BY depth;
+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS |     0 |
| 2 WAY RADIOS         |     1 |
| CD PLAYERS           |     1 |
| MP3 PLAYERS          |     1 |
| FLASH                |     2 |
+----------------------+-------+

Two self-joins are used here to implement the function of obtaining the subtree of any child node, which is also applicable to the root node.

If we add to the above SQL HAVING depth < 2, we can get the result of all child nodes of a node but excluding deeper grandchild nodes:

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
                FROM nested_category AS node2,
                nested_category AS parent2
                WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
                AND node2.name = 'PORTABLE ELECTRONICS'
                GROUP BY node2.name
                ORDER BY depth2
        ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
HAVING depth < 2
ORDER BY depth;
+----------------------+-------+
| name                 | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS |     0 |
| 2 WAY RADIOS         |     1 |
| CD PLAYERS           |     1 |
| MP3 PLAYERS          |     1 |
+----------------------+-------+

This feature is useful for expanding only the first level without expanding subsequent levels.

The above SQL can also be written using CTE syntax in MySQL 8:

WITH sub_tree AS (SELECT node2.name, (COUNT(parent2.name) - 1) AS depth2
        FROM nested_category AS node2,
        nested_category AS parent2
        WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt
        AND node2.name = 'PORTABLE ELECTRONICS'
        GROUP BY node2.name
        ORDER BY depth2)
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth2 + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name, sub_tree.depth2
HAVING depth < 2
ORDER BY depth;

Add node

Node addition to a nested set is much more complicated than an adjacency list because the left and right values ​​of the affected nodes need to be recalculated. Here we have a stored procedure that adds a new node and recalculates the affected node left and right values ​​based on the parent node ID and the new node properties.

CREATE DEFINER = `root`@`localhost` PROCEDURE `AddNestedSetNode`(`parent_id` INT,`node_name` VARCHAR(20))
BEGIN
    DECLARE _rgt INT;
    DECLARE step INT;
    SET step = 1;
    SET autocommit=0;

    IF EXISTS(SELECT category_id From nested_category WHERE category_id = parent_id)
    THEN
        START TRANSACTION;
        SET _rgt = (SELECT rgt FROM nested_category WHERE category_id = parent_id);
        UPDATE nested_category SET rgt = rgt + 2 * step WHERE rgt >= _rgt;
        UPDATE nested_category SET lft = lft + 2 * step WHERE lft >= _rgt;

        INSERT INTO nested_category(name, lft, rgt) values(node_name, _rgt, _rgt + step);
        COMMIT;
    END IF;
END;

Let's try adding a child node under the root node:

CALL OrgAndUser.AddNestedSetNode(1,'GAME CONSOLE');

SELECT * FROM nested_category;
+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  22 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  19 |
|           7 | MP3 PLAYERS          |  11 |  14 |
|           8 | FLASH                |  12 |  13 |
|           9 | CD PLAYERS           |  15 |  16 |
|          10 | 2 WAY RADIOS         |  17 |  18 |
|          11 | GAME CONSOLE         |  20 |  21 |
+-------------+----------------------+-----+-----+

Add another FLASH child node:

CALL OrgAndUser.AddNestedSetNode(8,'ABC FLASH');

SELECT * FROM nested_category;
+-------------+----------------------+-----+-----+
| category_id | name                 | lft | rgt |
+-------------+----------------------+-----+-----+
|           1 | ELECTRONICS          |   1 |  24 |
|           2 | TELEVISIONS          |   2 |   9 |
|           3 | TUBE                 |   3 |   4 |
|           4 | LCD                  |   5 |   6 |
|           5 | PLASMA               |   7 |   8 |
|           6 | PORTABLE ELECTRONICS |  10 |  21 |
|           7 | MP3 PLAYERS          |  11 |  16 |
|           8 | FLASH                |  12 |  15 |
|           9 | CD PLAYERS           |  17 |  18 |
|          10 | 2 WAY RADIOS         |  19 |  20 |
|          11 | GAME CONSOLE         |  22 |  23 |
|          12 | ABC FLASH            |  13 |  14 |
+-------------+----------------------+-----+-----+
SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name                   | lft |
+------------------------+-----+
| ELECTRONICS            |   1 |
| --TELEVISIONS          |   2 |
| ----TUBE               |   3 |
| ----LCD                |   5 |
| ----PLASMA             |   7 |
| --PORTABLE ELECTRONICS |  10 |
| ----MP3 PLAYERS        |  11 |
| ------FLASH            |  12 |
| --------ABC FLASH      |  13 |
| ----CD PLAYERS         |  17 |
| ----2 WAY RADIOS       |  19 |
| --GAME CONSOLE         |  22 |
+------------------------+-----+

Delete node

Deleting nested set nodes is slightly different from other previous models. In our example, assuming that the PORTABLE ELECTRONICS node is deleted and other nodes are not changed, let's see what the results will be:

Only in terms of lftand rgtvalues, the child nodes of the original PORTABLE ELECTRONICS node will naturally become the child nodes of ELECTRONICS without the appearance of an isolated subtree:

The following is a stored procedure for deleting nested set leaf nodes. Here, the left and right values ​​of the affected nodes are recalculated (in fact, we know that there should be no impact if not recalculated).

CREATE DEFINER = `root`@`localhost` PROCEDURE `DeleteNestedSetLeaf`(`node_id` INT)
BEGIN
    DECLARE _lft INT;
    DECLARE _rgt INT;
    DECLARE step INT;
    DECLARE width INT;
    SET step = 1;
    SET autocommit=0;

    IF EXISTS(SELECT category_id From nested_category WHERE category_id = node_id AND rgt = lft + step)
    THEN
        START TRANSACTION;
        SELECT rgt,lft,(rgt-lft+step) INTO @_rgt,@_lft,@width FROM nested_category WHERE category_id = node_id;

        DELETE FROM nested_category WHERE lft BETWEEN @_lft AND @_rgt;

        UPDATE nested_category SET rgt = rgt - @width WHERE rgt > @_rgt;
        UPDATE nested_category SET lft = lft - @width WHERE lft > @_rgt;
        COMMIT;
    END IF;
END;

We delete the GAME CONSOLE (ID 11) added earlier:

CALL OrgAndUser.DeleteNestedSetLeaf(11);

SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name                   | lft |
+------------------------+-----+
| ELECTRONICS            |   1 |
| --TELEVISIONS          |   2 |
| ----TUBE               |   3 |
| ----LCD                |   5 |
| ----PLASMA             |   7 |
| --PORTABLE ELECTRONICS |  10 |
| ----MP3 PLAYERS        |  11 |
| ------FLASH            |  12 |
| --------ABC FLASH      |  13 |
| ----CD PLAYERS         |  17 |
| ----2 WAY RADIOS       |  19 |
+------------------------+-----+

Delete subtree

CREATE DEFINER = `root`@`localhost` PROCEDURE `DeleteNestedSetSubtree`(`node_id` INT)
BEGIN
    DECLARE _lft INT;
    DECLARE _rgt INT;
    DECLARE step INT;
    DECLARE width INT;
    SET step = 1;
    SET autocommit=0;

    IF EXISTS(SELECT category_id From nested_category WHERE category_id = node_id)
    THEN
        START TRANSACTION;
        SELECT rgt,lft,(rgt-lft+step) INTO @_rgt,@_lft,@width FROM nested_category WHERE category_id = node_id;

        DELETE FROM nested_category WHERE lft BETWEEN @_lft AND @_rgt;

        UPDATE nested_category SET rgt = rgt - @width WHERE rgt > @_rgt;
        UPDATE nested_category SET lft = lft - @width WHERE lft > @_rgt;
        COMMIT;
    END IF;
END;

We delete FLASH (ID 8) and its subtree:

CALL OrgAndUser.DeleteNestedSetSubtree(8);

SELECT CONCAT( REPEAT('--', COUNT(parent.name) - 1), node.name) AS name, node.lft
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name, node.lft
ORDER BY node.lft;
+------------------------+-----+
| name                   | lft |
+------------------------+-----+
| ELECTRONICS            |   1 |
| --TELEVISIONS          |   2 |
| ----TUBE               |   3 |
| ----LCD                |   5 |
| ----PLASMA             |   7 |
| --PORTABLE ELECTRONICS |  10 |
| ----MP3 PLAYERS        |  11 |
| ----CD PLAYERS         |  13 |
| ----2 WAY RADIOS       |  15 |
+------------------------+-----+

reference:

Convert existing adjacency list to nested set table

There is a reporting relationship table in our actual OA database, which is an adjacency list model. Its general fields are as follows (a temporary table is used for my convenience APPROVAL_GROUP_TEMP):

CREATE TABLE `APPROVAL_GROUP_TEMP` (
  `ID` decimal(8,0) NOT NULL,
  `FATHERID` decimal(8,0) DEFAULT NULL, -- 上级ID
  `APPROVALGROUPNAME` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -- 名称
  `SHOWFLAG` decimal(1,0) DEFAULT '1', -- 状态(1::启用,0:禁用,2:删除)
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

We build a nested set table:

CREATE TABLE `AG_Stack` (
  `stack_top` int NOT NULL,
  `node` varchar(100) NOT NULL,  -- 名称
  `lft` int DEFAULT NULL,
  `rgt` int DEFAULT NULL,
  `nodeid` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

Then use the following stored procedure to convert the adjacency table relationship into the nested set table (only conversion enabled SHOWFLAG=1):

CREATE DEFINER=`root`@`localhost` PROCEDURE `AdjToNested`()
DETERMINISTIC
BEGIN
    DECLARE lft_rgt INTEGER;
    DECLARE max_lft_rgt INTEGER;
    DECLARE current_top INTEGER;
    DECLARE step INTEGER;

    SET step = 1;
    SET lft_rgt = 2;
    SET max_lft_rgt = 2 * (SELECT COUNT(*) FROM APPROVAL_GROUP_TEMP);
    SET current_top = 1;

    -- Clear Stack
    DELETE FROM AG_Stack;

    -- Insert 1st record, push 1 to stack
    INSERT INTO AG_Stack
        SELECT 1, APPROVALGROUPNAME, 1, max_lft_rgt, id
        FROM APPROVAL_GROUP_TEMP
        WHERE fatherid = -1;

    -- Remove the 1st record from Old table
    DELETE FROM APPROVAL_GROUP_TEMP WHERE fatherid = -1;

    -- If there are still records
    WHILE lft_rgt <= max_lft_rgt - 1 AND current_top > 0 DO 
        IF EXISTS (SELECT *
            FROM AG_Stack AS S1, APPROVAL_GROUP_TEMP AS T1
            WHERE S1.nodeid = T1.fatherid AND T1.SHOWFLAG = 1
            AND S1.stack_top = current_top)
        THEN BEGIN
            -- Each time process 1 record
            INSERT INTO AG_Stack SELECT (current_top + 1), T1.APPROVALGROUPNAME, lft_rgt, NULL, T1.id
                FROM AG_Stack AS S1, APPROVAL_GROUP_TEMP AS T1
                WHERE S1.nodeid = T1.fatherid AND T1.SHOWFLAG = 1
                AND S1.stack_top = current_top LIMIT 1;

            DELETE FROM APPROVAL_GROUP_TEMP
                WHERE id = (SELECT nodeid
                FROM AG_Stack
                WHERE stack_top = (current_top + 1) AND lft = lft_rgt);

            SET current_top = current_top + 1;
            SET lft_rgt = lft_rgt + step;

        END;
        ELSEIF current_top >= 0 THEN BEGIN
            UPDATE AG_Stack
                SET rgt = lft_rgt,
                stack_top = - stack_top
                WHERE stack_top = current_top;

            SET lft_rgt = lft_rgt + step;
            SET current_top = current_top - 1;
        END;
        END IF;
    END WHILE;
END;

After running:

CALL AdjToNested();

SELECT * FROM AG_Stack;
+-----------+---------------------------------+-------+-------+--------+
| stack_top | node                            | lft   | rgt   | nodeid |
+-----------+---------------------------------+-------+-------+--------+
|        -1 | 公司总部                        |     1 | 68202 |  16060 |
|        -2 | 研发中心                        |     2 | 41052 |  16062 |
|        -3 | 直属员工                        |    52 |   102 |  16881 |
|        -3 | 研发管理部                      |   152 |  1702 |  19340 |
|        -4 | 直属员工                        |   202 |   252 |  19341 |
|        -4 | 业务流程管理组                   |   302 |   452 |  19720 |
|        -5 | 直属员工                        |   352 |   402 |  19721 |
|        -4 | 业务标准管理组                   |   502 |   652 |  19722 |
|        -5 | 直属员工                        |   552 |   602 |  19723 |
|        -4 | 业务执行专家组                   |   702 |  1652 |  19724 |
|        -5 | 策划专家组                      |   752 |   902 |  19342 |
|        -6 | 直属员工                        |   802 |   852 |  19343 |
|        -5 | 程序专家组                      |   952 |  1102 |  19344 |
|        -6 | 直属员工                        |  1002 |  1052 |  19345 |
|        -5 | 美术专家组                      |  1152 |  1302 |  19346 |
|        -6 | 直属员工                        |  1202 |  1252 |  19347 |
|        -5 | 项目管理专家组                   |  1352 |  1502 |  19348 |
|        -6 | 直属员工                        |  1402 |  1452 |  19349 |
|        -5 | 直属员工                        |  1552 |  1602 |  19725 |
|        -3 | 某某工作室                       |  1752 |  3002 |  19496 |
|        -4 | XXX项目组                       |  1802 |  2952 |  19800 |
|        -5 | 策划组                          |  1852 |  2002 |  20740 |
-- ...以下省略

Performance comparison

Here we compare the performance of different models with similar data volumes.

Before optimization

Except for the closure table, the association tables are not indexed here.

Get the whole tree:

SET @@profiling = 0;
SET @@profiling_history_size = 0;
SET @@profiling_history_size = 100; 
SET @@profiling = 1;

-- 邻接表
WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
    T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT * FROM T1;

-- 路径枚举表
SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';

-- 闭包表
SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0;

-- 嵌套集
SELECT node.node,node.nodeid
FROM AG_Stack AS node,
        AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
        AND parent.lft = 1
ORDER BY node.lft;

-- 查看性能
SHOW PROFILES;
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.01553425 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS ( SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE T0.FATHERID = -1 AND SHOWFLAG=1 UNION ALL SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1 WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1 ) SELECT  |
|        2 | 0.00199475 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%' |
|        3 | 0.01929400 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0 |
|        4 | 0.00121350 | SELECT node.node,node.nodeid FROM AG_Stack AS node, AG_Stack AS parent WHERE node.lft > parent.lft AND node.rgt < parent.rgt AND parent.lft = 1 ORDER BY node.lft                                                                                                                            |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

It can be seen that the query speed of nested sets is orders of magnitude faster than that of adjacency tables. For more than 600 pieces of data, it is 10 times faster. Compared with closure tables, nested sets are also about 6 times faster. Without optimization, nested sets are about the same speed as path enumeration tables.

Use the explain statement to analyze the four SQLs. The path enumeration table only uses 1 simple query, the nested set 2 times, the closure table 3 times, and the adjacency table 4 times. Moreover, the adjacency table contains 1 complex query, a subquery of a temporary table and 2 joint queries, and other models are simple queries (SIMPLE). Because the adjacency list, path enumeration table and nested set are not indexed, the type is ALL, which is a full table scan.

-- 邻接表
EXPLAIN
WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
    T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT * FROM T1;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  616 |   100.00 | NULL                                       |
|  2 | DERIVED     | T0         | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2467 |     1.00 | Using where                                |
|  3 | UNION       | T1         | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   24 |   100.00 | Recursive                                  |
|  3 | UNION       | T2         | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2467 |     1.00 | Using where; Using join buffer (hash join) |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+

-- 路径枚举表
EXPLAIN
SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%';
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | AG_PathEnum | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    11.11 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------------+

-- 闭包表
EXPLAIN
SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0;
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key           | key_len | ref                      | rows | filtered | Extra                              |
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+
|  1 | SIMPLE      | n1    | NULL       | ALL    | PRIMARY       | NULL          | NULL    | NULL                     | 2467 |    10.00 | Using where                        |
|  1 | SIMPLE      | n2    | NULL       | ref    | uniq_anc_desc | uniq_anc_desc | 4       | OrgAndUser.n1.ID         |    5 |    90.00 | Using index condition; Using where |
|  1 | SIMPLE      | n3    | NULL       | eq_ref | PRIMARY       | PRIMARY       | 4       | OrgAndUser.n2.descendant |    1 |   100.00 | Using where                        |
+----+-------------+-------+------------+--------+---------------+---------------+---------+--------------------------+------+----------+------------------------------------+

-- 嵌套集
EXPLAIN
SELECT node.node,node.nodeid
FROM AG_Stack AS node,
        AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
        AND parent.lft = 1
ORDER BY node.lft;
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                        |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | parent | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    10.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | node   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  683 |    11.11 | Using where; Using join buffer (hash join)   |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+

Optimized

Here, the indexes of the search fields are added to several models:

-- 邻接表
CREATE UNIQUE INDEX APPROVAL_GROUP_TEMP_ID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (ID);
CREATE INDEX APPROVAL_GROUP_TEMP_FATHERID_IDX USING BTREE ON APPROVAL_GROUP_TEMP (FATHERID);
CREATE INDEX APPROVAL_GROUP_TEMP_SHOWFLAG_IDX USING BTREE ON APPROVAL_GROUP_TEMP (SHOWFLAG);

-- 路径枚举
CREATE UNIQUE INDEX AG_PathEnum_path_string_IDX USING BTREE ON AG_PathEnum (path_string);

-- 闭包表
CREATE INDEX AG_Closure_ancestor_IDX USING BTREE ON AG_Closure (ancestor);
CREATE INDEX AG_Closure_descendant_IDX USING BTREE ON AG_Closure (descendant);
CREATE INDEX AG_Closure_distance_IDX USING BTREE ON AG_Closure (distance);

-- 嵌套集
CREATE UNIQUE INDEX AG_Stack_lft_IDX USING BTREE ON AG_Stack (lft);
CREATE UNIQUE INDEX AG_Stack_rgt_IDX USING BTREE ON AG_Stack (rgt);

Re-execute the previous full-tree query, SHOW PROFILESand the result is:

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|       12 | 0.00990250 | WITH RECURSIVE T1(ID,APPROVALGROUPNAME,FATHERID) AS (
SELECT T0.ID,T0.APPROVALGROUPNAME,T0.FATHERID FROM APPROVAL_GROUP T0 WHERE
    T0.FATHERID = -1 AND SHOWFLAG=1
UNION ALL
SELECT T2.ID,T2.APPROVALGROUPNAME,T2.FATHERID FROM APPROVAL_GROUP T2, T1
  WHERE T2.FATHERID = T1.ID AND SHOWFLAG=1
)
SELECT  |
|       13 | 0.00184200 | SELECT nodeid,node FROM AG_PathEnum WHERE path_string LIKE '16060/%'     |
|       14 | 0.00384525 | SELECT n3.ID,n3.APPROVALGROUPNAME FROM APPROVAL_GROUP n1
INNER JOIN AG_Closure n2 ON n1.ID = n2.ancestor
INNER JOIN APPROVAL_GROUP n3 ON n2.descendant = n3.ID
WHERE n1.FATHERID = -1 AND n2.distance != 0                                                                                                   |
|       15 | 0.00235000 | SELECT node.node,node.nodeid
FROM AG_Stack AS node,
        AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
        AND parent.lft = 1
ORDER BY node.lft                                                                                                                            |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

It seems that there is not much difference between path enumeration and nested set, but nested set is twice as slow as before optimization. This should be related to the cache and index building of the database. Running the same SQL again will be faster.

Let’s compare the performance of finding parent nodes:

-- 路径枚举
SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2 
WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid;

-- 嵌套集
SELECT parent.node,parent.nodeid FROM
    AG_Stack AS node,
    AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
    AND node.nodeid = 18903
ORDER BY parent.lft DESC;

SHOW PROFILES;
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|       19 | 0.00192450 | SELECT P2.* FROM AG_PathEnum AS P1, AG_PathEnum AS P2 
WHERE P1.nodeid = 18903 AND POSITION(P2.path_string IN P1.path_string)= 1 ORDER BY P2.nodeid                                                                                                                                                           |
|       20 | 0.00111425 | SELECT parent.node,parent.nodeid FROM
AG_Stack AS node,
AG_Stack AS parent
WHERE node.lft > parent.lft AND node.rgt < parent.rgt
AND node.nodeid = 18903
ORDER BY parent.lft DESC                                                                                                                            |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The gap is not big either.

Summarize

For the design of tree-like or hierarchical data in a database, the above four models are time-tested and commonly used models.

Whether to normalize Is it infinite depth? Query requires recursion How many tables are needed Is it easy to add, delete or modify performance
adjacency list no yes yes 1 yes Low
path enumeration no no no 1 generally high
closure table yes yes no 2 generally generally
Nested sets yes yes no 1 generally high
  • Whether to normalize: Because the adjacency list and path enumeration table may have loop paths, they are not normalized models and need to be normalized through programming;
  • Whether the depth is unlimited: The path length of the path enumeration table is limited, which is not suitable for some data models with high depth requirements;
  • Whether the query requires recursion: Adjacency lists can only implement full-tree search through recursion, but other models do not.
  • Several tables are needed: the closure table requires additional relational tables, that is, exchanging space for time
  • Is addition, deletion and modification simple? The adjacency table only needs to operate on one record. Both the path enumeration table and the closure table need to operate on the associated nodes. The nested set also needs to recalculate the subsequent node left and right values ​​of the modified node.
  • Performance: As tested above, the performance of the adjacency list is the worst due to recursion. The performance of path enumeration and nested sets is very good due to simple queries and optimization. The closure table queries many tables and times, so the performance is average.

Original address zhuanlan.zhihu.com

Guess you like

Origin blog.csdn.net/qq_38263083/article/details/135455948