Invented a new tree-structured database storage scheme

In the process of developing jSqlBox recently, I wanted to study the conversion between tree structure and VO object tree, and suddenly found a new tree structure database storage scheme.
There are four common tree-structured database storage schemes, but they all have certain problems: 1
) Adjacency List:: Record the parent node. The advantage is that it is simple, but the disadvantage is that accessing subtrees requires traversal, issuing many pieces of SQL, and putting a lot of pressure on the database.
2) Path Enumerations: Record the entire path with a string. The advantage is that the query is convenient, and the disadvantage is that all paths below this node must be manually changed when inserting a new record, which is prone to errors.
3) Closure Table: A dedicated table to maintain Path, the disadvantage is that it takes up a lot of space and the operation is not intuitive.
4) Nested Sets: record lvalues ​​and rvalues, the disadvantage is that it is complicated and difficult to operate.
The above methods all have a common disadvantage: the operation is not intuitive, and the tree structure cannot be directly seen, which is not conducive to development and debugging.
The method introduced in this article I temporarily call it "simple and rude multi-column storage method", which is somewhat similar to Path Enumerations, but the difference is that many database columns are used to store a placeholder (1 or null), as shown in the following figure (https ://github.com/drinkjava2/Multiple-Columns-Tree/blob/master/treemapping.jpg) The tree structure on the left, the structure mapped in the database is shown in the table on the right:



Various SQL operations are as follows:

1. Get (or Delete) all child nodes under the specified node, the row number of the known node is "X", and the column name is "cY":
select *(or delete) from tb where
  line>=X and line<(select min(line) from tb where line>X and (cY=1 or c(Y-1)=1 or c(Y-2)=1 ... or c1=1) )
For example to get D node and all its children:
select * from tb where line>=7 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1))
delete D node and All its children:
delete from tb where line>=7 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1))

to get only all children of node D's secondary:
select * from tb where line>=7 and c3=1 and line< (select min(line) from tb where line>7 and (c2=1 or c1=1))

2. Query the root node of the specified node, known The line number of the node is "X", and the column name is "cY":
select * from tb where line=(select max(line) from tb where line<=X and c1=1)
For example, check the root node of node I:
select * from tb where line=(select max(line) from tb where line<=12 and c1=1)

3. Query the parent node of the specified node, the line number of the known node is "X", and the column name is "cY":
select * from tb where line=(select max(line) from tb where line<X and c (Y-1)=1)
For example, check the parent node of the upper level of the L node:
select * from tb where line=(select max(line) from tb where line<11 and c3=1)

4. Query all the specified nodes Parent node, the line number of the known node is "X", and the column name is "cY":
select * from tb where line=(select max(line) from tb where line<X and c(Y-1)=1)
union select * from tb where line=(select max(line) from tb where line<X and c(Y-2)=1)
...
union select * from tb where line=(select max(line) from tb where line <X and c1=1)
For example, check all parent nodes of node I:
select * from tb where line=(select max(line) from tb where line< 12 and c2=1)
union  select * from tb where line=(select max(line) from tb where line<12 and c1=1)

5. Insert a new node:
as needed, for example, insert a new node T between J and K:
update tb set line=line+1 where line>=10;
insert into tb (line,id,c4) values ​​( 10,'T',1)
This is the biggest difference from the Path Enumerations mode. Insertion is very convenient. You only need to use SQL to add 1 to all the row numbers that follow. You don't need to spend a lot of energy on maintaining the path string, and 
it's not easy to make mistakes . .
In addition, if the table is very large, in order to avoid full table update caused by update tb set line=line+1, which affects performance, you can consider adding
a GroupID field. All nodes under the same root node share a GroupID, and all operations are in the groupID group. For example to insert a new node instead:
update tb set line=line+1 where groupid=2 and line>=8;
insert into tb (groupid,line,c4) values ​​(2, 8,'T')
because a groupid The operations below will not affect other groupids. For complex additions, deletions, and modifications, you can even delete the contents of the entire group at one time 
and reinsert a new group after completing the operation in memory.

Summary:
The advantages of this method introduced above are:
1) It is intuitive and easy to understand and easy to debug. It is the only WYSIWYG among all tree structure database schemes, and can directly see the shape of the tree. The use of null values ​​makes the tree shape The structure is clear at a glance.
2) Can make full use of SQL, it is very convenient to query, delete, and insert, without using like fuzzy query syntax.
3) Only one table is required.
4) Compatible with all databases.
5) The placeholder is the place where the actual content to be displayed should appear, which is convenient for outputting to table display controls such as Grid.

Disadvantages:
1) It is not an infinite depth tree, and the maximum number of columns allowed in the database is limited, usually up to 1000, which leads to the fact that the depth of the tree cannot exceed 1000, and considering that too many columns have an impact on performance, it is recommended to use one. Smaller depth limits such as 100.
2) The SQL statement is relatively long, and many times there will be n factorial query conditions such as c9=1 or c8=1 or c7=1 ... or c1=1.
3) The overall movement of the nodes of the tree is troublesome, and it is necessary to The entire subtree is translated or moved up and down. When the node needs to move frequently, this solution is not recommended. For some applications that only increase or decrease, and do not move nodes frequently, such as forum posts and comments, it is more suitable.
4) When there are many columns, the space is a bit large.

====Supplement, the following is the additional content, which is a simpler infinite-depth tree scheme on the basis of the aforementioned ==
Suddenly found that the above method is still too stupid, if you don't use multiple columns but only use one column to store The depth level can not be limited by the number of database columns, so it can evolve into an infinite depth tree. Although it no longer has the effect of what you see is what you get, it is far superior in performance and simplicity to the above "simple and crude multi-column storage method" ", temporarily named it "Zhu's Depth Tree V2.0 Method" (Note: If someone has invented this method, just delete the first two words), the method is as follows:
The tree structure on the left of the following figure (https://github.com/drinkjava2/Multiple-Columns-Tree/blob/master/treemappingv2.png), the structure mapped in the database is shown in the table on the right, note that this method is based on groupid To group, the last row of each group must have an END mark, and the level is set to 0, as shown in the figure:



1. Get all child nodes under the specified node, the row number of the known node is X, the level is Y, and the groupID is Z
select * from tb2 where groupID=Z and
  line>=X and line<(select min(line) from tb where line>X and level<=Y and groupID=Z)
For example to get D node and all its children:
select * from tb2 where groupID=1 and line>=7 and line< (select min(line) from tb2 where groupid=1 and line>7 and level<=2)
delete is similar to acquisition, just replace select * in sql with delete Can.

Get only all sub-nodes of node D: (add a level=Y+1 to the query condition):
select * from tb2 where groupID=1 and line>=7 and level=3 and line< (select min(line ) from tb2 where groupid=1 and line>7 and level<=2)

2. Query the root node of any node,
select * from tb2 where groupID=Z and line=1 (or level=1)

3. Query the parent node of the upper level of the specified node, the line number of the known node is X, the level is Y, and the groupID is Z
select * from tb2 where groupID=Z and line=(select max(line) from tb2 where groupID=Z and line<X and level=(Y-1))
For example, check the parent node of the upper level of node L:
select * from tb2 where groupID= 1 and line=(select max(line) from tb2 where groupID=1 and line<11 and level=3)

4. Query all parent nodes of the specified node, the line number of the known node is X, and the level is Y:
select * from tb2 where groupID=Z and line=(select max(line) from tb2 where groupID=Z and line<X and level=(Y-1))
union select * from tb2 where groupID=Z and line=(select max( line) from tb2 where groupID=Z and line<X and level=(Y-2))
...(number of lines=Y-1)
union select * from tb2 where groupID=Z and line=(select max(line) from tb2 where groupID=Z and line<X and level=1)
For example, check all parent nodes of node I:
select * from tb2 where groupID=1 and line=(select max(line) from tb2 where groupID=1 and line<12 and level=2)
union select * from tb2 where groupID=1 and line=(select max(line) from tb2 where groupID=1 and line <12 and level=1)

5. Insert a new node: For example, insert a new node T between J and K:
update tb2 set line=line+1 where groupID=1 and line>=10;
insert into tb (groupid, line,id,level) values ​​(1,10,'T',4);

Summary: The
advantages of this method are:
1) It is an infinite depth tree
2) Although it does not have the WYSIWYG effect as the first scheme, However, it still has the characteristics of being intuitive and easy to understand and easy to debug.
3) It can make full use of SQL, and it is very convenient to query, delete, and insert. SQL is much simpler than the first solution, and it does not use like fuzzy query syntax.
4) Only one table is required
5) Compatible with all databases
6) Small footprint

The disadvantages are:
1) The overall movement of the nodes of the tree is a bit troublesome, and it is suitable for some occasions where nodes are only increased or decreased, such as forum posts and comments, etc. When it is really necessary to perform complex moving node operations, one solution is to perform the entire tree operation in memory and complete the sorting. After the operation is completed, delete the entire old group and then insert the new group into the database in batches at one time.

Added on January 22, 2017:
The movement of nodes is a bit troublesome, but compared to query/deletion/insertion, it does not mean that it is difficult. For example, the operation of moving the entire B node tree to the H node under MySQL and between J and K is as follows:
update tb2 set tempno=line*1000000 where groupid=1;  
set @nextNodeLine=(select min(line) from tb2 where groupid=1 and line>2 and level<=2);  
update tb2 set tempno=9*1000000+line, level=level+2 where groupID=1 and line>=2 and line< @nextNodeLine;  
set @mycnt=0;  
update tb2 set line=(@mycnt := @mycnt + 1) where groupid=1 order by tempno;

The above example needs to add an integer type column named tempno to the table. This is a lazy algorithm. Although it is simple and clear, it reorders the entire tree, so the efficiency is not high. The Adjacency List scheme may be more appropriate in situations where frequent node movement is required.

On January 22, I will add:
If you need to move nodes frequently, and want to retain the advantages of the efficient query of scheme 2, another scheme is to add a parent node pid field and two auxiliary fields tempno and temporder for sorting. (Temporarily call it the "depth tree V3.0 method"), which is equivalent to the merger of the V2.0 method and the Adjacency List mode. The advantage is that every time a node is moved, only the PID needs to be changed, and no complex algorithm is required. You can move, add, and delete multiple nodes arbitrarily at one time, and finally call the following algorithm to simply reorder. The following example fully demonstrates the conversion of an Adjacency List mode to V2.0 mode, which is equivalent to a new The process of building a query index:



create table tb3 (
id varchar(10),
comments varchar(55),
pid varchar(10),
line integer,
level integer,
tempno bigint,
temporder integer
)

insert into tb3 (id,comments,Pid) values('A','found a bug',null);
insert into tb3 (id,comments,Pid) values('B','is a worm','A');
insert into tb3 (id,comments,Pid) values('C','no','A');
insert into tb3 (id,comments,Pid) values('D','is a bug','A');
insert into tb3 (id,comments,Pid) values('E','oh, a bug','B');
insert into tb3 (id,comments,Pid) values('F','solve it','B');
insert into tb3 (id,comments,Pid) values('G','careful it bites','C');
insert into tb3 (id,comments,Pid) values('H','it does not bit','D');
insert into tb3 (id,comments,Pid) values('I','found the reason','D');
insert into tb3 (id,comments,Pid) values('J','solved','H');
insert into tb3 (id,comments,Pid) values('K','uploaded','H');
insert into tb3 (id,comments,Pid) values('L','well done!','H');

set @mycnt=0;
update tb3 set  line=0,level=0, tempno=0, temporder=(@mycnt := @mycnt + 1) order by id;
update tb3 set level=1, line=1 where pid is null;

update tb3 set tempno=line*10000000 where line>0;
update tb3 a, tb3 b set a.level=2, a.tempno=b.tempno+a.temporder where a.level=0 and a.pid=b.id and b.level=1;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;

update tb3 set tempno=line*10000000 where line>0;
update tb3 a, tb3 b set a.level=3, a.tempno=b.tempno+a.temporder where a.level=0 and a.pid=b.id and b.level=2;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;

update tb3 set tempno=line*10000000 where line>0;
update tb3 a, tb3 b set a.level=4, a.tempno=b.tempno+a.temporder where a.level=0 and a.pid=b.id and b.level=3;
set @mycnt=0;
update tb3 set line=(@mycnt := @mycnt + 1) where level>0 order by tempno;


The above algorithm takes advantage of the functions of SQL and transforms the original process of recursive query of SQL into a limited number of SQL operations (= the maximum depth of the tree). In order to highlight the algorithm, the above example assumes that there is only one root node, and deletes the groupid and endtag, it is necessary to improve this detail in actual use, and order by id can also be changed to order by other fields. Due to time constraints, I will not give the algorithm for inverting the V2.0 mode to the Adjacency List mode (that is, if the pid is empty, the process of assigning the pid inversely according to the V2.0 table), but this algorithm is not important, because Usually each row in the v3.0 table will always keep a pid).
To sum up:
Adjacency List mode: easy to move/add/delete nodes, inconvenient to query
Depth tree V2.0 mode: easy to query, easy to add/delete nodes, but there are efficiency problems, inconvenient to move nodes
Depth tree V3.0 mode: It is convenient to move/add/delete nodes, and it is convenient to query. It is a combination of the above two modes, and can switch between the two modes (modification mode and query mode) at any time according to the emphasis. The v3.0 method is equivalent to designing a query index for the Adjacency List mode.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326485288&siteId=291194637