Use ltree to process hierarchical data in PostgreSQL

Author: Alexei Vasiliev (Alexey Vasiliev)

Translator: Lei Yanliang, working at Henco Foundation Software Co., Ltd., PostgreSQL database technology enthusiast, PostgreSQL ACE, PGCM, 10g & 11g OCM, OGG certified expert.

Original address: https://leopard.in.ua/2013/09/02/postgresql-ltree#.YEhtc2gzaUk

In this article, we will learn how to use PostgreSQL's ltree module, which allows data to be stored in a hierarchical tree structure.

What is ltree?

Ltree is a PostgreSQL module. It implements a data type ltree, which is used to represent tags of data stored in a hierarchical tree structure. Provides extensive tools for searching the tag tree.

Why choose ltree?

  • ltree implements a materialized path, which is very fast for INSERT / UPDATE / DELETE, but faster for SELECT operations
  • Generally, it is faster than using recursive CTEs or recursive functions that often require branch recalculation
  • Such as built-in query syntax and operators dedicated to query and navigation tree
  • index! ! !

Initial data

First, you should enable extensions in the database. You can do this with the following command:

CREATE EXTENSION ltree;

Let's create the table and add some data to it:

CREATE TABLE comments (user_id integer, description text, path ltree);
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001.0001.0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0001.0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 8, md5(random()::text), '0001.0003.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 11, md5(random()::text), '0001.0003.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0003.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0003.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 7, md5(random()::text), '0001.0003.0002.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 20, md5(random()::text), '0001.0003.0002.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 31, md5(random()::text), '0001.0003.0002.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0004');
INSERT INTO comments (user_id, description, path) VALUES ( 34, md5(random()::text), '0001.0003.0002.0002.0005');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0006');

In addition, we should add some indexes:

CREATE INDEX path_gist_comments_idx ON comments USING GIST(path);
CREATE INDEX path_comments_idx ON comments USING btree(path);

As you can see, I created the comments table with a path field, which contains all the paths of the tree in the table. As you can see, for the tree separator, I use 4 numbers and dots.

Let us find the record whose path starts with '0001.0003' in the commenets table:

$ SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

Let's check this SQL through the EXPLAIN command:

$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on comments  (cost=0.00..1.24 rows=2 width=38) (actual time=0.013..0.017 rows=12 loops=1)
   Filter: (path <@ '0001.0003'::ltree)
   Rows Removed by Filter: 7
 Total runtime: 0.038 ms
(4 rows)

Let's disable seq scan for testing:

$ SET enable_seqscan=false;
SET
$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using path_gist_comments_idx on comments  (cost=0.00..8.29 rows=2 width=38) (actual time=0.023..0.034 rows=12 loops=1)
   Index Cond: (path <@ '0001.0003'::ltree)
 Total runtime: 0.076 ms
(3 rows)

Now SQL is slow, but you can see how SQL uses index.
The first SQL statement uses sequence scan because there is not much data in the table.

We can change the implementation method of select "path <@ '0001.0003'":

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

You should not forget the order of the data, as in the following example:

$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0001');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0002');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0003');
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
(15 rows)

Now to sort:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

You can add several modifiers to the end of the non-asterisk tag of lquery to make it more matching than an exact match:
"@"-case-insensitive matching, for example a @match A
"*"-match anything with this prefix Tag, for example foo* matches foobar
"%"-matches words starting with an underscore

$ SELECT user_id, path FROM comments WHERE path ~ '0001.*{1,2}.0001|0002.*' ORDER by path;
 user_id |           path
---------+--------------------------
       2 | 0001.0001.0001
       2 | 0001.0001.0001.0001
       1 | 0001.0001.0001.0002
       5 | 0001.0001.0001.0003
       6 | 0001.0002.0001
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(19 rows)

Let's find all direct children for parent '0001.0003', see below:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path;
 user_id |      path
---------+----------------
       8 | 0001.0003.0001
       9 | 0001.0003.0002
(2 rows)

Find all children for parent '0001.0003', see below:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

Find the parent for children '0001.0003.0002.0002.0005':

$ SELECT user_id, path FROM comments WHERE path = subpath('0001.0003.0002.0002.0005', 0, -1) ORDER by path;
 user_id |        path
---------+---------------------
       2 | 0001.0003.0002.0002
(1 row)

If your path is not unique, you will get multiple records.

Overview

It can be seen that the materialization path using ltree is very simple. In this article, I did not list all possible uses of ltree. It is not regarded as a full text search problem with ltxtquery. But you can find it in the official PostgreSQL documentation ( http://www.postgresql.org/docs/current/static/ltree.html ).

To learn more about PostgreSQL hotspot information, news trends, and exciting activities, please visit the official website of PostgreSQL in China: www.postgresqlchina.com

To solve more PostgreSQL-related knowledge, technology, and work problems, please visit the official PostgreSQL Q&A community in China: www.pgfans.cn

To download more PostgreSQL-related information, tools, and plug-in issues, please visit the official PostgreSQL download site in China: www.postgreshub.cn

Guess you like

Origin blog.csdn.net/weixin_46199817/article/details/114819040