Postgresql sort and limit the performance limits of optimized combination scene

1 structure test data

create table tbl(id int, num int, arr int[]); 
create index idx_tbl_arr on tbl using gin (arr); 
create or replace function gen_rand_arr() returns int[] as $$  
  select array(select (1000*random())::int from generate_series(1,64));  
$$ language sql strict;


insert into tbl select generate_series(1,3000000),(10000*random())::int, gen_rand_arr();

insert into tbl select generate_series(1,500), (10000*random())::int, array[350,514,213,219,528,753,270,321,413,424,524,435,546,765,234,345,131,345,351];

2 query go GIN index

GIN index query speed limit test scenario is very fast, in the actual production, may occur after the use of gin index, query rate is still high, the feature is the implementation plan Bitmap Heap Scantakes up a lot of time, Bitmap Index Scanmost of the blocks are marked filtered out.

This situation is very common, ordinary btree index can cluster to reorganize the data, but does not support cluster gin index, the general index of the column is an array type gin. When the data is very scattered situation, bitmap index scan a large number of blocks marked, the cost is very high recheck later, resulting in slow gin index query.

We then look at this example

explain analyze select * from tbl where arr @> array[350,514,213,219,528,753,270] order by num desc limit 20;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2152.02..2152.03 rows=1 width=40) (actual time=57.665..57.668 rows=20 loops=1)
   ->  Sort  (cost=2152.02..2152.03 rows=1 width=40) (actual time=57.664..57.665 rows=20 loops=1)
         Sort Key: num
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Bitmap Heap Scan on tbl  (cost=2148.00..2152.01 rows=1 width=40) (actual time=57.308..57.581 rows=505 loops=1)
               Recheck Cond: (arr @> '{350,514,213,219,528,753,270}'::integer[])
               Heap Blocks: exact=493
               ->  Bitmap Index Scan on idx_tbl_arr  (cost=0.00..2148.00 rows=1 width=0) (actual time=57.248..57.248 rows=505 loops=1)
                     Index Cond: (arr @> '{350,514,213,219,528,753,270}'::integer[])
 Planning time: 0.050 ms
 Execution time: 57.710 ms

You can see the current implementation plan is dependent on gin index scan, but gin index performance issues how do we optimize it?

3 scene optimization combination sorting limit

Sorting and limit the combination of SQL is a typical index to optimize a King. We know btree indexes are ordered in memory, you can directly get the result of sort by traversing the btree index, where the combination of limit, only need to traverse part of btree node and follow other conditions recheck ok.

We look Optimization:

create index idx_tbl_num on tbl(num);
analyze tbl;

set enable_seqscan = off;
set enable_bitmapscan = off;


postgres=# explain analyze select * from tbl where arr @> array[350,514,213,219,528,753,270] order by num desc limit 10;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..571469.93 rows=1 width=287) (actual time=6.300..173.949 rows=10 loops=1)
   ->  Index Scan Backward using idx_tbl_num on tbl  (cost=0.43..571469.93 rows=1 width=287) (actual time=6.299..173.943 rows=10 loops=1)
         Filter: (arr @> '{350,514,213,219,528,753,270}'::integer[])
         Rows Removed by Filter: 38399
 Planning time: 0.125 ms
 Execution time: 173.972 ms
(6 rows)

Time: 174.615 ms
postgres=# cluster tbl using idx_tbl_num;
CLUSTER
Time: 124340.276 ms
postgres=# explain analyze select * from tbl where arr @> array[350,514,213,219,528,753,270] order by num desc limit 10;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..563539.77 rows=1 width=287) (actual time=1.145..34.602 rows=10 loops=1)
   ->  Index Scan Backward using idx_tbl_num on tbl  (cost=0.43..563539.77 rows=1 width=287) (actual time=1.144..34.601 rows=10 loops=1)
         Filter: (arr @> '{350,514,213,219,528,753,270}'::integer[])
         Rows Removed by Filter: 38399
 Planning time: 0.206 ms
 Execution time: 34.627 ms
(6 rows)

In this case test scenarios constructed may not reflect the greatest degree of the problem, but you can see the cluster btree index can go very stable reach about 34ms.

When there is a problem of gin performance, such limit + order by SQL statements may wish to enforce common sense (pg_hint_plan) go about btree index, may have unexpected results.

gin index query performance under high concurrency scenarios decline 4

GIN indexes for PostgreSQL database multi-value type of inverted index, a record may involve multiple GIN index KEY, so real-time indexing if the merger is written, it will lead to a sharp increase in IO, write RT will increase. In order to increase the write throughput, PG combining technique allows the user to turn-on delay GIN index, after opening, the data is written first pending list, not directly write index page, when the pending list reaches a certain size, or autovacuum correspondence table, will trigger pending list to merge the index action.

Query, if there are not merged into the PENDING LIST index, then queries the pending list, while also query the index information.

If you write a lot of amount, pending list is huge, merge (autovacuum worker to do) could not keep pace, it will cause the query performance when a query by GIN index.

create extension pageinspect ; 
SELECT * FROM gin_metapage_info(get_raw_page('idx_tbl_arr', 0));  

-- 如果很多条记录在pending list中,查询性能会下降明显。
-- vacuum table,强制合并pending list
vacuum tbl; 

Part 4 reference https://github.com/digoal/blog/blob/master/201809/20180919_02.md

Published 27 original articles · won praise 2 · views 50000 +

Guess you like

Origin blog.csdn.net/jackgo73/article/details/89683098