PostgreSQL 为什么使用了GIN索引还是慢？

PostgreSQL中的gin索引常常被用来需要搜索多值类型内的VALUE时，适合多值类型，例如数组、全文检索等。
但有时尽管是多值类型，我们使用gin索引查询时，却会发生查询很慢的情况。

例子：
创建表并在数组类型的列上创建gin索引。

bill=# create table t1(id int, info int[]);  
CREATE TABLE
bill=# insert into t1 select generate_series(1,10000),array[1,2,3,4,5]; 
INSERT 0 10000
bill=# create index idx_t1 on t1 using gin(info); 
CREATE INDEX

接下来使用数组列进行匹配来使用索引查询(为了让语句走索引需禁用seqscan)。

bill=# set enable_seqscan TO off;
SET
bill=# explain analyze select * from t1 where info  && array [1] ; 
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=84.00..303.00 rows=10000 width=45) (actual time=0.903..2.027 rows=10000 loops=1)
   Recheck Cond: (info && '{1}'::integer[])
   Heap Blocks: exact=94
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..81.50 rows=10000 width=0) (actual time=0.884..0.884 rows=10000 loops=1)
         Index Cond: (info && '{1}'::integer[])
 Planning Time: 0.166 ms
 Execution Time: 2.509 ms
(7 rows)

可以看到使用的是bitmap index scan，所以被匹配的数组对应有1万条记录的话，这1万条记录的行号会先排序，然后扫描heap取出记录。
所以gin索引查询慢就是因为使用的是bitmap index scan！由于目前gin 索引只支持bitmap index scan，也就是说，查询会将所有匹配的行号取出，排序，然后去heap表取记录。

哪怕是limit 1只查询1条记录，但是行号排序还是少不了，索引开销是不小的。

bill=# explain analyze select * from t1 where info  && array [1] limit 1;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=84.00..84.02 rows=1 width=45) (actual time=0.913..0.914 rows=1 loops=1)
   ->  Bitmap Heap Scan on t1  (cost=84.00..303.00 rows=10000 width=45) (actual time=0.912..0.912 rows=1 loops=1)
         Recheck Cond: (info && '{1}'::integer[])
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on idx_t1  (cost=0.00..81.50 rows=10000 width=0) (actual time=0.891..0.891 rows=10000 loops=1)
               Index Cond: (info && '{1}'::integer[])
 Planning Time: 0.115 ms
 Execution Time: 0.941 ms
(8 rows)

这就是为什么gin 索引慢的原因。
但并不是gin索引不好，要知道gin索引适用的是这种多值类型。而这种多值类型例如数组中可能存在大量的重复值。
例如我需要找的element有3个：1,2,3，假设一共有10万条记录。而1,2,3对应的ctid中可能存在大量重复的page，那么使用bitmap index scan就可以大大减少离散扫描的情况。
所以对于获取大量离散存放的堆数据gin索引是有奇效的。

总结：
gin索引适用场景：
能够使用索引对应字段上的条件可以将范围缩小到很小的场景。如果不能这样，或者是btree就可以缩小到很小的范围，那么建议使用BTREE就够了。因为btree可以走index scan也可以走bitmap index scan。
所以如果获取的记录数比较少，并且数据库的shared buffer足够大的话，完全没有必要使用bitmap index scan，效果一般。

foucus、

发布了155 篇原创文章 · 获赞 88 · 访问量 2万+

私信关注

PostgreSQL 为什么使用了GIN索引还是慢？

猜你喜欢