PostgreSQL多列索引

多列索引在实际的应用中是个很常见的功能,比如说一张表一张表的c1,c2,c3这些列经常被用在一起做查询使用,这个时候稍微有经验的人员都会给这张表的三个列建个多列组合索引来加速查询,那么在使用多列索引的时候大家有没有思考过这些问题呢?
哪些索引支持多列索引?多列索引的索引列该如何选择?同一个多列索引在不同场景下是不是查询效率都一样呢?..
我们先来看看在PostgreSQL中哪些索引支持多列索引,目前PostgreSQL的B-tree, GiST, GIN, and BRIN索引方法，支持多列索引。并且最多只能支持创建32个列的多列索引,不过这个值在pg_config_manual.h中可以修改(修改完需要重新编译)
既然B-tree, GiST, GIN, BRIN都支持多列索引,而且这些索引的内部结构都不一样,是不是创建的多列索引结构也不一样,效率也不一样呢?

btree :
可能很多不熟悉PostgreSQL的人都知道,虽然b-tree多列索引支持任意列的组合查询，但是最有效的查询还是包含驱动列条件的查询,不过这是为什么呢?这就要说到对于b-tree的多列索引来说，一个查询要扫描索引的哪些部分了,我们先来看几个例子:

bill=# create table t1(c1 int ,c2 int,c3 int);
CREATE TABLE
bill=# create index idx_t1 on t1 using btree(c1,c2,c3);
CREATE INDEX
bill=# insert into t1 select random()*100,random()*100,random()*100 from generate_series(1,1000000);
INSERT 0 1000000

1、查询条件c1=10 and c2>=40 and c3 < 80:

bill=# explain (analyze ,buffers) select * from t1 where c1=10 and c2>=40 and c3 < 80; 
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=110.15..3588.30 rows=4497 width=12) (actual time=1.192..4.998 rows=4777 loops=1)
   Recheck Cond: ((c1 = 10) AND (c2 >= 40) AND (c3 < 80))
   Heap Blocks: exact=3170
   Buffers: shared hit=3201
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..109.03 rows=4497 width=0) (actual time=0.773..0.773 rows=4777 loops=1)
         Index Cond: ((c1 = 10) AND (c2 >= 40) AND (c3 < 80))
         Buffers: shared hit=31
 Planning Time: 0.190 ms
 Execution Time: 5.254 ms
(9 rows)

2、查询条件 c2>=40 and c3 < 80:

bill=# explain (analyze ,buffers) select * from t1 where  c2>=40 and c3 < 80;          
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..20406.00 rows=481775 width=12) (actual time=0.017..125.647 rows=480822 loops=1)
   Filter: ((c2 >= 40) AND (c3 < 80))
   Rows Removed by Filter: 519178
   Buffers: shared hit=5406
 Planning Time: 0.129 ms
 Execution Time: 148.011 ms
(6 rows)

3、查询条件c1=10 and c3 < 80:

bill=# explain (analyze ,buffers) select * from t1 where c1=10 and c3 < 80;            
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=156.70..4790.57 rows=7399 width=12) (actual time=1.930..8.622 rows=7991 loops=1)
   Recheck Cond: ((c1 = 10) AND (c3 < 80))
   Heap Blocks: exact=4162
   Buffers: shared hit=4215
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..154.86 rows=7399 width=0) (actual time=1.317..1.317 rows=7991 loops=1)
         Index Cond: ((c1 = 10) AND (c3 < 80))
         Buffers: shared hit=53
 Planning Time: 0.115 ms
 Execution Time: 9.078 ms
(9 rows)

4、查询条件 c1>=10 and c2>=40 and c3 < 80:

bill=# explain (analyze ,buffers) select * from t1 where c1>=10 and c2>=40 and c3 < 80; 
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..22906.00 rows=436552 width=12) (actual time=0.014..131.837 rows=434814 loops=1)
   Filter: ((c1 >= 10) AND (c2 >= 40) AND (c3 < 80))
   Rows Removed by Filter: 565186
   Buffers: shared hit=5406
 Planning Time: 0.190 ms
 Execution Time: 151.981 ms
(6 rows)

细心的人可能发现了,查询条件不同,扫描的数据块数量也不相同,似乎查询条件中带有c1列(驱动列)的扫描的数据块都很少.的确如此,不过仔细一看好像最后一个查询又有点不一样,这是为什么呢?
官方文档中的解释如下:
The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.

总结一下就是:
1、查询条件不带驱动列:查询会扫描所有索引条目;
2、查询条件带有驱动列:从驱动列开始算，按索引列的顺序算到非驱动列的第一个不相等条件为止

这也就是说对于上面的4个查询:
1、查询条件c1=10 and c2>=40 and c3 < 80:从c1=10, c2=40开始的所有索引条目，都会被扫描;
2、查询条件 c2>=40 and c3 < 80:所有索引条目，都会被扫描;
3、查询条件c1=10 and c3 < 80:c1=10的所有索引条目都会被扫描;
4、查询条件 c1>=10 and c2>=40 and c3 < 80:从c1=10开始的所有索引条目，都会被扫描。

gist:
gist多列索引支持任意列的组合查询。与b-tree不一样的地方，驱动列的选择性决定了需要扫描多少索引条目，扫描多少条目与非驱动列无关（而b-tree是与非驱动列也有关的)

gin:
gin多列索引支持任意列的组合查询。并且任意查询条件的查询效率都是一样的。

brin:
brin多列索引支持任意列的组合查询。并且任意查询条件的查询效率都是一样的。

foucus、

发布了70 篇原创文章 · 获赞 5 · 访问量 3159

私信关注

猜你喜欢