PostgreSQL的btree和bloom索引以单列为查询条件时的性能比较

1. 实验环境

硬件：普通PC一台。内存 8G。磁盘总空间200G。

操作系统：CentOS 6.9

软件：PostgreSQL 10，默认配置。

2. 准备工作

登录PostgreSQL，首先创建表 testindex，它有四列：

postgres=# create table testindex

postgres-# (

postgres(#     col_int int,

postgres(#     col_float float,

postgres(#     col_varchar_1 varchar(64),

postgres(#     col_varchar_2 varchar(64)

postgres(# );

CREATE TABLE

现在需要向这张表随机插入十万条数据，其中列col_varchar_1和col_varchar_2中的数据是uuid。插入数据之前，需要创建两个扩展模块：pgcrypto模块含有uuid的生成函数，而bloom模块是下面我们创建bloom索引时需要的：

postgres=#create extension if not exists pgcrypto;

CREATE EXTENSION

postgres=#create extension if not exists bloom;

CREATE EXTENSION

postgres=# insert into testindex(col_int, col_float, col_varchar_1, col_varchar_2)

postgres-# select (100000 * random())::int, 100000 * random(), gen_random_uuid(), gen_random_uuid() from generate_series(1,100000);

INSERT 0 100000

看一看col_varchar_1，col_varchar_2中数据的唯一性如何:

postgres=# select count(col_varchar_1),count(col_varchar_2) from testindex;

结果是：

count | count

--------+--------

100000 | 100000

(1 row)

col_varchar_1和col_varchar_2中没有重复数据。

现在进入了我们的主题。

写一个条件列为col_varchar_1的等值查询 (1)，并用查询分析器分析它：

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

分析结果如下：

                                            QUERY PLAN

---------------------------------------------------------------------------------------------------

Seq Scan on testindex (cost=0.00..2789.00 rows=1 width=86) (actual time=6.902..13.586 rows=1 loops=1)

   Filter: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Filter: 99999

Planning time: 0.059 ms

Execution time: 13.610 ms

(5 rows)

3. 实验步骤

3.1. 查询条件是单列上的单值

现在，我们来分三次创建三个索引。它们分别是 col_varchar_1 上的b-tree索引，col_varchar_1 上的bloom索引，以及col_varchar_1和col_varchar_2 上的联合bloom索引。每次以同一个col_varchar_1值作为查询条件，观察查询分析器对查询的分析结果。

创建col_varchar_1 上的b-tree索引：

postgres=# create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);

CREATE INDEX

执行分析查询(1)，结果如下：

postgres=#explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

-

Index Scan using idx_col_varchar_1_btree on testindex (cost=0.42..8.44 rows=1 width=86) (actual time=0.039..0.039 rows=1 loops=1)

Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

Planning time: 0.298 ms

Execution time: 0.064 ms

(4 rows)

删除idx_col_varchar_1_btree，并创建col_varchar_1 上的bloom索引：

postgres=# drop index if exists idx_col_varchar_1_btree;

DROP INDEX

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

执行查询(1)，结果如下：

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

                                                        QUERY PLAN



---------------------------------------------------------------------------------------------------------------------

------

Bitmap Heap Scan on testindex (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.831..0.846 rows=1 loops=1)

   Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Index Recheck: 31

   Heap Blocks: exact=32

   -> Bitmap Index Scan on idx_col_varchar_1_bloom (cost=0.00..1542.00 rows=1 width=0) (actual time=0.803..0.803 rows=32 loo

ps=1)

         Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

Planning time: 0.142 ms

Execution time: 0.877 ms

(8 rows)

删除idx_col_varchar_1_bloom，并创建col_varchar_1和col_varchar_2 上的联合bloom索引：

postgres=# drop index if exists idx_col_varchar_1_bloom;

DROP INDEX

postgres=# create index idx_col_varchar_1_col_varchar_2_bloom on testindex using bloom (col_varchar_1, col_varchar_2);

CREATE INDEX

执行查询(1)，结果如下：

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

                                                           QUERY PLAN



---------------------------------------------------------------------------------------------------------------------

------------

Bitmap Heap Scan on testindex (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.924..0.998 rows=1 loops=1)

   Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Index Recheck: 182

   Heap Blocks: exact=171

   -> Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom (cost=0.00..1542.00 rows=1 width=0) (actual time=0.815..0.815 rows=1

83 loops=1)

         Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

Planning time: 0.123 ms

Execution time: 1.027 ms

(8 rows)

下面是使用这几种索引的进行上述查询的代价估计：

索引类型	无索引	单列btree索引	单列bloom索引	多列bloom索引
查询代价估计	2789.00	8.44	1546.01	1546.01

根据查询分析器的分析，可以看出，如果一个查询是建立在类型为varchar的单列上的等值查询，那么论性能， 单列b-tree索引 > 单列bloom 索引 = 多列bloom索引 > 顺序扫描。

3.2 查询条件是单列上的一个已知集合

那么，如果查询条件是单列上的一个集合呢？

删除之前创建的索引：

postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

分析如下查询(2)，查询的条件时单列上10个值的集合：

postgres=# explain analyze

postgres-# select * from testindex where col_varchar_1 in

postgres-# (

postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',

postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',

postgres(# '02aded54-5b45-4919-9162-490fd56ee240',

postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',

postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',

postgres(# '13c87623-86c7-44d8-8a77-39144a154238'

postgres(# );

                                                                                      QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

Planning time: 0.254 ms

Execution time: 56.891 ms

(5 rows)

创建col_varchar_1 上的b-tree索引，执行分析查询(2)，结果如下：

postgres=# create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);

CREATE INDEX

postgres=# explain analyze

select * from testindex where col_varchar_1 in

(

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);



                                                                                           QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

Bitmap Heap Scan on testindex (cost=44.25..82.06 rows=10 width=86) (actual time=0.147..0.148 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Heap Blocks: exact=1

   -> Bitmap Index Scan on idx_col_varchar_1_btree (cost=0.00..44.25 rows=10 width=0) (actual time=0.143..0.143 rows=10 loop

s=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

Planning time: 0.182 ms

Execution time: 0.276 ms

(7 rows)

删除上面的索引，并创建col_varchar_1 上的bloom索引，分析查询(2)，结果如下：

postgres=# drop index if exists idx_col_varchar_1_btree;

DROP INDEX

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

postgres=# explain analyze

postgres-# select * from testindex where col_varchar_1 in

postgres-# (

postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',

postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',

postgres(# '02aded54-5b45-4919-9162-490fd56ee240',

postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',

postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',

postgres(# '13c87623-86c7-44d8-8a77-39144a154238'

postgres(# );



                                                                                      QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

Planning time: 0.201 ms

Execution time: 55.487 ms

奇怪的事发生了，分析器竟然不会选择通过索引扫描，而选择顺序扫描。

而删除idx_col_varchar_1_bloom，并创建col_varchar_1和col_varchar_2 上的联合bloom索引后，重新执行(2)，你会发现查询同样选择顺序扫描。

postgres=# drop index if exists idx_col_varchar_1_bloom;

DROP INDEX

postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

postgres=#

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

postgres=# explain analyze

select * from testindex where col_varchar_1 in

(

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);



                                                                                      QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=26.604..53.837 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

Planning time: 0.125 ms

Execution time: 53.855 ms

(5 rows)

由此得出的结论是，如果表中数据多且极少重复，而查询条件是类型为varchar的单列上的一个范围很小的集合时，使用btree索引扫描比顺序扫描更快，而相比于bloom索引扫描，查询分析器倾向于顺序扫描。

那么如果我们强制查询使用索引扫描呢？

执行下列命令：

postgres=# set enable_seqscan = off;

SET

这是使用bloom复合索引的查询规划：

postgres=# explain analyze

select * from testindex where col_varchar_1 in

(

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);



                                                                                           QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

Bitmap Heap Scan on testindex (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.944..7.953 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Index Recheck: 1776

   Heap Blocks: exact=1075

   -> Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom (cost=0.00..8292.00 rows=10 width=0) (actual time=5.794..5.794 rows=

1798 loops=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

Planning time: 0.203 ms

Execution time: 7.984 ms

这是使用bloom单列索引的查询规划：

postgres=# postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

postgres=# explain analyze

select * from testindex where col_varchar_1 in

(

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);



                                                                                           QUERY PLAN





---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

Bitmap Heap Scan on testindex (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.250..6.441 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Index Recheck: 316

   Heap Blocks: exact=284

   -> Bitmap Index Scan on idx_col_varchar_1_bloom (cost=0.00..8292.00 rows=10 width=0) (actual time=5.921..5.921 rows=326 l

oops=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

Planning time: 0.209 ms

Execution time: 6.467 ms

(8 rows)

下面是使用这几种索引的进行上述查询的代价估计：

索引类型	无索引	单列btree索引	单列bloom索引	多列bloom索引
查询代价估计	3789.00	82.06	8329.81	8329.81

可以看出，如果表中数据多且极少重复，而查询条件是类型为varchar的单列上的一个集合时，即使这个集合很小，查询使用顺序扫描也比bloom索引扫描更快。