PostgreSQL的btree和bloom索引以单列为查询条件时的性能比较

1. 实验环境

硬件: 普通PC一台。内存 8G。磁盘总空间200G。

操作系统:CentOS 6.9

软件:PostgreSQL 10,默认配置。

2. 准备工作

登录PostgreSQL,首先创建表 testindex,它有四列:

postgres=# create table testindex

postgres-# (

postgres(#     col_int int,

postgres(#     col_float float,

postgres(#     col_varchar_1 varchar(64),

postgres(#     col_varchar_2 varchar(64)

postgres(# );

CREATE TABLE

现在需要向这张表随机插入十万条数据,其中列col_varchar_1和col_varchar_2中的数据是uuid。插入数据之前,需要创建两个扩展模块:pgcrypto模块含有uuid的生成函数,而bloom模块是下面我们创建bloom索引时需要的:

postgres=#create extension if not exists pgcrypto;

CREATE EXTENSION

postgres=#create extension if not exists bloom;

CREATE EXTENSION

postgres=# insert into testindex(col_int, col_float, col_varchar_1, col_varchar_2)

postgres-# select (100000 * random())::int, 100000 * random(), gen_random_uuid(), gen_random_uuid() from generate_series(1,100000);

INSERT 0 100000

 

看一看col_varchar_1,col_varchar_2中数据的唯一性如何:

postgres=# select count(col_varchar_1),count(col_varchar_2) from testindex;

结果是:

 

 count  | count 

--------+--------

 100000 | 100000

(1 row)

 

col_varchar_1和col_varchar_2中没有重复数据。

 

现在进入了我们的主题。

写一个条件列为col_varchar_1的等值查询 (1),并用查询分析器分析它:

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';                                                          

 

分析结果如下:

                                            QUERY PLAN                                            

---------------------------------------------------------------------------------------------------

 Seq Scan on testindex  (cost=0.00..2789.00 rows=1 width=86) (actual time=6.902..13.586 rows=1 loops=1)

   Filter: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Filter: 99999

 Planning time: 0.059 ms

 Execution time: 13.610 ms

(5 rows)

3. 实验步骤

3.1. 查询条件是单列上的单值

现在,我们来分三次创建三个索引。它们分别是 col_varchar_1 上的b-tree索引,col_varchar_1 上的bloom索引,以及col_varchar_1和col_varchar_2 上的联合bloom索引。每次以同一个col_varchar_1值作为查询条件,观察查询分析器对查询的分析结果。

 

创建col_varchar_1 上的b-tree索引:

 

postgres=# create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);

CREATE INDEX

 

执行分析查询(1),结果如下:

postgres=#explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

 

                                       QUERY PLAN                                                    

 ---------------------------------------------------------------------------------------------------------------------

-

 Index Scan using idx_col_varchar_1_btree on testindex  (cost=0.42..8.44 rows=1 width=86) (actual time=0.039..0.039 rows=1 loops=1)

   Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

 Planning time: 0.298 ms

 Execution time: 0.064 ms

(4 rows)

 

删除idx_col_varchar_1_btree,并创建col_varchar_1 上的bloom索引:

postgres=# drop index if exists idx_col_varchar_1_btree;

DROP INDEX

 

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

执行查询(1),结果如下:

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

                                                        QUERY PLAN                                                  

     

---------------------------------------------------------------------------------------------------------------------

------

 Bitmap Heap Scan on testindex  (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.831..0.846 rows=1 loops=1)

   Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Index Recheck: 31

   Heap Blocks: exact=32

   ->  Bitmap Index Scan on idx_col_varchar_1_bloom  (cost=0.00..1542.00 rows=1 width=0) (actual time=0.803..0.803 rows=32 loo

ps=1)

         Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

 Planning time: 0.142 ms

 Execution time: 0.877 ms

(8 rows)

 

删除idx_col_varchar_1_bloom,并创建col_varchar_1和col_varchar_2 上的联合bloom索引:

 

postgres=# drop index if exists idx_col_varchar_1_bloom;

DROP INDEX

postgres=# create index idx_col_varchar_1_col_varchar_2_bloom on testindex using bloom (col_varchar_1, col_varchar_2);

CREATE INDEX

 

执行查询(1),结果如下:

postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';

                                                           QUERY PLAN                                               

           

---------------------------------------------------------------------------------------------------------------------

------------

 Bitmap Heap Scan on testindex  (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.924..0.998 rows=1 loops=1)

   Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

   Rows Removed by Index Recheck: 182

   Heap Blocks: exact=171

   ->  Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom  (cost=0.00..1542.00 rows=1 width=0) (actual time=0.815..0.815 rows=1

83 loops=1)

         Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)

 Planning time: 0.123 ms

 Execution time: 1.027 ms

(8 rows)

下面是使用这几种索引的进行上述查询的代价估计:

索引类型

无索引

单列btree索引

单列bloom索引

多列bloom索引

查询代价估计

2789.00

8.44

1546.01

1546.01

根据查询分析器的分析,可以看出,如果一个查询是建立在类型为varchar的单列上的等值查询,那么论性能, 单列b-tree索引 > 单列bloom 索引 = 多列bloom索引 > 顺序扫描。

 

3.2 查询条件是单列上的一个已知集合

那么,如果查询条件是单列上的一个集合呢?

删除之前创建的索引:

postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

 

分析如下查询(2),查询的条件时单列上10个值的集合:

 

postgres=# explain analyze

postgres-# select * from testindex where col_varchar_1 in

postgres-#  (

postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',

postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',

postgres(# '02aded54-5b45-4919-9162-490fd56ee240',

postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',

postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',

postgres(# '13c87623-86c7-44d8-8a77-39144a154238'

postgres(# );                                                          

                                                                                                                     

                                                                                      QUERY PLAN                    

                                                                                                                     

                                                                

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

 Seq Scan on testindex  (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

 Planning time: 0.254 ms

 Execution time: 56.891 ms

(5 rows)

创建col_varchar_1 上的b-tree索引,执行分析查询(2),结果如下:

postgres=#  create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);

CREATE INDEX

 

postgres=# explain analyze                                        

select * from testindex where col_varchar_1 in

 (

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);

                                                                                                                    

                                                                                           QUERY PLAN               

                                                                                                                    

                                                                          

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

 Bitmap Heap Scan on testindex  (cost=44.25..82.06 rows=10 width=86) (actual time=0.147..0.148 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Heap Blocks: exact=1

   ->  Bitmap Index Scan on idx_col_varchar_1_btree  (cost=0.00..44.25 rows=10 width=0) (actual time=0.143..0.143 rows=10 loop

s=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

 Planning time: 0.182 ms

 Execution time: 0.276 ms

(7 rows)

删除上面的索引,并创建col_varchar_1 上的bloom索引,分析查询(2),结果如下:

postgres=# drop index if exists idx_col_varchar_1_btree;

DROP INDEX

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

 

postgres=# explain analyze

postgres-# select * from testindex where col_varchar_1 in

postgres-#  (

postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',

postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',

postgres(# '02aded54-5b45-4919-9162-490fd56ee240',

postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',

postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',

postgres(# '13c87623-86c7-44d8-8a77-39144a154238'

postgres(# );                                                          

                                                                                                                    

                                                                                      QUERY PLAN                    

                                                                                                                    

                                                                

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

 Seq Scan on testindex  (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

 Planning time: 0.201 ms

 Execution time: 55.487 ms

 

奇怪的事发生了,分析器竟然不会选择通过索引扫描,而选择顺序扫描。

 

而删除idx_col_varchar_1_bloom,并创建col_varchar_1和col_varchar_2 上的联合bloom索引后,重新执行(2),你会发现查询同样选择顺序扫描。

 

postgres=# drop index if exists idx_col_varchar_1_bloom;

DROP INDEX

postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

postgres=#

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

postgres=# explain analyze                         

select * from testindex where col_varchar_1 in

 (

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);

                                                                                                                    

                                                                                      QUERY PLAN                    

                                                                                                                    

                                                                

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------

 Seq Scan on testindex  (cost=0.00..3789.00 rows=10 width=86) (actual time=26.604..53.837 rows=10 loops=1)

   Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-

2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4

8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b

46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Filter: 99990

 Planning time: 0.125 ms

 Execution time: 53.855 ms

(5 rows)

 

由此得出的结论是,如果表中数据多且极少重复,而查询条件是类型为varchar的单列上的一个范围很小的集合时,使用btree索引扫描比顺序扫描更快,而相比于bloom索引扫描,查询分析器倾向于顺序扫描。

 

那么如果我们强制查询使用索引扫描呢?

执行下列命令:

postgres=# set enable_seqscan = off;

SET

 

这是使用bloom复合索引的查询规划:

postgres=# explain analyze                    

select * from testindex where col_varchar_1 in

 (

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);

                                                                                                                     

                                                                                           QUERY PLAN               

                                                                                                                     

                                                                          

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

Bitmap Heap Scan on testindex  (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.944..7.953 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Index Recheck: 1776

   Heap Blocks: exact=1075

   ->  Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom  (cost=0.00..8292.00 rows=10 width=0) (actual time=5.794..5.794 rows=

1798 loops=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

 Planning time: 0.203 ms

 Execution time: 7.984 ms

 

这是使用bloom单列索引的查询规划:

postgres=# postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;

DROP INDEX

postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);

CREATE INDEX

 

postgres=# explain analyze         

select * from testindex where col_varchar_1 in

 (

'8c8b0314-23e4-48da-abdf-997260db183a',

'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',

'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',

'130c971d-25f7-432e-a649-51ca6d3277f7',

'02aded54-5b45-4919-9162-490fd56ee240',

'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',

'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',

'4174bcca-4f0d-407a-abc2-31a556773560',

'abe57875-8a3c-4273-9b46-8a5e18ce491c',

'13c87623-86c7-44d8-8a77-39144a154238'

);

                                                                                                                    

                                                                                           QUERY PLAN               

                                                                                                                    

                                                                           

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------

 Bitmap Heap Scan on testindex  (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.250..6.441 rows=10 loops=1)

   Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04

8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-

1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4

273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

   Rows Removed by Index Recheck: 316

   Heap Blocks: exact=284

   ->  Bitmap Index Scan on idx_col_varchar_1_bloom  (cost=0.00..8292.00 rows=10 width=0) (actual time=5.921..5.921 rows=326 l

oops=1)

         Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd

,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a

5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a

3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))

 Planning time: 0.209 ms

 Execution time: 6.467 ms

(8 rows)

 

下面是使用这几种索引的进行上述查询的代价估计:

索引类型

无索引

单列btree索引

单列bloom索引

多列bloom索引

查询代价估计

3789.00

82.06

8329.81

8329.81

可以看出,如果表中数据多且极少重复,而查询条件是类型为varchar的单列上的一个集合时,即使这个集合很小,查询使用顺序扫描也比bloom索引扫描更快。

 

 

猜你喜欢

转载自blog.csdn.net/international24/article/details/84984163