1. 实验环境
硬件: 普通PC一台。内存 8G。磁盘总空间200G。
操作系统:CentOS 6.9
软件:PostgreSQL 10,默认配置。
2. 准备工作
登录PostgreSQL,首先创建表 testindex,它有四列:
postgres=# create table testindex
postgres-# (
postgres(# col_int int,
postgres(# col_float float,
postgres(# col_varchar_1 varchar(64),
postgres(# col_varchar_2 varchar(64)
postgres(# );
CREATE TABLE
现在需要向这张表随机插入十万条数据,其中列col_varchar_1和col_varchar_2中的数据是uuid。插入数据之前,需要创建两个扩展模块:pgcrypto模块含有uuid的生成函数,而bloom模块是下面我们创建bloom索引时需要的:
postgres=#create extension if not exists pgcrypto;
CREATE EXTENSION
postgres=#create extension if not exists bloom;
CREATE EXTENSION
postgres=# insert into testindex(col_int, col_float, col_varchar_1, col_varchar_2)
postgres-# select (100000 * random())::int, 100000 * random(), gen_random_uuid(), gen_random_uuid() from generate_series(1,100000);
INSERT 0 100000
看一看col_varchar_1,col_varchar_2中数据的唯一性如何:
postgres=# select count(col_varchar_1),count(col_varchar_2) from testindex;
结果是:
count | count
--------+--------
100000 | 100000
(1 row)
col_varchar_1和col_varchar_2中没有重复数据。
现在进入了我们的主题。
写一个条件列为col_varchar_1的等值查询 (1),并用查询分析器分析它:
postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';
分析结果如下:
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on testindex (cost=0.00..2789.00 rows=1 width=86) (actual time=6.902..13.586 rows=1 loops=1)
Filter: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Rows Removed by Filter: 99999
Planning time: 0.059 ms
Execution time: 13.610 ms
(5 rows)
3. 实验步骤
3.1. 查询条件是单列上的单值
现在,我们来分三次创建三个索引。它们分别是 col_varchar_1 上的b-tree索引,col_varchar_1 上的bloom索引,以及col_varchar_1和col_varchar_2 上的联合bloom索引。每次以同一个col_varchar_1值作为查询条件,观察查询分析器对查询的分析结果。
创建col_varchar_1 上的b-tree索引:
postgres=# create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);
CREATE INDEX
执行分析查询(1),结果如下:
postgres=#explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
-
Index Scan using idx_col_varchar_1_btree on testindex (cost=0.42..8.44 rows=1 width=86) (actual time=0.039..0.039 rows=1 loops=1)
Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Planning time: 0.298 ms
Execution time: 0.064 ms
(4 rows)
删除idx_col_varchar_1_btree,并创建col_varchar_1 上的bloom索引:
postgres=# drop index if exists idx_col_varchar_1_btree;
DROP INDEX
postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);
CREATE INDEX
执行查询(1),结果如下:
postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
------
Bitmap Heap Scan on testindex (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.831..0.846 rows=1 loops=1)
Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Rows Removed by Index Recheck: 31
Heap Blocks: exact=32
-> Bitmap Index Scan on idx_col_varchar_1_bloom (cost=0.00..1542.00 rows=1 width=0) (actual time=0.803..0.803 rows=32 loo
ps=1)
Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Planning time: 0.142 ms
Execution time: 0.877 ms
(8 rows)
删除idx_col_varchar_1_bloom,并创建col_varchar_1和col_varchar_2 上的联合bloom索引:
postgres=# drop index if exists idx_col_varchar_1_bloom;
DROP INDEX
postgres=# create index idx_col_varchar_1_col_varchar_2_bloom on testindex using bloom (col_varchar_1, col_varchar_2);
CREATE INDEX
执行查询(1),结果如下:
postgres=# explain analyze select * from testindex where col_varchar_1 = '8c8b0314-23e4-48da-abdf-997260db183a';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
------------
Bitmap Heap Scan on testindex (cost=1542.00..1546.01 rows=1 width=86) (actual time=0.924..0.998 rows=1 loops=1)
Recheck Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Rows Removed by Index Recheck: 182
Heap Blocks: exact=171
-> Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom (cost=0.00..1542.00 rows=1 width=0) (actual time=0.815..0.815 rows=1
83 loops=1)
Index Cond: ((col_varchar_1)::text = '8c8b0314-23e4-48da-abdf-997260db183a'::text)
Planning time: 0.123 ms
Execution time: 1.027 ms
(8 rows)
下面是使用这几种索引的进行上述查询的代价估计:
索引类型 |
无索引 |
单列btree索引 |
单列bloom索引 |
多列bloom索引 |
查询代价估计 |
2789.00 |
8.44 |
1546.01 |
1546.01 |
根据查询分析器的分析,可以看出,如果一个查询是建立在类型为varchar的单列上的等值查询,那么论性能, 单列b-tree索引 > 单列bloom 索引 = 多列bloom索引 > 顺序扫描。
3.2 查询条件是单列上的一个已知集合
那么,如果查询条件是单列上的一个集合呢?
删除之前创建的索引:
postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;
DROP INDEX
分析如下查询(2),查询的条件时单列上10个值的集合:
postgres=# explain analyze
postgres-# select * from testindex where col_varchar_1 in
postgres-# (
postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',
postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',
postgres(# '02aded54-5b45-4919-9162-490fd56ee240',
postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',
postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',
postgres(# '13c87623-86c7-44d8-8a77-39144a154238'
postgres(# );
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)
Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-
2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4
8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b
46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Rows Removed by Filter: 99990
Planning time: 0.254 ms
Execution time: 56.891 ms
(5 rows)
创建col_varchar_1 上的b-tree索引,执行分析查询(2),结果如下:
postgres=# create index idx_col_varchar_1_btree on testindex using btree (col_varchar_1);
CREATE INDEX
postgres=# explain analyze
select * from testindex where col_varchar_1 in
(
'8c8b0314-23e4-48da-abdf-997260db183a',
'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
'130c971d-25f7-432e-a649-51ca6d3277f7',
'02aded54-5b45-4919-9162-490fd56ee240',
'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
'4174bcca-4f0d-407a-abc2-31a556773560',
'abe57875-8a3c-4273-9b46-8a5e18ce491c',
'13c87623-86c7-44d8-8a77-39144a154238'
);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
Bitmap Heap Scan on testindex (cost=44.25..82.06 rows=10 width=86) (actual time=0.147..0.148 rows=10 loops=1)
Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04
8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-
1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4
273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_col_varchar_1_btree (cost=0.00..44.25 rows=10 width=0) (actual time=0.143..0.143 rows=10 loop
s=1)
Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd
,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a
5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a
3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Planning time: 0.182 ms
Execution time: 0.276 ms
(7 rows)
删除上面的索引,并创建col_varchar_1 上的bloom索引,分析查询(2),结果如下:
postgres=# drop index if exists idx_col_varchar_1_btree;
DROP INDEX
postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);
CREATE INDEX
postgres=# explain analyze
postgres-# select * from testindex where col_varchar_1 in
postgres-# (
postgres(# '8c8b0314-23e4-48da-abdf-997260db183a',
postgres(# 'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
postgres(# 'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
postgres(# '130c971d-25f7-432e-a649-51ca6d3277f7',
postgres(# '02aded54-5b45-4919-9162-490fd56ee240',
postgres(# '5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
postgres(# '3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
postgres(# '4174bcca-4f0d-407a-abc2-31a556773560',
postgres(# 'abe57875-8a3c-4273-9b46-8a5e18ce491c',
postgres(# '13c87623-86c7-44d8-8a77-39144a154238'
postgres(# );
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=28.207..56.852 rows=10 loops=1)
Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-
2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4
8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b
46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Rows Removed by Filter: 99990
Planning time: 0.201 ms
Execution time: 55.487 ms
奇怪的事发生了,分析器竟然不会选择通过索引扫描,而选择顺序扫描。
而删除idx_col_varchar_1_bloom,并创建col_varchar_1和col_varchar_2 上的联合bloom索引后,重新执行(2),你会发现查询同样选择顺序扫描。
postgres=# drop index if exists idx_col_varchar_1_bloom;
DROP INDEX
postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;
DROP INDEX
postgres=#
postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);
CREATE INDEX
postgres=# explain analyze
select * from testindex where col_varchar_1 in
(
'8c8b0314-23e4-48da-abdf-997260db183a',
'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
'130c971d-25f7-432e-a649-51ca6d3277f7',
'02aded54-5b45-4919-9162-490fd56ee240',
'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
'4174bcca-4f0d-407a-abc2-31a556773560',
'abe57875-8a3c-4273-9b46-8a5e18ce491c',
'13c87623-86c7-44d8-8a77-39144a154238'
);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Seq Scan on testindex (cost=0.00..3789.00 rows=10 width=86) (actual time=26.604..53.837 rows=10 loops=1)
Filter: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b048ff4e-
2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-1c7f-4
8c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4273-9b
46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Rows Removed by Filter: 99990
Planning time: 0.125 ms
Execution time: 53.855 ms
(5 rows)
由此得出的结论是,如果表中数据多且极少重复,而查询条件是类型为varchar的单列上的一个范围很小的集合时,使用btree索引扫描比顺序扫描更快,而相比于bloom索引扫描,查询分析器倾向于顺序扫描。
那么如果我们强制查询使用索引扫描呢?
执行下列命令:
postgres=# set enable_seqscan = off;
SET
这是使用bloom复合索引的查询规划:
postgres=# explain analyze
select * from testindex where col_varchar_1 in
(
'8c8b0314-23e4-48da-abdf-997260db183a',
'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
'130c971d-25f7-432e-a649-51ca6d3277f7',
'02aded54-5b45-4919-9162-490fd56ee240',
'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
'4174bcca-4f0d-407a-abc2-31a556773560',
'abe57875-8a3c-4273-9b46-8a5e18ce491c',
'13c87623-86c7-44d8-8a77-39144a154238'
);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
Bitmap Heap Scan on testindex (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.944..7.953 rows=10 loops=1)
Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04
8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-
1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4
273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Rows Removed by Index Recheck: 1776
Heap Blocks: exact=1075
-> Bitmap Index Scan on idx_col_varchar_1_col_varchar_2_bloom (cost=0.00..8292.00 rows=10 width=0) (actual time=5.794..5.794 rows=
1798 loops=1)
Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd
,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a
5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a
3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Planning time: 0.203 ms
Execution time: 7.984 ms
这是使用bloom单列索引的查询规划:
postgres=# postgres=# drop index if exists idx_col_varchar_1_col_varchar_2_bloom;
DROP INDEX
postgres=# create index idx_col_varchar_1_bloom on testindex using bloom (col_varchar_1);
CREATE INDEX
postgres=# explain analyze
select * from testindex where col_varchar_1 in
(
'8c8b0314-23e4-48da-abdf-997260db183a',
'b7be6295-5e3d-4bd3-93fb-8f3f27939dbd',
'b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed',
'130c971d-25f7-432e-a649-51ca6d3277f7',
'02aded54-5b45-4919-9162-490fd56ee240',
'5a08a5f9-1c7f-48c1-9dbf-a11e3a7edcd2',
'3ad5ce4d-7a9f-415f-981e-0d8892522e0a',
'4174bcca-4f0d-407a-abc2-31a556773560',
'abe57875-8a3c-4273-9b46-8a5e18ce491c',
'13c87623-86c7-44d8-8a77-39144a154238'
);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
Bitmap Heap Scan on testindex (cost=8292.00..8329.81 rows=10 width=86) (actual time=6.250..6.441 rows=10 loops=1)
Recheck Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd,b04
8ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a5f9-
1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a3c-4
273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Rows Removed by Index Recheck: 316
Heap Blocks: exact=284
-> Bitmap Index Scan on idx_col_varchar_1_bloom (cost=0.00..8292.00 rows=10 width=0) (actual time=5.921..5.921 rows=326 l
oops=1)
Index Cond: ((col_varchar_1)::text = ANY ('{8c8b0314-23e4-48da-abdf-997260db183a,b7be6295-5e3d-4bd3-93fb-8f3f27939dbd
,b048ff4e-2cd9-491e-9e7b-dde7ff03d6ed,130c971d-25f7-432e-a649-51ca6d3277f7,02aded54-5b45-4919-9162-490fd56ee240,5a08a
5f9-1c7f-48c1-9dbf-a11e3a7edcd2,3ad5ce4d-7a9f-415f-981e-0d8892522e0a,4174bcca-4f0d-407a-abc2-31a556773560,abe57875-8a
3c-4273-9b46-8a5e18ce491c,13c87623-86c7-44d8-8a77-39144a154238}'::text[]))
Planning time: 0.209 ms
Execution time: 6.467 ms
(8 rows)
下面是使用这几种索引的进行上述查询的代价估计:
索引类型 |
无索引 |
单列btree索引 |
单列bloom索引 |
多列bloom索引 |
查询代价估计 |
3789.00 |
82.06 |
8329.81 |
8329.81 |
可以看出,如果表中数据多且极少重复,而查询条件是类型为varchar的单列上的一个集合时,即使这个集合很小,查询使用顺序扫描也比bloom索引扫描更快。