PG in&not in

in VS join VS any VS exists

模型A

postgres=# create table tbl_a (a integer primary key, b char(128));

CREATE TABLE

Time: 67.026 ms

postgres=# create table tbl_b (a integer primary key, b char(128));

CREATE TABLE

Time: 60.716 ms

postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));

INSERT 0 2000001

Time: 4218.271 ms (00:04.218)

postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));

INSERT 0 1000001

Time: 2135.322 ms (00:02.135)

postgres=#

postgres=# select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

count

---------

1000001

(1 row)

Time: 629.656 ms

postgres=# select count(*) from tbl_a where a in (select a from tbl_b);

count

---------

1000001

(1 row)

Time: 613.041 ms

postgres=# select count(*) from tbl_a where a = any (array (select a from tbl_b));

count

---------

1000001

(1 row)

Time: 1391.568 ms (00:01.392)

postgres=#

postgres=# select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);

count

---------

1000001

(1 row)

Time: 556.529 ms

postgres=#

这个数据模型下是exists > in > join > any

看相关的执行计划

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a in (select a from tbl_b);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=881.258..881.258 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=48602

-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.684..811.034 rows=1000001 loops=1)

Inner Unique: true

Merge Cond: (tbl_a.a = tbl_b.a)

Buffers: shared hit=48602

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.011..260.072 rows=1100002 loops=1)

Output: tbl_a.a

Heap Fetches: 1100002

Buffers: shared hit=25458

-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.005..234.057 rows=1000001 loops=1)

Output: tbl_b.a

Heap Fetches: 1000001

Buffers: shared hit=23144

Planning Time: 0.181 ms

Execution Time: 881.288 ms

(17 rows)

Time: 881.762 ms

postgres=#

join

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=882.490..882.490 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=48602

-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.831..812.149 rows=1000001 loops=1)

Inner Unique: true

Merge Cond: (tbl_a.a = tbl_b.a)

Buffers: shared hit=48602

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.017..260.735 rows=1100002 loops=1)

Output: tbl_a.a

Heap Fetches: 1100002

Buffers: shared hit=25458

-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.009..234.505 rows=1000001 loops=1)

Output: tbl_b.a

Heap Fetches: 1000001

Buffers: shared hit=23144

Planning Time: 0.170 ms

Execution Time: 882.524 ms

(17 rows)

Time: 883.040 ms

postgres=#

看join方式是采用的merge join，如果默认走hash join的执行计划会比in快。

关闭merge join强制走hash join发现更慢。(此时使用in也一样会变成hash join变慢)

原本走merge join 采用的是Index Only Scan 强制走hash join时变成了Seq Scan。

postgres=# set enable_mergejoin = off;

SET

Time: 0.153 ms

postgres=# show enable_mergejoin;

enable_mergejoin

------------------

off

(1 row)

Time: 0.123 ms

postgres=#

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=134915.90..134915.91 rows=1 width=8) (actual time=1339.586..1339.586 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=61226, temp read=8254 written=8254

-> Hash Join (cost=46816.02..132415.89 rows=1000001 width=0) (actual time=351.648..1269.892 rows=1000001 loops=1)

Inner Unique: true

Hash Cond: (tbl_a.a = tbl_b.a)

Buffers: shared hit=61226, temp read=8254 written=8254

-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.010..313.553 rows=2000001 loops=1)

Output: tbl_a.a

Buffers: shared hit=40817

-> Hash (cost=30409.01..30409.01 rows=1000001 width=4) (actual time=319.297..319.297 rows=1000001 loops=1)

Output: tbl_b.a

Buckets: 131072 Batches: 16 Memory Usage: 3225kB

Buffers: shared hit=20409, temp written=2738

-> Seq Scan on public.tbl_b (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.005..157.206 rows=1000001 loops=1)

Output: tbl_b.a

Buffers: shared hit=20409

Planning Time: 0.118 ms

Execution Time: 1339.627 ms

(19 rows)

Time: 1340.074 ms (00:01.340)

postgres=#

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a = any (array(select a from tbl_b));

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=30427.79..30427.80 rows=1 width=8) (actual time=1559.216..1559.216 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=3046285

InitPlan 1 (returns $0)

-> Seq Scan on public.tbl_b (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.010..146.601 rows=1000001 loops=1)

Output: tbl_b.a

Buffers: shared hit=20409

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..18.75 rows=10 width=0) (actual time=225.485..1492.757 rows=1000001 loops=1)

Output: tbl_a.a

Index Cond: (tbl_a.a = ANY ($0))

Heap Fetches: 1000001

Buffers: shared hit=3046285

Planning Time: 0.097 ms

Execution Time: 1559.249 ms

(14 rows)

Time: 1559.665 ms (00:01.560)

postgres=#

exists

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=816.748..816.749 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=48602

-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=26.624..749.396 rows=1000001 loops=1)

Inner Unique: true

Merge Cond: (tbl_a.a = tbl_b.a)

Buffers: shared hit=48602

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.010..224.811 rows=1100002 loops=1)

Output: tbl_a.a

Heap Fetches: 1100002

Buffers: shared hit=25458

-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.006..205.844 rows=1000001 loops=1)

Output: tbl_b.a

Heap Fetches: 1000001

Buffers: shared hit=23144

Planning Time: 0.153 ms

Execution Time: 816.782 ms

(17 rows)

Time: 817.252 ms

模型B

postgres=# create table tbl_c (a integer primary key , c char(128));

CREATE TABLE

postgres=# insert into tbl_c values (generate_series(10000,10010),'');

INSERT 0 11

postgres=# select count(*) from tbl_a where a in (select a from tbl_c);

count

-------

(1 row)

Time: 0.189 ms

postgres=# select count(*) from tbl_a where a =any(array (select a from tbl_c));

count

-------

(1 row)

Time: 0.160 ms

postgres=# select count(*) from tbl_a inner join tbl_c using (a);

count

-------

(1 row)

Time: 0.173 ms

postgres=# select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);

count

-------

(1 row)

Time: 0.181 ms

postgres=#

差异不大 any > join > exists > in

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a in (select a from tbl_c);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.004..0.005 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.078 ms

Execution Time: 0.050 ms

(16 rows)

Time: 0.265 ms

postgres=#

join

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a inner join tbl_c using (a);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.066 ms

Execution Time: 0.049 ms

(16 rows)

Time: 0.248 ms

postgres=#

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a =any(array (select a from tbl_c));

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=30.18..30.19 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=35

InitPlan 1 (returns $0)

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.005 rows=11 loops=1)

Output: tbl_c.a

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..18.75 rows=10 width=0) (actual time=0.014..0.027 rows=11 loops=1)

Output: tbl_a.a

Index Cond: (tbl_a.a = ANY ($0))

Heap Fetches: 11

Buffers: shared hit=35

Planning Time: 0.042 ms

Execution Time: 0.046 ms

(14 rows)

Time: 0.211 ms

postgres=#

exists

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.069 ms

Execution Time: 0.047 ms

(16 rows)

Time: 0.248 ms

postgres=#

模型C

postgres=# select count(*) from tbl_c where a in (select a from tbl_a);

count

-------

(1 row)

Time: 0.209 ms

postgres=#

postgres=# select count(*) from tbl_c inner join tbl_a using (a);

count

-------

(1 row)

Time: 0.173 ms

postgres=#

postgres=# select count(*) from tbl_c where a = any (array (select a from tbl_a));

count

-------

(1 row)

Time: 871.603 ms

postgres=#

postgres=# select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);

count

-------

(1 row)

Time: 0.182 ms

postgres=#

此模型下 join > exists > in > any

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a in (select a from tbl_a);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.055..0.055 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.030..0.050 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.009..0.010 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.136 ms

Execution Time: 0.089 ms

(16 rows)

Time: 0.525 ms

join

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c inner join tbl_a using (a);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.067 ms

Execution Time: 0.049 ms

(16 rows)

Time: 0.255 ms

postgres=#

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a = any (array (select a from tbl_a));

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=60828.61..60828.62 rows=1 width=8) (actual time=1029.756..1029.756 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=2040819

InitPlan 1 (returns $0)

-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.006..259.035 rows=2000001 loops=1)

Output: tbl_a.a

Buffers: shared hit=40817

-> Bitmap Heap Scan on public.tbl_c (cost=4.13..11.70 rows=10 width=0) (actual time=1029.747..1029.749 rows=11 loops=1)

Recheck Cond: (tbl_c.a = ANY ($0))

Heap Blocks: exact=1

Buffers: shared hit=2040819

-> Bitmap Index Scan on tbl_c_pkey (cost=0.00..4.12 rows=10 width=0) (actual time=1029.739..1029.739 rows=11 loops=1)

Index Cond: (tbl_c.a = ANY ($0))

Buffers: shared hit=2040818

Planning Time: 0.082 ms

Execution Time: 1030.852 ms

(16 rows)

Time: 1031.234 ms (00:01.031)

postgres=#

exists

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=45

-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.027 rows=11 loops=1)

Inner Unique: true

Buffers: shared hit=45

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

Output: tbl_c.a, tbl_c.c

Buffers: shared hit=1

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

Output: tbl_a.a

Index Cond: (tbl_a.a = tbl_c.a)

Heap Fetches: 11

Buffers: shared hit=44

Planning Time: 0.067 ms

Execution Time: 0.047 ms

(16 rows)

Time: 0.246 ms

postgres=#

not in VS except VS join VS not exists

模型A

postgres=# create table tbl_a (a integer primary key, b char(128));

CREATE TABLE

Time: 67.026 ms

postgres=# create table tbl_b (a integer primary key, b char(128));

CREATE TABLE

Time: 60.716 ms

postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));

INSERT 0 2000001

Time: 4218.271 ms (00:04.218)

postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));

INSERT 0 1000001

Time: 2135.322 ms (00:02.135)

postgres=#

postgres=# select count(*) from (select a from tbl_a except select a from tbl_b) as t;

count

---------

1000000

(1 row)

Time: 1727.102 ms (00:01.727)

postgres=#

postgres=# select count(*) from tbl_a left join tbl_b using (a) where tbl_b.b is null;

count

---------

1000000

(1 row)

Time: 737.321 ms

postgres=#

postgres=# select count(*) from tbl_a where not exists (select null from tbl_b where tbl_a.a=tbl_b.a);

count

---------

1000000

(1 row)

Time: 701.591 ms

postgres=#

此模型下 not exists > join > except > not in

not in

except

join

not exists

模型B 右侧表数据量少

postgres=# create table tbl_c (a integer primary key , c char(128));

CREATE TABLE

postgres=# insert into tbl_c values (generate_series(10000,10010),'');

INSERT 0 11

postgres=# select count(*) from tbl_a where a not in (select a from tbl_c);

count

---------

1999990

(1 row)

Time: 341.151 ms

postgres=#

postgres=# select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;

count

---------

1999990

(1 row)

Time: 401.454 ms

postgres=#

postgres=# select count(*) from (select a from tbl_a except select a from tbl_c) as t;

count

---------

1999990

(1 row)

Time: 1146.650 ms (00:01.147)

postgres=#

postgres=# select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);

count

---------

1999990

(1 row)

Time: 402.370 ms

postgres=#

此模型下 not in > join > not exists >except

not in

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a not in (select a from tbl_c);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=68328.60..68328.61 rows=1 width=8) (actual time=512.604..512.605 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=40818

-> Seq Scan on public.tbl_a (cost=11.75..65828.61 rows=999994 width=0) (actual time=0.023..379.856 rows=1999990 loops=1)

Output: tbl_a.a, tbl_a.b

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 11

Buffers: shared hit=40818

SubPlan 1

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

Output: tbl_c.a

Buffers: shared hit=1

Planning Time: 0.091 ms

Execution Time: 512.643 ms

(14 rows)

Time: 513.050 ms

postgres=#

except

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from (select a from tbl_a except select a from tbl_c) as t;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=368761.05..368761.06 rows=1 width=8) (actual time=2139.793..2139.794 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=40818, temp read=7229 written=7264

-> Subquery Scan on t (cost=333760.54..363761.08 rows=1999989 width=0) (actual time=1205.992..2006.728 rows=1999990 loops=1)

Output: t.a

Buffers: shared hit=40818, temp read=7229 written=7264

-> SetOp Except (cost=333760.54..343761.19 rows=1999989 width=8) (actual time=1205.991..1817.584 rows=1999990 loops=1)

Output: "*SELECT* 1".a, (0)

Buffers: shared hit=40818, temp read=7229 written=7264

-> Sort (cost=333760.54..338760.86 rows=2000129 width=8) (actual time=1205.984..1447.060 rows=2000012 loops=1)

Output: "*SELECT* 1".a, (0)

Sort Key: "*SELECT* 1".a

Sort Method: external merge Disk: 35280kB

Buffers: shared hit=40818, temp read=7229 written=7264

-> Append (cost=0.00..90830.23 rows=2000129 width=8) (actual time=0.012..666.137 rows=2000012 loops=1)

Buffers: shared hit=40818

-> Subquery Scan on "*SELECT* 1" (cost=0.00..80816.78 rows=1999989 width=8) (actual time=0.011..500.831 rows=2000001 loops=1)

Output: "*SELECT* 1".a, 0

Buffers: shared hit=40817

-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.009..290.951 rows=2000001 loops=1)

Output: tbl_a.a

Buffers: shared hit=40817

-> Subquery Scan on "*SELECT* 2" (cost=0.00..12.80 rows=140 width=8) (actual time=0.010..0.012 rows=11 loops=1)

Output: "*SELECT* 2".a, 1

Buffers: shared hit=1

-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.007..0.009 rows=11 loops=1)

Output: tbl_c.a

Buffers: shared hit=1

Planning Time: 0.108 ms

Execution Time: 2147.694 ms

(30 rows)

Time: 2148.186 ms (00:02.148)

postgres=#

join

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=87968.55..87968.56 rows=1 width=8) (actual time=751.825..751.826 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=46286

-> Merge Anti Join (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.024..616.804 rows=1999990 loops=1)

Merge Cond: (tbl_a.a = tbl_c.a)

Buffers: shared hit=46286

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.018..409.008 rows=2000001 loops=1)

Output: tbl_a.a

Heap Fetches: 2000001

Buffers: shared hit=46284

-> Index Only Scan using tbl_c_pkey on public.tbl_c (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)

Output: tbl_c.a

Heap Fetches: 11

Buffers: shared hit=2

Planning Time: 0.107 ms

Execution Time: 751.875 ms

(16 rows)

Time: 752.358 ms

postgres=#

not exists

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=87968.55..87968.56 rows=1 width=8) (actual time=740.657..740.657 rows=1 loops=1)

Output: count(*)

Buffers: shared hit=46286

-> Merge Anti Join (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.019..607.766 rows=1999990 loops=1)

Merge Cond: (tbl_a.a = tbl_c.a)

Buffers: shared hit=46286

-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.014..403.054 rows=2000001 loops=1)

Output: tbl_a.a

Heap Fetches: 2000001

Buffers: shared hit=46284

-> Index Only Scan using tbl_c_pkey on public.tbl_c (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)

Output: tbl_c.a

Heap Fetches: 11

Buffers: shared hit=2

Planning Time: 0.104 ms

Execution Time: 740.690 ms

(16 rows)

Time: 741.111 ms

postgres=#

模型C 左侧表数据量小

postgres=# select count(*) from tbl_c where a not in (select a from tbl_a);

count

-------

(1 row)

Time: 7.407 ms

postgres=#

postgres=# select count(*) from (select a from tbl_c except select a from tbl_a) as t;

count

-------

(1 row)

Time: 339.787 ms

postgres=#

postgres=# select count(*) from tbl_c left join tbl_a using (a) where tbl_a.a is null;

count

-------

(1 row)

Time: 0.169 ms

postgres=#

postgres=# select count(*) from tbl_c where not exists (select null from tbl_a where tbl_a.a=tbl_c.a);

count

-------

(1 row)

Time: 0.184 ms

postgres=#

此模型下 join > not exists > not in > except

not in

select count(*) from tbl_c where a not in (select a from tbl_a);

except

join

not exists

总结

	in VS join VS any VS exists	not in VS except VS join VS not exists
模型A 左表200W 右表100W	exists > in > join > any	not exists > join > except > not in
模型B 左表200W 右表11	any > join > exists > in	not in > join > not exists >except
模型C 左表11 右表200W	join > exists > in > any	join > not exists > not in > except

以上只是单纯的验证，不能作为性能差异的证据，具体差异还需要结合实际的执行计划分析。

其中 in 和 join类的不能完全等价，要看语义。in可以用于隐含去重，join不能做到。

PG in & not in系列方案比较

PG in&not in

猜你喜欢