PG in & not in系列方案比较

PG in&not in

 

 

  • in VS join VS any VS exists
  1. 模型A

postgres=# create table tbl_a (a integer primary key, b char(128));

CREATE TABLE

Time: 67.026 ms

postgres=# create table tbl_b (a integer primary key, b char(128));

CREATE TABLE

Time: 60.716 ms

postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));

INSERT 0 2000001

Time: 4218.271 ms (00:04.218)

postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));

INSERT 0 1000001

Time: 2135.322 ms (00:02.135)

postgres=#

postgres=# select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

  count 

---------

 1000001

(1 row)

 

Time: 629.656 ms

postgres=# select count(*) from tbl_a where a in (select a from tbl_b);

  count 

---------

 1000001

(1 row)

 

Time: 613.041 ms

postgres=# select count(*) from tbl_a where a = any (array (select a from tbl_b));

  count 

---------

 1000001

(1 row)

 

Time: 1391.568 ms (00:01.392)

postgres=#

 

postgres=# select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);

  count 

---------

 1000001

(1 row)

 

Time: 556.529 ms

postgres=#

 

这个数据模型下是exists > in > join > any

看相关的执行计划

 

  1. in

 

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a in (select a from tbl_b);

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=99469.19..99469.20 rows=1 width=8) (actual time=881.258..881.258 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=48602

   ->  Merge Join  (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.684..811.034 rows=1000001 loops=1)

         Inner Unique: true

         Merge Cond: (tbl_a.a = tbl_b.a)

         Buffers: shared hit=48602

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.011..260.072 rows=1100002 loops=1)

               Output: tbl_a.a

               Heap Fetches: 1100002

               Buffers: shared hit=25458

         ->  Index Only Scan using tbl_b_pkey on public.tbl_b  (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.005..234.057 rows=1000001 loops=1)

               Output: tbl_b.a

               Heap Fetches: 1000001

               Buffers: shared hit=23144

 Planning Time: 0.181 ms

 Execution Time: 881.288 ms

(17 rows)

 

Time: 881.762 ms

postgres=#

 

  1. join

 

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=99469.19..99469.20 rows=1 width=8) (actual time=882.490..882.490 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=48602

   ->  Merge Join  (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.831..812.149 rows=1000001 loops=1)

         Inner Unique: true

         Merge Cond: (tbl_a.a = tbl_b.a)

         Buffers: shared hit=48602

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.017..260.735 rows=1100002 loops=1)

               Output: tbl_a.a

               Heap Fetches: 1100002

               Buffers: shared hit=25458

         ->  Index Only Scan using tbl_b_pkey on public.tbl_b  (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.009..234.505 rows=1000001 loops=1)

               Output: tbl_b.a

               Heap Fetches: 1000001

               Buffers: shared hit=23144

 Planning Time: 0.170 ms

 Execution Time: 882.524 ms

(17 rows)

 

Time: 883.040 ms

postgres=#

看join方式是采用的merge join,如果默认走hash join的执行计划会比in快。

关闭merge join强制走hash join发现更慢。(此时使用in也一样会变成hash join变慢)

原本走merge join 采用的是Index Only Scan 强制走hash join时变成了Seq Scan。

postgres=# set enable_mergejoin = off;

SET

Time: 0.153 ms

postgres=# show enable_mergejoin;

 enable_mergejoin

------------------

 off

(1 row)

 

Time: 0.123 ms

postgres=#

 

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);

                                                                QUERY PLAN                                                               

------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=134915.90..134915.91 rows=1 width=8) (actual time=1339.586..1339.586 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=61226, temp read=8254 written=8254

   ->  Hash Join  (cost=46816.02..132415.89 rows=1000001 width=0) (actual time=351.648..1269.892 rows=1000001 loops=1)

         Inner Unique: true

         Hash Cond: (tbl_a.a = tbl_b.a)

         Buffers: shared hit=61226, temp read=8254 written=8254

         ->  Seq Scan on public.tbl_a  (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.010..313.553 rows=2000001 loops=1)

               Output: tbl_a.a

               Buffers: shared hit=40817

         ->  Hash  (cost=30409.01..30409.01 rows=1000001 width=4) (actual time=319.297..319.297 rows=1000001 loops=1)

               Output: tbl_b.a

               Buckets: 131072  Batches: 16  Memory Usage: 3225kB

               Buffers: shared hit=20409, temp written=2738

               ->  Seq Scan on public.tbl_b  (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.005..157.206 rows=1000001 loops=1)

                     Output: tbl_b.a

                     Buffers: shared hit=20409

 Planning Time: 0.118 ms

 Execution Time: 1339.627 ms

(19 rows)

 

Time: 1340.074 ms (00:01.340)

postgres=#

 

 

 

  1. any

 

postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a = any (array(select a from tbl_b));

                                                                   QUERY PLAN                                                                   

-------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=30427.79..30427.80 rows=1 width=8) (actual time=1559.216..1559.216 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=3046285

   InitPlan 1 (returns $0)

     ->  Seq Scan on public.tbl_b  (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.010..146.601 rows=1000001 loops=1)

           Output: tbl_b.a

           Buffers: shared hit=20409

   ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..18.75 rows=10 width=0) (actual time=225.485..1492.757 rows=1000001 loops=1)

         Output: tbl_a.a

         Index Cond: (tbl_a.a = ANY ($0))

         Heap Fetches: 1000001

         Buffers: shared hit=3046285

 Planning Time: 0.097 ms

 Execution Time: 1559.249 ms

(14 rows)

 

Time: 1559.665 ms (00:01.560)

postgres=#

 

  1. exists

postgres=# explain (analyze,buffers,costs,timing,verbose)  select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=99469.19..99469.20 rows=1 width=8) (actual time=816.748..816.749 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=48602

   ->  Merge Join  (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=26.624..749.396 rows=1000001 loops=1)

         Inner Unique: true

         Merge Cond: (tbl_a.a = tbl_b.a)

         Buffers: shared hit=48602

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.010..224.811 rows=1100002 loops=1)

               Output: tbl_a.a

               Heap Fetches: 1100002

               Buffers: shared hit=25458

         ->  Index Only Scan using tbl_b_pkey on public.tbl_b  (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.006..205.844 rows=1000001 loops=1)

               Output: tbl_b.a

               Heap Fetches: 1000001

               Buffers: shared hit=23144

 Planning Time: 0.153 ms

 Execution Time: 816.782 ms

(17 rows)

 

Time: 817.252 ms

 

  1. 模型B

postgres=# create table tbl_c (a integer primary key , c char(128));

CREATE TABLE

postgres=# insert into tbl_c values (generate_series(10000,10010),'');

INSERT 0 11

 

postgres=# select count(*) from tbl_a where a in (select a from tbl_c);

 count

-------

    11

(1 row)

 

Time: 0.189 ms

 

postgres=# select count(*) from tbl_a where a =any(array (select a from tbl_c));

 count

-------

    11

(1 row)

 

Time: 0.160 ms

 

postgres=# select count(*) from tbl_a inner join tbl_c using (a);

 count

-------

    11

(1 row)

 

Time: 0.173 ms

 

postgres=# select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);

 count

-------

    11

(1 row)

 

Time: 0.181 ms

postgres=#

 

差异不大 any > join > exists > in

 

  1. in

 

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a in (select a from tbl_c);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.004..0.005 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.078 ms

 Execution Time: 0.050 ms

(16 rows)

 

Time: 0.265 ms

postgres=#

 

  1. join

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a inner join tbl_c using (a);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.066 ms

 Execution Time: 0.049 ms

(16 rows)

 

Time: 0.248 ms

postgres=#

  1. any

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a =any(array (select a from tbl_c));

                                                              QUERY PLAN                                                              

---------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=30.18..30.19 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=35

   InitPlan 1 (returns $0)

     ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.005 rows=11 loops=1)

           Output: tbl_c.a

           Buffers: shared hit=1

   ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..18.75 rows=10 width=0) (actual time=0.014..0.027 rows=11 loops=1)

         Output: tbl_a.a

         Index Cond: (tbl_a.a = ANY ($0))

         Heap Fetches: 11

         Buffers: shared hit=35

 Planning Time: 0.042 ms

 Execution Time: 0.046 ms

(14 rows)

 

Time: 0.211 ms

postgres=#

  1. exists

postgres=# explain (analyze,buffers,costs,timing,verbose)  select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);

                                                                QUERY PLAN                                                                 

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.069 ms

 Execution Time: 0.047 ms

(16 rows)

 

Time: 0.248 ms

postgres=#

 

  1. 模型C

postgres=# select count(*) from tbl_c where a in (select a from tbl_a);

 count

-------

    11

(1 row)

 

Time: 0.209 ms

postgres=#

 

postgres=# select count(*) from tbl_c inner join tbl_a using (a);

 count

-------

    11

(1 row)

 

Time: 0.173 ms

postgres=#

 

postgres=# select count(*) from tbl_c where a = any (array (select a from tbl_a));

 count

-------

    11

(1 row)

 

Time: 871.603 ms

postgres=#

 

postgres=# select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);

 count

-------

    11

(1 row)

 

Time: 0.182 ms

postgres=#

 

此模型下 join > exists > in > any

 

  1. in

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a in (select a from tbl_a);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.055..0.055 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.030..0.050 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.009..0.010 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.136 ms

 Execution Time: 0.089 ms

(16 rows)

 

Time: 0.525 ms

 

  1. join

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c inner join tbl_a using (a);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.067 ms

 Execution Time: 0.049 ms

(16 rows)

 

Time: 0.255 ms

postgres=#

  1. any

 

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a = any (array (select a from tbl_a));

                                                           QUERY PLAN                                                           

---------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=60828.61..60828.62 rows=1 width=8) (actual time=1029.756..1029.756 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=2040819

   InitPlan 1 (returns $0)

     ->  Seq Scan on public.tbl_a  (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.006..259.035 rows=2000001 loops=1)

           Output: tbl_a.a

           Buffers: shared hit=40817

   ->  Bitmap Heap Scan on public.tbl_c  (cost=4.13..11.70 rows=10 width=0) (actual time=1029.747..1029.749 rows=11 loops=1)

         Recheck Cond: (tbl_c.a = ANY ($0))

         Heap Blocks: exact=1

         Buffers: shared hit=2040819

         ->  Bitmap Index Scan on tbl_c_pkey  (cost=0.00..4.12 rows=10 width=0) (actual time=1029.739..1029.739 rows=11 loops=1)

               Index Cond: (tbl_c.a = ANY ($0))

               Buffers: shared hit=2040818

 Planning Time: 0.082 ms

 Execution Time: 1030.852 ms

(16 rows)

 

Time: 1031.234 ms (00:01.031)

postgres=#

  1. exists

 

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);

                                                                QUERY PLAN                                                                

-------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=45

   ->  Nested Loop  (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.027 rows=11 loops=1)

         Inner Unique: true

         Buffers: shared hit=45

         ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

               Output: tbl_c.a, tbl_c.c

               Buffers: shared hit=1

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)

               Output: tbl_a.a

               Index Cond: (tbl_a.a = tbl_c.a)

               Heap Fetches: 11

               Buffers: shared hit=44

 Planning Time: 0.067 ms

 Execution Time: 0.047 ms

(16 rows)

 

Time: 0.246 ms

postgres=#

  • not in VS except VS join VS not exists
  1. 模型A

postgres=# create table tbl_a (a integer primary key, b char(128));

CREATE TABLE

Time: 67.026 ms

postgres=# create table tbl_b (a integer primary key, b char(128));

CREATE TABLE

Time: 60.716 ms

postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));

INSERT 0 2000001

Time: 4218.271 ms (00:04.218)

postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));

INSERT 0 1000001

Time: 2135.322 ms (00:02.135)

postgres=#

 

 

postgres=# select count(*) from (select a from tbl_a except select a from tbl_b) as t;

  count 

---------

 1000000

(1 row)

 

Time: 1727.102 ms (00:01.727)

postgres=#

 

postgres=# select count(*) from tbl_a left join tbl_b using (a) where tbl_b.b is null;

  count 

---------

 1000000

(1 row)

 

Time: 737.321 ms

postgres=#

 

postgres=# select count(*) from tbl_a where not exists (select null from tbl_b where tbl_a.a=tbl_b.a);

  count 

---------

 1000000

(1 row)

 

Time: 701.591 ms

postgres=#

 

此模型下 not exists > join > except > not in

 

  1. not in

 

 

  1. except

 

 

  1. join

 

  1. not exists

 

  1. 模型B 右侧表数据量少

postgres=# create table tbl_c (a integer primary key , c char(128));

CREATE TABLE

postgres=# insert into tbl_c values (generate_series(10000,10010),'');

INSERT 0 11

 

postgres=# select count(*) from tbl_a where a not in (select a from tbl_c);

  count 

---------

 1999990

(1 row)

 

Time: 341.151 ms

postgres=#

 

postgres=# select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;

  count 

---------

 1999990

(1 row)

 

Time: 401.454 ms

postgres=#

 

postgres=# select count(*) from (select a from tbl_a except select a from tbl_c) as t;

  count 

---------

 1999990

(1 row)

 

Time: 1146.650 ms (00:01.147)

postgres=#

 

postgres=# select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);

  count 

---------

 1999990

(1 row)

 

Time: 402.370 ms

postgres=#

 

此模型下 not in > join > not exists >except

 

  1. not in

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a not in (select a from tbl_c);

                                                          QUERY PLAN                                                         

------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=68328.60..68328.61 rows=1 width=8) (actual time=512.604..512.605 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=40818

   ->  Seq Scan on public.tbl_a  (cost=11.75..65828.61 rows=999994 width=0) (actual time=0.023..379.856 rows=1999990 loops=1)

         Output: tbl_a.a, tbl_a.b

         Filter: (NOT (hashed SubPlan 1))

         Rows Removed by Filter: 11

         Buffers: shared hit=40818

         SubPlan 1

           ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)

                 Output: tbl_c.a

                 Buffers: shared hit=1

 Planning Time: 0.091 ms

 Execution Time: 512.643 ms

(14 rows)

 

Time: 513.050 ms

postgres=#

 

  1. except

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from (select a from tbl_a except select a from tbl_c) as t;

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=368761.05..368761.06 rows=1 width=8) (actual time=2139.793..2139.794 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=40818, temp read=7229 written=7264

   ->  Subquery Scan on t  (cost=333760.54..363761.08 rows=1999989 width=0) (actual time=1205.992..2006.728 rows=1999990 loops=1)

         Output: t.a

         Buffers: shared hit=40818, temp read=7229 written=7264

         ->  SetOp Except  (cost=333760.54..343761.19 rows=1999989 width=8) (actual time=1205.991..1817.584 rows=1999990 loops=1)

               Output: "*SELECT* 1".a, (0)

               Buffers: shared hit=40818, temp read=7229 written=7264

               ->  Sort  (cost=333760.54..338760.86 rows=2000129 width=8) (actual time=1205.984..1447.060 rows=2000012 loops=1)

                     Output: "*SELECT* 1".a, (0)

                     Sort Key: "*SELECT* 1".a

                     Sort Method: external merge  Disk: 35280kB

                     Buffers: shared hit=40818, temp read=7229 written=7264

                     ->  Append  (cost=0.00..90830.23 rows=2000129 width=8) (actual time=0.012..666.137 rows=2000012 loops=1)

                           Buffers: shared hit=40818

                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..80816.78 rows=1999989 width=8) (actual time=0.011..500.831 rows=2000001 loops=1)

                                 Output: "*SELECT* 1".a, 0

                                 Buffers: shared hit=40817

                                 ->  Seq Scan on public.tbl_a  (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.009..290.951 rows=2000001 loops=1)

                                       Output: tbl_a.a

                                       Buffers: shared hit=40817

                           ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..12.80 rows=140 width=8) (actual time=0.010..0.012 rows=11 loops=1)

                                 Output: "*SELECT* 2".a, 1

                                 Buffers: shared hit=1

                                 ->  Seq Scan on public.tbl_c  (cost=0.00..11.40 rows=140 width=4) (actual time=0.007..0.009 rows=11 loops=1)

                                       Output: tbl_c.a

                                       Buffers: shared hit=1

 Planning Time: 0.108 ms

 Execution Time: 2147.694 ms

(30 rows)

 

Time: 2148.186 ms (00:02.148)

postgres=#

 

  1. join

postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=87968.55..87968.56 rows=1 width=8) (actual time=751.825..751.826 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=46286

   ->  Merge Anti Join  (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.024..616.804 rows=1999990 loops=1)

         Merge Cond: (tbl_a.a = tbl_c.a)

         Buffers: shared hit=46286

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.018..409.008 rows=2000001 loops=1)

               Output: tbl_a.a

               Heap Fetches: 2000001

               Buffers: shared hit=46284

         ->  Index Only Scan using tbl_c_pkey on public.tbl_c  (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)

               Output: tbl_c.a

               Heap Fetches: 11

               Buffers: shared hit=2

 Planning Time: 0.107 ms

 Execution Time: 751.875 ms

(16 rows)

 

Time: 752.358 ms

postgres=#

 

  1. not exists

postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);

                                                                         QUERY PLAN                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=87968.55..87968.56 rows=1 width=8) (actual time=740.657..740.657 rows=1 loops=1)

   Output: count(*)

   Buffers: shared hit=46286

   ->  Merge Anti Join  (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.019..607.766 rows=1999990 loops=1)

         Merge Cond: (tbl_a.a = tbl_c.a)

         Buffers: shared hit=46286

         ->  Index Only Scan using tbl_a_pkey on public.tbl_a  (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.014..403.054 rows=2000001 loops=1)

               Output: tbl_a.a

               Heap Fetches: 2000001

               Buffers: shared hit=46284

         ->  Index Only Scan using tbl_c_pkey on public.tbl_c  (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)

               Output: tbl_c.a

               Heap Fetches: 11

               Buffers: shared hit=2

 Planning Time: 0.104 ms

 Execution Time: 740.690 ms

(16 rows)

 

Time: 741.111 ms

postgres=#

 

 

  1. 模型C 左侧表数据量小

 

postgres=# select count(*) from tbl_c where a not in (select a from tbl_a);

 count

-------

     0

(1 row)

 

Time: 7.407 ms

postgres=#

 

postgres=# select count(*) from (select a from tbl_c except select a from tbl_a) as t;

 count

-------

     0

(1 row)

 

Time: 339.787 ms

postgres=#

 

postgres=# select count(*) from tbl_c left join tbl_a using (a) where tbl_a.a is null;

 count

-------

     0

(1 row)

 

Time: 0.169 ms

postgres=#

 

postgres=# select count(*) from tbl_c where not exists (select null from tbl_a where tbl_a.a=tbl_c.a);

 count

-------

     0

(1 row)

 

Time: 0.184 ms

postgres=#

 

此模型下 join > not exists > not in > except

 

  1. not in

select count(*) from tbl_c where a not in (select a from tbl_a);

  1. except

 

  1. join

 

  1. not exists

 

  • 总结

 

in VS join VS any VS exists

not in VS except VS join VS not exists

模型A 左表200W 右表100W

exists > in > join > any

not exists > join > except > not in

模型B 左表200W 右表11

any > join > exists > in

not in > join > not exists >except

模型C 左表11 右表200W

join > exists > in > any

join > not exists > not in > except

 

以上只是单纯的验证,不能作为性能差异的证据,具体差异还需要结合实际的执行计划分析。

其中 in 和 join类的不能完全等价,要看语义。in可以用于隐含去重,join不能做到。

 

猜你喜欢

转载自blog.csdn.net/weixin_42767321/article/details/85601904