citus 之三 reference table

版权声明:本文为博主原创文章,转载请标明出处。 https://blog.csdn.net/ctypyb2002/article/details/83993485

os: ubuntu 16.04
postgresql: 9.6.8
citus: postgresql-9.6-citus 8.0.0

安装结束,下一篇blog介绍下如何创建表。
citus 有两种表:

  1. distributed table:分片表,rows会分布在 worker节点中。主要用于大量数据的事实表。
  2. reference table:广播表,每个 worker 节点都保存一模一样的数据。主要用于维度表。

登录 coordinator 创建广播表

$ psql -h 192.168.0.92 -U cituser citusdb
citusdb=# create table ref_t0(c0 varchar(100),c1 varchar(100));
CREATE TABLE
citusdb=# create table ref_t1(c0 varchar(100),c1 varchar(100));
CREATE TABLE

citusdb=# select create_reference_table('ref_t0');
 create_reference_table 
------------------------
 
(1 row)

Time: 664.340 ms
citusdb=# select create_reference_table('ref_t1');
 create_reference_table 
------------------------
 
(1 row)

Time: 211.499 ms

citusdb=# \d+
                     List of relations
 Schema |  Name  | Type  |  Owner  |  Size   | Description 
--------+--------+-------+---------+---------+-------------
 public | ref_t0 | table | cituser | 0 bytes | 
 public | ref_t1 | table | cituser | 0 bytes |

pgsql2 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 0 bytes | 
 public | ref_t1_102073 | table | cituser | 0 bytes | 

pgsql3 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 0 bytes | 
 public | ref_t1_102073 | table | cituser | 0 bytes | 

插入数据

citusdb=# insert into ref_t0(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,2000000) as id;

INSERT 0 2000000

citusdb=# insert into ref_t1(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,1000000) as id;

pgsql2 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 193 MB  | 
 public | ref_t1_102073 | table | cituser | 97 MB   |

pgsql3 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | ref_t0_102072 | table | cituser | 193 MB  | 
 public | ref_t1_102073 | table | cituser | 97 MB   |

可以看到,在 worker 节点 pgsql2、pgsql3 上的数据是一模一样。
coordinator 节点 pgsql1 是不保存任何数据的。如下:

citusdb=# \d+
                     List of relations
 Schema |  Name  | Type  |  Owner  |  Size   | Description 
--------+--------+-------+---------+---------+-------------
 public | ref_t0 | table | cituser | 0 bytes | 
 public | ref_t1 | table | cituser | 0 bytes |

执行计划

citusdb=# explain verbose select count(1) from ref_t0;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Router)  (cost=0.00..0.00 rows=0 width=0)
   Output: remote_scan.count
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Node: host=192.168.0.90 port=5432 dbname=citusdb
         ->  Aggregate  (cost=49692.00..49692.01 rows=1 width=8)
               Output: count(1)
               ->  Seq Scan on public.ref_t0_102072 ref_t0  (cost=0.00..44692.00 rows=2000000 width=0)
                     Output: c0, c1
(10 rows)

Time: 15.737 ms

多表join

citusdb=# 
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)
citusdb=# explain verbose select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Router)  (cost=0.00..0.00 rows=0 width=0)
   Output: remote_scan.count
   Task Count: 1
   Tasks Shown: All
   ->  Task
         Node: host=192.168.0.90 port=5432 dbname=citusdb
         ->  Aggregate  (cost=146414.00..146414.01 rows=1 width=8)
               Output: count(1)
               ->  Hash Join  (cost=42659.00..143914.00 rows=1000000 width=0)
                     Hash Cond: ((t0.c0)::text = (t1.c0)::text)
                     ->  Seq Scan on public.ref_t0_102072 t0  (cost=0.00..44692.00 rows=2000000 width=33)
                           Output: t0.c0
                     ->  Hash  (cost=22346.00..22346.00 rows=1000000 width=33)
                           Output: t1.c0
                           ->  Seq Scan on public.ref_t1_102073 t1  (cost=0.00..22346.00 rows=1000000 width=33)
                                 Output: t1.c0
(16 rows)

Time: 393.115 ms

对 distributed table 和 reference table 做了一个简单的对比,发现数据量大稍大时,分片表的优势就显现出来了。
第一次

citusdb=# \timing
Timing is on. 
citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 3478.988 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 2362.888 ms

第二次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 5947.913 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1783.115 ms

第三次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 5951.641 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1274.679 ms

第四次

citusdb=# select count(1) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 4662.890 ms
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)

Time: 1347.655 ms

参考:
https://www.citusdata.com/
https://docs.citusdata.com/en/v8.0/
https://docs.citusdata.com/en/stable/index.html

猜你喜欢

转载自blog.csdn.net/ctypyb2002/article/details/83993485