citus 之二 distributed table

版权声明:本文为博主原创文章,转载请标明出处。 https://blog.csdn.net/ctypyb2002/article/details/83992999

os: ubuntu 16.04
postgresql: 9.6.8
citus: postgresql-9.6-citus 8.0.0

ip规划如下:

192.168.0.92 pgsql1 --coordinator 节点

192.168.0.90 pgsql2 --worker 节点
192.168.0.88 pgsql3 --worker 节点

citus 有两种表:

  1. distributed table:分片表,rows会分布在 worker节点中。主要用于大量数据的事实表。
  2. reference table:广播表,每个 worker 节点都保存一模一样的数据。主要用于维度表。

登录 coordinator 创建分片表

$ psql -h 192.168.0.92 -U cituser citusdb
citusdb=# create table tmp_t0(c0 varchar(100),c1 varchar(100));
CREATE TABLE
citusdb=# select create_distributed_table('tmp_t0','c0');
 create_distributed_table 
--------------------------
 
(1 row)

这里简单说明下:
----表分片
select master_create_distributed_table(‘tmp_t0’,‘c0’,‘hash’);
–设定分片个数及每个分片副本数,默认是 citus.shard_count=32,citus.shard_replication_factor=1
select master_create_worker_shards(‘tmp_t0’,2,2);

citusdb=# \d
         List of relations
 Schema |  Name  | Type  |  Owner  
--------+--------+-------+---------
 public | tmp_t0 | table | cituser
(1 row)

pgsql2 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | tmp_t0_102009 | table | cituser | 0 bytes | 
 public | tmp_t0_102011 | table | cituser | 0 bytes | 
 public | tmp_t0_102013 | table | cituser | 0 bytes | 
 public | tmp_t0_102015 | table | cituser | 0 bytes | 
 public | tmp_t0_102017 | table | cituser | 0 bytes | 
 public | tmp_t0_102019 | table | cituser | 0 bytes | 
 public | tmp_t0_102021 | table | cituser | 0 bytes | 
 public | tmp_t0_102023 | table | cituser | 0 bytes | 
 public | tmp_t0_102025 | table | cituser | 0 bytes | 
 public | tmp_t0_102027 | table | cituser | 0 bytes | 
 public | tmp_t0_102029 | table | cituser | 0 bytes | 
 public | tmp_t0_102031 | table | cituser | 0 bytes | 
 public | tmp_t0_102033 | table | cituser | 0 bytes | 
 public | tmp_t0_102035 | table | cituser | 0 bytes | 
 public | tmp_t0_102037 | table | cituser | 0 bytes | 
 public | tmp_t0_102039 | table | cituser | 0 bytes | 
(16 rows)

pgsql3 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | tmp_t0_102008 | table | cituser | 0 bytes | 
 public | tmp_t0_102010 | table | cituser | 0 bytes | 
 public | tmp_t0_102012 | table | cituser | 0 bytes | 
 public | tmp_t0_102014 | table | cituser | 0 bytes | 
 public | tmp_t0_102016 | table | cituser | 0 bytes | 
 public | tmp_t0_102018 | table | cituser | 0 bytes | 
 public | tmp_t0_102020 | table | cituser | 0 bytes | 
 public | tmp_t0_102022 | table | cituser | 0 bytes | 
 public | tmp_t0_102024 | table | cituser | 0 bytes | 
 public | tmp_t0_102026 | table | cituser | 0 bytes | 
 public | tmp_t0_102028 | table | cituser | 0 bytes | 
 public | tmp_t0_102030 | table | cituser | 0 bytes | 
 public | tmp_t0_102032 | table | cituser | 0 bytes | 
 public | tmp_t0_102034 | table | cituser | 0 bytes | 
 public | tmp_t0_102036 | table | cituser | 0 bytes | 
 public | tmp_t0_102038 | table | cituser | 0 bytes | 
(16 rows)

插入数据

citusdb=# insert into tmp_t0(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,2000000) as id;

INSERT 0 2000000

pgsql2 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | tmp_t0_102009 | table | cituser | 6216 kB | 
 public | tmp_t0_102011 | table | cituser | 6216 kB | 
 public | tmp_t0_102013 | table | cituser | 6200 kB | 
 public | tmp_t0_102015 | table | cituser | 6200 kB | 
 public | tmp_t0_102017 | table | cituser | 6240 kB | 
 public | tmp_t0_102019 | table | cituser | 6208 kB | 
 public | tmp_t0_102021 | table | cituser | 6160 kB | 
 public | tmp_t0_102023 | table | cituser | 6240 kB | 
 public | tmp_t0_102025 | table | cituser | 6184 kB | 
 public | tmp_t0_102027 | table | cituser | 6160 kB | 
 public | tmp_t0_102029 | table | cituser | 6224 kB | 
 public | tmp_t0_102031 | table | cituser | 6192 kB | 
 public | tmp_t0_102033 | table | cituser | 6176 kB | 
 public | tmp_t0_102035 | table | cituser | 6184 kB | 
 public | tmp_t0_102037 | table | cituser | 6256 kB | 
 public | tmp_t0_102039 | table | cituser | 6216 kB | 
(16 rows)

pgsql3 节点上查看

citusdb=# \d+
                        List of relations
 Schema |     Name      | Type  |  Owner  |  Size   | Description 
--------+---------------+-------+---------+---------+-------------
 public | tmp_t0_102008 | table | cituser | 6200 kB | 
 public | tmp_t0_102010 | table | cituser | 6184 kB | 
 public | tmp_t0_102012 | table | cituser | 6232 kB | 
 public | tmp_t0_102014 | table | cituser | 6200 kB | 
 public | tmp_t0_102016 | table | cituser | 6168 kB | 
 public | tmp_t0_102018 | table | cituser | 6264 kB | 
 public | tmp_t0_102020 | table | cituser | 6184 kB | 
 public | tmp_t0_102022 | table | cituser | 6200 kB | 
 public | tmp_t0_102024 | table | cituser | 6208 kB | 
 public | tmp_t0_102026 | table | cituser | 6216 kB | 
 public | tmp_t0_102028 | table | cituser | 6176 kB | 
 public | tmp_t0_102030 | table | cituser | 6152 kB | 
 public | tmp_t0_102032 | table | cituser | 6208 kB | 
 public | tmp_t0_102034 | table | cituser | 6184 kB | 
 public | tmp_t0_102036 | table | cituser | 6192 kB | 
 public | tmp_t0_102038 | table | cituser | 6184 kB | 
(16 rows)

可以看到,在 worker 节点 pgsql2、pgsql3 上的数据基本是均匀分布的。
coordinator 节点 pgsql1 是不保存任何数据的。如下:

citusdb=# \d+
                     List of relations
 Schema |  Name  | Type  |  Owner  |  Size   | Description 
--------+--------+-------+---------+---------+-------------
 public | tmp_t0 | table | cituser | 0 bytes | 
(1 row)

执行计划

citusdb=# explain verbose select count(1) from tmp_t0;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..0.00 rows=0 width=0)
   Output: COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint)
   ->  Custom Scan (Citus Real-Time)  (cost=0.00..0.00 rows=0 width=0)
         Output: remote_scan.count
         Task Count: 32
         Tasks Shown: One of 32
         ->  Task
               Node: host=192.168.0.88 port=5432 dbname=citusdb
               ->  Aggregate  (cost=1552.94..1552.95 rows=1 width=8)
                     Output: count(1)
                     ->  Seq Scan on public.tmp_t0_102008 tmp_t0  (cost=0.00..1396.75 rows=62475 width=0)
                           Output: c0, c1
(12 rows)

有点看不太明白,为什么只有Node: host=192.168.0.88,刚接触citus,先记录下问题。

多表join

coordinator 节点上创建 tmp_t1

citusdb=# create table tmp_t1(c0 varchar(100),c1 varchar(100));
CREATE TABLE
citusdb=# select create_distributed_table('tmp_t1','c0');
 create_distributed_table 
--------------------------
 
(1 row)

citusdb=# insert into tmp_t1(c0,c1) 
select md5(md5((id)::varchar)),md5((id)::varchar) from generate_series(1,1000000) as id;
citusdb=# 
citusdb=# select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
  count  
---------
 1000000
(1 row)
citusdb=# explain select count(1) from tmp_t0 t0,tmp_t1 t1 where t0.c0=t1.c0;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..0.00 rows=0 width=0)
   ->  Custom Scan (Citus Real-Time)  (cost=0.00..0.00 rows=0 width=0)
         Task Count: 32
         Tasks Shown: One of 32
         ->  Task
               Node: host=192.168.0.88 port=5432 dbname=citusdb
               ->  Aggregate  (cost=3114.55..3114.56 rows=1 width=8)
                     ->  Hash Join  (cost=1091.90..3036.22 rows=31329 width=0)
                           Hash Cond: ((t0.c0)::text = (t1.c0)::text)
                           ->  Seq Scan on tmp_t0_102008 t0  (cost=0.00..1396.75 rows=62475 width=33)
                           ->  Hash  (cost=700.29..700.29 rows=31329 width=33)
                                 ->  Seq Scan on tmp_t1_102040 t1  (cost=0.00..700.29 rows=31329 width=33)
(12 rows)

citusdb=# 

貌似还不错。

参考:
https://www.citusdata.com/
https://docs.citusdata.com/en/v8.0/
https://docs.citusdata.com/en/stable/index.html

猜你喜欢

转载自blog.csdn.net/ctypyb2002/article/details/83992999