greenplum（五） greenplum开发笔记之建表规范

背景：

某运营商经分分析系统底层数据仓库；离线分析系统物理模型表表多批量操作批量插入、更新，truncate操作，全表分组分析等。

1 非分区表，建表样例语句：

--drop table dwctr.tc_term_xxx;
create table dwctr.tc_term_xxx(
    acyc_id              integer       not null
    ,bcyc_id             varchar(6)    not null
    ,user_id             varchar(14)
    ,open_date           date
    ,update_time         timestamp
    ,total_cost          decimal(10,2)
)
distributed by (user_id)
with (appendonly=true      --①仅追加
      ,orientation=column  --②列存储
      ,compresstype=zlib   --③压缩算法
      ,compresslevel=5     --④压缩级别
      ,oids=false)         --⑤对象标识符

comment on table  dwctr.tc_term_xxx is '';
comment on column dwctr.tc_term_xxx.acyc_id      is '统计账期';
comment on column dwctr.tc_term_xxx.bcyc_id      is '统计月份';
comment on column dwctr.tc_term_xxx.user_id      is '用户标识';
comment on column dwctr.tc_term_xxx.open_date    is '入网日期';
comment on column dwctr.tc_term_xxx.update_time  is '更新时间';
comment on column dwctr.tc_term_xxx.total_cost   is '成本';

/**
①仅追加：意思是只能Insert，不能update和delete( 以前， 现在通过visimap来标记记录的可见性和是否已删除)。但是可以truncate。
由于常用开发不推荐update和delete操作（效率慢），所以一般数据表均要求为appendonly表
②列存储：不启用的话默认为行存储。列模式对某一列进行查询或聚合，效率会很高。注意：采用列存储必须是appendonly表
③压缩算法：有不同的压缩算法。目前要求只采用zlib算法。注意：采用压缩算法时必须是appendonly表
④压缩级别：与压缩算法是关联的，数值范围在1-9。目前要求固定值为5
压缩表跟列存储来说，前提是必须是appendonly的表
⑤对象标识符：就相当于行号。一般不使用，默认为false
**/

--数据表使用列存储AO表，使用zlib算法5级压缩
--WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=5)

--参数表、维表使用行存储非AO表，非压缩
--WITH (APPENDONLY=false)

--建议
--（1）将临时表建表语句放在目标表建表语句下

--（2）将历史表存储周期Insert语句也放在相应目标表的历史表建表语句下

总结适合单表数据量亿级一下批量操作以及对部分列聚合操作很多的业务场景

2 分区表，建表样例语句：

--drop table dwctr.tch_term_xxx;
create table dwctr.tch_term_xxx(
    acyc_id              integer       not null
    ,bcyc_id             varchar(6)    not null
    ,user_id             varchar(14)
    ,open_date           date
    ,update_time         timestamp
    ,total_cost          decimal(10,2)
)
distributed by (user_id)
with (appendonly=true      --①仅追加
      ,orientation=column  --②列存储
      ,compresstype=zlib   --③压缩算法
      ,compresslevel=5     --④压缩级别
      ,oids=false)         --⑤对象标识符
PARTITION BY LIST(bcyc_id)
(
PARTITION p190001 VALUES('190001') WITH (appendonly=true, orientation=column, compresstype= zlib, compresslevel=5),  --⑥默认分区
PARTITION p201303 VALUES('201303') WITH (appendonly=true, orientation=column, compresstype= zlib, compresslevel=5)   --⑦数据分区
);

comment on table  dwctr.tch_term_xxx is 'XXXXXXXXX';
comment on column dwctr.tch_term_xxx.acyc_id      is '统计账期';
comment on column dwctr.tch_term_xxx.bcyc_id      is '统计月份';
comment on column dwctr.tch_term_xxx.user_id      is '用户标识';
comment on column dwctr.tch_term_xxx.open_date    is '入网日期';
comment on column dwctr.tch_term_xxx.update_time  is '更新时间';
comment on column dwctr.tch_term_xxx.total_cost   is '总成本';

/**

其他参数如上

⑥默认分区：建任何分区表时均应给一条初始化的分区，减少添加不到任何一个分区时出错。
⑦数据分区：分区的values必须是与分区表分区键类型对应的值，如无对应分区，数据插入时会出错。且不需要一次性将所有分区写入，有分区表存储过程可自动增加相应分区（详见分区表存储过程）。
**/

--数据表使用列存储AO表，使用zlib算法5级压缩
--WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=5)

--参数表、维表使用行存储非AO表，非压缩
--WITH (APPENDONLY=false)

--建议
--（1）将临时表建表语句放在目标表建表语句下
--（2）将历史表存储周期Insert语句也放在相应目标表的历史表建表语句下