一、数据导入性能测试
文件名称 |
文件大小 |
数据量大小 |
导入花费时间 |
表空间大小 |
---|---|---|---|---|
customer.tbl |
317M |
300W | 27s | 114M |
lineorder.tbl |
66G |
6亿 | 1h25m16s | 16.7G |
part.tbl |
135M |
140W | 7s | 24M |
supplier.tbl |
19M |
20W |
1s | 7.5M |
建表语句:
1)主表
CREATE TABLE tutorial.lineorder
(
LO_ORDERKEY UInt32,
LO_LINENUMBER UInt8,
LO_CUSTKEY UInt32,
LO_PARTKEY UInt32,
LO_SUPPKEY UInt32,
LO_ORDERDATE Date,
LO_ORDERPRIORITY LowCardinality(String),
LO_SHIPPRIORITY UInt8,
LO_QUANTITY UInt8,
LO_EXTENDEDPRICE UInt32,
LO_ORDTOTALPRICE UInt32,
LO_DISCOUNT UInt8,
LO_REVENUE UInt32,
LO_SUPPLYCOST UInt32,
LO_TAX UInt8,
LO_COMMITDATE Date,
LO_SHIPMODE LowCardinality(String)
)
ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
2)特征表
CREATE TABLE tutorial.customer
(
C_CUSTKEY UInt32,
C_NAME String,
C_ADDRESS String,
C_CITY LowCardinality(String),
C_NATION LowCardinality(String),
C_REGION LowCardinality(String),
C_PHONE String,
C_MKTSEGMENT LowCardinality(String)
)
ENGINE = MergeTree ORDER BY (C_CUSTKEY);
3)特征表
CREATE TABLE tutorial.part
(
P_PARTKEY UInt32,
P_NAME String,
P_MFGR LowCardinality(String),
P_CATEGORY LowCardinality(String),
P_BRAND LowCardinality(String),
P_COLOR LowCardinality(String),
P_TYPE LowCardinality(String),
P_SIZE UInt8,
P_CONTAINER LowCardinality(String)
)
ENGINE = MergeTree ORDER BY P_PARTKEY;
4)特征表
CREATE TABLE tutorial.supplier
(
S_SUPPKEY UInt32,
S_NAME String,
S_ADDRESS String,
S_CITY LowCardinality(String),
S_NATION LowCardinality(String),
S_REGION LowCardinality(String),
S_PHONE String
)
ENGINE = MergeTree ORDER BY S_SUPPKEY;
二、运行测试
1、基础测试
1)简单聚合 + 条件筛选
编号 |
执行sql |
首次执行时间 |
平均执行时间 |
---|---|---|---|
1 | SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
715ms | 390ms |
2 | SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
84ms | 64ms |
3 | SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue |
52ms | 40ms |
2)简单聚合 + 条件筛选 + group by + order by
编号 |
执行sql |
首次执行时间 |
平均执行时间 |
---|---|---|---|
1 | SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year |
3.48s | 1.706s |
2 | SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year,LO_ORDERPRIORITY |
5.59s | 4.740s |
3 | SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT),sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year,LO_ORDERPRIORITY |
6.834s | 5.943s |
3)多表join + 简单聚合 + 条件筛选
编号 | 执行sql | 首次执行时间 | 平均执行时间 |
---|---|---|---|
1 | SELECT sum(l.LO_EXTENDEDPRICE * l.LO_DISCOUNT) AS revenue FROM tutorial.lineorder l left join tutorial.part p on l.LO_PARTKEY = p.P_PARTKEY WHERE toYear(l.LO_ORDERDATE) = 1993 AND l.LO_DISCOUNT BETWEEN 1 AND 3 AND l.LO_QUANTITY < 25 AND p.P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228'; |
2.943s | 2.595s |
2 | SELECT sum(l.LO_EXTENDEDPRICE * l.LO_DISCOUNT) AS revenue FROM tutorial.lineorder l left join tutorial.customer c on c.C_CUSTKEY = l.LO_CUSTKEY WHERE toYYYYMM(l.LO_ORDERDATE) = 199401 AND l.LO_DISCOUNT BETWEEN 4 AND 6 AND l.LO_QUANTITY BETWEEN 26 AND 35 AND c.C_REGION = 'AMERICA'; |
1.80s | 868ms |
3 | SELECT sum(l.LO_EXTENDEDPRICE * l.LO_DISCOUNT) AS revenue FROM tutorial.lineorder l left join tutorial.supplier s on l.LO_SUPPKEY = s.S_SUPPKEY left join tutorial.part p on l.LO_PARTKEY = p.P_PARTKEY left join tutorial.customer c on l.LO_CUSTKEY = c.C_CUSTKEY WHERE toISOWeek(l.LO_ORDERDATE) = 6 AND toYear(l.LO_ORDERDATE) = 1994 AND l.LO_DISCOUNT BETWEEN 5 AND 7 AND l.LO_QUANTITY BETWEEN 26 AND 35 AND p.P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND c.C_REGION = 'AMERICA' and s.S_NATION = 'UNITED KINGDOM'; |
1.48s | 890ms |
4)多表join + 简单聚合 + 脚尖筛选 + group by + order by
编号 | 执行sql | 首次执行时间 | 平均执行时间 |
---|---|---|---|
1 | SELECT sum(l.LO_REVENUE), toYear(l.LO_ORDERDATE) AS year, p.P_BRAND FROM tutorial.lineorder l left join tutorial.supplier s on l.LO_SUPPKEY = s.S_SUPPKEY left join tutorial.part p on l.LO_PARTKEY = p.P_PARTKEY WHERE p.P_CATEGORY = 'MFGR#12' AND s.S_REGION = 'AMERICA' GROUP BY toYear(l.LO_ORDERDATE), p.P_BRAND ORDER BY toYear(l.LO_ORDERDATE), p.P_BRAND; |
56.34s | 56s |
2 | SELECT sum(l.LO_REVENUE), toYear(l.LO_ORDERDATE) AS year, p.P_BRAND FROM tutorial.lineorder l left join tutorial.supplier s on l.LO_SUPPKEY = s.S_SUPPKEY left join tutorial.part p on l.LO_PARTKEY= p.P_PARTKEY WHERE p.P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND s.S_REGION = 'ASIA' GROUP BY toYear(l.LO_ORDERDATE), p.P_BRAND ORDER BY toYear(l.LO_ORDERDATE), p.P_BRAND; |
42.637s | 41.95s |
3 | SELECT sum(l.LO_REVENUE), toYear(l.LO_ORDERDATE) AS year, p.P_BRAND |
41.7s | 41.645s |
2、BitMap测试
bitMap表设计结构:
序号 | 字段英文名 | 字段中文名 | 字段类型 | 示例 |
---|---|---|---|---|
1 | LO_ORDERPRIORITY | 订单优先级 | LowCardinality(String) | |
2 | LO_ORDERDATE | 订单日期 | date | 1993-01-01 |
3 | order_nbr_bmp | order_nbr_bmp | bitmap |
建表语句:
CREATE TABLE if not exists tutorial.lineorder_bmp(
LO_ORDERPRIORITY LowCardinality(String),
LO_ORDERDATE date,
ORDERKEY_BMP AggregateFunction(groupBitmap,UInt64)
)
ENGINE=AggregatingMergeTree()
partition by LO_ORDERPRIORITY
order by LO_ORDERDATE;
导入数据:
INSERT INTO tutorial.lineorder_bmp
SELECT
LO_ORDERPRIORITY,
LO_ORDERDATE,
groupBitmapState(toUInt64(ORDERKEY_BMP))
from (
select l.LO_ORDERPRIORITY,l.LO_ORDERDATE, l.LO_ORDERKEY as ORDERKEY_BMP
from tutorial.lineorder l
)
group by LO_ORDERPRIORITY,LO_ORDERDATE
测试案例一:统计每一天的每个订单优先级的订单数
序号 | 执行sql | 首次消耗时间 | 平均消耗时间 |
---|---|---|---|
1 | select lb.LO_ORDERDATE ,lb.LO_ORDERPRIORITY ,bitmapCardinality(lb.ORDERKEY_BMP) |
2.9s | 2.9s |
测试案例二:求取1993-01-01和1993-01-02两日订单优先级最高的商品数增长百分比
序号 | 执行sql | 首次消耗时间 | 平均消耗时间 |
---|---|---|---|
1 | select (c1-c0)/c1 |
3.328s | 2.7s |