快速搞懂ClickHouse表引擎

表引擎在ClickHouse中扮演重要角色，直接决定如何存储、读取数据，是否支持并法读写，是否支持索引、查询类型、主从复制等。

ClickHouse提供4类表引擎，分别支持不同场景。如Log系列用于小型表数据分析，MergeTree系列实现大型表数据分析，集成系列用于数据集成。考虑到这么多类型对于新用户经来说难以理解、也难以选择，本文试图对各类引擎进行整理，加深理解，希望对你也有帮助。另外复制系列和分布式表引擎比较复杂，未来再学习分享。

ClickHouse表引擎概述

下图展示了官方文档中列举至此的所有表引擎：

图1

log 系列引擎

log系列表引擎功能相对简单。主要用于快速写小规模数据（少于100万行），然后全部读出来。Log表引擎有几个通用特征：

数据按顺序写如磁盘
不支持更新和删除数据
不支持索引
不支持原子写
插入时阻塞查询操作

它们之间的差异为：

TinyLog 不支持并发读数据文件，查询性能低；格式简单，适合临时存储中间数据。
StripLog 支持并发读取数据文件，相比TinyLog查询性能更佳；在同一文件中存储所有列，文件数量比TinyLog少。
Log 支持并发读取数据文件，相比TinyLog查询性能更佳；每一列存储在独立文件中。

在这里插入图片描述

集成系列引擎

该系列引擎主要用于导入外部数据至ClickHouse或在ClickHouse中直接操作外部数据源：

Kafka: 直接导入Kafka主题数据至ClickHouse
MySQL: 使用MySQL作为存储引擎，可以在ClickHouse中操作MySQL
JDBC/ODBC: 从jdbc 和 odbc连接字符串指定的数据源读取数据
HDFS: 直接读HDFS上指定格式的数据文件

特殊系列表引擎

Memory: 在内存中存储数据，重启数据丢失。查询性能极好，适合100万以内无需持久化的小型表。ClickHouse内部用作临时表。
Buffer: 为目标表设置内存缓冲，当缓冲达到一定条件，数据会写到磁盘上。
File: 直接在本地文件存储数据。
Null: 写数据被丢弃，读数据为空。通常作为管道和物化视图一起使用。

MegerTree系列引擎

上述几种引擎主要用于特定目的，使用场景有一定限制。MegerTree系列引擎是官方提供主要存储引擎，支持所有ClickHouse所有核心功能。
在这里插入图片描述
下面聚焦MegerTree、replaceingmergetree, CollapsingMergeTree, VersionedCollapsingMergeTree, SummingMergeTree, 以及 AggregatingMergeTree 引擎。

MegerTree

MegerTree表引擎主要用于海量数据分析、支持数据分区，顺序存储、主键索引、稀疏索引、数据TTL等。egerTree至此所有ClickHouse SQL语法，但一些功能与MySQL有差异。举例，主键不保证数据重复性。

下面示例创建MegerTree表test_tbl，主键为id, create_time，数据会按照主键进行排序存储，数据根据create_time进行分区，数据仅保存至上个月。

CREATE TABLE test_tbl (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = MergeTree()
   PARTITION BY create_time
     ORDER BY  (id, create_time)
     PRIMARY KEY (id, create_time)
     TTL create_time + INTERVAL 1 MONTH
     SETTINGS index_granularity=8192;

下面写一些示例数据，但我们示例数据有主键重复的数据：

insert into test_tbl values(0, '2023-03-12', null);
insert into test_tbl values(0, '2023-03-12', null);
insert into test_tbl values(1, '2023-03-13', null);
insert into test_tbl values(1, '2023-03-13', null);
insert into test_tbl values(2, '2023-03-14', null);

查询数据：虽然相同主键的数据仅有3条，但查询结果为5条。

SELECT count(*) FROM test_tbl

┌─count()─┐
│       5 │
└─────────┘

SELECT * FROM test_tbl

┌─id─┬─create_time─┬─comment─┐
│  2 │  2023-03-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

因为MergeTree使用了一种类似于LSM树的结构，很多存储层处理逻辑直到比较期间才会发生。因此，在强制执行后台压缩后，再次查询，仍然有5条数据,但因为我们定义了分区，数据被重新组织。

optimize table test_tbl final;

SELECT * FROM test_tbl

┌─id─┬─create_time─┬─comment─┐
│  2 │  2023-03-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

上面示例看到，MegerTree虽然有主键，主要用于家属查询，而不像MySQL保证记录唯一性。即使完成比较后，相同主键的数据行仍在一起。

ReplacingMergeTree

为了解决MegerTree相同主键问题，ClickHouse提供了ReplacingMergeTree引擎，实现记录去重，请看示例：

-- Table building
CREATE TABLE test_tbl_replacing (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = ReplacingMergeTree()
   PARTITION BY create_time
     ORDER BY  (id, create_time)
     PRIMARY KEY (id, create_time)
     TTL create_time + INTERVAL 1 MONTH
     SETTINGS index_granularity=8192;

-- Write duplicate primary key data
insert into test_tbl_replacing values(0, '2023-03--12', null);
insert into test_tbl_replacing values(0, '2023-03--12', null);
insert into test_tbl_replacing values(1, '2023-03--13', null);
insert into test_tbl_replacing values(1, '2023-03--13', null);
insert into test_tbl_replacing values(2, '2023-03--14', null);


SELECT *
FROM test_tbl_replacing

Query id: 8b96a5d3-5089-4721-9b88-9ec86ae4816a

┌─id─┬─create_time─┬─comment─┐
│  2 │  2023-03-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

-- Force background compaction: 
optimize table test_tbl_replacing final;

SELECT *
FROM test_tbl_replacing

Query id: 33efdfe6-c8f1-4428-8307-352ee4c1d71b

┌─id─┬─create_time─┬─comment─┐
│  2 │  2023-03-14 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  1 │  2023-03-13 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘
┌─id─┬─create_time─┬─comment─┐
│  0 │  2023-03-12 │ ᴺᵁᴸᴸ    │
└────┴─────────────┴─────────┘

虽然replaceingmergetree提供去重功能，但仍有一定限制：

在完全优化之前，不能有效实现主键去重功能，举例，一些数据已被去重，但其他数据可能还没有进行去重；
在分布式场景中，相同主键数据可能分散在不同节点的分片中，不同分片中的记录不能去重；
后段优化动作，执行时间不确定；
手动实现优化在大数据场景中需要很长时间，不能满足实时业务需求；

因此，replaceingmergetree引擎在数据最终保证去重场景中更有用，在查询过程中，不能保证主键数据去重。

CollapsingMergeTree

ClickHouse实现CollapsingMergeTree（折叠合并树）引擎为了消除ReplacingMergeTree的限制。该引擎需要一个标签列:Sign, 在创建表时指定。在后端比较时，相同主键行和相反Sign将被折叠，就是被删除。

CollapsingMergeTree数据行根据Sign直分为了两类：Sign=1 称为状态行，Sign=-1称为取消行。每次需要写数据时需要新的状态行，反之删除数据时需要取消行。

在后端比较时，状态行和取消行被自动折叠（删除）。如果还未开始比较，状态行和取消行同时存在。因此，为了实现相同主键记录被折叠，业务曾需要相应逻辑进行支持：

执行删除操作时需要取消行，取消行需要包含与原状态行记录相同的数据（除了Sign列），因此，在应用层需要记录原状态行记录的值，或查询数据库获取删除前的记录值。
因为后端比较时间不能预测，当开始查询时状态行和取消行还未开始折叠，ClickHouse也不能保证相同主键记录一定在相同节点上，不在同一节点上的数据不能折叠。因此在执行count()和Sum(col)聚合函数时会造成冗余数据。为了获取正确结果，业务曾需要重写SQL逻辑，count() 修改为count(Sign), Sum(col) 修改为 sum(colSign)。下面通过示例说明：


-- Table building
CREATE TABLE UAct
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

-- Insert status line, note sign The value of a column is 1
INSERT INTO UAct VALUES (4324182021466249494, 5, 146, 1);

-- Insert a cancel line to offset the above status line. be careful sign The value of a column is-1，Other values are consistent with the status line;
-- And insert a new status row with the same primary key, which is used to PageViews Update from 5 to 6, will Duration Update from 146 to 185.
INSERT INTO UAct VALUES (4324182021466249494, 5, 146, -1), (4324182021466249494, 6, 185, 1);


SELECT * FROM UAct

Query id: c4ca984b-ac9d-46df-bdba-b2cf4e98dc1f

┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │   -1 │
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘


-- 为了获得正确sum结果,我们使用下面SQL：


SELECT
    UserID,
    sum(PageViews * Sign) AS PageViews,
    sum(Duration * Sign) AS Duration
FROM UAct
GROUP BY UserID
HAVING sum(Sign) > 0

Query id: 829e4c7f-11af-47fc-b8d9-3f3a3105d491

┌──────────────UserID─┬─PageViews─┬─Duration─┐
│ 4324182021466249494 │         6 │      185 │
└─────────────────────┴───────────┴──────────┘


-- Force background Compaction
optimize table UAct final;

SELECT * FROM UAct

┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘

虽然CollapsingMergeTree解决了用同一个主键瞬间删除数据的问题，但在连续状态改变和多线程并行写入的情况下，状态行和取消行位置可能乱序，导致无法正常折叠。

下面示例为乱序记录导致不能折叠示例：

-- Table building
CREATE TABLE UAct_order
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

-- Insert cancel line first
INSERT INTO UAct_order VALUES (4324182021466249495, 5, 146, -1);
-- Insert status line after
INSERT INTO UAct_order VALUES (4324182021466249495, 5, 146, 1);

-- force Compaction
optimize table UAct_order final;

-- You can see that even if Compaction After that, the primary key cannot be folded: 2 Row data still exists.
select * from UAct_order;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249495 │         5 │      146 │   -1 │
│ 4324182021466249495 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘

VersionedCollapsingMergeTree

为了解决CollapsingMergeTree引擎乱序不能折叠问题，VersionedCollapsingMergeTree引擎在创建表时增加版本列，用于记录乱序写入时状态行和取消行之间的对应关系。在比较时相同主键、相同版本、相反Sign的记录被删除。

与CollapsingMergeTree类似，为了获得正确结果，业务曾仍需要重写SQL,分别修改 count(), sum(col) 为 sum(Sign), sum(col * Sign)。请看示例：

-- Table building
CREATE TABLE UAct_version
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8,
    Version UInt8
)
ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY UserID;


-- Insert a row first to cancel the row, note Signz=-1, Version=1
INSERT INTO UAct_version VALUES (4324182021466249494, 5, 146, -1, 1);

-- Insert a status line after, note Sign=1, Version=1；
-- And a new status line note Sign=1, Version=2，take PageViews Update from 5 to 6, will Duration Update from 146 to 185.
INSERT INTO UAct_version VALUES (4324182021466249494, 5, 146, 1, 1),(4324182021466249494, 6, 185, 1, 2);


-- 为比较之前查询，显示所有行记录.
SELECT * FROM UAct_version;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │   -1 │
│ 4324182021466249494 │         6 │      185 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┐
│ 4324182021466249494 │         5 │      146 │    1 │
└─────────────────────┴───────────┴──────────┴──────┘


-- 为了获取正确结果，重写下面SQL:  
-- sum(PageViews) => sum(PageViews * Sign), 
-- sum(Duration) => sum(Duration * Sign)
SELECT
    UserID,
    sum(PageViews * Sign) AS PageViews,
    sum(Duration * Sign) AS Duration
FROM UAct_version
GROUP BY UserID
HAVING sum(Sign) > 0;
┌──────────────UserID─┬─PageViews─┬─Duration─┐
│ 4324182021466249494 │         6 │      185 │
└─────────────────────┴───────────┴──────────┘


-- 强制后端比较
optimize table UAct_version final;


-- 查询后，即使顺序乱，但仍获得正确结果.
select * from UAct_version;
┌──────────────UserID─┬─PageViews─┬─Duration─┬─Sign─┬─Version─┐
│ 4324182021466249494 │         6 │      185 │    1 │       2 │
└─────────────────────┴───────────┴──────────┴──────┴─────────┘

SummingMergeTree

ClickHouse的SummingMergeTree引擎提供按照主键列进行聚合求和。在后端比较时，汇总相同主键的多个行，然后使用一行代替，这样既缩减了存储空间，也提升聚集计算的性能。需要注意下面三点：

ClickHouse仅在后端比较时执行按主键求和，执行时间不能确定，因此可能一些数据还没有计算，但同时其他数据已经聚合了。因此在执行聚合计算SQL中仍需要GROUP BY子句。
预聚求和时，ClickHouse预聚合除主键列外的所有列。如果这些列是可聚合的(例如数字类型)，则直接对它们求和;如果它们是不可聚合的(如字符串类型)，随机选择一个值。
通常建议将SummingMergeTree与MergeTree结合使用，后者存储详细信息，并使用SummingMergeTree来存储预聚合的结果加速查询。

请看示例：

-- 创建表Table
CREATE TABLE summtt
(
    key UInt32,
    value UInt32
)
ENGINE = SummingMergeTree()
ORDER BY key

-- 插入数据
INSERT INTO summtt Values(1,1),(1,2),(2,1)

-- 比较前，相同主键记录同时存在
select * from summtt;
┌─key─┬─value─┐
│   1 │     1 │
│   1 │     2 │
│   2 │     1 │
└─────┴───────┘

-- 采用 GROUP BY 执行聚合计算
SELECT key, sum(value) FROM summtt GROUP BY key
┌─key─┬─sum(value)─┐
│   2 │          1 │
│   1 │          3 │
└─────┴────────────┘

-- 强制比较
optimize table summtt final;

-- 比较后查询，可以看到相同主键记录已经聚合
select * from summtt;
┌─key─┬─value─┐
│   1 │     3 │
│   2 │     1 │
└─────┴───────┘


-- 实际使用时，仍然需要分组查询
SELECT key, sum(value) FROM summtt GROUP BY key
┌─key─┬─sum(value)─┐
│   2 │          1 │
│   1 │          3 │
└─────┴────────────┘

AggregatingMergeTree

AggregatingMergeTree也是一种预聚合引擎，用于提升聚合计算性能。与SummingMergeTree的差异是，SummingMergeTree聚合非主键列，而AggregatingMergeTree可以指定不同的聚合函数。

AggregatingMergeTree语法稍微有点复杂，需要和物化视图或特定类型的聚合函数一起使用。在insert,select中，插入时需要使用State语法，查询需要使用Merge语法。

请看示例1，使用复杂视图：

-- Schedule
CREATE TABLE visits
(
    UserID UInt64,
    CounterID UInt8,
    StartDate Date,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

-- Create a materialized view of the schedule, which pre aggregates the schedule
-- Note: the functions used for pre aggregation are: sumState, uniqState. Corresponding to write syntax<agg>-State.
CREATE MATERIALIZED VIEW visits_agg_view
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate)
AS SELECT
    CounterID,
    StartDate,
    sumState(Sign)    AS Visits,
    uniqState(UserID) AS Users
FROM visits
GROUP BY CounterID, StartDate;

-- Insert detail data
INSERT INTO visits VALUES(0, 0, '2019-11-11', 1);
INSERT INTO visits VALUES(1, 1, '2019-11-12', 1);

-- Final aggregation of materialized views
-- Note: the aggregate function used is sumMerge， uniqMerge. Corresponding to query syntax<agg>-Merge.
SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM visits_agg_view
GROUP BY StartDate
ORDER BY StartDate;

-- Ordinary function sum, uniq No longer available
-- as follows SQL Error will be reported: Illegal type AggregateFunction(sum, Int8) of argument 
SELECT
    StartDate,
    sum(Visits),
    uniq(Users)
FROM visits_agg_view
GROUP BY StartDate
ORDER BY StartDate;

示例2：

-- Schedule
CREATE TABLE detail_table
(   CounterID UInt8,
    StartDate Date,
    UserID UInt64
) ENGINE = MergeTree() 
PARTITION BY toYYYYMM(StartDate) 
ORDER BY (CounterID, StartDate);

-- Insert detail data
INSERT INTO detail_table VALUES(0, '2019-11-11', 1);
INSERT INTO detail_table VALUES(1, '2019-11-12', 1);

-- Create a prepolymerization table,
-- Note: among them UserID The type of a column is: AggregateFunction(uniq, UInt64)
CREATE TABLE agg_table
(   CounterID UInt8,
    StartDate Date,
    UserID AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() 
PARTITION BY toYYYYMM(StartDate) 
ORDER BY (CounterID, StartDate);

-- Read data from the parts list and insert the aggregate table.
-- Note: the aggregate function used in the subquery is uniqState， Corresponding to write syntax<agg>-State
INSERT INTO agg_table
select CounterID, StartDate, uniqState(UserID)
from detail_table
group by CounterID, StartDate

-- Cannot use normal insert Statement direction AggregatingMergeTree Insert data in.
-- book SQL Error will be reported: Cannot convert UInt64 to AggregateFunction(uniq, UInt64)
INSERT INTO agg_table VALUES(1, '2019-11-12', 1);

-- Query from aggregate table.
-- Be careful: select The aggregate function used in is uniqMerge，Corresponding to query syntax<agg>-Merge
SELECT uniqMerge(UserID) AS state 
FROM agg_table 
GROUP BY CounterID, StartDate;

总结

本文总体介绍了ClickHouse表引擎，重点按流程详细介绍了MegerTree系列表引擎，并通过实例进行验证，希望对你有帮助。参考资料：https://programmer.help/blogs/how-to-choose-clickhouse-table-engine.html；官方文档：https://clickhouse.com/docs/en/engines/table-engines#mergetree