ClickHouse's new function of WAL

​The new version of ClickHouse starts to support the Write-Ahead-Log (WAL) function.
Insert picture description here
Before introducing WAL, let's review the most basic merge process of MergeTree.

MergeTree high-frequency write problem?

CREATE TABLE test
(
    id UInt8,
    name String,
    age UInt8,
    shijian Date)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shijian)
ORDER BY id;

insert into test values(1001,'张三','18',now());
insert into test values(1002,'李四','28',now());
insert into test values(1003,'王五','38',now());

​ At this point, MergeTree will generate 3 partition directories:

Insert picture description here

​ Partition directory after manual merge

Insert picture description here

​ For ClickHouse MergeTree engine, in the process of writing data, the data will always be written to the disk in the form of data fragments, and the data fragments cannot be modified. For each batch of writes (every INSERT), MergeTree will generate a new partition directory (part) on the disk according to the partitioning rules. In order to avoid too many fragments, at some point in the future, data fragments belonging to the same partition Will be merged into a brand-new partition directory. The feature of this kind of data partition to compound and merge is the origin of the name of the merge tree. Among them, if there are multiple clients and each client writes less data and more frequently, the following error message will be triggered.

Too many parts (N). Merges areprocessing significantly slower than inserts.

​ WAL write-ahead log solves this problem and improves write performance. In the new version of ClickHouse, MergeTree has several more parameters:

	M(SettingUInt64, min_bytes_for_wide_part, 0, xxxxxxxx, 0) \
	M(SettingUInt64, min_rows_for_wide_part, 0, xxxxxxxxx, 0) \
	M(SettingUInt64, min_bytes_for_compact_part, 0, xxxxx, 0) \
	M(SettingUInt64, min_rows_for_compact_part, 0, xxxxxx, 0) \
	M(SettingBool, in_memory_parts_enable_wal, true, xxxx, 0) \
	M(SettingUInt64, write_ahead_log_max_bytes, 1024 * 1024 * 1024, xxxx, 0) \

​ Among them, in_memory_parts_enable_wal is true by default, which means that the write-ahead log is enabled by default.

CREATE TABLE default.test1
(
    id UInt8,
    name String,
    age UInt8,
    shijian Date
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shijian)
ORDER BY id
SETTINGS min_rows_for_compact_part = 2, index_granularity = 8192;

​ min_rows_for_compact_part = 2 means that the data will be written to the memory and WAL first. When Merge is triggered, if the data is greater than 2 rows, the merged partition will be written directly to the disk.

insert into test1 values(1001,'张三','18',now());
insert into test1 values(1002,'李四','28',now());
insert into test1 values(1003,'王五','38',now());

​ The Merge action has not been triggered after writing, the disk directory situation:

Insert picture description here

clickhouse-client -m

	optimize table test1;

Insert picture description here

​ Prior to this, MergeTree had only one wide layout, that is, each column field has a set of independent files, as shown in the following figure:

Insert picture description here

​ Now that the new feature of wal is added, the partition layout of MergeTree has also been expanded. During the insertion process, the data first enters the memory. After the threshold is met, the data in the memory will be flushed to the disk.Insert picture description here

CREATE TABLE default.test2
(
    id UInt8,
    name String,
    age UInt8,
    shijian Date
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(shijian)
ORDER BY id
SETTINGS min_rows_for_compact_part = 2, min_rows_for_wide_part = 10, index_granularity = 8192;
insert into test2 values(1001,'张三','18',now());
insert into test2 values(1002,'李四','28',now());
insert into test2 values(1003,'王五','38',now());
optimize table test2;

Insert picture description here

​ All data is written to the same data.bin file, and all the column mark files are also written to the same .mark file. When there are many column fields and few data, you can consider partitioning in this layout mode.

For more details, please pay attention to the official account
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_45320660/article/details/114450799