Clickhouse Beginner to Master-How Clickhouse Works

Why is Clickhouse so fast in query analysis? ? ? ? ? ? ? ?

Because clickhouse uses the following solution:

  1. clickhouse data partition
  2. clickhouse column storage
  3. clickhouse primary index (primary key index)
  4. clickhouse secondary index (hop index)
  5. clickhouse data compression
  6. clickhouse data markup

 Analyze by combining two demand scenarios

  • Data insertion
  • data query

1. clickhouse data partition

About the table partition directory structure: The physical structure of the partition directory of the MergeTree table:

Suppose there is a partitioned table: the table has four fields a, b, date, name, where date is the partition field (by month)

So there are two questions about partitioning:

  • What are the naming rules for partition folders? What are the merge rules?
  • What is the function and meaning of each file list in partition folder China?

 1.1 Naming rules & merging rules

PartitionID_MinBlockNum_MaxBlockNum_level, example: 20230519_1_1_0

PartitionID: the value of the partition field

MinBlockNum: minimum block sequence number

MaxBlockNum: maximum block sequence number

Level: Number of times participating in the merger

BlockNum is a global accumulation inside the table. Every time a new partition directory is created, it will accumulate 1

File directory change process

-- 假设有两条语句
A: insert into table values ('2021-06', 'a'), ('2021-07', 'b');
B: insert into table values ('2021-06', 'c'), ('2021-07', 'd');

After the statement is executed, the following folder will be created

# 结果体现了blockNum的表内全局累加
A语句 创建文件夹:2021-06_1_1_0, 2021-07_2_2_0
B语句 创建文件夹:2021-06_3_3_0, 2021-07_4_4_0

There will be questions here. I have two statements to write data to the same partition. Wouldn't creating two directories increase the number of unnecessary directories? And added a lot of small files?
This is actually why clickhouse is not suitable for high-frequency writing of single or small amounts of data, because clickhouse integrates directories by merging partition files! When there are too many directories to be integrated, the system's IO resources will be greatly consumed.

# 文件目录合并结果
2021-06_1_1_0 + 2021-06_3_3_0 = 2021-06_1_3_1
2021-07_2_2_0 + 2021-07_4_4_0 = 2021-07_2_4_1

 Let’s review the folder naming rules here: ParitionID_MinBlockNum_MaxBlockNum_level
It can be seen intuitively from the results that when multiple directories in the same partition are merged, the smallest blockNum and the largest blockNum will be taken as the batch number, and max(level) + will be taken. 1 method to record the number of merges

The meaning of each file

# 文件目录
## 以下是数据列存储的文件,其中bin是数据内容,mrk是标记文件,结合索引标记数据位置
a.bin
a.mrk2
b.bin
b.mrk2
date.bin
date.mrk2
name.bin
name.mrk2

## 以下是系统数据文件
checksums.txt # 校验文件完整性,二进制存储(primary.idx, count.txt)size大小及size哈希值
columns.txt # 列信息
primary.idx # 一级索引,主键索引
minmax_date.idx # 二级索引,分区字段索引
default_compression_codec.txt # 压缩编码
count.txt # 当前分区的数据量

File directory sequence diagram

 2. Column storage

All OLAP technologies basically use columnar storage. It has the following advantages:

  • Analysis scenarios often require reading a large number of rows but a small number of columns. In row storage mode, all column data are stored in a block. Columns that are not involved in calculations also need to be read during IO, causing unnecessary IO waste. In column storage, only columns that are involved in calculations need to be read. Just columns, which greatly reduces IO consumption and speeds up retrieval efficiency.
  • The data types in the same column of data are the same, which is beneficial to improving the compression ratio and saving a lot of storage space. A high compression ratio means that the data volume is small, and the corresponding IO reading time is shorter.
  • Free compression algorithm selection, different columns can use different compression algorithms according to data type
  • High compression ratio will also reduce memory consumption, and the same memory can cache more data.

Note: Column storage is generally not used in data deletion scenarios. This is something column storage is not good at doing.

 3. Primary index (primary key index, sparse index)

In clickhouse, the primary key index defaults to 8192 pieces of data as an index.

 Regarding the primary index: The primary key of MergeTree is defined using Primary Key. After the primary key is defined, MergeTree will generate a primary index for the data table based on the index_granularity interval (default 8192) and save it to the primary.idx file.

 4. Secondary index (hop index)

The secondary index in clickhouse is built on the basis of the primary index and has an important parameter:

granularity = 3 This means: Create a secondary index on the 3-segment primary index

 Types supported by secondary indexes

  • minmax: Use index_granularity as the unit to store the min and max values ​​calculated by the specified expression; in equal value and range queries, it can help quickly skip blocks that do not meet the requirements and reduce IO
  • set(max_rows): Use index_granularity as the unit to store the specified expression disinct value set, which is used to quickly determine whether the equivalent query hits the block and reduce IO
  • ngrambf_v1(n,size_of_bloom_fiter_in_bytes,number_of_hash_functions,random_seed): After ngram segmenting the string, build a bloom filter, which can optimize query conditions such as equal value, like, in, etc.
  • tokenbf_v1(size_of_bloom_fiter_in_bytes,number_of_hash_functions,random_seed): Similar to ngrambf_v1, the difference is that ngram is not used for word segmentation, but punctuation is used for word segmentation.
  • bloom_filter([false_positive]): Construct a bloom filter on the specified column, which is used to speed up the execution of query conditions such as equal value, like, in and so on.

 5. Data compression

Regarding data compression: Clickhouse's data storage file column.bin stores a column of data. Since a column is of the same data type, it is convenient for efficient compression. When performing compression, please note: a compressed data block consists of header information and compressed data. It consists of two parts. The header information is fixedly represented by 9-bit bytes. Specifically, it consists of 1 UInt8 (1 byte) integer and 2 UInt32 (4 byte) integers, which respectively represent the type of compression algorithm used and the compressed data. Data size and data size before compression. The volume of each compressed data block is specified according to the max_compress_block_size (default 1MB) parameter before compression.

Principle: Every 8192 records is actually a primary index, and an index interval is compressed into a data block.

 Specific compression rules:

1. Single batch data size < 64KB : If a single batch of data is less than 64KB, after continuing to obtain the next batch of data, the next compressed data block will be generated only when the cumulative size >= 64KB. If the average record is less than 8 bytes, multiple data batches are compressed into one data block.

2. Single batch data 64KB <= size <= 1MB : If the size of a single batch of data is between 64KB and 1MB, the next compressed data block is directly generated.

3. Single batch data size > 1MB : If a single batch of data directly exceeds 1MB, it will first be truncated according to the size of 1MB and generate the next compressed data block. The remaining data continues to be subject to the appeal rules. At this time, a batch of data may generate multiple compressed data blocks. If the average size of each record exceeds 128 bytes, the current batch of data will be compressed into multiple data blocks.

 Summary: In a xxx.bin field storage file, one compressed block does not correspond to a first-level index, but a first-level index is constructed for every 8192 pieces of data;

Summary: A column.bin is actually composed of compressed data blocks. The size of each data block is between 64KB ~ 1MB.

 Composition of column.bin data file

 6. Data labeling

Regarding data marking, the data marking file also corresponds to the xxx.bin file one-to-one, and is the data on the relationship between the first-level index and the data block.

Each column field xxx.column corresponds to a xxx.mrk file mark 

Guess you like

Origin blog.csdn.net/wangguoqing_it/article/details/130767891