TTL of MergeTree series table engine

1.TTL

​ TTL is time To Live, the survival time of data.

​ In MergeTree, you can set TTL for a column field or the entire table. When the time comes, if the column field level is TTL, the data in this column will be deleted; if the table level is TTL, the data of the entire table will be deleted; if both the column level and the table level TTL are set, it will be The data will be deleted as soon as the time arrives.

​ Whether it is a column or table level TTL, it depends on the DateTime or Date type field, and determines the TTL expiration time through the INTERVAL operation on this time field:

Example:

​ TTL time_column + interval 3 DAY

​ Indicates that the data survival time is 3 days after the time_column time.

​ TTL time_column + interval 1 MONTH

​ Indicates that the data survival time is 1 month after the time_column time.

​ INTERVAL supports operations: second, minute, hour, day, week, month, quarter, year.

1.1 TTL at column level

If you want to set the column-level TTL, you need to declare the TTL expression for them when you define the table fields. The primary key field cannot be declared TTL.

Example:

CREATE TABLE t_column_ttl
(
    id UInt64 COMMENT 'Primary key',
    create_time Datetime,
    product_desc String TTL create_time + toIntervalSecond(10),
    product_type UInt8 TTL create_time + toIntervalSecond(10)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(create_time)
ORDER BY id

insert into table t_column_ttl values(1,now(),'Huawei',1),(2,now()+interval 1 minute,'Apple',2);

select * from t_column_ttl;

┌─id─┬─────────create_time─┬─product_desc─┬─product_type─┐
│  12021-03-11 09:12:04 │ Huawei       │            1 │
│  22021-03-11 09:13:04 │ Apple        │            2 │
└────┴─────────────────────┴──────────────┴──────────────┘

select sleep(10);
select * from t_column_ttl;
┌─id─┬─────────create_time─┬─product_desc─┬─product_type─┐
│  12021-03-11 09:12:04 │              │            0 │
│  22021-03-11 09:13:04 │ Apple        │            2 │
└────┴─────────────────────┴──────────────┴──────────────┘


optimize table t_column_ttl final;
select * from t_column_ttl;
┌─id─┬─────────create_time─┬─product_desc─┬─product_type─┐
│  12021-03-11 09:12:04 │              │            0 │
│  22021-03-11 09:13:04 │              │            0 │
└────┴─────────────────────┴──────────────┴──────────────┘

# 执行optimize命令会强制触发TTL清理,若再次查询可以看到满足TTL条件之后,定义了TTL操作的字段列会被还原为数据类型的默认值。

# 修改列字段的TTL或者修改已有字段的TTL:
alter table t_column_ttl MODIFY COLUMN product_desc String  TTL create_time + INTERVAL  2 DAY;
# 添加字段的TTL:
alter table t_column_ttl add column product_name String comment '产品名称' ttl create_time + interval 3 month;

# 查看TTL的信息:
desc t_column_ttl\G

Row 1:
──────
name:               id
type:               UInt64
default_type:       
default_expression: 
comment:            Primary key
codec_expression:   
ttl_expression:     

Row 2:
──────
name:               create_time
type:               DateTime
default_type:       
default_expression: 
comment:            
codec_expression:   
ttl_expression:     

Row 3:
──────
name:               product_desc
type:               String
default_type:       
default_expression: 
comment:            
codec_expression:   
ttl_expression:     create_time + toIntervalSecond(10)

Row 4:
──────
name:               product_type
type:               UInt8
default_type:       
default_expression: 
comment:            
codec_expression:   
ttl_expression:     create_time + toIntervalSecond(10)

1.2 Table-level TTL

You can add a TTL expression to the table parameters of MergeTree to set TTL for the entire table.

# 表的定义:
CREATE TABLE t_table_ttl
(
    `id` UInt64 COMMENT '主键',
    `create_time` Datetime COMMENT '创建时间',
    `product_desc` String COMMENT '产品描述' ,
    `product_type` UInt8 COMMENT '产品序号'
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(create_time)
ORDER BY create_time
TTL create_time + toIntervalSecond(10)

insert into table t_table_ttl values(1,now(),'Huawei',1),(2,now()+interval 1 minute,'Apple',2);

select *from t_table_ttl;
┌─id─┬─────────create_time─┬─product_desc─┬─product_type─┐
│  12021-03-11 09:29:30 │ Huawei       │            1 │
│  22021-03-11 09:30:30 │ Apple        │            2 │
└────┴─────────────────────┴──────────────┴──────────────┘

optimize table t_table_ttl final;

select *from t_table_ttl;

Ok.

0 rows in set. Elapsed: 0.009 sec. 

表级别的TTL修改:
alter table t_table_ttl modify ttl create_time + interval 2 month;
alter table t_table_ttl modify ttl create_time + tointervalMonth(2);


查看信息:
SELECT 
    database,
    name,
    engine,
    data_paths,
    metadata_path,
    metadata_modification_time,
    partition_key,
    sorting_key
FROM system.tables
WHERE name = 't_table_ttl'

Row 1:
──────
database:                   default
name:                       t_table_ttl
engine:                     MergeTree
data_paths:                 ['/var/lib/clickhouse/data/default/t_table_ttl/']
metadata_path:              /var/lib/clickhouse/metadata/default/t_table_ttl.sql
metadata_modification_time: 2021-03-11 09:40:05
partition_key:              toYYYYMM(create_time)
sorting_key:                create_time


# 查看表的结构:
DESCRIBE TABLE t_table_ttl

Row 1:
──────
name:               id
type:               UInt64
default_type:       
default_expression: 
comment:            主键
codec_expression:   
ttl_expression:     

Row 2:
──────
name:               create_time
type:               DateTime
default_type:       
default_expression: 
comment:            创建时间
codec_expression:   
ttl_expression:     

Row 3:
──────
name:               product_desc
type:               String
default_type:       
default_expression: 
comment:            产品描述
codec_expression:   
ttl_expression:     create_time + toIntervalMinute(10)

Row 4:
──────
name:               product_type
type:               UInt8
default_type:       
default_expression: 
comment:            产品序号
codec_expression:   
ttl_expression:   

Note: TTL at column level or table level currently does not support cancel operations.
1.3 The operating mechanism of TTL
If a MergeTree table is set to TTL, the data partition will be used as the unit when writing data, and a ttl.txt file will be generated in each partition directory.

写入数据:

CREATE TABLE default.t_table_ttl
(
    `id` UInt64 COMMENT '主键',
    `create_time` DateTime COMMENT '创建时间',
    `product_desc` String COMMENT '产品描述',
    `product_type` UInt8 COMMENT '产品序号'
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(create_time)
ORDER BY create_time
TTL create_time + toIntervalMinute(10)
SETTINGS index_granularity = 8192;

insert into t_table_ttl(id,create_time,product_desc,product_type)values(10,now(),'Huawei',1),(20,now()+ interval 10 minute,'Apple',2);

┌─id─┬─────────create_time─┬─product_desc─┬─product_type─┐
│ 102021-03-11 10:04:21 │ Huawei       │            1 │
│ 202021-03-11 10:14:21 │ Apple        │            2 │
└────┴─────────────────────┴──────────────┴──────────────┘
ll /var/lib/clickhouse/data/default/t_table_ttl/
total 4
drwxr-x--- 2 clickhouse clickhouse 322 Mar 11 09:51 202103_1_1_0
drwxr-x--- 2 clickhouse clickhouse   6 Mar 11 09:51 detached
-rw-r----- 1 clickhouse clickhouse   1 Mar 11 09:51 format_version.txt

 ll /var/lib/clickhouse/data/default/t_table_ttl/202103_1_1_0/
total 60
-rw-r----- 1 clickhouse clickhouse 464 Mar 11 09:51 checksums.txt
-rw-r----- 1 clickhouse clickhouse 115 Mar 11 09:51 columns.txt
-rw-r----- 1 clickhouse clickhouse   1 Mar 11 09:51 count.txt
-rw-r----- 1 clickhouse clickhouse  34 Mar 11 09:51 create_time.bin
-rw-r----- 1 clickhouse clickhouse  48 Mar 11 09:51 create_time.mrk2
-rw-r----- 1 clickhouse clickhouse  39 Mar 11 09:51 id.bin
-rw-r----- 1 clickhouse clickhouse  48 Mar 11 09:51 id.mrk2
-rw-r----- 1 clickhouse clickhouse   8 Mar 11 09:51 minmax_create_time.idx
-rw-r----- 1 clickhouse clickhouse   4 Mar 11 09:51 partition.dat
-rw-r----- 1 clickhouse clickhouse   8 Mar 11 09:51 primary.idx
-rw-r----- 1 clickhouse clickhouse  39 Mar 11 09:51 product_desc.bin
-rw-r----- 1 clickhouse clickhouse  48 Mar 11 09:51 product_desc.mrk2
-rw-r----- 1 clickhouse clickhouse  28 Mar 11 09:51 product_type.bin
-rw-r----- 1 clickhouse clickhouse  48 Mar 11 09:51 product_type.mrk2
-rw-r----- 1 clickhouse clickhouse  67 Mar 11 09:51 ttl.txt

cat /var/lib/clickhouse/data/default/t_table_ttl/202103_1_1_0/ttl.txt 
ttl format version: 1
{
    
    "table":{
    
    "min":1615428861,"max":1615429461}}

You can see that MergeTree saves TTL related information through a string of JSON configurations.
Columns are used to store column-level TTL information.
Tables are used to store table-level TTL information. The
min and max stores the minimum and maximum values ​​of the date field specified by the TTL in the current data partition and the timestamp calculated by the INTERVAL expression, respectively.

Find the domestic time corresponding to the Greenwich timetable in JSON
(ttl_max-max(create_time) expire_max
ttl_min-min(create_time) expire_min)

select toDateTime('1615428861') ttl_min,toDateTime('1615429461') ttl_max,ttl_min - min(create_time) expire_min,ttl_max - max(create_time) expire_max from t_table_ttl;

┌─────────────ttl_min─┬─────────────ttl_max─┬─expire_min─┬─expire_max─┐
│ 2021-03-11 10:14:212021-03-11 10:24:21600600 │
└─────────────────────┴─────────────────────┴────────────┴────────────┘

It can be seen that the extreme value interval recorded in ttl.txt is exactly equal to the maximum and minimum value of create_time in the current data partition plus 10 minutes (600S), which is consistent with the expectation of the TTL expression (TTL create_time + toIntervalMinute(10)).

The general processing logic can be inferred through the information recording method of TTL:

1.MergeTree 是以分区目录为单位,通过ttl.txt 记录过期时间,并以此作为判断标准。

2.每当写入一批数据时候,都会基于interval 表达式的计算结果为这个分区生成ttl.txt 文件

3.只有在MergeTree合并分区才会触发TTL过期数据的逻辑

4.在删除分区的时候,选择使用了贪婪算法,算法规则即尽可能找到会最早过期,同时时间最早的分区。

5.若一个分区内某一列因为TTL到期则全部删除,在合并之后生成的新分区目录中将不会包含这个列字段的数据文件(.bin 和.mrk)

note:

1. The default merge frequency of TTL is controlled by the parameter merge_with_ttl_timeout of MergeTree, and the default period is 86400 seconds. It maintains a dedicated TTL task queue. Different from the conventional merge task of MergeTree, if this value is set too small, it may cause performance loss.

This setting means that TTL deletion is performed every 24 hours only on one partition or when a background merge occurs. Therefore, in the worst case, ClickHouse now deletes a partition that matches the TTL delete expression at most every 24 hours.

This behavior may not be ideal, so if you want the TTL delete expression to perform the delete operation faster, you can modify the merge_with_ttl_timeout setting of the table to one hour.

alter table t_table_ttl  MODIFY SETTING merge_with_ttl_timeout = 3600;

2. In addition to triggering TTL merge, optimize command can forcibly trigger merge.

触发一个分区合并:optimize table t;

触发所有分区合并:       optimize table t final;

3. There is currently no way to delete the declaration of ttl, but it provides a method to globally control the startup and shutdown of the TTL merge task:

system stop/start TTL MERGES

For more details, please follow the WeChat public account
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_45320660/article/details/114655052