Clickhouse stops and avoids mutations

Problem Description

When using clickhouse, the delete and update operations of alter query are used to delete and update data. When such operations are used in large quantities, the clickhouse service cannot be connected. Observing the log shows that there are a large number of mutation operations, which seriously occupy clickhouse resources. .
The log is as follows:

2022.10.14 15:43:01.896327 [ 1978 ] {
    
    } <Debug> ads.dw_wxt_means_download_statistics_day (ee73b1ab-dbe7-4ba2-b30f-027803650f2a): Loading mutation: mutation_285350.txt entry, commands size: 1
2022.10.14 15:43:01.905333 [ 1978 ] {
    
    } <Debug> ads.dw_wxt_means_download_statistics_day (ee73b1ab-dbe7-4ba2-b30f-027803650f2a): Loading mutation: mutation_309364.txt entry, commands size: 1
......

After research, it is found that the realization mechanism of ALTER DELETE and UPDATE table data is mutations.
For the MergeTree table, the operation of the mutations mechanism will rewrite the entire data block (rewriting whole data parts), so it is a heavy operation and will seriously occupy service resources. Especially ALTER TABLE...DELETE and ALTER TABLE...UPDATE .
Therefore, in order to ensure the reliability and availability of the clickhouse service, operations using the mutations mechanism should be avoided as much as possible.

problem solving

When a large number of mutations are executed, the normal connection of the clickhouse service will be affected. At this time, the unfinished mutations can be stopped first to ensure service availability.
Normal stop method:

-- Cancel and remove all mutations of the single table:
KILL MUTATION WHERE database = 'default' AND table = 'table'

-- Cancel the specific mutation:
KILL MUTATION WHERE database = 'default' AND table = 'table' AND mutation_id = 'mutation_3.txt'

When there are so many mutation operations that it is impossible to connect to clickhouse to execute the above SQL, you can manually delete the mutation file to stop the unfinished mutation operations.
The mutation file exists in the clickhouse table directory, and the file name is similar to mutation_3.txt, but the numbers are different.
The clickhosue data storage directory can be found from the configuration file, which is replaced by the parameter $CLICKHOUSE_PATH.
Assume that there is a mutation file in table t1 under the bi database, and the deletion method is as follows:

It is recommended to back up the mutatiuon file first to facilitate fault recovery.

systemctl stop clickhouse-server
cd $CLICKHOUSE_PATH/data/bi/t1
rm -rf mutation*
systemctl start clickhouse-server

Gracefully delete and update data

However, in daily use, it is inevitable to delete and update a large amount of data. How to perform data deletion and update gracefully is a problem.
After researching related information of clikhouse, it is found that ReplacingMergeTree can be used .
ReplacingMergeTree is similar to Hbase and has a column version number. When executing merges, only one row is left for the rows with the same sort key:

  • When no column version number is set, the last inserted row is left by default.
  • When setting the column version number, the row with the largest version number will be left. If the version number is the same, it will be executed according to the rule of not setting the version number, for example, the last inserted row will be left.

Using ReplacingMergeTree, for newly added data and updated data, just execute INSERT INTO uniformly, and the old data will be automatically deleted when merging. When querying, in order to ensure that the latest data is used, it is necessary to use FINAL
when executing the query . Considering that you need to pay attention to each query and that FINAL is not easy to use, you can solve it by creating a view. Use FINAL in the view to query the physical table, and use view to query when querying.

https://clickhouse.com/docs/en/sql-reference/statements/alter/#mutations
https://clickhouse.com/docs/en/operations/system-tables/mutations/#system_tables-mutations
https://clickhouse.com/docs/en/sql-reference/statements/kill/#kill-mutation
https://clickhouse.com/docs/en/manage/tuning-for-cloud-cost-efficiency/#avoid-mutations
https://clickhouse.com/docs/zh/engines/table-engines/mergetree-family/replacingmergetree
https://blog.csdn.net/yunqiinsight/article/details/106532398

Guess you like

Origin blog.csdn.net/xwd127429/article/details/127419673