Clickhouse field auto-increment solution

Implemented with rowNumberInAllBlocks()

Field setting auto-increment rowNumberInAllBlocks() pitfall

In tables using the ReplacingMergeTree table engine, rowNumberInAllBlocks() is used to increment the id during insertion. When distributed, the id will be incremented in each cluster. If the merge is based on id, then the amount of data will be halved.

The diagram is as follows:

If you insert data into a new table and use rowNumberInAllBlocks() to increment a new column of atuo_id, you will find that the same value of atuo_id is the repetition of the number of cluster servers. For example, if there are three servers in the cluster, then there will be three increments of atuo_id of 1.

as the picture shows:

When the number of clusters is small, it can be solved in the following ways:

Method 1: Change a table engine such as MergeTree, but the id is still the same as above. Because id cannot be updated, add an id_auto field, but in turn set the id_auto field to be incremented on the local table using rowNumberInAllBlocks() , one thing to note is to do it on the server in sequence, and each time you run it, you have to look at the maximum value of id_auto after the last run, and auto-increment it on this value.

Method 2: Instead of changing the table engine, use the ReplacingMergeTree table engine. The solution is to use local tables and insert the local tables into which the tables are to be inserted. After each insertion, determine the maximum value of the inserted table id, and add the maximum value in the table to rowNumberInAllBlocks(). value

Guess you like

Origin blog.csdn.net/qq_41110377/article/details/128634788