Migration method of large amount of data in ClickHouse

Regarding Clickhouse backup methods, its official website provides a variety of backup methods for reference. Different business needs have different usage scenarios, and different backup methods need to be used. There is no general solution that can deal with various situations. ClickHouse backup and restore. In today's article, we introduce various Clickhouse migration methods, and specific usage scenarios need to be selected according to requirements.

1. Text file import and export

Exporting the data in the database into a specific format and then importing it is very straightforward, simple and easy to understand, but it can only be used when the amount of data is small. If the amount of data is large, this method will be a disaster.

Export:

clickhouse-client --password 12345678 --query="select * from inuser.t_record FORMAT CSV" > record.csv

Import: Pay attention to the capitalization after FORMAT

cat inuser.record.csv | clickhouse-client --port 9008 --password 12345678 --query="INSERT INTO inuser.record FORMAT CSV"

2. Copy the data directory

Cold data recovery, directly copy the clickhouse data to another machine, modify the relevant configuration to start directly, carefully observe the directory structure of ClickHouse on the file system (configuration file /ect/clickhouse-server/
config <path> configured in .xml), for ease of viewing, only the data and metadata directories are reserved.

Based on this information, directly copy the data and metadata directories (to exclude the system library) to the new cluster to realize data migration

step:

1、停止原先的clickhouse数据库,并打包好 对应数据库或表的 data 和 metadata 数据
2、拷贝到目标clickhouse数据库对应的目录,比如/var/lib/clickhouse 目录下
3、给clickhouse 赋予权限, chown -Rf clickhouse:clickhouse /var/lib/clickhouse/*
						                              chown -Rf clickhouse:clickhouse /var/lib/clickhouse
4、重启目标clickhouse数据库
5、验证数据
        select count(1) form inuser.t_record;

3. Use third-party tools, clickhouse-backup

clickhouse-backup is a ClickHouse backup tool open sourced by the community, which can be used to implement data migration. The principle is to create a backup first, and then import data from the backup, similar to MySQL's mysqldump + SOURCE. This tool can be used as a conventional off-site cold backup solution

# Usage restrictions:

  • Support ClickHouse 1.1.54390 and above
  • MergeTree family of table engines only
  • Does not support backup Tiered storage or storage_policy
  • Maximum backup size on cloud storage is 5TB
  • The maximum number of parts on AWS S3 is 10,000

(1) Download the clickhouse-backup software package

The official way to provide binary version and rpm package

github address: https://github.com/AlexAkulov/clickhouse-backup

Download address: https://github.com/AlexAkulov/clickhouse-backup/releases/download/v1.0.0/clickhouse-backup.tar.gz

(2), modify the clickhouse-backup configuration file config.yml

# Modify this configuration file according to clickhouse's own configuration, such as clickhouse's data directory, database password, monitoring address and port

Official configuration instructions:

In addition to backing up to the local machine, clickhouse-backup also supports remote backup, backing up to s3 [object storage], ftp, sftp, and also supports access using the api interface

(3), view clickhouse-backup related commands

1. View all default configuration items

 clickhouse-backup default-config

2. View the tables that can be backed up [All tables under the system and default libraries have been filtered out in the configuration file]

 [root@localhost clickhouse-backup]# clickhouse-backup tablesbrdatasets.hits_v1  1.50GiB  default 

3. Create a backup

#Full database backup

clickhouse-backup create

The backup is stored in $data_path/backup, the backup name defaults to timestamp, you can manually specify the backup name

 clickhouse-backup create 

The backup contains two directories:

  • metadata directory: Contains DDL SQL required for re-creation
  • shadow directory: Contains data that is the result of ALTER TABLE ... FREEZE operations

single table backup

 clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>

backup table datasets.hits_v1

 clickhouse-backup create  -t datasets.hits_v1

Backup multiple tables datasets.hits_v1, datasets.hits_v2

 clickhouse-backup create  -t datasets.hits_v1,datasets.hits_v2

4. View backup records

[root@localhost datasets]# clickhouse-backup list

5. Delete the backup file

 [root@localhost datasets]# clickhouse-backup delete local 2021-09-06T14-03-23

(4), data recovery

grammar:

clickhouse-backup restore backup name

4. Use clickhouse-backup to backup and restore data

4.1, local backup and recovery

4.2. Different machine remote backup and recovery

5. Use scripts to perform regular remote backup on different machines

6. Frequently asked questions

1. Symptom: When using clickhouse-backup to restore data, a UUID problem is prompted

Guess you like

Origin blog.csdn.net/inthirties/article/details/128478751