Overview of Clickhouse expansion plan

2. Expansion

2.1 Expansion copy

For the expansion of the replica node, when the ck cluster adds a new replica node, zk will automatically synchronize the data in the original replica to the newly added replica node.

1. The general steps to expand the copy

  • Modify the configuration in the expanded replica node and add the current replica node to the cluster configuration
  • Start the expansion replica node node and create the relevant replication table (at this time, the replica node query request can be routed to all replica nodes normally, but the original replica node configuration file is not refreshed, and only the original replica node can be routed)
  • Modify the configuration file of the original replica node and add the new replica node information to the cluster configuration

2. Case test

1) Configuration before expansion

-- 配置文件
<clickhouse_remote_servers>
    <!-- Test only shard config for testing distributed storage -->
    <shard1_repl1>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>sdw1</host>
                <port>9000</port>
            </replica>
        </shard>
    </shard1_repl1>
</clickhouse_remote_servers>


-- sdw1集群信息
sdw1 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard1_repl1 │         1 │            1 │           1 │ sdw1      │ 172.16.104.12 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

1 rows in set. Elapsed: 0.005 sec.

-- sdw1上复制表信息
sdw1 :) show tables;

SHOW TABLES

┌─name─┐
│ tt1  │
└──────┘

1 rows in set. Elapsed: 0.007 sec.

sdw1 :) select * from tt1 order by id;

SELECT *
FROM tt1
ORDER BY id ASC

┌─id─┬─name─┬─create_date─┐
│  4 │ ww   │  2020-01-02 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  6 │ dsk  │  2020-07-20 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│ 19 │ bw   │  2021-02-18 │
└────┴──────┴─────────────┘

3 rows in set. Elapsed: 0.012 sec.

2) Modify the configuration file

<clickhouse_remote_servers>
    <!-- Test only shard config for testing distributed storage -->
    <shard1_repl1>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>true</internal_replication>
            <replica>
                <host>sdw1</host>
                <port>9000</port>
            </replica>
<!-- 将新增的副本节点添加至集群配置中 -->
            <replica>
                <host>sdw2</host>
                <port>9000</port>
            </replica>

        </shard>
    </shard1_repl1>
</clickhouse_remote_servers>

<!-- 新增副本节点按照规律填写macros信息 -->
<macros>
    <layer>01</layer>
    <shard>01</shard>
    <replica>cluster01-01-2</replica>
</macros>

3) After modifying the configuration information of the sdw2 node, start the newly added replica node ck service, and manually create the related table structure. At this time, for the sdw2 node, the cluster replica information is complete and can be routed to any node normally. At that time, for the sdw1 node, because the configuration file has not been refreshed, the cluster replica still only has sdw1.

-- 启动sdw2节点
# systemctl  restart clickhouse-server


-- 在sdw2节点上手动创建表结构
sdw2 :) create table db1.tt1 (`id` Int32,`name` String,`create_date` Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/tt1', '{replica}') PARTITION BY toYYYYMM(create_date) ORDER BY id SETTINGS index_granularity = 8192;
sdw2 :) create table db1.tt2 on cluster shard1_repl1 (`id` Int32,`name` String) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/tt2', '{replica}') ORDER BY id SETTINGS index_granularity = 8192;

-- sdw1上ck集群仍然还是1节点
sdw1 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard1_repl1 │         1 │            1 │           1 │ sdw1      │ 172.16.104.12 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

1 rows in set. Elapsed: 0.006 sec.

-- sdw2上ck集群信息已经刷新为扩容之后的状态
sdw2 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard1_repl1 │         1 │            1 │           1 │ sdw1      │ 172.16.104.12 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ shard1_repl1 │         1 │            1 │           2 │ sdw2      │ 172.16.104.13 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.011 sec.

4) Check whether the data of the sdw2 node can be replicated normally

sdw2 :) select * from tt1 order by id;

SELECT *
FROM tt1
ORDER BY id ASC

┌─id─┬─name─┬─create_date─┐
│  4 │ ww   │  2020-01-02 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  6 │ dsk  │  2020-07-20 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│ 19 │ bw   │  2021-02-18 │
└────┴──────┴─────────────┘

3 rows in set. Elapsed: 0.011 sec.

5) Modify the sdw1 node configuration file to check the validity of the configuration

-- 检查sdw1的ck服务集群信息,集群信息已经刷新为扩容后信息
sdw1 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard1_repl1 │         1 │            1 │           1 │ sdw1      │ 172.16.104.12 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ shard1_repl1 │         1 │            1 │           2 │ sdw2      │ 172.16.104.13 │ 9000 │        0 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.010 sec.


5) Check the sdw1 cluster service information

-- sdw2正常写入数据
sdw1 :) select * from tt1 order by id;

SELECT *
FROM tt1
ORDER BY id ASC

┌─id─┬─name─┬─create_date─┐
│  1 │ aa   │  2020-01-04 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  4 │ ww   │  2020-01-02 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  6 │ dsk  │  2020-07-20 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│ 19 │ bw   │  2021-02-18 │
└────┴──────┴─────────────┘

4 rows in set. Elapsed: 0.015 sec.

-- sdw1检查数据
sdw1 :) select * from tt1 order by id;

SELECT *
FROM tt1
ORDER BY id ASC

┌─id─┬─name─┬─create_date─┐
│  1 │ aa   │  2020-01-04 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  4 │ ww   │  2020-01-02 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  6 │ dsk  │  2020-07-20 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│ 19 │ bw   │  2021-02-18 │
└────┴──────┴─────────────┘

4 rows in set. Elapsed: 0.015 sec.


2.2 Sharding

1. General steps for scaling up shards

Option one (redistribution of historical data):

  • Add a cluster to the original shard node and the newly added shard node. The new cluster provides information about all nodes after our expansion.
  • On the original shard node, create a new table table_bak that is consistent with the historical table structure, and you need to pay attention to the choice of engine and cluster
  • Migrate the distributed table data backup of the original cluster to the distributed table corresponding to the new cluster by means of the snapshot table. This step will automatically route the table data into fragments.
  • Replace the local table table_bak of the new cluster with table through rename, and create a new distributed table to complete the redistribution of data

Method two (do not migrate historical data):

  • Whether it is a new shard node or the original shard node, directly add the information of the new shard to the original cluster
  • Manually create local tables and distributed tables in the newly added sharding node, and rebuild the distributed table statement on the original sharding node
  • The historical data is still retained in the original shard node, and the new data is written into the normal shards of the cluster after the expansion.

In the case of not migrating historical data, the table should be set with TTL as much as possible, so as to avoid a huge tilt in the amount of data on a node, resulting in excessive load on a single node.

2. Case test

1) Environmental inspection before expansion

-- 配置文件
<clickhouse_remote_servers>
    <!-- Test only shard config for testing distributed storage -->
    <shard2_repl0>
	<shard>
	    <replica>
		<host>mdw</host>
		<port>9000</port>
	    </replica>
	</shard>
    </shard2_repl0>
</clickhouse_remote_servers>

-- 集群信息
mdw :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0 │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2) New cluster configuration

<clickhouse_remote_servers>
    <!-- 新增分片节点不需要再配置该集群信息 -->
    <shard2_repl0>
	<shard>
	    <replica>
		<host>mdw</host>
		<port>9000</port>
	    </replica>
	</shard>
    </shard2_repl0>
    
    <!-- 新增集群配置 -->    
    <shard2_repl0_new>
	<shard>
	    <replica>
		<host>mdw</host>
		<port>9000</port>
	    </replica>
	</shard>
	<shard>
	    <replica>
		<host>sdw3</host>
		<port>9000</port>
	    </replica>
	</shard>
    </shard2_repl0_new>
</clickhouse_remote_servers>

3) After modifying the configuration files of mdw and sdw3 nodes, check the validity of the configuration files

-- mdw集群信息
mdw :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0     │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ shard2_repl0_new │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ shard2_repl0_new │         2 │            1 │           1 │ sdw3      │ 172.16.104.14 │ 9000 │        0 │ default │                  │            0 │                       0 │
└──────────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

3 rows in set. Elapsed: 0.011 sec.


sdw3 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0_new │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ shard2_repl0_new │         2 │            1 │           1 │ sdw3      │ 172.16.104.14 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.006 sec.

4) Create a table under the new cluster by means of a snapshot table, and perform data backup migration

mdw :) create table db1.t2_new_local on cluster shard2_repl0_new as db1.t2_local ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/t2_new_local', '{replica}') ORDER BY id;
mdw :) create table db1.t8_new_local on cluster shard2_repl0_new as db1.t8_local ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/t8_new_local', '{replica}') ORDER BY id;;

mdw :) create table db1.t2_new on cluster shard2_repl0_new as db1.t2 ENGINE = Distributed('shard2_repl0_new', 'db1', 't2_new_local', rand())
mdw :) create table db1.t8_new on cluster shard2_repl0_new as db1.t8 ENGINE = Distributed('shard2_repl0_new', 'db1', 't8_new_local', rand())

mdw :) insert into t2_new select * from t2;
mdw :) insert into t8_new select * from t8;

-- 检查数据分布
mdw :) select * from t2_new

SELECT *
FROM t2_new

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
└────┴──────┘
┌─id─┬─name─┐
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

5 rows in set. Elapsed: 0.013 sec.

mdw :) select * from t2_new_local;

SELECT *
FROM t2_new_local

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
└────┴──────┘

sdw3 :) select * from t2_new_local;

SELECT *
FROM t2_new_local

┌─id─┬─name─┐
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

3 rows in set. Elapsed: 0.006 sec.

5) Use rename to replace the name of the local table and create a new distributed table

mdw :) rename table db1.t2_local to db1.t2_bak_local,db1.t2_new_local to db1.t2_local;
mdw :) rename table db1.t2 to db1.t2_bak;
mdw :) rename table db1.t8_local to db1.t8_bak_local,db1.t8_new_local to db1.t8_local;
mdw :) rename table db1.t8 to db1.t8_bak;


sdw3 :) rename table db1.t2_local to db1.t2_bak_local,db1.t2_new_local to db1.t2_local;
sdw3 :) rename table db1.t2 to db1.t2_bak;
sdw3 :) rename table db1.t8_local to db1.t8_bak_local,db1.t8_new_local to db1.t8_local;
sdw3 :) rename table db1.t8 to db1.t8_bak;

mdw :) create table db1.t2(`id` Int32,`name` String) ENGINE = Distributed('shard2_repl0_new', 'db1', 't2_local', rand())
mdw :) create table db1.t8 on cluster shard2_repl0_new (`id` Int32,`name` String) ENGINE = Distributed('shard2_repl0_new', 'db1', 't8_local', rand())

mdw :) select * from t2;

SELECT *
FROM t2

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
└────┴──────┘
┌─id─┬─name─┐
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

5 rows in set. Elapsed: 0.032 sec.

mdw :) select * from t2_local;

SELECT *
FROM t2_local

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
└────┴──────┘

2 rows in set. Elapsed: 0.006 sec.

sdw3 :) select * from t2_local

SELECT *
FROM t2_local

┌─id─┬─name─┐
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

3 rows in set. Elapsed: 0.020 sec.

6) Delete invalid tables

mdw :) drop table t2_bak_local on cluster shard2_repl0;
mdw :) drop table t2_bak on cluster shard2_repl0;
mdw :) drop table t8_bak_local on cluster shard2_repl0;
mdw :) drop table t8_bak on cluster shard2_repl0;

3. Case test

1) Environmental inspection before expansion

<clickhouse_remote_servers>
    <shard2_repl0>
	<shard>
	    <replica>
		<host>mdw</host>
		<port>9000</port>
	    </replica>
	</shard>
    </shard2_repl0>
</clickhouse_remote_servers>

mdw :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0 │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

1 rows in set. Elapsed: 0.008 sec.

2) Modify configuration information of all shard nodes, and add new shard node information under the original cluster

<clickhouse_remote_servers>
    <shard2_repl0>
	<shard>
	    <replica>
		<host>mdw</host>
		<port>9000</port>
	    </replica>
	</shard>
	<shard>
	    <replica>
		<host>sdw3</host>
		<port>9000</port>
	    </replica>
	</shard>
    </shard2_repl0>
</clickhouse_remote_servers>    
    
mdw :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0 │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        1 │ default │                  │            0 │                       0 │
│ shard2_repl0 │         2 │            1 │           1 │ sdw3      │ 172.16.104.14 │ 9000 │        0 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.006 sec.


sdw3 :) select * from system.clusters;

SELECT *
FROM system.clusters

┌─cluster──────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ shard2_repl0 │         1 │            1 │           1 │ mdw       │ 172.16.104.11 │ 9000 │        0 │ default │                  │            0 │                       0 │
│ shard2_repl0 │         2 │            1 │           1 │ sdw3      │ 172.16.104.14 │ 9000 │        1 │ default │                  │            0 │                       0 │
└──────────────┴───────────┴──────────────┴─────────────┴───────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

2 rows in set. Elapsed: 0.006 sec.

3) Manually create local tables and distributed tables under the newly added sharding node

sdw3 :) create table db1.t2_aa_local on cluster shard2_repl0 (`id` Int32,`name` String) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/t2_aa_local', '{replica}') ORDER BY id;
sdw3 :) create table t2_aa(`id` Int32,`name` String) ENGINE = Distributed('shard2_repl0', 'db1', 't2_aa_local', rand())

mdw :) select * from t2_aa

SELECT *
FROM t2_aa

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

5 rows in set. Elapsed: 0.016 sec.

mdw :) select * from t2_aa_local

SELECT *
FROM t2_aa_local

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

5 rows in set. Elapsed: 0.005 sec.

sdw3 :) select * from t2_aa

SELECT *
FROM t2_aa

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘

5 rows in set. Elapsed: 0.016 sec.

sdw3 :) select * from t2_aa_local

SELECT *
FROM t2_aa_local

Ok.

0 rows in set. Elapsed: 0.004 sec.

4) New data write



sdw3 :) insert into t2_aa values(6,'ff'),(7,'gg');

INSERT INTO t2_aa VALUES

Ok.

2 rows in set. Elapsed: 0.050 sec.

sdw3 :) select * from t2_aa_local

SELECT *
FROM t2_aa_local

┌─id─┬─name─┐
│  6 │ ff   │
└────┴──────┘

mdw :) select * from t2_aa_local

SELECT *
FROM t2_aa_local

┌─id─┬─name─┐
│  1 │ aa   │
│  2 │ bb   │
│  3 │ cc   │
│  4 │ dd   │
│  5 │ ee   │
└────┴──────┘
┌─id─┬─name─┐
│  7 │ gg   │
└────┴──────┘

6 rows in set. Elapsed: 0.008 sec.

Guess you like

Origin blog.csdn.net/weixin_37692493/article/details/114003547