kafka data migration practice

This article focuses on two types of common data migration kafka way:

1, the internal partition data between different data disk migration broker;

2, partition data migration between different broker.

A, partition data migration between different data broker internal disks

1.1 Background

The internal partition data storage kafka broker topic uneven distribution, resulting in part of the disk 100% depleted, while only 40% of the disk portion consumption.

Analyze the causes and find that there is some data partition is too focused on some topic lead to disk, for example, the following screenshot shows / data5 data disk.

According to the characteristics of distributed systems, it is easy to think of ways to take data migration, data on different disk partition data broker of internal migration. Before cluster online data migration, in order to ensure data integrity and security of production clusters, it must first be tested in a test cluster.

Testing different internal data broker 1.2 disk partition data migration

1.2.1 Establishing test topic and verify the production and consumption of normal

We build a test cluster, Kafka has three broker, hostname are: tbds-172-16-16-11, tbds-172-16-16-12, tbds-172-16-16-16. Each broker configuration the two data disk, the data are stored in the cache / data / kafka-logs / and / data1 / kafka-logs /.

First, create a test topic:

./kafka-topics.sh --create --zookeeper tbds-172-16-16-11:2181 --replication-factor 2 --partitions 3 --topic test_topic

Then, when the production of the topic to send 500 data, consumption data transmitted simultaneously. Then view the partition data situation topic of:

GROUP    TOPIC      PARTITION   CURRENT-OFFSET   LOG-END-OFFSET   LAG   OWNER
groupid1 test_topic 0       172      172      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 1       156      156      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 2       172      172      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3

  

Found test_topic production and consumption data are normal.

1.2.2 partition data migration between disks

Now this log tbds-172-16-16-12 broker node, the data directory partition test_topic / data1 / kafka-logs / test_topic-0 / moved to / data / kafka-logs /:

mv /data1/kafka-logs/test_topic-0/ /data/kafka-logs/

See the / data / kafka-logs / directory, data partition test_topic-0:

1.2.3 Production and consumption data for test topic again

Sending 500 data again, while consumption data. Then view the data situation:

GROUP    TOPIC      PARTITION   CURRENT-OFFSET   LOG-END-OFFSET   LAG   OWNER
groupid1 test_topic 0       337      337      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 1       304      304      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 2       359      359      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3

View data / data kafka-logs test_topic-0 in tbds-172-16-16-12 this broker node / / / partition directory again:

Found, did not increase from the cache data / data1 / kafka-logs / directory moved to partition data in test_topic-0 / data / kafka-logs / directory / (i.e. number of partition 0).

Because test_topic Each partition has two replicas, therefore, I find the number tbds-172-16-16-16 this broker node to another partition replica data stored in the 0. Login tbds-172-16-16-16 this broker node, open the cached data directory for the partition numbers 0 to obtain the following information:

We found that the amount of cache data partition data directory test_topic-0 tbds-172-16-16-16 this broker node / in the increase, that is, the cache productive message data is sent again.

Thus, after a number tbds-172-16-16-12 after this broker of mobile nodes within the data cache directory partition 0, and no new cache data. Numbers corresponding thereto, do not partition data movement operations tbds-172-16-16-16 broker node for this data in the data cache partition directory 0 New retransmitted.

Does that mean the partition can not move data between disks broker do?

1.2.4 Dafa call reboot: Restart kafka

Kafka cluster restart, after restart, I find that number tbds-172-16-16-12 this broker node for the data in the cache data directory partition 0 also increased to normal levels.

表明重启之后,broker的不同磁盘间迁移数据已经生效。

1.2.5 验证磁盘间迁移分区数据生效

再次向test_topic发送500条数据,同时消费数据,然后查看数据情况:

GROUP    TOPIC      PARTITION   CURRENT-OFFSET   LOG-END-OFFSET   LAG   OWNER
groupid1 test_topic 0       521      521      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 1       468      468      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3
groupid1 test_topic 2       511      511      0      kafka-python-1.3.1_tbds-172-16-16-3/172.16.16.3复制代码

查看tbds-172-16-16-12 和 tbds-172-16-16-16 两个broker节点的test_topic-0分区数据的缓存目录:

发现两个replicas完全一样。

1.3 结论

Kafka broker 内部不同数据盘之间可以自由迁移分区数据目录。迁移完成后,重启kafka即可生效。

二、不同broker之间传输分区数据

当对kafka集群进行扩容之后,由于新扩容的broker没有缓存数据,容易造成系统的数据分布不均匀。因此,需要将原来集群broker的分区数据迁移到新扩容的broker节点。

不同broker之间传输分区数据,可以使用kafka自带的kafka-reassign-partitions.sh脚本工具实现。

我们在kafka测试集群原有的3台broker基础上,扩容1台broker。

2.1 获取test_topic的分区分布情况

执行命令:

./kafka-topics.sh --zookeeper 172.16.16.11:2181 --topic test_topic --describe

  

可以得到test_topic的3个分区(每个分区有2份replicas)在三个broker节点的分布情况:

Topic:test_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: test_topic Partition: 0 Leader: 1002 Replicas: 1002,1001 Isr: 1002,1001
Topic: test_topic Partition: 1 Leader: 1003 Replicas: 1003,1002 Isr: 1003,1002
Topic: test_topic Partition: 2 Leader: 1001 Replicas: 1001,1003 Isr: 1001,1003

  

2.2 获取topic重新分区的配额文件

编写分配脚本:move_kafka_topic.json内容如下:

{"topics": [{"topic":"test_topic"}], "version": 1}

执行分配计划生成脚本:

./kafka-reassign-partitions.sh --zookeeper tbds-172-16-16-11:2181 --topics-to-move-json-file /tmp/move_kafka_topic.json --broker-list "1001,1002,1003,1004" --generate

命令里面的broker-list填写kafka集群4个broker的id。不同kafka集群,因为部署方式不一样,选择的broker id也不一样。我们的测试集群broker id是1001,1002,1003,1004。读者需要根据自己的kafka集群设置的broker id填写。

执行命令之后,得到以下结果:

Current partition replica assignment #当前分区的副本分配
{"version":1,"partitions":[{"topic":"test_topic","partition":0,"replicas":[1002,1001]},{"topic":"test_topic","partition":2,"replicas":[1001,1003]},{"topic":"test_topic","partition":1,"replicas":[1003,1002]}]}
Proposed partition reassignment configuration #建议的分区配置
{"version":1,"partitions":[{"topic":"test_topic","partition":0,"replicas":[1001,1002]},{"topic":"test_topic","partition":2,"replicas":[1003,1004]},{"topic":"test_topic","partition":1,"replicas":[1002,1003]}]}

  

Proposed partition reassignment configuration 后是根据命令行的指定的broker list生成的分区分配计划json格式。将 Proposed partition reassignment configuration的配置复制保存到一个文件中 move_kafka_topic_result.json:

{"version":1,"partitions":[{"topic":"test_topic","partition":0,"replicas":[1001,1002]},{"topic":"test_topic","partition":2,"replicas":[1003,1004]},{"topic":"test_topic","partition":1,"replicas":[1002,1003]}]}

2.3 对topic分区数据进行重新分布

执行重新分配命令:

./kafka-reassign-partitions.sh --zookeeper tbds-172-16-16-11:2181 --reassignment-json-file /tmp/move_kafka_topic_result.json --execute

  

得到如下结果:

Current partition replica assignment
{"version":1,"partitions":[{"topic":"test_topic","partition":0,"replicas":[1002,1001]},{"topic":"test_topic","partition":2,"replicas":[1001,1003]},{"topic":"test_topic","partition":1,"replicas":[1003,1002]}]}
Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions {"version":1,"partitions":[{"topic":"test_topic","partition":0,"replicas":[1001,1002]},{"topic":"test_topic","partition":2,"replicas":[1003,1004]},{"topic":"test_topic","partition":1,"replicas":[1002,1003]}]}

  

从返回结果来看,分区数据重新分布任务已经启动成功。

2.4 查看分区数据重新分布进度

检查分配的状态,执行命令:

./kafka-reassign-partitions.sh --zookeeper tbds-172-16-16-11:2181 --reassignment-json-file /tmp/move_kafka_topic_result.json --verify

  

得到结果:

Status of partition reassignment:
Reassignment of partition [test_topic,0] completed successfully
Reassignment of partition [test_topic,2] completed successfully
Reassignment of partition [test_topic,1] completed successfully

  

表明分区数据重新分步任务已经完成。

2.5 再次获取test_topic的分区分布情况

再次查看各个分区的分布情况,执行命令:

./kafka-topics.sh --zookeeper 172.16.16.11:2181 --topic test_topic --describe

  

得到返回结果:

Topic:test_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: test_topic Partition: 0 Leader: 1002 Replicas: 1001,1002 Isr: 1002,1001
Topic: test_topic Partition: 1 Leader: 1003 Replicas: 1002,1003 Isr: 1003,1002
Topic: test_topic Partition: 2 Leader: 1003 Replicas: 1003,1004 Isr: 1003,1004

  

从结果看出,test_topic的分区数据已经由原来的3个broker,重新分布到4个broker。

三、测试结论

Ø Kafka broker 内部不同数据盘之间可以自由迁移分区数据目录。迁移完成后,重启kafka即可生效;

Ø Kafka 不同broker之前可以迁移数据,使用kafka自带的kafka-reassign-partitions.sh脚本工具实现。

四、修复客户的kafka集群故障

我们采用本文测试的方法,对该客户的Kafka集群进行broker节点内部不同磁盘间的数据迁移,对多个topic均进行了数据迁移,最终实现磁盘间的数据缓存分布均匀化。

同时,我们又对客户的kafka集群进行扩容,扩容之后采用本文描述的不同broker之间迁移分区数据方法,对多个topic均进行了数据迁移,保证新扩容节点也有缓存数据,原来的broker节点存储压力减小。

 

https://juejin.im/post/5a65b2df518825732a6d9ff1

 

Guess you like

Origin www.cnblogs.com/felixzh/p/11866045.html