Remember once Kafka cluster line expansion

Some time ago received a Kafka cluster production end customer feedback very time-consuming to send a message, so it took some time to troubleshoot the problem, and finally the cluster expansion, due to the current amount of data is too great for certain topics, for the these topics migration calls for a very long time, but the process went smoothly, because during the migration process is also done enough research in all aspects, including the impact of re-balancing process a partition of the client, as well as the performance of the entire cluster impact, etc., hereby summarize this process, also hit a shot in the arm for the two-eleven.

Troubleshooting and analysis

After receiving feedback from users, I used the test script again, and compared to another normal Kafka cluster and found that consuming indeed high, the next

After investigation, we found that clients in connection frequently disconnected from the cluster nodes, log found frequently print the following:

Attempting to send response via channel for which there is no open connection, connection id xxx(kafka.network.Processor)复制代码

Locate the source code here:

kafka.network.Processor#sendResponse:

See source notes, the remote connection is closed or idle too long means, find a specific person in charge of the client, upon inquiry after, this is the Big Data Spark node cluster.

Log seen from the above, a consumer group OrderDeliveryTypeCnt Spark cluster actually occurred nearly 40,000 times the weight balancing operation, which obviously is not a normal event, Kafka consumer group counterbalancing condition occurs are the following:

  1. Consumer group membership changes, the new consumers to join or leave, or consumers crash;
  2. The number of themes consumer group subscriptions changed;
  3. The number of partitions consumer group subscriptions changed.

It is clear that the second and third point did not happen, it would be assumed that this is connected Spark cluster nodes frequently disconnected kafka led consumer group membership changes, the group had led to heavy consumption smoothing.

Why Spark cluster will have to disconnect and reconnect it frequently?

View Kafka version or the version 0.10.1.1 with Spark cluster, and the cluster is 2.2.1 version Kafka, thought it was a version compatibility issues, then junior partner data intelligence unit will be connected to a cluster Spark version 0.11.1.1 the Kafka cluster, using eight Spark tasks consumption for consumption, also found problems disconnected. Description This problem is due Spark internal mechanism leading consumer Kafka, and kafka version has little to do.

After a series of discussions with the staff of big data, the frequent rebalancing looks like Spark 2.3 version due to internal mechanisms, Spark 2.4 version does not have this problem.

Due to this the frequent disconnect reconnect, not the developers due process, taking into account the two-eleven approaching, not rashly upgrade changes the project, so now the best solution is to expand the cluster level, increase the load capacity of the cluster, and special topics partition redistribution.

Analysis of partition redistribution scheme

At present a total of six cluster nodes, expansion of 50% as a benchmark, then the need to prepare three nodes in the operation and maintenance of the machine and ready to be added to the cluster, the next step is to prepare the subject partition reallocation of policy files.

In the process of implementation of the partition reallocation, the impact on the two main clusters:

  1. Partition redistribution mainly on the subject of data migration between Broker, and therefore consume bandwidth clusters;
  2. Partition redistribution will change the partition Leader Broker is located, thus affecting the client.

For the above two points, the first point can be carried out in the evening (bitter force, remember there is a theme data migration for nearly five hours), for the second point, I think of two options:

  1. The entire allocation scheme is divided into two steps: 1) manually generated allocation scheme, the original partition Leader position is not changed, only the copy partition redistribution; 2) wait for the completion of data migration, and then manually change of partition, the aim balanced Leader.
  2. Kafka API provided directly generates reallocation scheme partition, the partition directly perform reallocation.

The first option is theoretically minimal customer impact end, the entire allocation is divided into two steps, namely the impact of bandwidth resources with the client in a cluster of separate, higher controllability of the process, but the problem come, some of the themes in the cluster, there are 64 partitions, copy factor of 3, a total of 192 copies, you need to keep the case of the original partition Leader position remains unchanged, the remaining copies of the manual to balance the difficulty of the test of true too much, a little bit of bias, it will cause a copy of the imbalance.

So I deliberately went to the source partition reallocation, and were further analyzed the process, found that the step of assigning weight distribution is a collection of copies of the original copy of the new partition allocated to form a partition replica set new assignment a copy of the effort to catch up with the Leader of displacement, eventually joined the ISR, after all copies are to be added to ISR, it will be partitioned Leader elections, it will be deleted after the original copy of the details I will write a separate article.

According to the above steps redistribution, meaning that data will not occur in the process of blocking the client, as Leader during the change did not happen, when the completion of data migration Leader election will be, but has little effect, for this affect my intended to be used to test the script a bit:

It can be found in the sending process, if the Leader change occurs, producers will promptly pull in metadata, and re-send the message.

For the above analysis and testing, we decided to take the second option, follow these steps:

  1. Generated for each topic assigned partition allocation policy: the execution time period (10: 00-17: 00), and inspect allocation strategy, execution and save topic1 Partition reassignment.json file, and save the original scheme to topic1 Partition reassignment_rollback.json file, to prepare a subsequent rollback operation;
  2. Execution allocation policy: the performance time period (00: 30-02: 30), prepared topic1 Partition reassignment.json file, executing the revalidation and view a copy of the allocation, each implementation of a distribution strategy should view the situation ISR expands and contracts, message flow situation, determine no problem and then perform the next allocation strategy;
  3. Since the end of the parameter broker cluster auto.leader.rebalance.enable=true, and therefore automatically performs Preferred Leader elections, the default interval is 300 seconds, to be seen Preferred Leader situation during the elections.

Partition redistribution

For the new Broker, Kafka is not automatically distribute the load of the existing theme that will not be assigned to the new partition theme Broker, but we can reallocate partition operations on the topic provided by API Kafka, specific as follows:

  1. Generating topics need to perform partition reallocation list of json file:
echo '{"version":1,"topics":[{"topic":"sjzn_spark_binlog_order_topic"}]}' > sjzn_spark_binlog_order_topic.json复制代码

  1. Generates allocation topics:
bin/kafka-reassign-partitions.sh  --zookeeper  --zookeeper xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181 --topics-to-move-json-file sjzn_spark_binlog_order_topic.json --broker-list "0,1,2,3,4,5,6,7,8" --generate复制代码

Because the theme of the 64 partitions, each three copies, generating allocation data is still getting bigger, here is not posted out

  1. Save the program to assign a json file:
echo '{"version":1,"partitions":[{"topic":"sjzn_spark_binlog_order_topic","partition":59,"replicas":[4,8,0],"log_dirs":["any","any","any"]} ......' > sjzn_spark_binlog_order_topic_reassignment.json复制代码

  1. Execute the partition redistribution:
 bin/kafka-reassign-partitions.sh   --zookeeper xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181 --reassignment-json-file sjzn_spark_binlog_order_topic_reassignment.json --execute复制代码

  1. Verify whether the partition reallocation success:
bin/kafka-reassign-partitions.sh  --zookeeper xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181,xxx.xxx.xx.xxx:2181 --reassignment-json-file sjzn_spark_order_unique_topic_resign.json --verify复制代码

Since the amount of data exists on the subject of particularly large, the entire reallocation process needs to be maintained for several hours:

In the data migration process it for, I tend to get kafka-manage console to observe the changes in the district's data:

As can be seen from the console, the basic number of copies of each partition have increased, which also confirms the current partition is equal to the number of copies plus a copy of the original collection, a collection of copies of the newly allocated copy of the newly allocated yet to catch up Leader of the displacement, and therefore did not join ISR list.

Have not noticed that, at this time the district is not in the Preferred Leader Leader, so wait for the follow-up copy of the new allocation of the catch ISR, will conduct a new round of Preferred Leader election, election implementation details I will write a separate to analyze the article, so stay tuned.

After some time, I found that displacement has changed:

From this point also confirmed the partition reallocation process, as long as the Leader no change occurs, the client can continue to send a message to the Leader of the partition.

As can be seen from the figure, the newly allocated copy Leader catch displacement, the list will be added ISR.

Now go and see a cluster of bandwidth load:

As can be seen from the figure above, in the migration process, the newly allocated copy continually pull data from Leader, cluster occupied bandwidth.

Theme partitions redistribution copy of the situation after the completion of:

As can be seen from the above figure, the copy of the new allocation of the district have been all in the ISR list, and delete the copy of the old assignment, after Preferred Leader elections, Preferred Leader newly allocated copy of the district became the most the partition leader.

No public more exciting articles please pay attention to the maintenance of a "back-end Advanced", which is a focus on back-end technology-related public number.

No public attention and reply "back-end" to receive free e-books related to the back-end.

Welcome to share, reproduced Please keep the source.

No public "back-end Advanced", focused on the back-end technology to share!

Guess you like

Origin juejin.im/post/5dfb6fe9f265da33b82bf826