An article to understand how to solve the problem of Kafka message backlog

Usually, enterprises will adopt a polling or random method to produce data to the Kafka cluster through the Kafka producer to ensure that the data between Kafka partitions is evenly distributed as much as possible.

If you don't know Kafka, you can read this blog " Quick Understanding of Kafka in One Article ".

Solution to message backlog

Strengthening the monitoring and alarming and improving the mechanism for restarting the task will not be repeated here.

1. The solution to the consumption backlog caused by the hanging of real-time/consumption tasks

When the backlog of data is small and the impact is small, restart the consumption task and check the cause of the downtime.

If the downtime of the consumption task is too long and the backlog of data is large, in addition to restarting the consumption task and troubleshooting the cause of the problem, it is also necessary to solve the message backlog problem.

The following methods can be used to solve the message backlog.

  1. After the task is restarted, the latest news is directly consumed, and the offline program is used to "catch the leak" for the "lag" historical data.
  2. As shown in the figure below. Create a new topic and configure a larger number of partitions, change the topic consumer logic of the backlog message to directly enter the message into the new topic, and write the consumption logic in the consumer of the new topic.

If you also need to ensure the local order of message consumption, you can change the consumer thread pool to multiple queues, and each queue is processed by a single thread. For more information, please refer to the blog " One article on understanding how Kafka ensures message ordering "

2. Unreasonable setting of the number of Kafka partitions or optimization of insufficient consumer "consumption ability"

The number of Kafka partitions is the minimum unit for Kafka parallelism tuning. If the number of Kafka partitions is set too small, the throughput of Kafka Consumer consumption will be affected.

If the amount of data is large and Kafka's consumption capacity is insufficient, you can consider increasing the number of Topic Partitions and increasing the number of consumers in the consumer group.

3. Optimization of Kafka message key settings

When using Kafka Producer messages, you can specify a key for the message, but the key is required to be uniform, otherwise there will be data imbalance between Kafka partitions.

Therefore, according to the business, reasonably modify the key setting rules at the Producer to solve the problem of data skew.

Reprinted: One article to understand how to solve the problem of Kafka message backlog - Cloud + Community - Tencent Cloud (tencent.com) 

Guess you like

Origin blog.csdn.net/yangbindxj/article/details/123571646