Summary of Kafka topic consumption backlog problem

 

Fault description:

On the afternoon of December 6, the operation and maintenance feedback said that a partition of a certain topic had a backlog of consumption. Since this topic is very important and there have been complaints from users, the operation and maintenance are very nervous. After urgently printing the stack and dumping the heap memory, it restarted the machine. machine.

 

Failure Analysis 1:

The business logic of the cluster that consumes this topic is relatively simple. It mainly reads certain topics, and then logically judges + DB operations and writes them to other topics. The operation and maintenance found the backlog of topics through the kafka monitoring platform. After finding that a certain partition of the topic had a backlog of tens of thousands of messages, it found the application node that consumes this partition.

 

Failure Analysis 2:

No problem was found through the analysis of the node stack information, and the heap dump file was opened through MAT (Eclipse Memory Analyzer), and all threads related to this topic were found through the backlog topic name, and it was found that one of these consumer threads ran to (Stuck in?) Network return phase of mysql query. I saw the SQL statement through the thread context information. This SQL statement is to query a certain table and the two query fields have also been indexed. There is no reason for this SQL statement to cause blockage! Is this SQL statement a slow query?

 

Note:

This problem has been notified to the operation and maintenance to find a DBA to confirm whether there is any slow SQL query in that time period, and then summarize the relevant information next week.

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326393910&siteId=291194637