RocketMQ backlog troubleshooting

I. Introduction

        There is a backlog of services that receive orders from mq for real-time analysis and summary of data. Obviously, the processing speed cannot keep up with the speed of order placement. The focus of investigation is the factors that affect the processing speed of nodes. Let’s analyze the blogger’s investigation process below.

Two, investigation 

 1、qps

        First, I looked at the qps of the nodes. The qps of the orders have always been high, and there is not much difference.

2、mysql

        I went to Alibaba Cloud to look at the database situation. There is no slow SQL and it will not affect the processing speed.

3、jvm

        If jvm often occurs full gc, it will stop the world and affect node processing, but there is nothing wrong with monitoring.

4. Node

        The node monitoring time was lengthened and the problem was found.

The normal consumption of mq by the node is about 34ms

 The processing speed slows down to 500~600ms in the early morning

After a period of time, the average processing speed is around 200~300ms, but it is far beyond the normal processing speed

 

After adding two nodes to the cluster, the other three nodes have no pressure, and the processing speed of 172.17.74.80 is still around 200~300ms 

 

 After the node restarts, the processing speed returns to normal.

 3. Analysis

        It’s strange that only this node’s processing speed slows down. It appeared in the early morning to point out the direction. We need to see what’s different. In the early morning, I finally asked the dba. Redis was upgraded at that time, and the node’s processing speed was slow after that. Such as an old cow.

        The redis distributed lock is used when the service processes the order mq, indicating that the upgrade caused the time-consuming connection of this node to redis. But why does the upgrade cause time-consuming client connections? Why does it take time for only one of the nodes to connect? , no relevant information can be found on the Internet, and this situation cannot be reproduced, so it is impossible to do various tests to confirm it.

Four. Summary

        If there is a problem with the normal function of the node after redis, mysql, mongo, etc. are upgraded, you can restart the node to see if it can be solved.

        The process of troubleshooting is actually a process of speculation and trial and error. As long as it does not affect the business, you can try to operate.

Supongo que te gusta

Origin blog.csdn.net/m0_69270256/article/details/125632642
Recomendado
Clasificación