Peeling the silk and drawing the cocoon | Interpretation of Ali's interview questions: What should I do if the MQ consumer side encounters a bottleneck?

1. Interview scenarios and interview skills

During the Golden Three Silver Four recruitment season, a fan friend recently encountered such a question in the second round of interviews at Ant Financial: What should I do if MQ consumption encounters a bottleneck ?

Horizontal expansion , compared to many readers and my friend who will blurt out, the interviewer is obviously not satisfied with such an answer , and then asks: Horizontal expansion is a heap machine, is there any other way ?

During the interview process, I personally suggest that you think for a while after hearing the question, do not give too direct answers immediately, but discuss with the interviewer. You can sort out your thoughts.

The consumer side encounters a bottleneck, which is a result, but what is the reason for this result? Talking about optimization and solutions without understanding the reasons will be very pale .

In such an interview scenario, how do we discuss communication? My thinking is as follows:

  • Try to discuss with the interviewer how to judge the bottleneck on the consumer side
  • How to find the root cause
  • suggest solution

> Reminder: For the sake of the rigor of this article, this article mainly takes RocketMQ as an example for analysis.

2. How to judge the bottleneck on the consumer side

There are usually two important indicators for judging that the consumer side encounters a bottleneck in the RocketMQ consumption field:

  • Number of message backlogs (number of delays)
  • lastConsumeTime

In the open source version of the console rocketmq-console interface, you can check the above two indicators of a consumer, as shown in the following figure:insert image description here

  • Delay: The number of message backlogs, that is, how many messages are left on the consumer side to be processed. The larger the value, the more bottlenecks the consumer side has encountered .
  • LastConsumeTime: Indicates the storage time of the last message that was successfully consumed. The larger the value is from the current time, it can also indicate that the consumer has encountered a bottleneck .

So why is there a backlog? Where is the bottleneck on the consumer side?

3. How to locate the problem

When there is a bottleneck on the consumer side, how to identify whether it is a client-side problem or a server-side problem? One of the easiest ways is to check whether other consumer groups in the cluster also have backlogs, especially whether other consumer groups that subscribe to the same topic as the consumer group in question have Backlog, according to the author's experience, message backlog is usually a client problem, which can be proved by querying rocketmq_client.log:

grep "flow" rocketmq_client.log

insert image description hereA log such as so do flow control appears , indicating that the current limit of consumption is triggered. The direct trigger reason is that the message consumer side has a backlog of messages, that is, the consumer side cannot consume the pulled messages. In order to avoid memory leaks, RocketMQ does not have a After the message is processed, the message will not be pulled from the server, and the above log will be printed.

So how to locate where the consumer end is slow? What line of code is it stuck in?

The usual troubleshooting method is to trace the thread stack, that is, to use the jstack command to view the running status of the thread, so as to explore the running status of the thread.

Commonly used commands are as follows:

ps -ef | grep java
jstack pid > j1.log

Usually for comparison, I usually print 5 files in a row, so that I can check whether the status of the same consumer thread has changed in the 5 files. If it does not change, it means that the thread is stuck, which is what we need to pay attention to. place.

In RocketMQ, the consumer thread starts with ConsumeMessageThread_. By judging the thread, the following code makes people excited: insert image description herethe state of the consumer thread is RUNNABLE, but the state is the same in the five files, basically it can be concluded that the thread is stuck In the specific code, from the sample code, it is stuck in an external http interface, so as to solve it, usually involving external calls, especially http calls, you can set a timeout to avoid long waiting.

4. Solutions

In fact, according to the third step, there is a high probability that it is possible to identify which part is slow and encounter performance bottlenecks. Usually, it is nothing more than adjusting third-party services, database and other problems have bottlenecks, and then prescribe the right medicine. Performance optimization such as database is not within the scope of this article, so we can stop here. Of course, the interviewer may continue to talk about database optimization and other topics in the future, so as to realize the communication and interaction with the interviewer. , the technical exchange atmosphere is friendly, and the interview pass rate is greatly improved.

Finally, I would like to discuss a question with you. If there is a message backlog, it must mean that there is a consumption bottleneck . Does it have to be dealt with?

In fact, it is not the case. Let's recall why we need to use MQ, isn't it just to use asynchronous decoupling and shaving peaks and fill valleys ? For example, during the Double Eleven period, a large amount of sudden traffic flows in, which is likely to lead to a backlog of messages. This is our intention. We use MQ to resist the sudden traffic, and the back-end applications consume slowly to ensure the stability of the consumer side. In the case of tps, if the tps is normal, that is, the problem is not big, at this time, the usual processing method is to expand the capacity horizontally , reduce the backlog as much as possible, and reduce the delay of the business.

This article is introduced here. If you are interested in RocketMQ, you can download my e-book to get the practical experience of operating and maintaining hundreds of billions of message flow clusters in the online environment. insert image description hereFollow the official account [Middleware Interest Circle] and reply to RMQPDF to get it.insert image description here

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324108257&siteId=291194637