Message backlog solution for message queue

1.1 Overview

In fact, the essence of the scenario is that there may be a problem with your consumption end and you will not consume it; or the consumption speed is extremely slow. Then the pit father, the following three problem scenarios may occur:

1. Maybe the disk of your message queue cluster is almost full and no one is consuming it. What should I do at this time?
2. Or the whole thing has been backlogged for a few hours. What should you do at this time?
3. Or your backlog is too long, resulting in, for example, RabbitMQsetting the message expiration time, what should I do?

So that's it, in fact, it's quite common online, and it's a big problem when it comes out. Usually, for example, the consumer needs to write after each consumption mysql, and the result mysqlhangs , the consumer stops there and does not move; or something goes wrong on the consumer, resulting in extremely slow consumption.

Let's sort this out one by one.

1.2 Three major problem scenarios and solutions

1.2.1 A large number of messages have mqbeen backlogged for several hours and have not been resolved

1.2.1.1 Scenario

Tens of millions of pieces of data have MQbeen backlogged for seven or eight hours, from 4:00 pm to late at night, more than 10:00 and more than 11:00. The online failure, at this time, or it is a repair consumerproblem, let him restore the consumption speed, and then foolishly wait for a few hours to finish the consumption. This definitely won't work. One consumer is one second 1000, one second is 3one consumer 3000, one minute is 18ten thousand, and 1000more than ten thousand.

So if you have a backlog of millions to tens of millions of data, even if the consumer recovers, it will take about 1an hour to recover.

1.2.1.2 Solutions

At this time, only temporary expansion can be performed to consume data at a faster speed. The specific operation steps and ideas are as follows:

1. Fix consumerthe problem first, ensure that it resumes consumption speed, and then consumerstop all existing ones.

2. Temporarily establish the number of 10times or 20times of the original queue(create a new one topic, which partitionis the original 10times).

3. Then write a consumerprogram for temporarily distributing messages. This program is deployed to consume the backlog of messages. After consumption, it does not do time-consuming processing , and directly polls and writes into the temporarily established 10number of points queue.

4. Immediately after requisitioning multiple 10machines to deploy consumer, each batch consumerconsumes a temporary queuemessage.

5. This approach is equivalent to temporarily multiplying queueresources and consumerresources , and consuming messages 10at times the normal speed .10

6. After the fast consumption is over, restore the original deployment architecture and use the original consumermachine again to consume messages.

insert image description here

1.2.2 The message is set to expire time, what should I do if it is lost when it expires?

1.2.2.1 Scenario

Assuming that you are using it rabbitmq, rabbitmqyou can set the expiration time, that is TTL, if queuethe backlog of messages exceeds a certain period of time, it will be rabbitmqcleaned up, and the data will be gone. Then this is the second pit. This does not mean that a large amount of data will be accumulated in mqit, but that a large amount of data will be lost directly.

1.2.2.2 Solutions

In this case, there is actually no message squeeze, but a lot of lost messages. So the first addition consumerdefinitely doesn't apply.

This situation can be solved by adopting the "Batch Redirection" scheme.
During low traffic peaks (such as in the dead of night), write a program to manually query the lost part of the data, and then resend the message to mqit to make up for the lost data.

1.2.3 What should I do if the backlog of messages has not been processed for a long time and cannot be mqput down?

1.2.3.1 Scenario

If the way to go is that there is a backlog of messages mq, then if you haven't dealt with it for a long time, it will be mqalmost full at this time, what should you do? Is there any other way to do this?

1.2.3.2 Solutions

There is no way to do this. It must be that the execution of the first solution is too slow. In this case, the method of "discarding + batch retransmission" has to be used to solve it.

First, temporarily write a program, connect to mqit to consume data, discard it directly after receiving the message, quickly consume the backlog of messages, reduce MQthe pressure, and then go to the second solution, manually query and redirect the loss in the dead of night at night this part of the data.

Guess you like

Origin blog.csdn.net/weixin_42039228/article/details/123528619