1.1 Overview
In fact, the essence of the scenario is that there may be a problem with your consumption end and you will not consume it; or the consumption speed is extremely slow. Then the pit father, the following three problem scenarios may occur:
1. Maybe the disk of your message queue cluster is almost full and no one is consuming it. What should I do at this time?
2. Or the whole thing has been backlogged for a few hours. What should you do at this time?
3. Or your backlog is too long, resulting in, for example, RabbitMQ
setting the message expiration time, what should I do?
So that's it, in fact, it's quite common online, and it's a big problem when it comes out. Usually, for example, the consumer needs to write after each consumption mysql
, and the result mysql
hangs , the consumer stops there and does not move; or something goes wrong on the consumer, resulting in extremely slow consumption.
Let's sort this out one by one.
1.2 Three major problem scenarios and solutions
1.2.1 A large number of messages have mq
been backlogged for several hours and have not been resolved
1.2.1.1 Scenario
Tens of millions of pieces of data have MQ
been backlogged for seven or eight hours, from 4:00 pm to late at night, more than 10:00 and more than 11:00. The online failure, at this time, or it is a repair consumer
problem, let him restore the consumption speed, and then foolishly wait for a few hours to finish the consumption. This definitely won't work. One consumer is one second 1000
, one second is 3
one consumer 3000
, one minute is 18
ten thousand, and 1000
more than ten thousand.
So if you have a backlog of millions to tens of millions of data, even if the consumer recovers, it will take about 1
an hour to recover.
1.2.1.2 Solutions
At this time, only temporary expansion can be performed to consume data at a faster speed. The specific operation steps and ideas are as follows:
1. Fix consumer
the problem first, ensure that it resumes consumption speed, and then consumer
stop all existing ones.
2. Temporarily establish the number of 10
times or 20
times of the original queue
(create a new one topic
, which partition
is the original 10
times).
3. Then write a consumer
program for temporarily distributing messages. This program is deployed to consume the backlog of messages. After consumption, it does not do time-consuming processing , and directly polls and writes into the temporarily established 10
number of points queue
.
4. Immediately after requisitioning multiple 10
machines to deploy consumer
, each batch consumer
consumes a temporary queue
message.
5. This approach is equivalent to temporarily multiplying queue
resources and consumer
resources , and consuming messages 10
at times the normal speed .10
6. After the fast consumption is over, restore the original deployment architecture and use the original consumer
machine again to consume messages.
1.2.2 The message is set to expire time, what should I do if it is lost when it expires?
1.2.2.1 Scenario
Assuming that you are using it rabbitmq
, rabbitmq
you can set the expiration time, that is TTL
, if queue
the backlog of messages exceeds a certain period of time, it will be rabbitmq
cleaned up, and the data will be gone. Then this is the second pit. This does not mean that a large amount of data will be accumulated in mq
it, but that a large amount of data will be lost directly.
1.2.2.2 Solutions
In this case, there is actually no message squeeze, but a lot of lost messages. So the first addition consumer
definitely doesn't apply.
This situation can be solved by adopting the "Batch Redirection" scheme.
During low traffic peaks (such as in the dead of night), write a program to manually query the lost part of the data, and then resend the message to mq
it to make up for the lost data.
1.2.3 What should I do if the backlog of messages has not been processed for a long time and cannot be mq
put down?
1.2.3.1 Scenario
If the way to go is that there is a backlog of messages mq
, then if you haven't dealt with it for a long time, it will be mq
almost full at this time, what should you do? Is there any other way to do this?
1.2.3.2 Solutions
There is no way to do this. It must be that the execution of the first solution is too slow. In this case, the method of "discarding + batch retransmission" has to be used to solve it.
First, temporarily write a program, connect to mq
it to consume data, discard it directly after receiving the message, quickly consume the backlog of messages, reduce MQ
the pressure, and then go to the second solution, manually query and redirect the loss in the dead of night at night this part of the data.