How to solve the millions of messages a message queue backlog problem persists for several hours? Stepped pit triple bomb

For this scenario, you are usually brought out spending problem, not a consumer, or extremely extremely slow consumption. This thing, in fact, quite common line, not general, is a big case.

The reason there are many, such as: consumer end consumer after each write mysql, mysql result hung up, dang consumer end there, not moving; or consumption brought out what fork, resulting in the consumption rate is extremely slow.

How to solve the delay and message queues expire problem? How to deal with the message queue is full after? There are millions of messages backlog continued for several hours, how to solve?

Scene analysis:

On this thing, we have to sort out one by one bar, assume a scenario, we now consume brought out a failure, and then a large number of messages in the backlog mq, and now the accident, panicked.

The first pit: mq large number of messages in the backlog for a few hours yet been solved

Tens of millions of pieces of data in the MQ backlog seven or eight hours from 16:00 more backlog to late at night, 10 points, 11 points.

There is a scene, is really the fault line, and this time the consumer or else fix the problem, let him recover rate of consumption, and then wait several hours consumption silly completed. This certainly can not speak during the interview.

1000 is a consumer one second, one second three consumers is 3000, one minute is 180 000, more than 10 million.

So if you are a backlog of millions to tens of millions of data, even if the consumer recovery, and also takes about one hour to recover.

Generally this time, only operate the temporary expansion of the emergency, concrete steps and ideas are as follows:

  • First fix consumer issues, to ensure that the recovery rate of consumption, then the existing cnosumer are stopped;
  • Create a new topic, partition is 10 times the original, the original temporary establishment of a good 10 or 20 times the number of queue;
  • Then write a program to distribute interim consumer data, this deployment up the backlog of consumer data, not time-consuming process after consumption, a good 10 times the number of polling queue directly written to the temporary establishment of uniform;
  • Then temporary requisition 10 times the machines to deploy consumer, consumer consumption per batch of a temporary queue data;
  • This approach is equivalent to 10 times the queue temporarily expand the resources and consumer resources to 10 times normal speed consumption data;
  • After completion of the backlog of data and other fast consumption, was to restore the original deployment architecture, re-use the original consumer machine to consume a message;

The second pit: message queue of the expiry

Suppose you are using a rabbitmq, rabbitmq can set the expiration time, that is, TTL, if the message backlog in the queue for more than a certain period of time will be rabbitmq to clean up, the data is gone. Well, this is the second pit. This is not to say that the data will be a substantial backlog in mq, but rather a large amount of data is not lost directly.

In this case, it is not said to increase consumer consumption backlog of messages, because in fact nothing backlog, but lost a lot of news. We can take a program that batch redirect, this line before we had a similar scenario worked. Is a large backlog, we will then discards the data, and then so after the peak of the future, such as we drink coffee together after staying up until 0:00, users sleep.

This time we started writing program, will be lost batch of data, write a temporary program, check out a little bit, and then re-poured mq inside, the data lost during the day and make it up to him. It can only be the case.

Assuming that 10,000 orders in backlog mq, there's no deal, in which 1000 orders were lost, you can only write a program manually put the 1000 order to check out the manual issued to mq going to fix once.

The third pit: the message queue is full

If you take the way of the message backlog in mq years, so if you did not get rid of a long time, this time resulting in mq almost filled, it supposed to? There are other ways to do this? No, why did you first slow implementation of the program, you write the program temporarily, access to consumption data, consumption a discarding one, are not, and quickly consume all of the messages. Then take the second option, at night it supplemented the data.

 

"RabbitMQ analysis and Kafka message loss causes and solutions": https://blog.csdn.net/weixin_44259720/article/details/104844231

Respect the original, this original address: https://www.sohu.com/a/312377165_120148307

 

Shaoxia Please stay ...ヾ(◍ ° ∇ ° ◍) Techno゙... 
welcome thumbs up, comment, plus interest, to allow more people to see learn to earn
more exciting, please pay attention to my "Today's headlines No ": Java cloud notes

Published 171 original articles · won praise 311 · Views 100,000 +

Guess you like

Origin blog.csdn.net/weixin_44259720/article/details/104845731