The message backlog --- general approach

How to solve the delay and message queues expire problem? How to deal with the message queue is full after?

Think

  1. What causes the message backlog? Is a consumer program bug? Is consumer spending lagged behind the speed of news production?
  2. How long backlog, the backlog how much?
  3. Impact on the business?

Solutions

1. If only consumer spending lagged behind the speed of news production, we can consider the expansion of consumer groups manner.
2. If more serious backlog, a backlog of millions, tens of millions of messages.
  1. Fix the problem existing consumer, and stopped it.
  2. Greater capacity to re-create a topic, such as patition is 10 times the original.
  3. Write a temporary consumer program, the original consumer backlog queue. The consumer without any time-consuming operation, the message is even written in the newly created queue.
  4. The fix is ​​deployed to a new consumer good consumer queue 10 times the original machine.
  5. After the message backlog resolved, restore the original architecture.
3. If the message has been lost

Because some message queue mechanism have expired, resulting in a large number of message loss.
This situation can only be lost batch of data, write a temporary program, check out a little bit, and then re-poured mq inside. 


 
 

Mq large number of messages in the backlog for a few hours yet been solved  

  Tens of millions of pieces of data in the MQ backlog seven or eight hours, the easiest way to let him recover rate of consumption, then wait a few hours of consumption is completed. 

  1000 is a consumer one second, one second three consumers is 3000, one minute is 180 000, more than 10 million, so if you are a backlog of millions to tens of millions of data, even if the consumer recovery , and also it takes about one hour to recover  

  Generally this time, only operate the temporary expansion of the emergency, concrete steps and ideas are as follows:  

    First fix consumer issues, to ensure that the recovery rate of consumption, and then stopped all existing cnosumer

    Create a new topic, partition is 10 times the original, the original temporary establishment of a good 10 or 20 times the number of queue

    Then write a program to distribute interim consumer data, this deployment up the backlog of consumer data, time-consuming process after consumption not directly write to the temporary establishment of a uniform polling good 10 times the number of queue

    Then temporary requisition 10 times the machines to deploy consumer, consumer consumption per batch of a temporary queue data

    This approach is equivalent to the temporary queue resources and consumer resources to expand 10 times to 10 times normal speed consumption data

    After a quick and so complete the backlog of data consumption, have to revert to the original deployment architecture, re-use the original consumer machine to consume news

topic ---- kafka
Database ---- ES
 

Guess you like

Origin www.cnblogs.com/Allen-rg/p/11690233.html