RabbitMQ real production failure problem restoration and analysis

1.     The problem is caused

  The BI-collector-xx queue of a certain service is blocked, which greatly affects the unavailability of the entire rabbitMQ cluster service, and the suspended state of multiple application MQ producer services, which affects a wide range of systems and has a great impact on business. At that time, in order to deal with emergencies and restore the system to be available, the operation and maintenance relatively ruthlessly cleared a bunch of blocked queue information, and then restarted the entire cluster. RabbitMQ real production failure problem restoration and analysis

During the process of reviewing the entire fault, I had a lot of doubts in my mind, at least the following problems:

  1. Why is there queue blocking?
  2. Why does the blocking of a certain queue affect the operation of other queues (that is, the mutual influence between multiple queues)?
  3. Why does the application become unavailable when there is a problem with the MQ queue of an application?

2.     The test queue is blocked

One weekend at home, find a test environment, install rabbitmq to try to reproduce this process, and do a simulation test.

Write two test application demos (assuming two project applications) with producers and consumers respectively, and use queues testA and testB respectively.

In order to restore the production situation as much as possible, the same vhost was used for the first test, and different vhosts were set later.

Producer A, the sample code is as follows

 

Consumer A

 

Guess you like

Origin blog.csdn.net/yetaodiao/article/details/131326303