The easiest way to monitor RabbitMQ

This is the 8th installment in the OpenStack Implementation Experience Sharing series.


Let's take a look at the picture first:

blob.png

this is the architecture diagram of Nova, we can see that there are two components at the center of the architecture: database and Queue. The database holds state information, and almost all nova-* services directly rely on Queue to implement communication and invocation between services.

OpenStack usually uses RabbitMQ to implement message queues. RabbitMQ is used in almost all OpenStack modules. If RabbitMQ hangs, OpenStack will be paralyzed. It can be said that it is the most important component.

In this section, we will discuss how to monitor the status of RabbitMQ and introduce a very simple and efficient method.
 

Enable RabbitMQ management plugin

 

In the default installation, we can only use the command rabbitmqctl to monitor RabbitMQ, such as: rabbitmqctl list_queues, rabbitmqctl list_exchanges and other subcommands. This method is not very intuitive and not very efficient.

Fortunately, RabbitMQ has a management plugin that provides a graphical management interface, which can be enabled by executing the following commands on the node (usually the control node) running RabbitMQ.

rabbitmq-plugins enable rabbitmq_management


Then you need to create a user to log in to the management console.
 

rabbitmqctl add_user  user_admin  passwd_admin

rabbitmqctl set_user_tags user_admin administrator

rabbitmqctl set_permissions -p / user_admin ".*" ".*" ".*"


Then you can log in with user_admin (passwd_admin password), the address is 

http://server-name:15672/


blob.png

The easiest and most efficient way to monitor

 

The web console will display a lot of RabbitMQ information, but one of the most important: Unacked Message. This data will be displayed directly in the Overview tab after logging in, and you can see it at first glance.


blob.png


Unacked Message refers to a message that has not yet been processed. Normally, this value should be 0. If this value is not 0 and continues to grow, then you have to pay attention, which means that there is a problem with RabbitMQ, the queue starts to backlog, and messages start to pile up, which is a serious signal.

What's next?

At this time, you can click on the label behind the Overview to see which Connection, Channel, Exchange, and Queues the messages are accumulated in, so as to analyze the root cause of the problem and solve it.

blob.png

blob.png
 

a real case

 

1. The customer's OpenStack suddenly hangs after a month of normal operation.

2. Log analysis found that nova, neutron and other modules reported that no related queue could be found. Because the logs of multiple modules point to RabbitMQ, it seems that RabbitMQ has the biggest suspicion.

3. Error in the RabbitMQ log has been continuously refreshed, but the information is very general. At this time, RabbitMQ is already in an inoperable state, and RabbitMQ can only be restarted.

4. After RabbitMQ restarts, OpenStack automatically recovers.

5. Open the RabbitMQ web console and find Unacked Message > 0.

6. Observe for a period of time and find that Unacked Message continues to grow at a fixed rate.

7. Locate the reasons for the growth of Messages, and find that they all come from Ceilometer-related Queues.

8. Checked Ceilometer and found a configuration error that caused the data sent by Ceilometer to Queue not to be processed.

9. Modify the configuration, restart Ceilometer, the Unacked Message begins to drop, and finally remains at 0.

This problem is like a memory leak. Unacked messages gradually accumulate and eventually overwhelm the entire OpenStack.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325904969&siteId=291194637