Monitoring of RabbitMQ

Preface

RabbitMQ message queue can not only solve the problem of coupling between services, but also improve the load capacity of a single service. It has become an essential tool in programming development, but its existence will definitely reduce the availability of the service. If MQ hangs , will directly affect all the services that use it. How to avoid shortcomings gracefully is the key to whether we can use it smoothly. Then here must be a set of combined monitoring + alarm .

How to choose a weapon?

According to the official documentation, that is Prometheus & Grafana. The former is a powerful monitoring toolkit (mainly implementing storage indicators and rule alarms), and the latter is an elegant indicator visualization system. It's truly a match made in heaven.

Their advantages are as follows (compared to the built-in management page)

  1. Separation of monitoring system and monitored system
  2. Reduce service overhead
  3. Long term storage metrics
  4. Convenient correlation aggregation and comparison of various related indicators
  5. More powerful and customizable user interface
  6. Metric data that’s easy to share
  7. More robust access rights
  8. Data collection for different nodes is more flexible

PS: Prometheus & Grafana are officially supported from version 3.8 and are strongly recommended for use in production environments.

Prometheus related information to learn
Grafana related information to learn
Grafana related implementation cases

But before using these tools, we need to understand the concept of monitoring .

How to understand good monitoring1

  1. It has a certain early warning function, which helps to solve problems in advance or arrange expansion plans before the problem affects the business;
  2. It has a certain positioning and troubleshooting function, which helps facilitate root cause analysis when problems occur;

How to achieve the above two points, you can pay attention to infrastructure and kernel indicators , business-specific indicators , health checks , and polling intervals (the official recommendation is 15s for development environment, 30s or even 60s for production environment, and the minimum cannot be less than 5s, because this monitoring polling will The TCP connection may be disconnected, thereby increasing RabbitMQ's channel and queue checks and increasing CPU consumption.)

Official API, used to understand the specific indicators of this business

Build monitoring 2

Grafana combined with MQ ready-made templates


  1. Official Guide - Monitoring ↩︎

  2. Official Guide-Building Prometheus & Grafana ↩︎

Guess you like

Origin blog.csdn.net/weixin_43832080/article/details/125147083