In the process of monitoring RocketMQ with Zabbix, the following are some possible monitoring items and triggers:
monitoring item
- Overall cluster health
- Number of producer and consumer connections
- Broker status
- The production and consumption speed of messages
- Queue depth (i.e. the number of messages in the queue)
- disk space usage
- memory usage
- CPU usage
- Network traffic
- Delays, including production delays and consumption delays
- Number of Topics
- Cumulative number of messages
- Number of message consumption failures
- Whether there are dead letters (messages that cannot be consumed)
- Performance parameters of the operating system (such as I/O)
- Monitor exceptions from RocketMQ logs
trigger
- Abnormal cluster status
- An abnormal number of connections (for example, a sudden drop in the number of connections for a producer or consumer)
- Broker status is abnormal
- Abnormal message production or consumption speed
- Queue depth is too large or too small
- Not enough disk space
- high memory usage
- High CPU usage
- Abnormal network traffic
- Production or consumption latency is too high
- The number of topics is abnormal
- Too many accumulated messages
- The number of message consumption failures increases
- dead letter
- The performance parameters of the operating system are outside the normal range (for example, I/O is too high)
- Anomalies were detected from RocketMQ's logs
The above monitoring items and triggers can be adjusted and added according to actual needs. The key is to find and solve problems in time to ensure the stable operation of the RocketMQ cluster.