Use Prometheus to monitor the running status of eKuiper rules

Use Prometheus to monitor the running status of eKuiper rules

Recently I purchased a cloud server from cnaaa.com.

Prometheus is an open source system monitoring and alerting toolkit hosted by CNCF. Many companies and organizations have adopted Prometheus as a monitoring and alerting tool.

eKuiper's rule is a continuously running streaming computing task. Rules are used to process unbounded data streams. Under normal circumstances, after the rules are started, they will continue to run and generate running status data continuously. Stops until the rule is manually stopped or an unrecoverable error occurs. The rules in eKuiper provide a status API to obtain the running indicators of the rules. At the same time, eKuiper integrates Prometheus, which can be used to monitor various status indicators conveniently.

This tutorial is aimed at users who have a preliminary understanding of eKuiper. It will introduce rule status indicators and how to monitor specific indicators through Prometheus.

Rule Status Indicator

After using eKuiper to create a rule and run it successfully, users can view the running status indicators of the rule through CLI, REST API or management console. For example, there is an existing rule rule1, you can curl -X GET "<http://127.0.0.1:9081/rules/rule1/status">get , as follows:

{
    
    
  "status": "running",
  "source_demo_0_records_in_total": 265,
  "source_demo_0_records_out_total": 265,
  "source_demo_0_process_latency_us": 0,
  "source_demo_0_buffer_length": 0,
  "source_demo_0_last_invocation": "2022-08-22T17:19:10.979128",
  "source_demo_0_exceptions_total": 0,
  "source_demo_0_last_exception": "",
  "source_demo_0_last_exception_time": 0,
  "op_2_project_0_records_in_total": 265,
  "op_2_project_0_records_out_total": 265,
  "op_2_project_0_process_latency_us": 0,
  "op_2_project_0_buffer_length": 0,
  "op_2_project_0_last_invocation": "2022-08-22T17:19:10.979128",
  "op_2_project_0_exceptions_total": 0,
  "op_2_project_0_last_exception": "",
  "op_2_project_0_last_exception_time": 0,
  "sink_mqtt_0_0_records_in_total": 265,
  "sink_mqtt_0_0_records_out_total": 265,
  "sink_mqtt_0_0_process_latency_us": 0,
  "sink_mqtt_0_0_buffer_length": 0,
  "sink_mqtt_0_0_last_invocation": "2022-08-22T17:19:10.979128",
  "sink_mqtt_0_0_exceptions_total": 0,
  "sink_mqtt_0_0_last_exception": "",
  "sink_mqtt_0_0_last_exception_time": 0
}

The running indicator mainly includes two parts, one part is status, which is used to indicate whether the rule is running normally, and its value may be running, stopped manually, etc. The other part is the operation index of each operator of the rule. The operator of the rule is generated according to the SQL of the rule, and each rule may be different. In this example, the simplest rule SQL SELECT * FROM demo, actionis MQTT, and the generated operators are [source_demo, op_project, sink_mqtt]. Each operator has the same number of operating indicators, which together with the operator name constitute an indicator. For example, the index of the input quantity records_in_total of operator source_demo_0 is source_demo_0_records_in_total.

Operating indicators

The operating indicators of each operator are the same, mainly in the following categories:

  • records_in_total: The total number of messages read in, indicating how many messages have been processed after the rule starts.
  • records_out_total: The total number of output messages, indicating the number of messages correctly processed by the operator.
  • process_latency_us: The latency of the latest processing, in microseconds. This value is an instantaneous value to understand the processing performance of the operator. The delay of the overall rule is generally determined by the operator with the largest delay.
  • buffer_length: The operator buffer length. Due to the difference in calculation speed between operators, there are buffer queues between operators. A larger buffer length indicates that the operator processing is slower and cannot keep up with the upstream processing speed.
  • last_invocation: The last running time of the operator.
  • exceptions_total: The total amount of exceptions. Non-unrecoverable errors generated during operator operation, such as connection interruption, data format error, etc., are counted as exceptions and will not interrupt the rules.

After version 1.6.1, we added two exception-related indicators to facilitate debugging and handling of exceptions.

  • last_exception: The error message of the last exception.
  • last_exception_time: The time when the last exception occurred.

Numerical indicators in these operating indicators can be monitored using Prometheus. In the next section we will describe how to configure the Prometheus service in eKuiper.

Configure eKuiper's Prometheus service

eKuiper has its own Prometheus service, but it is disabled by default. Users can modify the configuration in etc/kuiper.yaml to enable the service. Among them, prometheus is a Boolean value, which can be changed to true to open the service; prometheusPort configures the access port of the service.

  prometheus: true
  prometheusPort: 20499

If you use Docker to start eKuiper, you can also enable the service by configuring environment variables.

docker run -p 9081:9081 -d --name ekuiper MQTT_SOURCE__DEFAULT__SERVER="$MQTT_BROKER_ADDRESS" KUIPER__BASIC__PROMETHEUS=true lfedge/ekuiper:$tag

In the startup log, you can see related information about service startup, for example:

time="2022-08-22 17:16:50" level=info msg="Serving prometheus metrics on port <http://localhost:20499/metrics"> file="server/prome_init.go:60"
Serving prometheus metrics on port <http://localhost:20499/metrics>

Click the address in the prompt http://localhost:20499/metricsto view the original index information of eKuiper collected in Prometheus. After eKuiper has rules running normally, you can search for indicators like kuiper_sink_records_in_total etc. on the page. Users can configure Prometheus to access eKuiper for richer display.

Use Prometheus to view the status

Above we have implemented the function of outputting eKuiper status as Prometheus indicators. Next, we can configure Prometheus to access this part of indicators and complete the preliminary monitoring.

Install and configure

Go to Prometheus official website to download the required system version and unzip it.

Modify the configuration file so that it monitors eKuiper. Open prometheus.yml and modify the scrape_configs section as follows:

global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: ekuiper
    static_configs:
      - targets: ['localhost:20499']

The monitoring task name is defined here as eKuiper, and the targets point to the address of the service started in the previous section. After the configuration is complete, start Prometheus.

./prometheus --config.file=prometheus.yml

After successful startup, open http://localhost:9090/to enter the management console.

simple monitoring

Monitor changes in the number of messages received by sinks of all rules. You can enter the name of the indicator to be monitored in the search box as shown in the figure, and click Execute to generate the monitoring table. Select Graph to switch to display methods such as line graphs.

Click Add Panel to monitor more indicators through the same configuration method.

Guess you like

Origin blog.csdn.net/weixin_53641036/article/details/127056858