Prometheus+Grafana monitoring system

Table of contents

One, Promethues. 1

2. Grafana. 2

3. Environment preparation... 2

4. Install Prometheus and Grafana. 3

5. Configure Prometheus and Grafana. 4

1. Configure prometheus. 4

2. Configure grafana. 6

Six, monitor machine hardware resources... 7

7. Monitoring basic services... 9

1. Monitor NGINX. 9

2. Monitor MYSQL. 11

3. Monitor REDIS. 16

8. Monitoring application... 18

Nine, monitoring business interface data... 19

10. Alarm settings... 21

1. Set the alarm mode... 21

2. Set alarm rules... 24

①/data/prometheus/rules/node.yml 24

② /data/prometheus/rules/redis.yml 26

③/data/prometheus/rules/mysql.yml 27

④/data/prometheus/rules/nginx.yml 29

Use Prometheus+Grafana to build a monitoring system. The main monitoring content includes machine hardware resources, basic services, applications, and business interface data.

1. Promethues

Prometheus - Monitoring system & time series database

Prometheus is an open source service monitoring system and time series database. The Prometheus ecosystem consists of multiple components, including the Prometheus Server responsible for data collection and storage and providing PromQL query language support, providing multi-language client SDKs, the intermediate gateway Push Gateway that supports temporary job active push indicators, and the data collection component Exporter , which is responsible for collecting data from the target and converting it into a format supported by Prometheus and an Alertmanager that provides an alert function.

The difference between the Promethues Exporter component and the traditional data collection component is that it does not send data to the central server, but waits for the central server to take the initiative to grab it. Prometheus provides various types of exporters to collect the running status of various services.

Ecosystem architecture diagram provided by Promethues official website:

 

2. Grafana

Grafana: The open observability platform | Grafana Labs

Grafana is a cross-platform open source measurement analysis and visualization tool that supports data acquisition from multiple data sources (such as prometheus) for visual data display.

3. Environmental preparation

&&& Monitor the hardware resources of the server: cpu 8 cores, memory 32G, disk 250G, network card.

Prometheus and other monitoring service deployment arrangements are as follows:

 

1. Check the operating system version

[root@node64 ~]# cat /etc/redhat-release

CentOS Linux release 7.1.1503 (Core)

[root@node64 ~]# getconf LONG_BIT

64

2. Check the network card IP and configuration

[root@node64 ~]# ip a

eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether fa:16:3e:a3:89:25 brd ff:ff:ff:ff:ff:ff

inet 192.168.0.91/24 brd 192.168.0.255 scope global eth1

valid_lft forever preferred_lft forever

inet6 fe80::f816:3eff:fea3:8925/64 scope link

valid_lft forever preferred_lft forever

[root@node64 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1

NAME=eth1

TYPE=Ethernet

BOOTPROTO=dhcp

DEVICE=eth1

ONBOOT=yes

IPV4_ROUTE_METRIC=100

4. Install Prometheus and Grafana

1. Create the installation directory mkir /data/prometheus

groupadd prometheus

useradd -g prometheus -s /sbin/nologin prometheus

chown -R prometheus:prometheus prometheus

2. Download prometheus, grafana installation package and prometheus plug-in package,

包括node_exporter、mysqld_exporter、nginx-vts-exporter、redis_exporter、alertmanager

The Prometheus installation package reference is as follows:

wget https://github.com/prometheus/prometheus/releases/download/v2.20.0/prometheus-2.20.0.linux-amd64.tar.gz

   The Grafana installation package reference is as follows:

wget https://dl.grafana.com/oss/release/grafana-7.1.1.linux-amd64.tar.gz

3. Unzip and install

5. Configure Prometheus and Grafana

1. Configure prometheus

a. Modify the configuration file prometheus.yml

vi /data/prometheus/prometheus/prometheus.yml

scrape_configs:

    metrics_path: /prometheus/metrics

static_configs:

      - targets: ['192.168.0.91:9090']

b. Check the configuration file

/data/prometheus/prometheus

[root@node64 prometheus]# ./promtool check config prometheus.yml

Note: Make the prometheus configuration effective pgrep -fl prometheus

c. Register prometheus as a system service

[root@node64 prometheus]# cat /usr/lib/systemd/system/prometheus.service

[Unit]

Description=prometheus

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/prometheus/prometheus --web.external-url=prometheus   --web.enable-admin-api --config.file=/data/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus/prometheus/data --storage.tsdb.retention=15d --log.level=info --web.enable-lifecycle

Restart=on-failure

[Install]

WantedBy=multi-user.target

d. Start and view the prometheus service

systemctl enable prometheus

systemctl start prometheus

systemctl status prometheus

   [root@node64 prometheus]# netstat -anp | grep 9090

 

e. nginx forwards the prometheus service

    Prometheus' Node Exporter does not provide any authentication support. However, with Nginx as a reverse proxy server, we can easily add HTTP Basic Auth functionality to Node Exporter.

yum -y install httpd

[root@node23 conf]# htpasswd -c .htpasswd_prometheus Prometheus

Iampwd

location /prometheus/ {

     proxy_set_header X-Real-IP $remote_addr;

     proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

     proxy_set_header Host $http_host;

     proxy_set_header X-Nginx-Proxy true;

     proxy_pass http://192.168.0.91:9090;

     proxy_redirect off;

     proxy_buffering off;

     proxy_read_timeout 90;

     proxy_send_timeout 90;

        auth_basic "Prometheus";

auth_basic_user_file ".htpasswd";

}

f. web access prometheus

https://***.com:9091/prometheus/

prometheus/Iampwd

 

2. Configure grafana

   a. Modify the configuration file default.ini

     vi /data/prometheus/grafana/conf/defaults.ini

     http_port = 3000

     root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana/

  

b. Register grafana as a system service

     [root@node64 conf]# cat /usr/lib/systemd/system/grafana-server.service

[Unit]

Description=Grafana

After=network.target

[Service]

Type=notify

ExecStart=/data/prometheus/grafana/bin/grafana-server -homepath /data/prometheus/grafana

Restart=on-failure

[Install]

WantedBy=multi-user.target

c. Start and view the grafana service

systemctl enable grafana-server

systemctl start grafana-server

systemctl status grafana-server

  

   d. nginx forwards the grafana service

     location /grafana/ {

          proxy_set_header X-Real-IP $remote_addr;

          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

          proxy_set_header Host $http_host;

          proxy_set_header X-Nginx-Proxy true;

          proxy_pass http://192.168.0.91:3000/;

          proxy_redirect off;

          proxy_buffering off;

          proxy_read_timeout 90;

          proxy_send_timeout 90;

       }

   e. web access grafana and configure data source

     https://***.com:9091/grafana/

admin/Iampwd

     Data source URL: http://192.168.0.91:9090/prometheus

 

 

6. Monitor machine hardware resources

Prometheus uses the node_exporter plug-in to monitor the hardware resources of the machine, and prometheus actively grabs the required data from the network intercommunication machine where the node_exporter service is installed

1. Install node_exporter on the machine that needs to be monitored

2. Register node_exporter as a system service

[root@node64 conf]# cat /usr/lib/systemd/system/node_exporter.service

[Unit]

Description=node_exporter

Documentation=https://prometheus.io/

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/node_exporter/node_exporter

Restart=on-failure

[Install]

WantedBy=multi-user.target

3. Start and view the node_exporter service

systemctl enable node_exporter

systemctl start node_exporter

systemctl status node_exporter

4. Modify prometheus.yml and restart the prometheus service

scrape_configs:

- job_name: 'node_exporter'

static_configs:

-targets:['192.168.0.91:9100','192.168.0.92:9100'…]

   

5. Visit prometheus to view the monitoring status

 6. Introduce grafana monitoring panel

Dashboards | Grafana Labs

node_exporter 8919

mysql_exporter 11323

mysql overview 7362

nginxvts 2949

redis 11835

 

Seven, monitoring basic services

1. Monitor NGINX

Nginx obtains certain index data of nginx through the nginx-module-vts module, and Prometheus collects nginx information through the nginx-vts-exporter component.

a. Install the nginx-module-vts module on the nginx server

./configure --prefix= /data/nginx --with-http_gzip_static_module --with-http_stub_status_module --with-http_ssl_module --with-pcre --with-file-aio --with-http_realip_module --add-module=/data/nginx-module-vts

make && make install

  • Modify the nginx.conf file

http{

    vhost_traffic_status_zone;

    vhost_traffic_status_filter_by_host on;

location /status {

vhost_traffic_status_display;

vhost_traffic_status_display_format html;

  }

}

 

b. Download and install the nginx-vts-exporter plug-in on both the nginx server and the prometheus server

wget

https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz

c. Register nginx-vts-exporter as a system service on the nginx server (192.168.0.71)

cat /etc/systemd/system/nginx-vts-exporter.service

[Unit]

Description=nginx_exporter

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/nginx-vts-exporter/nginx-vts-exporter -nginx.scrape_uri=https://公网IP:9091/status/format/json

Restart=on-failure

[Install]

WantedBy=multi-user.target

d. Register nginx-vts-exporter as a system service on the prometheus server ( 192.168.0.91 )

cat /etc/systemd/system/nginx-vts-exporter.service

[Unit]

Description=nginx_exporter

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/prometheus/nginx-vts-exporter/nginx-vts-exporter -nginx.scrape_uri=https://192.168.0.71:9091/status/format/json

Restart=on-failure

[Install]

WantedBy=multi-user.target

e. Start and view the nginx-vts-exporter service on the nginx server and prometheus server

systemctl enable nginx-vts-exporter

systemctl start nginx-vts-exporter

systemctl status nginx-vts-exporter

f. Modify the configuration file prometheus.yml on the prometheus server and restart the prometheus service

- job_name: 'nginx'

static_configs:

      - targets: ['192.168.0.91:9913']

  

 https://public network IP:9091/status g, view nginx page monitoring

https://public IP:9091/status

Import nginx monitoring panel nginx-vts-exporter 2949 on grafana and view the panel

       https://***.com:9091/grafana/d/5-RKCVxGk/nginx-vts-stats?orgId=1

 

 

2. Monitor MYSQL

Prometheus collects data related to MySQL master and slave servers through the mysqld_exporter component.

1) After installing mysql using an automated script, add the mysql service to the system service and set the boot to start automatically

[root@centos7-min4 nginx]# cp /opt/mysql57/support-files/mysql.server /etc/rc.d/init.d/mysqld

chmod +x /etc/init.d/mysqld

chkconfig --add mysqld

chkconfig --list

 # systemctl start mysqld

# systemctl status mysqld

[mysql@centos7-min4 nginx]$ mysql -uroot -p —— 123456

mysql> select version();

+------------+

| version() |

+------------+

| 5.7.24-log |

2) Install the mysqld_exporter component on the prometheus server

Prometheus monitors mysql master-slave server

    • Log in to mysql to create an account for the exporter and authorize it

create user 'exporter'@'192.168.0.%' identified by 'Abc123';

grant process,replication client,select on *.* to 'exporter'@'192.168.0.%';

flush privileges;

   

    • Install the mysqld_exporter service on the Prometheus server and monitor the mysql master-slave service at the same time

   ls -al /data/prometheus/mysqld_exporter/

   .my-master.cnf

   .my-slave.cnf

       root@node64 mysqld_exporter]# cat .my-master.cnf

[client]

user=exporter 

password=Abc123

host=192.168.0.92

port=3306

[root@node64 mysqld_exporter]# cat .my-slave.cnf

[client]

user=exporter

password=Abc123

host=192.168.0.93

port=3306

    • Start the mysqld_exporter service

Start a service for the mysql master and slave services respectively:

Mysql master service starts

/data/prometheus/mysqld_exporter/mysqld_exporter --web.listen-address=192.168.0.91:9104 --config.my-cnf=/data/prometheus/mysqld_exporter/.my-master.cnf --collect.auto_increment.columns --collect.binlog_size --collect.global_status --collect.engine_innodb_status --collect.global_variables --collect.info_schema.innodb_metrics --collect.info_schema.innodb_tablespaces --collect.info_schema.innodb_cmp --collect.info_schema.innodb_cmpmem --collect.info_schema.processlist --collect.info_schema.query_response_time --collect.info_schema.tables --collect.info_schema.tablestats --collect.info_schema.userstats --collect.perf_schema.eventswaits --collect.perf_schema.file_events --collect.perf_schema.indexiowaits --collect.perf_schema.tableiowaits --collect.perf_schema.tablelocks

Mysql starts from service

/data/prometheus/mysqld_exporter/mysqld_exporter --web.listen-address=192.168.0.91:9105 --config.my-cnf=/data/prometheus/mysqld_exporter/.my-slave.cnf

(Note: Please keep the other parameters consistent with the above Mysql main service startup)

3) Modify the prometheus configuration file information and restart prometheus

prometheus.yml

- job_name: 'mysql_exporter'

    static_configs:

    #  - targets: ['192.168.0.92:9104','192.168.0.93:9104']

      - labels:

          instance: master:3306 # The alias of the instance displayed by grafana

      - targets:

        - 192.168.0.91:9104 # The port exposed by mysqld_exporter

      - labels:

          instance: slave:3306 # The alias of the instance displayed by grafana

      - targets:

        - 192.168.0.91:9105 # The port exposed by mysqld_exporter

4) View the mysql data of the prometheus and grafana panels, and import the mysql monitoring panel

mysql_exporter 11323

mysql overview 7362

 

 

 

 

PS: mysql synchronization fault handling: Slave_SQL_Running: No

Analysis: Causes of mysql data synchronization failure

  1. The program may have performed a write operation on the slave
  2. It may be caused by the transaction rollback after the slave machine is restarted

Solution: first stop the slave service, check the status of the host on the master server, and synchronize to the slave server according to the values ​​corresponding to File and Position, and finally start the slave service to check the synchronization status

Main server:

mysql> show master status;

From the server:

mysql> stop slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>

mysql> change master to master_host='192.168.0.92',

-> master_user='repl',

-> master_password='123456',

-> master_log_file='mysql-bin-T-prod-3306.000005',

-> master_log_pos=653020;

Query OK, 0 rows affected, 2 warnings (0.05 sec)

mysql> start slave;

 

3. Monitor REDIS

Use the redis_exporter component to monitor the redis cluster with three masters and three slaves.

     1) Use automated scripts to install redis three-master and three-slave clusters

     2) Download and install the redis_exporter service on the prometheus server

wget https://github.com/oliver006/redis_exporter/releases/download/v1.3.5/redis_exporter-v1.3.5.linux-amd64.tar.gz

    3) Monitor one of the servers in the redis cluster to monitor the entire created cluster

cd /data/prometheus/redis_exporter

./redis_exporter -redis.addr 192.168.0.93:7000 -redis.password 'zxcvb123' &

   4) Modify the prometheus configuration file information and restart prometheus

- job_name: 'redis_exporter_targets'

    static_configs:

      - targets:

        - redis://192.168.0.3:7000

        - redis://192.168.0.2:7003

        - redis://192.168.0.72:7002

        - redis://192.168.0.35:7001

        - redis://192.168.0.14:7004

        - redis://192.168.0.13:7005

    metrics_path: /scrape

    relabel_configs:

      - source_labels: [__address__]

        target_label: __param_target

      - source_labels: [__param_target]

        target_label: instance

      - target_label: __address__

        replacement: 192.168.0.91:9121

  - job_name: 'redis_exporter'

    static_configs:

      - targets:

        - 192.168.0.91:9121

       5) View prometheus and grafana panel data, import panel redis 11835

 

 

8. Monitoring application

Prometheus monitors the application using the node_exporter component

     1) Install the process-exporter component on the application server that needs to be monitored

wget

https://github.com/ncabatoff/process-exporter/releases/download/v0.5.0/process-exporter-0.5.0.linux-amd64.tar.gz

    2) Configure application monitoring information

process-conf.yml

 

      3) Start the application monitoring service and specify the configuration file

./process-exporter -config.path  process-conf.yml &

      4) Modify prometheus configuration information and restart prometheus

[root@node64 prometheus]# vi prometheus.yml

- job_name: process

static_configs:

- targets: ['192.168.0.35:9256','192.168.0.13:9256'…]

     5) View prometheus and grafana panel data  

    The dashboard corresponding to process-exporter is: Named processes | Grafana Labs

 

 

 

9. Monitoring business interface data

Configure grafana data display according to the business monitoring interface data provided by the development.

https://***.com:9092/api

  • Notice:

1. Grafana does not save Prometheus data. It queries Prometheus and displays the UI. In this case you have to look at clearing prometheus data.

Prometheus has a retention period of 15 days by default. But this can be tuned to suit your needs with the -storage.local.retention flag

2. Prometheus will set the expired data to NaN, and the sum() summation function does not support NaN, so the metric needs to be adjusted:

sum(st_invoke_count{app_id=~'$appid',road_type='1'}>0)

3. Grafana global variables

4. Grafana menu cascade

10. Alarm settings

1. Set the alarm mode

> Prometheus

  Prometheus implements alerts through the component alertmanager. Alertmanager receives the alerts sent by prometheus and performs a series of processing on the alerts and sends them to specified users.

prometheus--->trigger threshold--->exceeding duration--->alertmanager--->group|suppress|silent--->media type--->mail, DingTalk, WeChat, etc.

      1) Install alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.16.2/alertmanager-0.16.2.linux-amd64.tar.gz

      2) Modify the alertmanager configuration file information alertmanager.yml

      3) Open the smtp service

      4) Configure the alarm notification template

[root@centos7-min4 alertmanager-0.21.0]# cat template/test.tmpl

{ { define "test.html" }}

<table border="5">

<tr>

<td>Alarm item</td>

<td>Matter</td>

<td>Alarm Threshold</td>

<td>Start time</td>

</tr>

{ { range $i, $alert := .Alerts }}

<tr>

<td>{ { index $alert.Labels "alertname" }}</td>

<td>{ { index $alert.Labels "instance" }}</td>

<td>{ { index $alert.Annotations "value" }}</td>

<td>{ { $alert.StartsAt }}</td>

</tr>

{ { end }}

</table>

{ { end }}

      5) Start the alertmanager service

(1) Specify the configuration file to start

[root@centos7-min4 alertmanager-0.21.0]# ./alertmanager --config.file=alertmanager.yml &

(2) Configured as a system service startup

[root@centos7-min4 alertmanager-0.21.0]# cat /usr/lib/systemd/system/alertmanager.service

[Unit]

Description=https://prometheus.io

[Service]

Restart=on-failure

ExecStart=/opt/alertmanager-0.21.0/alertmanager --config.file=/opt/alertmanager-0.21.0/alertmanager.yml

[Install]

WantedBy=multi-user.target

systemctl enable alertmanager

systemctl start alertmanager

systemctl status alertmanager

      6) Modify the prometheus configuration file and restart prometheus

# Alertmanager configuration

alerting:

  alert managers:

  - static_configs:

    - targets:

       - 192.168.0.91:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

    - "/data/prometheus/rules/node.yml"

    - "/data/prometheus/rules/redis.yml"

    - "/data/prometheus/rules/mysql.yml"

    - "/data/prometheus/rules/nginx.yml"

 - "/data/prometheus/rules/service-api.yml"

> Grafana

2. Set alarm rules

①/data/prometheus/rules/node.yml

groups:

- name: NodeProcess

  rules:

  - alert: NodeStatus

    expr: up == 0

    for: 1m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: The server is down"

      description: "{ {$labels.instance}}: The server delay exceeds 5 minutes"

  - alert: NodeFilesystemUsage

    expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: { {$labels.mountpoint }} partition usage is too high"

      description: "{ {$labels.instance}}: { {$labels.mountpoint }} partition usage is greater than 80% (current value: { { $value }})"

  - alert: NodeMemoryUsage

    expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: memory usage is too high"

      description: "{ { $labels.instance}}: memory usage is greater than 80% (current value: { { $value }})"

  - alert: NodeCPUUsage

    expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 80

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: CPU usage is too high"

      description: "{ { $labels.instance}}: CPU usage is greater than 80% (current value: { { $value }})"

  - alert: LoadCPU

    expr: node_load5 > 5

    for: 2m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: load is too high"

      description: "{ { $labels.instance}}: The load average exceeds 5 within 5 minutes (current value: { { $value }})" 

  - alert: DiskIORead

    expr: irate(node_disk_read_bytes_total{device="sda"}[1m]) > 30000000

    for: 1m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: I/O read load is too high"

      description: "{ {$labels.instance}}: I/O reading per minute has exceeded 30MB/s (current value: { { $value }})"

  - alert: DiskIOWrite

    expr: irate(node_disk_written_bytes_total{device="sda"}[1m]) > 30000000

    for: 1m

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: I/O write load is too high"

      description: "{ {$labels.instance}}: I/O write per minute has exceeded 30MB/s (current value: { { $value }})"

  - alert: incoming network bandwidth

    expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 18432

    for: 1m

    labels:

      status: warning

    annotations:

      summary: "{ {$labels.mountpoint}} Incoming network bandwidth is too high!"

      description: "{ {$labels.mountpoint }}Incoming network bandwidth is higher than 18M for 5 minutes. RX bandwidth usage { {$value}}"

  - alert: Outgoing network bandwidth

    expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 18432

    for: 1m

    labels:

      status: warning

    annotations:

      summary: "{ {$labels.mountpoint}} Outgoing network bandwidth is too high!"

      description: "{ {$labels.mountpoint }} Outgoing network bandwidth is higher than 18M for 5 minutes. RX bandwidth usage { {$value}}"

  - alert: number of network connections

    expr: node_sockstat_TCP_inuse > 240

    for: 1m

    labels:

      status: warning

    annotations:

      summary: "{ {$labels.mountpoint}} The number of connections is too high!"

      description: "{ {$labels.mountpoint }}Current connection number{ {$value}}"

② /data/prometheus/rules/redis.yml

groups:

- name:  Redis

  rules:

    - alert: RedisDown

      expr: redis_up  == 0

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "Redis down (instance { { $labels.instance }})"

        description: "Redis cluster node failure\n VALUE = { { $value }}\n LABELS: { { $labels }}"

    - alert: OutOfMemory

      expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "Out of memory (instance { { $labels.instance }})"

        description: "Redis is running out of memory (> 90%)\n  VALUE = { { $value }}\n  LABELS: { { $labels }}"

    - alert: ReplicationBroken

      expr: delta(redis_connected_slaves[1m]) < 0

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "Replication broken (instance { { $labels.instance }})"

        description: "Redis instance lost a slave\n  VALUE = { { $value }}\n  LABELS: { { $labels }}"

    - alert: TooManyConnections

      expr: redis_connected_clients > 1000

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "Too many connections (instance { { $labels.instance }})"

        description: "Redis instance has too many connections\n  VALUE = { { $value }}\n  LABELS: { { $labels }}"     

    - alert: RejectedConnections

      expr: increase(redis_rejected_connections_total[1m]) > 0

      for: 5m

      labels:

        severity: warning

      annotations:

        summary: "Rejected connections (instance { { $labels.instance }})"

        description: "Some connections to Redis has been rejected\n  VALUE = { { $value }}\n  LABELS: { { $labels }}"

    - alert: AofSaveStatus

      expr: redis_aof_last_bgrewrite_status < 1

      for: 5m

      you can:

        serverity: warning

      annotations:

        summary: "Missing backup (instance { { $labels.instance }})"

        description: "Redis AOF persistence failed\n  VALUE = { { $value }}\n  LABELS: { { $labels }}"

③/data/prometheus/rules/mysql.yml

groups:

    - name: MySQL

      rules:

      - alert: MySQL Status

        expr: mysql_up == 0

        for: 5s

        labels:

          severity: warning

        annotations:

          summary: "{ {$labels.instance}}: MySQL has stop !!!"

          description: "Detect the running status of the MySQL database"

   

      - alert: MySQL Slave IO Thread Status

        expr: mysql_slave_status_slave_io_running != 1

        for: 5s

        labels:

          severity: warning

        annotations:

          summary: "{ {$labels.instance}}: MySQL Slave IO Thread has stop !!!"

          description: "Detect the running status of MySQL master-slave IO thread"

      - alert: MySQL Slave SQL Thread Status

        expr: mysql_slave_status_slave_sql_running != 1

        for: 5s

        labels:

          severity: warning

        annotations:

          summary: '{ {$labels.instance}}: MySQL Slave SQL Thread has stop !!!' 

          description: "Detect the running status of MySQL master-slave SQL thread"

      - alert: MySQL Slave Delay Status

        expr: mysql_slave_status_sql_delay == 30

        for: 5s

        labels:

          severity: warning

        annotations:

          summary: "{ {$labels.instance}}: MySQL Slave Delay has more than 30s !!!"

          description: "Detect MySQL master-slave delay status"

   

      - alert: Mysql_Too_Many_Connections

        expr: rate(mysql_global_status_threads_connected[5m]) > 200

        for: 2m

        labels:

          severity: warning

        annotations:

          summary: "{ {$labels.instance}}: too many connections"

          description: "{ {$labels.instance}}: Too many connections, please deal with it,(current value is: { { $value }})" 

④/data/prometheus/rules/nginx.yml

groups:

- name: nginx

  rules:

  - alert: Nginx Status

    expr: up{instance="192.168.0.91:9913",job="nginx"} == 0

    for: 5s

    labels:

      severity: warning

    annotations:

      summary: "{ {$labels.instance}}: Nginx has stop !!!"

      description: "Detect abnormal running status of Nginx"

Guess you like

Origin blog.csdn.net/Wemesun/article/details/126455053