Prometheus integrates AlertManager to implement alarms

Prometheus Server configuration
  • Write an alarm rule configuration file using yml format
groups:
- name: 账号中心
  rules:
  # 检测状态报警
  - alert: 账号中心指标状态告警
    expr: ssl_expire_days == 0
    for: 0s
    labels:
      severity: 1
    annotations:
      instance: "账号中心 实例 {
    
    {$labels.instance}} 指标告警"
      description: "账号中心 实例{
    
    {$labels.instance}} 域名证书剩余值为:{
    
    {$value}}"

Configure the alarm rules triggered by prometheus through the yml file.

  • Modify the prometheus.yml configuration file, configure the alertmanager alarm address and alarm rule file
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "accountcenter.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "nodeExporter"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.240.130:9100"] #监控自己主机上的端口
  - job_name: "springboot"
    scrape_interval: 3s                                                # 多久采集一次数据
    scrape_timeout: 3s                                                 # 采集时的超时时间
    metrics_path: '/actuator/prometheus'                # 采集的路径
    static_configs:                                     # 采集服务的地址,设置成Springboot应用所在服务器的具体地址
      - targets: ["192.168.1.103:8188"]

  • Reload prometheus configuration file

curl -X POST http://127.0.0.1:9090/-/reload

SpringBoot - WebHook

Write a SpringBoot controller for callbacks triggered when AlertManager alerts

@RestController
@RequestMapping("/alertmanager")
public class AlertManagerWebHooks {
    
    

    @RequestMapping("/hook")
    public Object hook(@RequestBody String body){
    
    
        System.out.println("接受到告警信息:"+body);
        System.out.println("告警信息发送到数据库。。。");
        return "success";
    }

}
AlertManager configuration
  • Modify the alertmanager.yml configuration file and add webhook configuration
# 全局配置,全局配置,包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。
global:
  # 告警超时时间
  resolve_timeout: 5m
# 路由配置,设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。
route:
  # 用于将传入警报分组在一起的标签。
  # 基于告警中包含的标签,如果满足group_by中定义标签名称,那么这些告警将会合并为一个通知发送给接收器。
  group_by: ['alertname']
  # 发送通知的初始等待时间
  group_wait: 1s
  # 在发送有关新警报的通知之前需要等待多长时间
  group_interval: 1s
  # 如果已发送通知,则在再次发送通知之前要等待多长时间,通常约3小时或更长时间
  repeat_interval: 5s
  # 接受者名称
  receiver: 'web.hook'
# 配置告警消息接受者信息,例如常用的 email、wechat、slack、webhook 等消息通知方式
receivers:
  # 接受者名称
  - name: 'web.hook'
    # webhook URL
    webhook_configs:
      - url: 'http://192.168.1.103:8188/alertmanager/hook'

Configure the configuration information of the alarm receiver in the receivers configuration item. You can configure email, corporate WeChat, and customized webhooks. Webhooks is an HTTP interface. When the alertManager triggers an alarm, the configured interface will be automatically called.

  • Alarm process
  1. Prometheus regularly executes the configured alarm rules. If there is PromQL that meets the conditions, it will wait according to the for configuration item of each alarm. If the alarm conditions are still met after waiting for the evaluation time specified by for, an alarm will be triggered. At this time, Prometheus will report to AlertManager sends alerts.
  2. After receiving the alarm, AlertManager will group, suppress and deduplicate the arriving alarms.
  3. Call related triggers according to the configured receiver, such as WebHooks, Enterprise WeChat, DingTalk, etc.

Guess you like

Origin blog.csdn.net/qq_43750656/article/details/133271399