Prometheus之Alertmanager告警(七)

一、部署Alertmanager服务

1.1 alertmanager服务介绍

在Prometheus平台中,警报由独立的组件Alertmanager处理。通常情况下,我们首先告诉Prometheus Alertmanager所在的位置,然后在Prometheus配置中创建警报规则,最后配置Alertmanager来处理警报并发送给接收者(邮件,webhook、slack等)。

软件下载地址:

https://prometheus.io/download/
https://github.com/prometheus/alertmanager/releases

1.2 alertmanager服务部署 

Anertmanager可以不用和Prometheus部署在同一台机器,只要服务之间可以通信即可。

# wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz

# tar xf alertmanager-0.20.0.linux-amd64.tar.gz -C /usr/local/
# ln -s /usr/local/alertmanager-0.20.0.linux-amd64/ /usr/local/alertmanager

# 修改alertmanager配置文件
# cat alertmanager.yml                                                       
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'password'
  smtp_require_tls: false

route:
  group_by: ['alertname']  # 分组标签
  group_wait: 10s          # 分组等待时间,同一组内在10秒钟内还有其它告警,如果有则一同发送
  group_interval: 10s      # 上下两组间隔时间
  repeat_interval: 1h      # 重复告警间隔时间
  receiver: 'mail'         # 接收者是谁
receivers:                 # 定义接收者,将告警发送给谁
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
- name: 'mail'
  email_configs:
  - to: '[email protected]'

# 检查配置文件
# ./amtool check-config alertmanager.yml 

# 使用systemd来管理alertmanager服务
# cat /usr/lib/systemd/system/alertmanager.service    
[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager  --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target

# 启动alertmanager服务
# systemctl daemon-reload 
# systemctl start alertmanager 
# systemctl enable alertmanager 

二、配置Prometheus与alertmanager集成

配置告警规则可参考:https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

# vim /usr/local/prometheus/prometheus.yml
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 127.0.0.1:9093    # alertmanager的通信地址

rule_files:
  - "rules/*.yml"          # alertmanager使用的配置文件存放地址

# mkdir /usr/local/prometheus/rules
# cat /usr/local/prometheus/rules/general.yml
groups:
- name: example
  rules:
  - alert: InstanceDown
    expr: up == 0   # 表达式当前实例服务状态,1为正常
    for: 3m         # 告警持续时间5分钟 
    labels:
      severity: page     # 告警级别
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} 已经停止工作3分钟."

# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml 
Checking /usr/local/prometheus/prometheus.yml
  SUCCESS: 1 rule files found

Checking /usr/local/prometheus/rules/general.yml
  SUCCESS: 1 rules found

# systemctl restart prometheus.service 

 重启服务之后就能看到新的规则:

 停止一个服务进行验证

先中PENDING状态再是FIRING状态

 

 查看邮箱已经收到邮件:

 

猜你喜欢

转载自www.cnblogs.com/cyleon/p/12951681.html