Alertmanager与Prometheus rules

Prometheus与Alertmanager 结合完成监控工作,结构如图:
在这里插入图片描述
Prometheus+Alertmanager组合实现告警共涉及四个配置文件其中Prometheus下需要配置两个文件,包括基础配置文件:Prometheus.yml和规则文件:rules.yml;Alertmanager下也需要配置两个文件,包括基础配置文件:alertmanager.yml和告警

四个配置文件的关系如下:
在这里插入图片描述
配置方法如下:

1、Prometheus配置:

1) more prometheus.yml

global:
  scrape_interval:     15s
  external_labels:
    monitor: 'codelab-monitor'
scrape_configs:
  - job_name: test
    static_configs:
      - targets: ['10.13.82.244:8000']
        labels:
          instance: proxy
  - job_name: node
    static_configs:
      - targets: ['10.13.82.244:9100','10.13.82.196:9100']
alerting:   #配置Alertmanager相关信息
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]

rule_files:  #告警规则文件
   - rule.yml

2) more rules.yaml

groups:
 - name: test-rules
   rules:
   - alert: InstanceDown # 告警名称
     expr: up == 0 # 告警的判定条件,参考Prometheus高级查询来设定
     for: 2m # 满足告警条件持续时间多久后,才会发送告警
     labels: #标签项
      team: node
     annotations: # 解析项,详细解释告警信息
      summary: "{{$labels.instance}}: has been down"
      description: "{{$labels.instance}}: job {{$labels.job}} has been down "
      value: "{{$value}}"

2、Alertmanager 相关配置

1)more alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost: smtp.qq.com:465 # 邮箱smtp服务器代理
  smtp_from: [email protected] # 发送邮箱名称
  smtp_auth_username: [email protected] # 邮箱名称
  smtp_auth_password: xxxxxxxxx# 邮箱密码或授权码
  smtp_require_tls: false
  #wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' # 企业微信地址

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email'
templates:  #指定告警模板文件
  - test.tmpl
receivers:
#- name: 'web.hook'
 # webhook_configs:
  #- url: 'http://127.0.0.1:5001/'
  - name: 'email'
    email_configs: # 邮箱配置
    - to: '[email protected]'  # 接收警报的email配置
      html: '{{ template "test.html" . }}' # 设定邮箱的内容模板
      headers: { Subject: " 报警邮件"} # 接收邮件的标题
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

2) more test.tmpl

{{ define "test.html" }}
<table border="1">
        <tr>
                <td>报警项</td>
                <td>实例</td>
                <td>报警阀值</td>
                <td>开始时间</td>
                <td>告警信息</td>
        </tr>
        {{ range $i, $alert := .Alerts }}
                <tr>
                        <td>{{ index $alert.Labels "alertname" }}</td>
                        <td>{{ index $alert.Labels "instance" }}</td>
                        <td>{{ index $alert.Annotations "value" }}</td>
                        <td>{{ $alert.StartsAt }}</td>
                        <td>{{ index $alert.Annotations "description" }}</td>
                </tr>
        {{ end }}
</table>
{{ end }}
发布了48 篇原创文章 · 获赞 31 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/weixin_44723434/article/details/104498416