Prometheus 企业微信报警与inhibit抑制 (二)

创建企业微信应用

注册企业微信:访问https://work.weixin.qq.com/,注册企业,随便填,不需要认证
创建应用


创建告警配置

vim /usr/local/prometheus-2.1/rule2.yml
groups:
- name: cluster
  rules:
  - alert: HIGHCPU
    expr: (1-irate(node_cpu_seconds_total{mode="idle",job="export_test2"}[1m]))*100 > 10
    for: 5s
    labels:
      for: 'highcpu'
    annotations:
      description: CPU MORE THAN 10%
      summary: 'cpu more than 10%'

在Prometheus的配置中添加以上规则

vim /usr/local/prometheus-2.1/prometheus.yml 
rule_files:
  - "/usr/local/prometheus-2.1/rule.yml"
  - "/usr/local/prometheus-2.1/rule2.yml"   #添加此规则

创建报警策略

 vim /usr/local/alertmanager-0.15.2/alertmanager.yml
global:
  wechat_api_corp_id: 'ww0cf7ad485760f5b5'
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: 'K4jH8eZp5elr5BStjpHUk6Jw__KMqH2YuL4_4Xj-lvQ'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5s
  repeat_interval: 10s
  receiver: 'weixin'
  routes:
  - receiver: 'weixin'
    match:
      severity: 'critical'
  - receiver: 'weixin'
    match:
      for: 'highcpu'
receivers:
- name: 'weixin'
  wechat_configs:
  - send_resolved: true #告警恢复发送通知
    to_party: '1'
    agent_id: '1000003'
    corp_id: 'ww0cf7ad485760f5b5'
    api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
    api_secret: 'K4jH8eZp5elr5BStjpHUk6Jw__KMqH2YuL4_4Xj-lvQ'

corp_id :在企业微信中我的企业 --> 企业信息 --> 企业ID
agent_id 与 api_secret :点击创建的应用Prometheus,可以看到AgentId 与 Secret
to_party:是指发送信息的部门ID
api_url : 企业微信地址

重启prometheus 与 alertmanager 服务

测试

在被监控机上拉高cpu

cat /dev/urandom | md5sum

企业微信收到告警信息

[FIRING:1] HIGHCPU (0 highcpu node2 export_test2 idle)
CPU MORE THAN 10% cpu more than 10%
Alerts Firing:
Labels:
 - alertname = HIGHCPU
 - cpu = 0
 - for = highcpu
 - instance = node2
 - job = export_test2
 - mode = idle
Annotations:
 - description = CPU MORE THAN 10%
 - summary = cpu more than 10%
Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1

AlertmanagerUrl:
http://centos1.com:9093/#/alerts?receiver=weixin

cpu恢复,收到通知信息

[RESOLVED] HIGHCPU (0 highcpu node2 export_test2 idle)
CPU MORE THAN 10% cpu more than 10%

Alerts Resolved:
Labels:
 - alertname = HIGHCPU
 - cpu = 0
 - for = highcpu
 - instance = node2
 - job = export_test2
 - mode = idle
Annotations:
 - description = CPU MORE THAN 10%
 - summary = cpu more than 10%
Source: http://centos1.com:9090/graph?g0.expr=%281+-+irate%28node_cpu_seconds_total%7Bjob%3D%22export_test2%22%2Cmode%3D%22idle%22%7D%5B1m%5D%29%29+%2A+100+%3E+10&g0.tab=1

AlertmanagerUrl:
http://centos1.com:9093/#/alerts?receiver=weixin

抑制规则试用

注:本文中配置抑制的两个监控项没有直接逻辑联系,纯属测试抑制功能

添加新的告警配置

vim /usr/local/prometheus-2.1/rule2.yml  在尾部添加以下配置
- name: test
  rules:
  - alert: go_goroutines
    expr: go_goroutines{instance="node2",job="export_test2"} > 5
    for: 10s
    labels:
      severity: 'warning'
    annotations:
      description: go_goroutines > 5

添加以上规则的通知方式与抑制配置

vim /usr/local/alertmanager-0.15.2/alertmanager.yml
global:
  wechat_api_corp_id: 'ww0cf7ad485760f5b5'
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: 'K4jH8eZp5elr5BStjpHUk6Jw__KMqH2YuL4_4Xj-lvQ'

#templates:
#  - '/alertmanager/template/wechat.tmpl'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5s
  repeat_interval: 10s
  receiver: 'weixin'
  routes:
  - receiver: 'weixin'
    match:
      severity: 'critical'
  - receiver: 'weixin'
    match:
      for: 'highcpu'
  - receiver: 'weixin'              #新添加通知方式(三行)
    match:
      severity: 'warning'
receivers:
- name: 'weixin'
  wechat_configs:
  - send_resolved: true
    to_party: '1'
    agent_id: '1000003'
    corp_id: 'ww0cf7ad485760f5b5'
    api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
    api_secret: 'K4jH8eZp5elr5BStjpHUk6Jw__KMqH2YuL4_4Xj-lvQ'

inhibit_rules:                   #新添加抑制规则
  - source_match:
      for: 'highcpu'
    target_match:
      severity: 'warning'
    equal: ['instance','job']

实现效果为:当cpu与go_goroutines都满足告警条件,cpu发出告警,go_goroutines被抑制
当已经发送的告警通知匹配到target_match和target_match_re规则,当有新的告警规则如果满足source_match或者定义的匹配规则,并且以发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。

猜你喜欢

转载自www.cnblogs.com/huandada/p/10371169.html