table of Contents
A, Alertmanager Profile
Prometheus is a division of platforms, collect and store the alarm metrics are separate, the alarm is the responsibility of the Alertmanager, which is a separate part of the monitoring environment. Alert rule is defined in the Prometheus server, these rules can trigger time and then spread to alertmanager, alertmanager then decide how to deal with each alert, deal with the problem like copy, and decide what mechanism to use when sending an alert : instant messaging, e-mail or other nails, micro-letters and other tools.
Two, Alertmanager deployment
Alertmanager default listening port 9093, the port cluster answer 9094.
# 下载
[root@prometheus ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0-rc.0/alertmanager-0.20.0-rc.0.linux-amd64.tar.gz
# 解压
[root@prometheus ~]# tar -zxf alertmanager-0.20.0-rc.0.linux-amd64.tar.gz -C /usr/local/
[root@prometheus ~]# mv /usr/local/alertmanager-0.20.0-rc.0.linux-amd64 /usr/local/alertmanager-0.20.0
[root@prometheus ~]# ln -sv /usr/local/alertmanager-0.20.0 /usr/local/alertmanager
# 运行
[root@prometheus ~]# ln -sv /usr/local/alertmanager/alertmanager /usr/local/bin/
[root@prometheus ~]# alertmanager &
[root@prometheus ~]# netstat -tulnp |grep alert
tcp6 0 0 :::9093 :::* LISTEN 41194/alertmanager
tcp6 0 0 :::9094 :::* LISTEN 41194/alertmanager
udp6 0 0 :::9094 :::* 41194/alertmanager
Visit http: //
Three, Alertmanager configuration
Alertmanager disposed in two places, one of which is disposed an alarm Prometheus server node, which specifies the file path mismatch alarm rules, and monitoring alertmanager itself. Another direct configuration alertmanager own configuration, configured in alertmanager.yml.
[root@prometheus alertmanager]# cat /usr/local/prometheus/prometheus.yml
...
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.0.143:9093 #配置alertmanager节点列表
rule_files:
- "rules/*_rules.yml" #指定规则文件
# - "rules/*_alert.yml"
scrape_configs:
......
- job_name: 'alertmanager' #指定监控任务alertmanager
static_configs:
- targets: ['192.168.0.143:9093']
After the addition is complete, the web end prometheus server can view the list of targets to alertmanager, as follows:
Once configured prometheus.yml, look at the default presentation alertmanager.yml, as follows:
[root@prometheus alertmanager]# cat alertmanager.yml
global:
resolve_timeout: 5m #处理超时时间,默认为5min
route:
group_by: ['alertname'] # 报警分组依据
group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知
group_interval: 10s # 在发送新警报前的等待时间
repeat_interval: 1h # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
receiver: 'web.hook' # 发送警报的接收者的名称,以下receivers name的名称
receivers:
- name: 'web.hook' # 警报
webhook_configs: # webhook配置
- url: 'http://192.168.0.143:5001/'
inhibit_rules: # 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
global: global configuration, including a timeout after the alarm solve, SMTP configuration, various channels API addresses, etc. notification.
route: distribution policy is used to set the alarm, it is a tree structure, depth-first match in the order from left to right.
receivers: configure alarm message receiver information, such as conventional email, wechat, slack, webhook other message notification mode.
inhibit_rules: suppression rule configuration, when there is an alarm (source) to another set of matched, suppression rules disables alarm and a set of matching (target).