prometheus+alertmanager+webhook+钉钉+邮件通知
前言:
- alertmanager不支持直接将消息发送给钉钉,所以通过prometheus-webhook-dingtalk插件将prometheus的消息转换为可用信息,给alertmanager使用
- 本文提供邮件通知和钉钉通知两种方式
prometheus-webhook-dingtalk
dp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-webhook-dingtalk
namespace: infra
labels:
app: prometheus-webhook-dingtalk
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-webhook-dingtalk
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-webhook-dingtalk
spec:
containers:
- args:
- --ding.profile='ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxxx'
name: prometheus-webhook-dingtalk
image: harbor.yutang.cn/infra/prometheus-webhook-dingtalk:v1.4.0
ports:
- containerPort: 8060
imagePullSecrets:
- name: harbor
svc.yaml
apiVersion: v1
kind: Service
metadata:
namespace: infra
name: prometheus-webhook-dingtalk
labels:
app: prometheus-webhook-dingtalk
spec:
selector:
app: prometheus-webhook-dingtalk
ports:
- name: dingtalk-port
port: 8060
targetPort: 8060
protocol: TCP
alertmanager
cm-dingding.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: infra
data:
config.yml: |-
global:
resolve_timeout: 5m
route:
receiver: webhook
group_wait: 3s
group_interval: 5m
repeat_interval: 5s
group_by: ['alertname', 'cluster']
routes:
- receiver: webhook
group_wait: 10s
match:
team: node
receivers:
- name: webhook
webhook_configs:
- url: "http://prometheus-webhook-dingtalk:8060/dingtalk/ops_dingding/send"
send_resolved: true
cm-mail.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: infra
data:
config.yml: |-
global:
# 在没有报警的情况下声明为已解决的时间
resolve_timeout: 5m
# 配置邮件发送信息
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'xxxxx'
smtp_require_tls: false
# 所有报警信息进入后的根路由,用来设置报警的分发策略
route:
# 这里的标签列表是接收到报警信息后的重新分组标签,例如,接收到的报警信息里面有许多具有 cluster=A 和 alertname=LatncyHigh 这样的标签的报警信息将会批量被聚合到一个分组里面
group_by: ['alertname', 'cluster']
# 当一个新的报警分组被创建后,需要等待至少group_wait时间来初始化通知,这种方式可以确保您能有足够的时间为同一分组来获取多个警报,然后一起触发这个报警信息。
group_wait: 30s
# 当第一个报警发送后,等待'group_interval'时间来发送新的一组报警信息。
group_interval: 5m
# 如果一个报警信息已经发送成功了,等待'repeat_interval'时间来重新发送他们
repeat_interval: 5m
# 默认的receiver:如果一个报警没有被一个route匹配,则发送给默认的接收器
receiver: default
receivers:
- name: 'default'
email_configs:
- to: '[email protected]'
send_resolved: true
dp.yaml
kind: Deployment
metadata:
name: alertmanager
namespace: infra
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: harbor.yutang.cn/infra/alertmanager:v0.19.0
args:
- "--config.file=/etc/alertmanager/config.yml"
- "--storage.path=/alertmanager"
- "--cluster.advertise-address=0.0.0.0:9093"
ports:
- name: alertmanager
containerPort: 9093
volumeMounts:
- name: alertmanager-cm
mountPath: /etc/alertmanager
volumes:
- name: alertmanager-cm
configMap:
name: alertmanager-config
imagePullSecrets:
- name: harbor
svc.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: infra
spec:
selector:
app: alertmanager
ports:
- port: 80
targetPort: 9093
ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: alertmanager-web
namespace: infra
spec:
rules:
- host: alertmanager.dayutang.cn
http:
paths:
- path: /
backend:
serviceName: alertmanager
servicePort: 80