Due to work needs, our company uses the kube-prometheus package, which is not installed in binary mode, which saves a lot of trouble and encountered a lot of pits during testing. Record here
1 prometheus rules to write
this rule most of the online Yes, here I just use kube-prometheus to write rules and methods to record and
edit prometheus-rules.yaml (this file is in
the /manifests directory after kube-prometheus has cloned the code ), and write the following at the bottom, currently writing It is good to detect the running status of Pod and the
function of node node status. Others are similar, add the following content
#Detect pod status
-
alert: pod-status
annotations:
message: pod-{{ $labels.pod }}故障
expr: |
kube_pod_container_status_running != 1
for: 1m
labels:
severity: warning#Detect node node status
- alert: node-status
annotations:
message: node-{{ $labels.hostname }} failure
expr: |
kube_node_status_condition{status="unknown",condition="Ready"} == 1
for: 1m
labels:
severity: warning
last saved quit
and then edit alertmanager-secret.yaml file, which is mainly configured to send mail or nails, I'm
here is the way to nail alarm, e-mail is also equipped with, but not used, the message now rarely see, so direct
nail After checking the alarm, replace the following with the original, as follows:
apiVersion: v1
data: {}
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
global:
resolve_timeout: 1m # Processing timeout
smtp_smarthost:'smtp.9icaishi.net:25' # Email smtp server proxy
smtp_from:'[email protected]' # Sending email name
smtp_auth_username:'[email protected]' # Email name
smtp_auth_password:'Zabbix9icaishi2015' # Authorization password
smtp_require_tls: false # Do not open tls and open by default
receivers:
-
name: 'webhook'
webhook_configs:- url: ' http://webhook-dingtalk/dingtalk/send/ ' #DingTalk alarm connection, this will be
deployed separately for a while, because the default alarm content sent by alertmanager cannot be recognized by
DingTalk , you need to convert send_resolved: true
route:
group_interval: 1m # Waiting time before sending a new alarm
group_wait: 10s # Initially, how long is the first time to send a group of alarm notification
receiver: webhook
repeat_interval: 1m # The period of sending repeated alarms
type: Opaque and
finally save and exit .
Now a pod to deploy nails alarm
here to thank http://www.mamicode.com/info-detail-2845201.html author, I was in this group
on the basis custom alarm script to the next, in line with our alarm content , I changed it as shown below: the
original alarm diagram: the
changed
changes are in line with our script content as follows: it is the app.py script#!/usr/bin/env python import time,io, sys,arrow,os
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8') - url: ' http://webhook-dingtalk/dingtalk/send/ ' #DingTalk alarm connection, this will be
from flask import Flask, Response
from flask import request
import requests
import logging
import json
import locale
#locale.setlocale(locale.LC_ALL,"en_US.UTF-8")app = Flask(name)
console = logging.StreamHandler()
fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s'
formatter = logging.Formatter(fmt)
console.setFormatter(formatter)
log = logging.getLogger("flask_webhook_dingtalk")
log.addHandler(console)
log.setLevel(logging.DEBUG)EXCLUDE_LIST = ['prometheus', 'endpoint']
- alert: node-status
@app.route('/')
def index():
return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98'
@app.route('/dingtalk/send/',methods=['POST'])
def hander_session():
profile_url = sys.argv[1]
post_data = request.get_data()
post_data = json.loads(post_data.decode("utf-8"))['alerts']
post_data = post_data[0]
messa_list = []
if post_data['status'].upper() == "FIRING":
messa_list.append('### 报警名称: Prometheus-alert')
messa_list.append('**报警状态: 异常**')
messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
messa = (' \\n\\n > '.join(messa_list))
else:
messa_list.append('### 报警名称: Prometheus-alert')
messa_list.append('**报警状态: 恢复**')
messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**恢复时间: %s**' % arrow.get(post_data['endsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
messa = (' \\n\\n > '.join(messa_list))
status = alert_data(messa, post_data['labels']['alertname'], profile_url )
log.info(status)
return status
def alert_data(data,title,profile_url):
headers = {'Content-Type':'application/json'}
send_data = '{"msgtype": "markdown","markdown": {"title": \"%s\" ,"text": \"%s\" }}' %(title,data) # type: str
send_data = send_data.encode('utf-8')
reps = requests.post(url=profile_url, data=send_data, headers=headers)
return reps.text
if name == ' main ':
app.debug = False
app.run(host='0.0.0.0', port='8080')
Finally, just re-make a mirror. According to the
content of Dockerfile as follows:
FROM centos:7 as build
MAINTAINER billy98 [email protected]
RUN mkdir /root/.pip
ADD pip.conf /root/.pip/pip.conf
RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug arrow requests
ADD app.py /usr/local/alert-dingtalk.py
FROM gcr.io/distroless/python3
COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py
COPY --from=build usr/local/lib64/python3.6/ site-packages usr/local/lib64/python3.6/site-packages
COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages
ENV PYTHONPATH =usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages EXPOSE
8080
ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"]
Finally Just change your dingding.yaml or other file name to k8s, just deploy it.
For more k8s-related or automated operation and maintenance, please go to www.wangshuying.cn website to see
more knowledge points about operation and maintenance.