Prometheus study notes 5-Dingding to sign the alarm

Dingding alarm example The
first thing you need is the only webhook of a custom robot, which can be applied through Dingding on the computer. After the application is completed, record the webhook of the robot. As shown in the figure:
Insert picture description here
click to add a robot, select "custom robot"
Insert picture description here
Insert picture description here
custom function name:
Insert picture description here
complete the necessary security settings (choose at least one), reset I have read and agree to the "custom robot service and exemption clause", click " carry out". There are currently 3 ways to set security, see the introduction for setting instructions.
Insert picture description here
After completing the security settings, copy the robot's Webhook address, which can be used to send messages to this group, in the following format:

https://oapi.dingtalk.com/robot/send?access_token=XXXXXX

Note: Please keep this Webhook address and do not publish it on external websites to avoid future security risks.

Security Settings
There are currently 3 ways of security settings:

Method 1: Custom keywords
You can set up to 10 keywords, and the message can only be sent successfully if you include at least one of them.

For example: added a custom keyword: monitoring alarm

Then the message sent by this robot must contain the word “monitoring alarm” in order to be sent successfully.

Method two, sign
first, use the timestamp + "\ n" + key as the signature key, use the HmacSHA256 algorithm to calculate the signature, then perform Base64 encoding, and finally perform urlEncode on the signature parameters to get the final signature (you need to use UTF -8 character set).
Insert picture description here
Signature calculation code example (Java)

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import org.apache.commons.codec.binary.Base64;
import java.net.URLEncoder;

public class Test {
    
    
    public static void main(String[] args) throws Exception {
    
    
        Long timestamp = System.currentTimeMillis();
        String secret = "this is secret替换为密钥";

        String stringToSign = timestamp + "\n" + secret;
        Mac mac = Mac.getInstance("HmacSHA256");
        mac.init(new SecretKeySpec(secret.getBytes("UTF-8"), "HmacSHA256"));
        byte[] signData = mac.doFinal(stringToSign.getBytes("UTF-8"));
        String sign = URLEncoder.encode(new String(Base64.encodeBase64(signData)),"UTF-8");
        System.out.println(sign);
    }

}

Signature calculation code example (Python)

#python 3.8 
import time
import hmac
import hashlib
import base64
import urllib.parse

timestamp = str(round(time.time() * 1000))
secret = 'this is secret'
secret_enc = secret.encode('utf-8')
string_to_sign = '{}\n{}'.format(timestamp, secret)
string_to_sign_enc = string_to_sign.encode('utf-8')
hmac_code = hmac.new(secret_enc, string_to_sign_enc, digestmod=hashlib.sha256).digest()
sign = urllib.parse.quote_plus(base64.b64encode(hmac_code))
print(timestamp)
print(sign)
#python 2.7
import time
import hmac
import hashlib
import base64
import urllib

timestamp = long(round(time.time() * 1000))
secret = 'this is secret'
secret_enc = bytes(secret).encode('utf-8')
string_to_sign = '{}\n{}'.format(timestamp, secret)
string_to_sign_enc = bytes(string_to_sign).encode('utf-8')
hmac_code = hmac.new(secret_enc, string_to_sign_enc, digestmod=hashlib.sha256).digest()
sign = urllib.quote_plus(base64.b64encode(hmac_code))
print(timestamp)
print(sign)

The second step is to split the timestamp and the signature value obtained in the first step into the URL.
Insert picture description here

https://oapi.dingtalk.com/robot/send?access_token=XXXXXX×&tamp=XXX&sign=XXX

Method 3: After the IP address (segment) is
set, only the request from the IP address range will be processed normally. Two setting methods are supported: IP, IP segment. IPv6 address whitelist is not currently supported. The format is as follows:
Insert picture description here
Note: For the above three methods of security settings, at least one of them must be set for security protection. Messages that fail to be corrected will fail to be sent with the following errors:

1 //消息内容中不包含任何关键词
 2 {
    
    
 3   “ERRCODE”:310000,
 4   “ERRMSG”: “关键字不是在内容”
 5 }
 6 
7 //时间戳无效
 8 {
    
    
 9   “ERRCODE”:310000,
 10   “ ERRMSG “:”无效时间戳“
 11 }
 12 
13 //签名不匹配
 14 {
    
    
 15   ”ERRCODE“:310000,
 16   ”ERRMSG“: ”符号不匹配“
 17 }
 18 
19 // IP地址不在白名单
 20 {
    
    
 21   ” ERRCODE“:310000,
 22  “ errmsg”:“ ip XXXX不在白名单中”
 23 }

Python signature command line test

#钉钉收到消息即发送成功!
curl 'https://oapi.dingtalk.com/robot/send?access_token=XXXXXX&timestamp=XXXXXX&sign=XXXXXX' -H 'Content-Type: application/json' -d '{"msgtype": "text","text": {"content": " 我就是我, 不一样的烟火"}}'

Run with docker

$ docker run -p 5000:5000 --name -e ROBOT_TOKEN=<钉钉机器人TOKEN> -e ROBOT_SECRET=<钉钉机器人安全SECRET> -e LOG_LEVEL=debug -e PROME_URL=prometheus.local dingtalk-hook -d cnych/alertmanager-dingtalk-hook:v0.3.5

Environment variable configuration:

ROBOT_TOKEN: Dingding robot TOKEN PROME_URL: manually specify the Promethues address after the jump, the default will be the address of the Pod
LOG_LEVEL: log level, set to debug, you can see
the data sent by the AlertManager WebHook , which is convenient for debugging. The environment variable
ROBOT_SECRET: Set the key for the security of Dingding robot, the security setting page of the robot, the character string starting with SEC displayed under the column of signature

Running in a Kubernetes cluster The
first step is to create the DingTalk robot TOKEN as a Secret resource object:

$ kubectl create secret generic dingtalk-secret --from-literal=token=<钉钉群聊的机器人TOKEN> --from-literal=secret=<钉钉群聊机器人的SECRET> -n kube-ops
secret "dingtalk-secret" created

Then define the Deployment and Service resource objects: (dingtalk-hook.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk-hook
  namespace: kube-ops
spec:
  selector:
    matchLabels:
      app: dingtalk-hook
  template:
    metadata:
      labels:
        app: dingtalk-hook
    spec:
      containers:
      - name: dingtalk-hook
        image: cnych/alertmanager-dingtalk-hook:v0.3.6
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5000
          name: http
        env:
        - name: PROME_URL
          value: prometheus.local
        - name: LOG_LEVEL
          value: debug
        - name: ROBOT_TOKEN
          valueFrom:
            secretKeyRef:
              name: dingtalk-secret
              key: token
        - name: ROBOT_SECRET
          valueFrom:
            secretKeyRef:
              name: dingtalk-secret
              key: secret
        resources:
          requests:
            cpu: 50m
            memory: 100Mi
          limits:
            cpu: 50m
            memory: 100Mi

---
apiVersion: v1
kind: Service
metadata:
  name: dingtalk-hook
  namespace: kube-ops
spec:
  selector:
    app: dingtalk-hook
  ports:
  - name: hook
    port: 5000
    targetPort: http

Simply create the above resource object:

$ kubectl create -f dingtalk-hook.yaml
deployment.apps "dingtalk-hook" created
service "dingtalk-hook" created
$ kubectl get pods -n kube-ops
NAME                            READY     STATUS      RESTARTS   AGE
dingtalk-hook-c4fcd8cd6-6r2b6   1/1       Running     0          45m
......

Finally, the webhook address in AlertManager can be accessed directly through DNS:

receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://dingtalk-hook.kube-ops.svc.cluster.local:5000'
    send_resolved: true

Detailed description of Alertmanager alarm template
By default, Alertmanager uses the default notification template that comes with the system. The template source code can be obtained from https://github.com/prometheus/alertmanager/blob/master/template/default.tmpl.
Alertmanager's notification templates are based on Go's template system. Alertmanager also supports users to define and use their own templates. Generally speaking, there are two ways to choose.
**The first type is based on template strings. **Users can directly use template strings in the configuration file of Alertmanager, for example:

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: 'https://internal.myorg.net/wiki/alerts/{
    
    { .GroupLabels.app }}/{
    
    { .GroupLabels.alertname }}'

**The second way is to customize a reusable template file. **For example, you can create a custom template file custom-template.tmpl as follows:

{
    
    {
    
     define "slack.myorg.text" }}https://internal.myorg.net/wiki/alerts/{
    
    {
    
     .GroupLabels.app }}/{
    
    {
    
     .GroupLabels.alertname }}{
    
    {
    
     end}}

Specify the access path of the custom template by defining the templates configuration in the global settings of Alertmanager:

# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
  [ - <filepath> ... ]

After setting the access path of the custom template, the user can directly use the template in the configuration:

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: '{
    
    { template "slack.myorg.text" . }}'

templates:
- '/etc/alertmanager/templates/myorg.tmpl'

Alertmanager suppression and optimization
Alertmanager provides ways to help users control the behavior of alert notifications, including pre-defined suppression mechanisms and temporarily defined silence rules.
Suppression mechanism
Alertmanager's suppression mechanism can prevent users from receiving a large number of other alarm notifications caused by the problem after a certain problem alarm is generated. For example, when the cluster is unavailable, the user may only want to receive an alarm telling him that there is a problem in the cluster at this time, rather than a large number of alarm notifications such as abnormal applications in the cluster and abnormal middleware services.
In the Alertmanager configuration file, use inhibit_rules to define a set of alarm suppression rules :

inhibit_rules:
  [ - <inhibit_rule> ... ]

The specific configuration of each suppression rule is as follows:

target_match:
  [ <labelname>: <labelvalue>, ... ]
target_match_re:
  [ <labelname>: <regex>, ... ]

source_match:
  [ <labelname>: <labelvalue>, ... ]
source_match_re:
  [ <labelname>: <regex>, ... ]

[ equal: '[' <labelname>, ... ']' ]

When the sent alarm notification matches the target_match and target_match_re rules, when a new alarm rule meets the source_match or the defined matching rule, and the sent alarm is exactly the same as the tag defined by equal in the newly generated alarm, the suppression mechanism is activated , New alarms will not be sent. PS: Simple and clear means to send less finely.
For example, define the following suppression rules:

- source_match:
    alertname: NodeDown
    severity: critical
  target_match:
    severity: critical
  equal:
    - node

For example: when a certain host node in the cluster is abnormally down, the alarm NodeDown is triggered, and the alarm level severity=critical is defined in the alarm rule. Due to the abnormal downtime of the host, all services and middleware deployed on the host will be unavailable and trigger an alarm. According to the definition of suppression rules, if there is a new alarm level of severity=critical and the value of the label node in the alarm is the same as that of the NodeDown alarm, it means that the new alarm is caused by NodeDown, and the suppression mechanism is started to stop sending to the receiver Send notification.

Temporary Silence
In addition to controlling the behavior of alert notifications based on the suppression mechanism, users or administrators can also temporarily block specific alert notifications directly through the UI of Alertmanager. By defining the matching rule (string or regular expression) of the label, if the new alarm notification meets the setting of the silent rule, stop sending the notification to the receiver.
Enter the Alertmanager UI and click "New Silence" to display the following content: the
Insert picture description here
user can define the start time and duration of a new silent rule through the UI, and multiple matching rules (string matching or regular matching) can be set through the Matchers section. After filling in the creator of the current silent rule and the reason for creation, click the "Create" button.

Through "Preview Alerts", you can view and preview the alert information matched by the current matching rule. After the silent rule is successfully created, Alertmanager will start to load the rule and set the status to Pending. When the rule takes effect, it will enter the Active state.
Insert picture description here
When the silent rule takes effect, users will not see the alarm information matched by the rule from the Alerts page of Alertmanager.
Insert picture description here
For the rules that are already in effect, the user can manually click the "Expire" button to expire the current rules.

Use Recoding Rules to optimize performance.
PromQL can query, aggregate, and perform various other operations on the sample data collected in Prometheus in real time. When some PromQL is more complicated and computationally expensive, using PromQL directly may cause the Prometheus response timeout. At this time, a mechanism similar to background batch processing is needed to complete these complex calculations in the background, and users only need to query the results of these calculations. Prometheus supports this background calculation method through the Recoding Rule, which can optimize the performance of complex queries and improve query efficiency.
Define Recoding rules
In the Prometheus configuration file, define the access path of the recoding rule file through rule_files.

rule_files:
  [ - <filepath_glob> ... ]

Each rule file is defined in the following format:

groups:
  [ - <rule_group> ]

A simple rule file might look like this:

groups:
  - name: example
    rules:
    - record: job:http_inprogress_requests:sum
      expr: sum(http_inprogress_requests) by (job)

The specific configuration items of rule_group are as follows:

# The name of the group. Must be unique within a file.
name: <string>

# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]

rules:
  [ - <rule> ... ]

Consistent with the alarm rules, a group can contain multiple rules.

# The name of the time series to output to. Must be a valid metric name.
record: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]

According to the definition in the rules, Prometheus will complete the calculation of the PromQL expression defined in expr in the background, and save the calculation results to a new time series record. At the same time, additional labels can be added to these samples through labels.

The calculation frequency of these rule files is consistent with the calculation frequency of the alarm rules, which are defined by global.evaluation_interval:

global:
  [ evaluation_interval: <duration> | default = 1m ]

Reference:
https://github.com/cnych/alertmanager-dingtalk-hook/tree/116a3ea281a0ee35cd1065def202b52f22116a5e
https://open-doc.dingtalk.com/microapp/serverapi2/qf2nxq
https://www.qikqiak.com/k8s -book/docs/57.AlertManager%E7%9A%84%E4%BD%BF%E7%94%A8.htmlUse
Golang to create a webhook service
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus -ji-chu/alert/alert-manager-use-receiver/alert-manager-extension-with-webhook

Guess you like

Origin blog.csdn.net/ZhanBiaoChina/article/details/108523533