Prometheus Alarm Model

Prometheus as nowadays the most popular open source monitoring system, its vast ecosystem include: high affinity for Exporter variety of traditional applications, complete secondary development tool chain, and Kubernetes and other mainstream platforms and the resulting strong self-discovery capability, so that we can get a lot of monitoring indicators through simple configuration and dimension contains and rich. On the one hand, such a variety of indicators greatly improves the observability of the cluster, such as Dashboard with Grafana can make us understand the real-time status of each cluster dimensions; on the other hand, in real time alarms based on monitoring data is obtained observability after the inevitable needs to be achieved. Of course, Prometheus community has a good solution to this problem, it will also alert Prometheus model described in detail.

1 Overview

If you understand the Prometheus project, you can find, Prometheus is a very important principle is to keep it simple and try to make the design to meet the needs of most scenes with simple design. While allowing the project to maintain good scalability for extreme scenarios, you can splice a number of peripheral components Prometheus ecology to enhance existing capabilities to meet the requirements. It is similar to the alert, based on the overall architecture of the alarm system shown in FIG Prometheus:

 

 

Alarm system as a whole is decoupled into two parts:

  1. Prometheus Server reads series of alarm rules, and those rules are regularly assessed based on monitoring data collected once the trigger condition is met an alarm will be generated corresponding to the example of transmission AlertManager
  2. AlertManager Prometheus Server is a separate running HTTP Server, which is responsible for receiving an alarm from the Client side of the Examples These examples and polymerized (aggregation), silence (silence), inhibition (INHIBIT) and advanced operations support Email, Slack other notification of alarm notification platform. For AlertManager, it does not care whether the alert is issued by the Prometheus Server instance, so long as we construct warning example to meet the requirements and sent to Alertmanager, it can be treated without discrimination.

2. Alert Rules

Generally speaking, Prometheus alarm rules will be saved as a file on disk, we need to specify the location of these rule files for reading when Prometheus Server startup configuration file:

rule_files:
  - /etc/prometheus/rules/*.yaml

A general rule file contents are as follows:

groups:
- name: example
  rules:
  - alert: HighRequestLoad expr: rate(http_request_total{pod="p1"}[5m]) > 1000 for: 1m labels: severity: warning annotations: info: High Request Load

Can be specified in a plurality of group rules file, each group can specify a plurality of alarm rules. In general, there will be a link between the group on a certain logic in the warning rule, but even if they are unrelated, the subsequent process will not have any impact. And an alarm rule fields and their meanings are included:

  1. alert: Alarm Name
  2. expr: Alarm trigger conditions, essentially a promQL query expression, Prometheus Server periodically (typically 15s) the query expression, if it can get the corresponding time series, the alarm is triggered
  3. for: Time-triggered alarms continue, because the data there may be glitches, Prometheus and not because exprsent to AlertManager alarm is generated when the first instance be met. For example, the above example is intended to Pod named "p1", the number of HTTP requests per second received more than 1,000 alarm is triggered and the duration of one minute, if alarm rule once every 15s assessment, then only in four consecutive assessment Pod case where the load of more than 1000QPS, will actually generate an alarm instance.
  4. labels: For example the alarm tag, Prometheus and will herein assessed label attached to the exprlabel time series obtained was incorporated as an example of the alarm tag. Examples of the alarm tag constitutes a unique identifier of the instance. In fact, the last name will be included in the warning label alerting instance, and the key is "alertname".
  5. annotationsRelatively minor additional information to the additional information for the alarm instance, annotations will herein Prometheus alarm annotations as an example, general alarm annotations for specifying details like:

Note that, a rule not only generate a warning alarm class instance, for example, the above example, may have the following pieces of time-series alarm trigger condition is met, i.e., n1 and n2 both the namespace pod named p1 the QPS have continued to exceed 1000:

http_request_total{namespace="n1", pod="p1"}
http_request_total{namespace="n2", pod="p1"}

Examples of two types of alarms generated for the final:

# 此处只显示实例的label
{alertname="HighRequestLoad", severity="warning", namespace="n1", pod="p1"}
{alertname="HighRequestLoad", severity="warning", namespace="n2", pod="p1"}

Thus, for example, in K8S scenario, since the Pod is volatile, we can use the powerful promQL statement Deployment define an alarm level, as long as any Pod trigger condition is fulfilled, the alarm will generate corresponding instance.

3. In operation Alert Rules Kubernetes

At first glance, this Prometheus will all fall into alarm rules written to the file looks like a very simple way, in fact, it does simplify the design and implementation difficulty Prometheus itself. However, in a real production environment, especially when the Prometheus Server deployed in a cluster Kubernetes Pod in the form of additions and deletions to change the poor operation of the alarm rule will become very complicated. In particular, in Kubernetes environment, it is clear that we will only alert a number of rules contained in ConfigMap file and specify a directory to mount the Pod is located in the Prometheus, if you want to perform CRUD operations, the most intuitive way is to load the whole ConfigMap and re-written after modification.

Fortunately, this community had prepared a complete set of solutions. We know that in Kubernetes system, stateful management of complex applications the most common way is written for a specific Operator. Prometheus Operator as one of the first to achieve the Operator community, Prometheus greatly simplifies the configuration of the deployment process. Prometheus Prometheus Operator concepts are abstract related to CRD. This article is mainly related Prometheusand PrometheusRuleboth CRD.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus spec: ruleSelector: matchLabels: role: alert-rules --- apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: role: alert-rules name: spec: groups: - name: example rules: - alert: HighRequestLoad expr: rate(http_request_total{pod="p1"}[5m]) > 1000 for: 1m labels: severity: none annotations: info: High Request Load

Shown above is almost the most simple Prometheusand PrometheusRuleresource objects. When the file is submitted to the above-described yaml Kubernetes APIServer, Prometheus Operator immediately synchronized according to Prometheusthe configuration generated for operating a StatefulSet Prometheus Server instance, while the Prometheusconfiguration of the configuration file written Server. For PrometheusRule, we can find its content files with the above alarm rule is basically the same. Prometheus Operator will be based on PrometheusRulethe content and generates a corresponding ConfigMap Pod mounts to the Prometheus Server where the corresponding directory in the form of Volume. Ultimately a PrometheusRuleresource object corresponds to an alarm rule files in the directory to mount.

So how is the Operator Prometheusand PrometheusRuleassociate with it? Similar to the Service Selector field specified by the associated Pod. PrometheusAlso specifies a set of label by ruleSelector field, Operator will contain any of these are label PrometheusRuleare integrated into a ConfigMap (if single ConfigMap beyond limits, generating a plurality) mount and Prometheusthe respective corresponding StatefulSet Pod examples. Therefore, with the help of Prometheus Operator, for the Prometheus alarm rules difficulty CRUD CRUD operations has been degraded to objects of Kubernetes resources, the whole process is the most tedious part of the Operator has been completely automated. In fact, the Advanced Configuration and even AlertManager deployment of Prometheus Server can be easily provided by the CRD Prometheus Operator, because this article is not associated with, so not repeat them.

Finally, although the Operator to ensure the respect PrometheusRuleof CRUD can promptly to the appropriate ConfigMap, whereas Kubernetes itself is to ensure the modifications ConfigMap also eventually be able to synchronize to the corresponding Pod mount a file, but Prometheus Server does not monitor alarm change the rules file. Therefore, we need to form Sidecar will ConfigMap Reloader deployed in the Pod Prometheus Server resides. It listens for alerts from the rule change ConfigMap where, once the monitor to change, it calls Reload interfaces Prometheus Server provides the trigger for Prometheus reload configuration.

4. Alarm configuration example

AlertManager the HTTP Server is essentially a process for receiving and alarm instance from Client. Client generally for Prometheus Server, but as long as any program can be constructed in line with the standard alarm instance, can be accessed through the POST method to submit them to AlertManger will be processed. Thus, in a production environment, the alarm can not be used for sequential data Prometheus generated, for example, in the Event Kubernetes, we can also suitable configuration, sending it to AlertManager uniform treatment. Examples of the structure of the alarm is as follows:

[
  {
    "labels": {
      "alertname": "<requiredAlertName>", "<labelname>": "<labelvalue>", ... }, "annotations": { "<labelname>": "<labelvalue>", }, "startsAt": "<rfc3339>", "endsAt": "<rfc3339>", "generatorURL": "<generator_url>" }, ... ]

field labels and annotations have been mentioned in the foregoing: labels used to uniquely identify an alarm, AlertManger labels have identical compressing Examples alarm polymerization operation. annotations are some additional information like alarms and other details. Here we focus on startsAtand endsAtthese two fields, two fields indicate the start time and end time of the alarm, but the two fields are optional. When an alarm is received AlertManager example, the following categories will be treated where the two fields:

  1. Both are present: no treatment
  2. Both are specified: startsAt designated as the current time, endsAt the current time plus the alarm duration, default is 5 minutes
  3. Specify only startsAt: endsAt specify the default alarm duration plus the current time
  4. Specify only endsAt: The startsAt to endsAt

AlertManager generally endsAt field current time and alarm status of the comparison examples were used to determine the alarm:

  1. If the current time is located before endsAt, it means the alarm is still in the trigger state (firing)
  2. After If the current time is located endsAt, it means that the alarm has been eliminated (resolved)

Further, when the alarm Prometheus Server configuration rule is continuously satisfied, every minute default send an alarm instance. Clearly, these examples except startsAt and endsAt fields are exactly the same (in fact Prometheus Server instances startsAt will all set the alarm time first triggered). Ultimately, these examples are weights to be compressed to a manner as shown below:

alerts

Alarm final three same labels are finally compressed as a polymerization alarm. When we query will only get a starting time t1, the end time for the alarm instance of t4.

5. AlertManager Architecture Overview

alertmanager

AlertManager本质上来说是一个强大的告警分发过滤器。所有告警统一存放在Alert Provider中,Dispatcher则会对其中的告警进行订阅。每当AlertManager接受到新的告警实例就会先在Alert Provider进行存储,之后立刻转发到Dispatcher中。Dispatcher则定义了一系列的路由规则将告警发送到预定的接收者。而告警在真正发送到接收者之前,还需要经过一系列的处理,即图中的Notification Pipeline,例如对相关告警在时间和空间维度进行聚合,对用户指定的告警进行静默,检测当前告警是否被已经发送的告警抑制,甚至在高可用模式下检测该告警是否已由集群中的其他节点发送。而这一切的操作的最终目的,都为了让能让接收者准确接受到其最关心的告警信息,同时避免告警的冗余重复。

6. Alert Provider

所有进入AlertManager的告警实例都会首先存储在Alert Provider中。Alert Provider本质上是一个内存中的哈希表,用于存放所有的告警实例。因为labels唯一标识了一个告警,因此哈希表的key就是告警实例的label取哈希值,value则为告警实例的具体内容。若新接受到的告警实例在哈希表中已经存在且两者的[startsAt, endsAt]有重合,则会先将两者进行合并再刷新哈希表。同时,Alert Provider提供了订阅接口,每当接收到新的告警实例,它都会在刷新哈希表之后依次发送给各个订阅者。

值得注意的是,Alert Provider是存在GC机制的。默认每隔30分钟就会对已经消除的告警(即endsAt早于当前时间)进行清除。显然,AlertManager从实现上来看并不支持告警的持久化存储。已经消除的告警会定时清除,由于存储在内存中,若程序重启则将丢失所有告警数据。但是如果研读过AlertManager的代码,对于Alert Provider的实现是做过良好的封装的。我们完全可以实现一套底层存储基于MySQL,ElasticSearch或者Kafka的Alert Provider,从而实现告警信息的持久化(虽然AlertManager并不提供显式的插件机制,只能通过hack代码实现)。

7. 告警的路由与分组

将所有告警统一发送给所有人显然是不合适的。因此AlertManager允许我们按照如下规则定义一系列的接收者并制定路由策略将告警实例分发到对应的目标接收者:

global:
  // 所有告警统一从此处进入路由表
  route:
    // 根路由
    receiver: ops-mails group_by: ['cluster', 'alertname'] group_wait: 30s group_interval: 5m repeat_interval: 5h routes: // 子路由1 - match_re: service: ^(foo1|foo2|baz)$ receiver: team-X-webhook // 子路由2 - match: service: database receiver: team-DB-pager // 接收者 receivers: - name: 'ops-mails' email_configs: - to: '[email protected], [email protected]' - name: 'team-X-webhook' webhook_configs: - url: 'http://127.0.0.1:8080/webhooks' - name: 'team-DB-pager' pagerduty_configs: - routing_key: <team-DB-key> 

上述AlertManager的配置文件中定义了一张路由表以及三个接收者。AlertManager已经内置了Email,Slack,微信等多种通知方式,如果用户想要将告警发送给内置类型以外的其他信息平台,可以将这些告警通过webhook接口统一发送到webhook server,再由其转发实现。AlertManager的路由表整体上是一个树状结构,所有告警实例进入路由表之后会进行深度优先遍历,直到最终无法匹配并发送至父节点的Receiver。

需要注意的是路由表的根节点默认匹配所有告警实例,示例中根节点的receiver是ops-mails,表示告警默认都发送给运维团队。路由表的匹配规则是根据labels的匹配实现的。例如,对于子路由1,若告警包含key为service的label,且label的value为foo1, foo2或者baz,则匹配成功,告警将发送至team X。若告警包含service=database的label,则将其发送至数据库团队。

有的时候,作为告警的接收者,我们希望相关的告警能统一通过一封邮件进行发送,一方面能减少同类告警的重复,另一方面也有利于我们对告警进行归档。AlertManager通过Group机制对这一点做了很好的支持。每个路由节点都能配置以下四个字段对属于本节点的告警进行分组(若当前节点未显式声明,则继承父节点的配置):

  1. group_by:指定一系列的label键值作为分组的依据,示例中利用cluster和alertname作为分组依据,则同一集群中,所有名称相同的告警都将统一通知。若不想对任何告警进行分组,则可以将该字段指定为'...'
  2. group_wait:当相应的Group从创建到第一次发送通知的等待时间,默认为30s,该字段的目的为进行适当的等待从而在一次通知中发送尽量多的告警。在每次通知之后会将已经消除的告警从Group中移除。
  3. group_interval:Group在第一次通知之后会周期性地尝试发送Group中告警信息,因为Group中可能有新的告警实例加入,本字段为该周期的时间间隔
  4. repeat_interval:在Group没有发生更新的情况下重新发送通知的时间间隔

综上,AlertManager的Dispatcher会将新订阅得到的告警实例根据label进行路由并加入或者创建一个新的Group。而新建的Group经过指定时间间隔会将组中的告警实例统一发送并周期性地检测组内是否有新的告警加入(或者有告警消除,但需要显式配置),若是则再次发送通知。另外每隔repeat_interval,即使Group未发生变更也将再次发送通知。

8. Alert Notification Pipeline

通常来说,一个Group中会包含多条告警实例,但是并不是其中的所有告警都是用户想要看到的。而且已知Group都会周期性地尝试发送其包含的告警,如果没有新的告警实例加入,在一定时间内,显然没有再重复发送告警通知的必要,另外如果对AlertManager进行高可用部署的话,多个AlertManager之间也需要做好协同,避免重复告警。如上文中AlertManager的整体架构图所示,当Group尝试发送告警通知时,总是先要经过一条Notification Pipeline的过滤,最终满足条件的告警实例才能通过邮件的方式发出。一般过滤分为抑制(Inhibit),静默(silence)以及去重(dedup)三个步骤。下面我们将逐个进行分析。

8.1 告警抑制

所谓的告警抑制其实是指,当某些告警已经触发时,则不再发送其他受它抑制的告警。一个典型的使用场景为:如果产生了一条集群不可用的告警,那么任何与该集群相关的告警都应当不再通知给用户,因为这些告警都是由集群不可用引起的,发送它们只会增加用户找到问题根因的难度。告警的抑制规则会配置在AlertManager全局的配置文件中,如下所示:

inhibit_rules:
- source_match:
    alertname: ClusterUnavailable
    severity: critical target_match: severity: critical equal: - cluster

该配置的含义为,若出现了包含label为{alertname="ClusterUnavailable", severity="critical"}的告警实例A,AlertManager就会对其进行记录。当后续出现告警实例包含label为{severity="critical"}且"cluster"这个label对应的value和A的“cluster”对应的value相同,则该告警实例被抑制,不再发送。

当Group每次尝试发送告警实例时,AlertManager都会先用抑制规则筛选掉满足条件的实例,剩余的实例才能进入Notification Pipeline的下一个步骤,即告警静默。

8.2 告警静默

告警静默指的是用户可以选择在一段时间内不接收某些告警。与Inhibit rule不同的是,静默规则可以由用户动态配置,AlertManager甚至提供了如下所示的图形UI:

silence

与告警自身的定义方式类似,静默规则也是在一个时间段内起作用,需用明确指定开始时间与结束时间。而静默规则同样通过指定一组label来匹配作用的告警实例。例如在上图的例子中,任何包含label为{alertname="clusterUnavailable", severity="critical"}的告警实例都将不再出现在通知中。

显然,静默规则又将滤去一部分告警实例,如果此时Group中仍有剩余的实例,则将进入Notification的下一步骤,告警去重。

8.3 告警去重

每当一个Group第一次成功发送告警通知之后,AlertManager就会为其创建一个Notification Log(简称nflog),其结构如下:

e := &pb.MeshEntry{
	Entry: &pb.Entry{
		Receiver;	r,
		GroupKey:	[]byte(gkey), Timestamp: now, FiringAlerts: firingAlerts, ResolvedAlerts: resolvedAlerts, }, ExpiredAt: now.Add(l.retention) }

可以看到,每个Notification Log中包含:

  1. 该Group的Key(即该Group用于筛选Alerts的labels的哈希值)
  2. 该Group对应的Receiver
  3. 该Notification Log创建的时间
  4. 该Group中正在触发的各个告警实例的哈希值
  5. 该Group中各个已经消除的告警实例的哈希值
  6. 该Notification Log过期的时间,默认为120小时

当Group再次周期性地尝试推送通知并经过抑制和静默的两层筛选之后,若仍然有告警实例存在,则会进入告警去重阶段。首先找到该Group对应的Notification Log并只在以下任一条件满足的时候发送通知:

  1. Group剩余的告警实例中,处于触发状态的告警实例不是Notification Log中的FiringAlerts的子集,即有新的告警实例被触发
  2. Notification Log中FiringAlerts的数目不为零,但是当前Group中处于触发状态的告警实例数为0,即Group中的告警全部被消除了
  3. Group中已消除的告警不是Notification Log中ResolvedAlerts的子集,说明有新的告警被消除,且通知配置中设置对于告警消除进行通知。例如,Email默认不在个别告警实例消除时通知而Webhook则默认会进行通知。

综上,通过Notification Pipeline通过对告警的抑制,静默以及去重确保了用户能够专注于真正重要的告警而不会被过多无关的或者重复的告警信息所困扰。

9. 高可用

当真正部署到生产环境中,如果只部署单个实例的AlertManager显然是无法满足可用性的。因此AlertManager原生支持多实例的部署方式并用Gossip协议来同步实例间的状态。因为AlertManager并非是无状态的,它有如下两个关键信息需要同步:

  1. 告警静默规则:当存在多个AlertManager实例时,用户依然只会向其中一个实例发起请求,对静默规则进行增删。但是对于静默规则的应用显然应当是全局的,因此各个实例应当广播各自的静默规则,直到全局一致。
  2. Notification Log:既然要保证高可用,即确保告警实例不丢失,而AlertManager实例又是将告警保存在各自的内存中的,因此Prometheus显然不应该在多个AlertManager实例之间做负载均衡而是应该将告警发往所有的AlertManager实例。但是对于同一个Alert Group的通知则只能由一个AlertManager发送,因此我们也应该把Notification Log在全集群范围内进行同步。

当以集群模式运行AlertManager时,AlertManager的命令行参数配置如下:

--cluster.listen-address=0.0.0.0:9094
--cluster.peer=192.168.1.1:9094
--cluster.peer=192.168.1.2:9094

当AlertManager启动时,它会首先从cluster.peer参数指定的地址和端口进行Push/Pull:即首先将本节点的状态信息(全部的Silence以及Notification Log)发送到对端,再从对端拉取状态信息并与本节点的状态信息合并:例如,对于从对端拉取到的静默规则,如果有本节点不存在的规则则直接添加,若是规则在本节点已存在但是更新时间更晚,则用对端规则覆盖已有的规则。对于Notification Log的做法类似。最终,集群中的所有AlertManager都会有同样的静默规则以及Notification Log。

如果此时用户在某个AlertManager请求增加新的静默规则呢?根据Gossip协议,该实例应该从集群中选取几个实例,将新增的静默规则发送给它们。而当这些实例收到广播信息时,一方面它会合并这一新的静默规则同时再对其进行广播。最后,整个集群都会接收到这一新添加的静默规则,实现了最终一致性。

不过,Notification Log的同步并没有静默规则这么容易。我们可以假设如下场景:由于高可用的要求,Prometheus会向每个AlertManager发送告警实例。如果该告警实例不属于任何之前已有的Alert Group,则会新建一个Group并最终创建一个相应的Notification Log。而Notification Log是在通知完成之后创建的,所以在这种情况下,针对同一个告警发送了多次通知。

为了避免这种情况的发生,社区给出的解决方案是错开各个AlertManager发送通知的时间。如上文的整体架构图所示,Notification Pipeline在进行去重之前其实还有一个Wait阶段。该阶段会将对于告警的通知处理暂停一段时间,不同的AlertManager实例等待的时间会因为该实例在整个集群中的位置有所不同。根据实例名进行排序,排名每靠后一位,默认多等待15秒。

Suppose there are two cluster AlertManager example, ranking Examples of A0, is ranked by the following Examples A1, this time to address the above issues are as follows:

  1. Suppose two AlertManager while simultaneously alerted Examples reaches the Wait Notification Pipeline stage. At this stage A0 and A1 without having to wait to wait 15 seconds.
  2. A0 directly send notifications, and to generate the corresponding broadcast Notification Log
  3. A1 wait 15 seconds after entering the de-emphasis stage, but since has been synchronized to the broadcast Notification A0 Log, transmits notification is no longer

You can see, Gossip protocol is in fact a consistent protocol is weak, the above-mentioned mechanism to ensure high availability cluster AlertManager in most cases and not in time to avoid the problems caused by the user between instances synchronized. But still needs to be validated in demanding production environments, fortunately, strong consistency is not so sensitive alarm data.

10. summary

In this paper, Prometheus warning system based on for a more detailed analysis: Included in the Prometheus Server configuration from the alarm rule, Prometheus Server evaluation of alarm rules and trigger an alarm instance sends to AlertManager, AlertManager overall architecture and AlertManager treatment for private alarm . We can see that although the implementation of the basic link is complete, but the monitoring model Prometheus has become a de facto standard for comparison, versatility and practicality alarm throughout the Prometheus model is still doubt as to my experience, if we want to really apply production environment needs to be done adaptation and enhancement.

 

Guess you like

Origin www.cnblogs.com/YaoDD/p/11807372.html
Recommended