Practice of monitoring system based on Prometheus

Monitoring, as a part of the underlying infrastructure, is an indispensable part of ensuring the stability of production environment services. From discovery to location to resolution, online problems can effectively cover "discovery" and "location" through monitoring and warning methods. , It can even be solved by means such as self-healing of faults. Service development and operation and maintenance personnel can find abnormalities in service operation in a timely and effective manner, so as to troubleshoot and solve problems more efficiently.

One, Prometheus introduction

A typical monitoring (such as white box monitoring) usually focuses on the internal state of the target service, for example:

  • Number of requests received per unit time
  • Request success rate/failure rate per unit time
  • Average processing time of requests

White box monitoring describes the internal state of the system very well, but lacks the phenomena seen from an external perspective. For example, white box monitoring can only see the requests that have been received, and cannot see the requests that were not sent successfully due to DNS failure , And black box monitoring can be used as a supplementary means at this time, the probe program is used to detect whether the target service has successfully returned, so as to better feedback the current state of the system.

One day you need to build a monitoring system for the service to collect the indicators reported by the application buried points. Prometheus's business monitoring, because it has the following advantages:

① 支持PromQL(一种查询语言),可以灵活地聚合指标数据
② 部署简单,只需要一个二进制文件就能跑起来,不需要依赖分布式存储
③ Go语言编写,组件更方便集成在同样是Go编写项目代码中
④ 原生自带WebUI,通过PromQL渲染时间序列到面板上
⑤ 生态组件众多,Alertmanager,Pushgateway,Exporter...

Prometheus architecture is as follows:
Insert picture description here
In the above process, Prometheus specified by the service discovery profile manner 确定要拉取监控指标的目标(Target), to the next 拉取的目标(应用容器和Pushgateway), 发起HTTP请求to a particular endpoint (Metric Path), the index 持久化to the TSDb of itself, will eventually TSDb of memory In addition, Prometheus will periodically calculate the set alarm rules through PromQL to determine whether to generate an alarm to Alertmanager. After receiving the alarm, the latter will be responsible for sending the notification to the email or internal group chat. in.
Prometheus indicator names can only consist of ASCII characters, numbers, underscores and colons, and there is a set of naming conventions:

① 使用基础 Unit(如 seconds 而非 milliseconds)
② 指标名以 application namespace 作为前缀,如:
    process_cpu_seconds_total
    http_request_duration_seconds
③ 用后缀来描述 Unit,如:
    http_request_duration_seconds
    node_memory_usage_bytes
    http_requests_total
    process_cpu_seconds_total
    foobar_build_info

Prometheus provides the following basic indicator types:

  • Counter: represents an indicator of monotonic increase in sample data, that is, it only increases without decreasing. It is usually used to count the number of service requests and errors.
  • Gauge: Represents an indicator whose sample data can be changed arbitrarily, which can be increased or decreased. It is usually used for statistics such as the service CPU usage value and memory usage value.
  • Histogram and Summary: used to represent data sampling and point quantile map statistical results over a period of time, usually used to count request time or response size, etc.

Prometheus is based on 时间序列storage. First, understand what a time series is. The format of a time series is similar to (timestamp,value)this format, that is, a time point has a corresponding value, such as a weather forecast that is very common in life, such as: [(14:00, 27°C), (15:00, 28°C), (16:00, 26°C)], is a single-dimensional time series, this kind of sequence stored according to timestamp and value is also called a vector (vector) .
Insert picture description hereLet me give another example. As shown in the figure above, if there is an indicator http_requests, its function is to count the total amount of requests corresponding to each time period. At this time, it is a single-dimensional matrix mentioned above, and when We give this indicator 加上一个维度:主机名. At this time, the role of the indicator becomes 统计每个时间段各个主机名对应的请求量是多少. At this time, the matrix area becomes a time series with multiple column vectors (each column corresponds to a host name). When adding multiple labels to this time series (key=value)At this time, this matrix will correspondingly become a multi-dimensional matrix.

Each set of unique labels corresponds to a unique vector (vector), which can also be called a time sequence (Time Serie). When looking at it at a certain point in time, it is an instant vector (Instant Vector). The timing of the vector has only one point in time and one value for it, such as: the CPU load of the server at 12:05:30 today; when looking at it in a time period, it is a range vector (Range Vector) For a set of time series data, such as: the server's CPU load from 11:00 to 12:00 today.

Similarly, you can query the eligible time series through the indicator name and label set:

http_requests{host="host1",service="web",code="200",env="test"}

The query result will be an instantaneous vector:

http_requests{host="host1",service="web",code="200",env="test"} 10
http_requests{host="host2",service="web",code="200",env="test"} 0
http_requests{host="host3",service="web",code="200",env="test"} 12

If you add a time parameter to this condition, query the time series within a period of time:

http_requests{host="host1",service="web",code="200",env="test"}[:5m]

The result will be a range vector:

http_requests{host="host1",service="web",code="200",env="test"} 0 4 6 8 10
http_requests{host="host2",service="web",code="200",env="test"} 0 0 0 0 0
http_requests{host="host3",service="web",code="200",env="test"} 0 2 5 9 12

With range vectors, can we perform some aggregation operations on these time series? That's right, PromQL does this. For example, if we want to calculate the request growth rate in the last 5 minutes, we can use the above range vector plus the aggregate function to do the calculation:

rate(http_requests{host="host1",service="web",code="200",env="test"}[:5m])

For example, to request the increase in the last 5 minutes, you can use the following PromQL:

increase(http_requests{host="host1",service="web",code="200",env="test"}[:5m])

To calculate the 90th percentile in the past 10 minutes:

histogram_quantile(0.9, rate(employee_age_bucket_bucket[10m]))

In Prometheus, a metric (that is, a metric with a unique tag set) and a (timestamp,value)sample form a sample. Prometheus puts the collected samples in the memory, and compresses the data into a block every 2 hours by default for persistence To the hard disk, the more the number of samples, the higher the memory occupied by Prometheus. Therefore, in practice, generally 不建议用区分度(cardinality)太高的标签, such as: user IP, ID, URL address, etc., otherwise the result will cause the number of time series to increase exponentially (label Multiply the number). In addition to controlling the number and size of samples to be reasonable, you can also 降低 storage.tsdb.min-block-durationspeed up the data placement time and 增加 scrape intervalincrease the pull interval to control the memory occupied by Prometheus.

By declaring in the configuration file scrape_configsto specify the target that Prometheus needs to pull indicators at runtime, the target instance needs to implement an endpoint that can be polled by Prometheus, and to implement such an interface, it can be used to provide Prometheus with monitoring sample data The independent program is generally called Exporter, for example Node Exporter, it is used to pull operating system indicators , it will collect hardware indicators from the operating system for Prometheus to pull.

In the development environment, often only one Prometheus instance needs to be deployed to meet the collection of hundreds of thousands of indicators. However, in the production environment, there are a large number of application and service instances. It is usually not enough to deploy only one Prometheus instance. It is better to deploy multiple Prometheus instances, and each instance only pulls a part of the indicators through partitions, such as in the Prometheus Relabel configuration. The hashmod function of, you can hashmod the address of the pull target, and then keep the target whose result matches its ID:

relabel_configs:
- source_labels: [__address__]
  modulus:       3
  target_label:  __tmp_hash
  action:        hashmod
- source_labels: [__tmp_hash]
  regex:         $(PROM_ID)
  action:        keep

In other words, if we want each Prometheus to pull a cluster's metrics, we can also use Relabel to complete:

relabel_configs:
- source_labels:  ["__meta_consul_dc"]
  regex: "dc1"
  action: keep

Two, Prometheus high availability

Now that each Prometheus has its own data, how do you associate them and establish a global view? The official provides a method:, 联邦集群(federation)that is, the Prometheuse Server is layered according to a tree structure, and the Prometheus in the direction of the root node will query the Prometheus instance of the leaf node, and then aggregate the indicators back.
Insert picture description here
However, it is obvious that the use of federated clusters still cannot solve the problem. First of all, the single point problem still exists. If the root node is down, the query will become unavailable. If multiple parent nodes are configured, it will cause data redundancy and capture. Timing leads to problems such as data inconsistency, and when the number of leaf node targets is too large, it is more likely to increase the pressure on the parent node and even full downtime. In addition, rule configuration management is also a big trouble.

Fortunately, a Prometheus cluster solution appeared in the community:, Thanosit provides a global query view, which can query and aggregate data from multiple Prometheus, because all of this data can be obtained from a single endpoint.
Insert picture description here   1. When Querier receives a request, it will send a request to the relevant Sidecar and obtain time series data from their Prometheus server.
  2. It aggregates these response data and executes PromQL queries on them. It can aggregate disjoint data and deduplicate data for Prometheus' high availability group.

With Thanosthis, it Prometheus的水平扩展can become easier. Not only that, Thanos also provides a reliable data storage solution that can monitor and back up Prometheus local data to remote storage. In addition, because Thanos provides the Prometheus cluster 全局视图, the recording rules for global Prometheus are not a problem. The Ruler component provided by Thanos will execute the rules and issue alarms based on Thanos Querier.

Three, Prometheus storage

Speaking of storage , the high availability of Prometheus queries can be solved in 水平扩展+统一查询视图a way, so how to solve the high availability of storage? In the design of Prometheus, data is persisted in local storage. Although local persistence is convenient, it will also bring some troubles. For example, if the node is down or Prometheus is scheduled to other nodes, it will mean the original The monitoring data on the node is lost in the query interface. The local storage makes Prometheus unable to expand flexibly. For this reason, Prometheus provides the Remote Readand Remote Writefunction to support remote writing of the Prometheus time series to the remote storage, which can be stored remotely during query Read the data.
Insert picture description here
In one example M3DB, M3DB is one 分布式的时间序列数据库. It provides a remote read and write interface for Prometheus. When a time series is written to the M3DB cluster, the data will be copied to the cluster according to the shard and replication factor parameters. On other nodes, high storage availability is realized. In addition to M3DB, Prometheus currently supports InfluxDB, OpenTSDB, etc. as endpoints for remote writing.

Four, Prometheus collects data method

1, pull mode
to solve the high availability of Prometheus, let's look at how to monitor Prometheus target acquisition, as 监控节点数量较小when, by Static Configthe target host of the list wrote Prometheus pull configuration, but if the target node over one of the words in this way Management is a big problem, and in a production environment, the IP of the service instance is usually not fixed. At this time, there is no way to effectively manage the target node with static configuration Prometheus提供的服务发现功能便可以有效解决监控节点状态变化的问题. At this time , in this mode, Prometheus will arrive The registry monitors and queries the list of nodes, and periodically pulls indicators for nodes. If there are more flexible requirements and Prometheus也支持基于文件的服务发现functions for service discovery , at this time we can obtain the node list from multiple registries, filter by our own requirements, and finally write it to the file, which is detected by Prometheus 文件变化后便能动态地替换监控节点,再去拉取目标.
2. In the
front of Pushgateway, I saw that Prometheus regularly crawls the target node in the pull mode. If there is a situation where some task nodes have run out before they can be pulled, then the monitoring data will be lost. In order to deal with this situation, Prometheus provides a tool:, Pushgatewayused to receive from the service 主动上报, it is suitable for those short-lived batch tasks to push and temporarily store indicators on itself, and then Prometheus can pull itself, In order to prevent the indicator from exiting before being pulled by Prometheus. In addition, Pushgateway is also suitable for the problem that when Prometheus and application nodes are running on a heterogeneous network or are isolated by a firewall, nodes cannot be actively pulled. In this case, application nodes can push indicators to Pushgateway instances by using the domain name of Pushgateway on, Prometheus can pull Pushgateway nodes in the same network, the other should pay attention to when configuring a pull Pushgateway question : Prometheus will endow each indicatorjobAnd instancetags. When Prometheus pulls Pushgateway, the joband instancemay be the ip of the Pushgateway and Pushgateway hosts respectively. When the index reported by pushgateway also contains the joband instancetag, Prometheus will rename the conflicting tag to exported_joband exported_instance, if you need to overwrite the two If a label is required, it needs to be configured in Prometheus honor_labels: true.

Pushgateway can replace the pull model as an indicator collection scheme, but this model will bring many negative effects:

Pushgateway is designed as a cache of monitoring indicators, which means it will not actively expire the indicators reported by the service. This situation will not cause problems when the service is running, but when the service is rescheduled or destroyed, Pushgateway will still The indicators reported by the previous node are retained. Moreover, if multiple Pushgateway runs under LB, a monitoring indicator may appear on multiple Pushgateway instances, resulting in multiple copies of data. It is necessary to add consistent hash routing at the proxy layer to solve the problem

in pull mode. Prometheus can more easily view the health status of the monitored target instance, and can quickly locate faults, but in push mode, since it does not actively detect the client, it also becomes ignorant of the health status of the target instance

Five, Prometheus' Alertmanager

Alertmanager is an alarm component separated from Prometheus. It mainly receives alarm events sent by Promethues, and then deduplicates, groups, suppresses and sends the alarms. In practice, it can be used with webhook to send alarm notifications to enterprise WeChat or DingTalk. The architecture diagram is as follows:
Insert picture description here

Six, Kubernetes builds Prometheus monitoring system

Although Promehteus already has an official Operator, in order to learn to write yaml files manually, it is very convenient to complete the whole process, and only a few instances can be used to collect and monitor 200+ service indicators for thousands of instances. .

In order to deploy the Prometheus instance, you need to declare Prometheus的StatefulSetthat the Pod includes three containers, respectively, Prometheusand 绑定的Thanos Sidecarfinally add one watch容器to monitor the changes of the prometheus configuration file. When the ConfigMap is modified, the Prometheus Reload API can be automatically called to complete the configuration loading. Here According to the data partitioning method mentioned earlier, an environment variable is added before Prometheus starts PROM_ID, as the identifier of hashmod during Relabel, and POD_NAMEused as the one specified by Thanos Sidecar for Prometheus external_labels.replica:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  serviceName: "prometheus"
  updateStrategy:
    type: RollingUpdate
  replicas: 3
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
        thanos-store-api: "true"
    spec:
      serviceAccountName: prometheus
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config
      - name: prometheus-data
        hostPath:
          path: /data/prometheus
      - name: prometheus-config-shared
        emptyDir: {}
      containers:
      - name: prometheus
        image: prom/prometheus:v2.11.1
        args:
          - --config.file=/etc/prometheus-shared/prometheus.yml
          - --web.enable-lifecycle
          - --storage.tsdb.path=/data/prometheus
          - --storage.tsdb.retention=2w
          - --storage.tsdb.min-block-duration=2h
          - --storage.tsdb.max-block-duration=2h
          - --web.enable-admin-api
        ports:
          - name: http
            containerPort: 9090
        volumeMounts:
          - name: prometheus-config-shared
            mountPath: /etc/prometheus-shared
          - name: prometheus-data
            mountPath: /data/prometheus
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: http
      - name: watch
        image: watch
        args: ["-v", "-t", "-p=/etc/prometheus-shared", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:9090/-/reload"]
        volumeMounts:
        - name: prometheus-config-shared
          mountPath: /etc/prometheus-shared
      - name: thanos
        image: improbable/thanos:v0.6.0
        command: ["/bin/sh", "-c"]
        args:
          - PROM_ID=`echo $POD_NAME| rev | cut -d '-' -f1` /bin/thanos sidecar
            --prometheus.url=http://localhost:9090
            --reloader.config-file=/etc/prometheus/prometheus.yml.tmpl
            --reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yml
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
        ports:
          - name: http-sidecar
            containerPort: 10902
          - name: grpc
            containerPort: 10901
        volumeMounts:
          - name: prometheus-config
            mountPath: /etc/prometheus
          - name: prometheus-config-shared
            mountPath: /etc/prometheus-shared

Because Prometheus cannot access cluster resources in Kubernetes by default, it needs to be assigned RBAC:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: prometheus
  namespace: default
  labels:
    app: prometheus
rules:
- apiGroups: [""]
  resources: ["services", "pods", "nodes", "nodes/proxy", "endpoints"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["prometheus-config"]
  verbs: ["get", "update", "delete"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: prometheus
  namespace: default
  labels:
    app: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
roleRef:
  kind: ClusterRole
  name: prometheus
  apiGroup: ""

Then the deployment of Thanos Querier is relatively simple. You need to specify the store parameter as dnssrv+thanos-store-gateway.default.svc at startup to discover the Sidecar:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: thanos-query
  name: thanos-query
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-query
  minReadySeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: thanos-query
    spec:
      containers:
      - args:
        - query
        - --log.level=debug
        - --query.timeout=2m
        - --query.max-concurrent=20
        - --query.replica-label=replica
        - --query.auto-downsampling
        - --store=dnssrv+thanos-store-gateway.default.svc
        - --store.sd-dns-interval=30s
        image: improbable/thanos:v0.6.0
        name: thanos-query
        ports:
        - containerPort: 10902
          name: http
        - containerPort: 10901
          name: grpc
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: http
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: thanos-query
  name: thanos-query
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 10901
    targetPort: http
  selector:
    app: thanos-query
---
apiVersion: v1
kind: Service
metadata:
  labels:
    thanos-store-api: "true"
  name: thanos-store-gateway
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  selector:
    thanos-store-api: "true"

Deploy Thanos Ruler:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: thanos-rule
  name: thanos-rule
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-rule
  template:
    metadata:
      labels:
      labels:
        app: thanos-rule
    spec:
      containers:
      - name: thanos-rule
        image: improbable/thanos:v0.6.0
        args:
        - rule
        - --web.route-prefix=/rule
        - --web.external-prefix=/rule
        - --log.level=debug
        - --eval-interval=15s
        - --rule-file=/etc/rules/thanos-rule.yml
        - --query=dnssrv+thanos-query.default.svc
        - --alertmanagers.url=dns+http://alertmanager.default
        ports:
        - containerPort: 10902
          name: http
        volumeMounts:
          - name: thanos-rule-config
            mountPath: /etc/rules
      volumes:
      - name: thanos-rule-config
        configMap:
          name: thanos-rule-config

Deploy Pushgateway:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pushgateway
  name: pushgateway
spec:
  replicas: 15
  selector:
    matchLabels:
      app: pushgateway
  template:
    metadata:
      labels:
        app: pushgateway
    spec:
      containers:
      - image: prom/pushgateway:v1.0.0
        name: pushgateway
        ports:
        - containerPort: 9091
          name: http
        resources:
          limits:
            memory: 1Gi
          requests:
            memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: pushgateway
  name: pushgateway
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 9091
    targetPort: http
  selector:
    app: pushgateway

Deploy Alertmanager:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
spec:
  replicas: 3
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      name: alertmanager
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: prom/alertmanager:latest
        args:
          - --web.route-prefix=/alertmanager
          - --config.file=/etc/alertmanager/config.yml
          - --storage.path=/alertmanager
          - --cluster.listen-address=0.0.0.0:8001
          - --cluster.peer=alertmanager-peers.default:8001
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: alertmanager-config
          mountPath: /etc/alertmanager
        - name: alertmanager
          mountPath: /alertmanager
      volumes:
      - name: alertmanager-config
        configMap:
          name: alertmanager-config
      - name: alertmanager
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: alertmanager-peers
  name: alertmanager-peers
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: alertmanager
  ports:
  - name: alertmanager
    protocol: TCP
    port: 9093
    targetPort: 9093

Finally, deploy ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: pushgateway-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - host: $(DOMAIN)
    http:
      paths:
      - backend:
          serviceName: pushgateway
          servicePort: 9091
        path: /metrics
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: prometheus-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: $(DOMAIN)
    http:
      paths:
      - backend:
          serviceName: thanos-query
          servicePort: 10901
        path: /
      - backend:
          serviceName: alertmanager
          servicePort: 9093
        path: /alertmanager
      - backend:
          serviceName: thanos-rule
          servicePort: 10092
        path: /rule
      - backend:
          serviceName: grafana
          servicePort: 3000
        path: /grafana

Visit the Prometheus address, the monitoring node status is normal:
Insert picture description here

Guess you like

Origin blog.csdn.net/m0_37886429/article/details/109486145