Build Prometheus monitoring platform in k8s cluster

Basic architecture

Prometheus is released by SoundCloud and is a combination of open source monitoring, alarming and time series databases developed in the go language.

The basic principle of Prometheus is to periodically capture the status of monitored components through the HTTP protocol. Any component can be accessed for monitoring as long as it provides the corresponding HTTP interface. No SDK or other integration process is required. This is very suitable for virtualization environment monitoring systems, such as VM, Docker, Kubernetes, etc.

Insert image description here
The main component functions of Prometheus are as follows:

  • Prometheus Server: The main role of the server is to regularly pull data from statically configured targets or service discovery targets (mainly DNS, consul, k8s, mesos, etc.).
  • Exporter: Mainly responsible for reporting data to prometheus server. Different data reports are implemented by different exporters. For example, the monitoring host has node-exporters, and mysql has MySQL server exporter.
  • Pushgateway: Prometheus can obtain data by not only going to the corresponding exporter to Pull, but also by having the service push to the pushgateway first, and then the server can go to the pushgateway to pull it.
  • Alertmanager: implements the alarm function of prometheus.
  • webui: webui display is mainly implemented through grafana.

Our basic process in actual use is:
each service pushes monitoring data to its corresponding indicator (such as the Exporter mentioned below) --> Prometheus Server collects data regularly and stores it --> Configure Grafana to display data & configure alarm rules Alarm

Helm deploys Prometheus platform

Use helm to deploy kube-prometheus-stack
helm address: portal
github address: portal

Please add image description
First, you need to install the helm tool on the server. I won’t go into details on how to install it. There are many tutorials on the Internet. The specific operations of using helm to install prometheus are:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/kube-prometheus-stack

Exporter

To collect target monitoring data, you must first install a collection component at the target location. This collection component is called an Exporter. There are many such exporters on the prometheus.io official website, including the official exporter list .

How to transfer to Prometheus after collection?

Exporter will expose an HTTP interface, prometheus will pull data through Pull mode, and will periodically capture monitored component data through HTTP protocol.
However, prometheus also provides a way to support Push mode. You can push data to Push Gateway, and prometheus obtains data from Push Gateway through pull.

Access collection components in golang applications

kratos framework

An example of connecting the Prometheus collection component to the microservice framework kratos, kratos official tutorial :

package main

import (
	"context"
	"fmt"
	"log"

	prom "github.com/go-kratos/kratos/contrib/metrics/prometheus/v2"
	"github.com/go-kratos/kratos/v2/middleware/metrics"
	"github.com/prometheus/client_golang/prometheus/promhttp"

	"github.com/go-kratos/examples/helloworld/helloworld"
	"github.com/go-kratos/kratos/v2"
	"github.com/go-kratos/kratos/v2/transport/grpc"
	"github.com/go-kratos/kratos/v2/transport/http"
	"github.com/prometheus/client_golang/prometheus"
)

// go build -ldflags "-X main.Version=x.y.z"
var (
	// Name is the name of the compiled software.
	Name = "metrics"
	// Version is the version of the compiled software.
	// Version = "v1.0.0"

	_metricSeconds = prometheus.NewHistogramVec(prometheus.HistogramOpts{
    
    
		Namespace: "server",
		Subsystem: "requests",
		Name:      "duration_sec",
		Help:      "server requests duration(sec).",
		Buckets:   []float64{
    
    0.005, 0.01, 0.025, 0.05, 0.1, 0.250, 0.5, 1},
	}, []string{
    
    "kind", "operation"})

	_metricRequests = prometheus.NewCounterVec(prometheus.CounterOpts{
    
    
		Namespace: "client",
		Subsystem: "requests",
		Name:      "code_total",
		Help:      "The total number of processed requests",
	}, []string{
    
    "kind", "operation", "code", "reason"})
)

// server is used to implement helloworld.GreeterServer.
type server struct {
    
    
	helloworld.UnimplementedGreeterServer
}

// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(ctx context.Context, in *helloworld.HelloRequest) (*helloworld.HelloReply, error) {
    
    
	return &helloworld.HelloReply{
    
    Message: fmt.Sprintf("Hello %+v", in.Name)}, nil
}

func init() {
    
    
	prometheus.MustRegister(_metricSeconds, _metricRequests)
}

func main() {
    
    
	grpcSrv := grpc.NewServer(
		grpc.Address(":9000"),
		grpc.Middleware(
			metrics.Server(
				metrics.WithSeconds(prom.NewHistogram(_metricSeconds)),
				metrics.WithRequests(prom.NewCounter(_metricRequests)),
			),
		),
	)
	httpSrv := http.NewServer(
		http.Address(":8000"),
		http.Middleware(
			metrics.Server(
				metrics.WithSeconds(prom.NewHistogram(_metricSeconds)),
				metrics.WithRequests(prom.NewCounter(_metricRequests)),
			),
		),
	)
	httpSrv.Handle("/metrics", promhttp.Handler())

	s := &server{
    
    }
	helloworld.RegisterGreeterServer(grpcSrv, s)
	helloworld.RegisterGreeterHTTPServer(httpSrv, s)

	app := kratos.New(
		kratos.Name(Name),
		kratos.Server(
			httpSrv,
			grpcSrv,
		),
	)

	if err := app.Run(); err != nil {
    
    
		log.Fatal(err)
	}
}

Finally, an http://127.0.0.1:8000/metricsHTTP interface is exposed, through which Prometheus can pull monitoring data.

Gin framework

Example of connecting the Prometheus collection component to the lightweight HTTP framework Gin:

package main

import (
	"strconv"
	"time"

	"github.com/gin-gonic/gin"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
	handler = promhttp.Handler()

	_metricSeconds = prometheus.NewHistogramVec(prometheus.HistogramOpts{
    
    
		Namespace: "server",
		Subsystem: "requests",
		Name:      "duration_sec",
		Help:      "server requests duration(sec).",
		Buckets:   []float64{
    
    0.005, 0.01, 0.025, 0.05, 0.1, 0.250, 0.5, 1},
	}, []string{
    
    "method", "path"})
	_metricRequests = prometheus.NewCounterVec(prometheus.CounterOpts{
    
    
		Namespace: "client",
		Subsystem: "requests",
		Name:      "code_total",
		Help:      "The total number of processed requests",
	}, []string{
    
    "method", "path", "code"})
)

func init() {
    
    
	prometheus.MustRegister(_metricSeconds, _metricRequests)
}

func HandlerMetrics() func(c *gin.Context) {
    
    
	return func(c *gin.Context) {
    
    
		handler.ServeHTTP(c.Writer, c.Request)
	}
}

func WithProm() gin.HandlerFunc {
    
    
	return func(c *gin.Context) {
    
    
		var (
			method string
			path   string
			code   int
		)
		startTime := time.Now()

		method = c.Request.Method
		path = c.Request.URL.Path

		c.Next()

		code = c.Writer.Status()

		_metricSeconds.WithLabelValues(method, path).Observe(time.Since(startTime).Seconds())
		_metricRequests.WithLabelValues(method, path, strconv.Itoa(code)).Inc()
	}
}

func main() {
    
    
	r := gin.Default()
	r.Use(WithProm())
	r.GET("/ping", func(c *gin.Context) {
    
    
		c.JSON(200, gin.H{
    
    
			"message": "pong",
		})
	})
	r.GET("/metrics", HandlerMetrics())
	r.Run() // 监听并在 0.0.0.0:8080 上启动服务
}

Finally, an http://127.0.0.1:8080/metricsHTTP interface is exposed, through which Prometheus can pull monitoring data.

Fetch external data sources from the cluster

helmBackground: One was deployed in an existing K8s cluster kube-prometheus-stackto monitor servers and services. Now the node, pod and other components in the k8s cluster have been connected to prometheus. It is also necessary to connect other application services deployed outside the k8s cluster to prometheus.

When prometheus captures data outside the k8s cluster, there are the following ways:

  • ServiceMonitor
  • Additional Scrape Configuration

ServiceMonitor

ServiceMonitor is a CRD that defines the service endpoints that Prometheus should crawl and the crawling interval.
To monitor services outside the cluster through ServiceMonitor, you need to configure Service, Endpoints and ServiceMonitor.

192.168.1.100:8000There is now a backend service that has been deployed and has /metricsbeen exposed through monitoring metrics. Try to connect it to prometheus, the specific operations are as follows:

Enter at the command line

$ touch external-application.yaml

$ vim external-application.yaml

Then copy the contents of the following yaml file into it

---
apiVersion: v1
kind: Service
metadata:
  name: external-application-exporter
  namespace: monitoring
  labels:
    app: external-application-exporter
    app.kubernetes.io/name: application-exporter
spec:
  type: ClusterIP
  ports:
  - name: metrics
    port: 9101
    protocol: TCP
    targetPort: 9101
---
apiVersion: v1
kind: Endpoints
metadata:
    name: external-application-exporter
    namespace: monitoring
    labels:
      app: external-application-exporter
      app.kubernetes.io/name: application-exporter
subsets:
- addresses:
  - ip: 192.168.1.100  # 这里是外部的资源列表
  ports:
  - name: metrics
    port: 8000
- addresses:
  - ip: 192.168.1.100  # 这里是外部的资源列表2
  ports:
  - name: metrics
    port: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: external-application-exporter
  namespace: monitoring
  labels:
    app: external-application-exporter
    release: prometheus
spec:
  selector:
    matchLabels:            # Service选择器
      app: external-application-exporter
  namespaceSelector:        # Namespace选择器
    matchNames:
    - monitoring
  endpoints:
  - port: metrics           # 采集节点端口(svc定义)
    interval: 10s           # 采集频率根据实际需求配置,prometheus默认10s
    path: /metrics          # 默认地址/metrics

After saving the file, run the command:

kubectl apply -f external-application.yaml

Then open the prometheus console and enter the Targets directory. You can see that the new external-application-exporter is displayed:

Please add image description
Please add image description

Additional Scrape Configuration

In addition to the HTTP service provided by ip plus port, I have also deployed HTTPS services on other servers that can be accessed through domain names. Now I want to connect it using the same method.

First try to modify it Endpointsand find the official documentation of k8s . I find that Endpointsit only supports it ipand there is no HTTPSplace to configure the protocol.
Please add image description
So let's try another approach.

the first method

First, check the official documentation and find the place about prometheus crawling configuration . You can see that the keyword of prometheus crawling configuration is. scrape_config
Please add image description
Our prometheus is obtained by deploying kube-prometheus-stack through helm, so let’s check the charts. value.yaml file to see if it is configured.

input the command:

$ cat values.yaml  | grep -C 20  scrape_config

The following results are obtained:
Please add image description
As we know from the comments, kube-prometheus configures the crawling strategy through additionalScrapeConfigs.

So I wrote a configuration file to update the release of prometheus that helm had deployed.

$ touch prometheus.yml

$ vim prometheus.yml

Write the following content:

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: external-application-exporter-https
      scrape_interval: 10s
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      tls_config:
        insecure_skip_verify: true
      static_configs:
        - targets: ["www.baidu.com:443"]

Last updated release:

$ helm upgrade -nmonitoring -f prometheus.yaml prometheus kube-prometheus-stack-40.0.0.tgz

Use prometheus.yamlthe updated release, which is the chartkube-prometheus-stack-40.0.0.tgz file that I have helm pulled to the local when deploying prometheus .

We can see our newly added data source in the Targets directory of the prometheus console.

It can actually end here, but one disadvantage is that every time a new domain name is added for monitoring, the release of helm needs to be updated again , which is not particularly convenient.

The second method

Looking through the source code of prometheus-operator , I found that in the description, there is a tutorial on capturing configuration hot updates. A simple summary is to control the data source captured by prometheus by configuring the secret. When the content of the secret is modified, the prometheus fetching configuration can be hot updated. Take a screenshot to see:

Please add image description

The first step is to generate prometheus-additional.yamlthe file
$ touch prometheus-additional.yaml

$ vim prometheus-additional.yaml

prometheus-additional.yamlcontent:

- job_name: external-application-exporter-https
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
    - targets: ["www.baidu.com:443"]
The second step is to generate secret

Generate the configuration file used to create the secret:

$ kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run=client -oyaml > additional-scrape-configs.yaml

$ cat additional-scrape-configs.yaml

You can see the generated additional-scrape-configs.yamlcontent as follows:

apiVersion: v1
data:
  prometheus-additional.yaml: LSBqb2JfbmFtZTogZXh0ZXJuYWwtYXBwbGljYXRpb24tZXhwb3J0ZXItaHR0cHMKICBzY3JhcGVfaW50ZXJ2YWw6IDEwcwogIHNjcmFwZV90aW1lb3V0OiAxMHMKICBtZXRyaWNzX3BhdGg6IC9tZXRyaWNzCiAgc2NoZW1lOiBodHRwcwogIHRsc19jb25maWc6CiAgICBpbnNlY3VyZV9za2lwX3ZlcmlmeTogdHJ1ZQogIHN0YXRpY19jb25maWdzOgogICAgLSB0YXJnZXRzOiBbImNpYW10ZXN0LnNtb2EuY2M6NDQzIl0K
kind: Secret
metadata:
  creationTimestamp: null
  name: additional-scrape-configs

Decode this code and take a look at the content:

$ echo "LSBqb2JfbmFtZTogZXh0ZXJuYWwtYXBwbGljYXRpb24tZXhwb3J0ZXItaHR0cHMKICBzY3JhcGVfaW50ZXJ2YWw6IDEwcwogIHNjcmFwZV90aW1lb3V0OiAxMHMKICBtZXRyaWNzX3BhdGg6IC9tZXRyaWNzCiAgc2NoZW1lOiBodHRwcwogIHRsc19jb25maWc6CiAgICBpbnNlY3VyZV9za2lwX3ZlcmlmeTogdHJ1ZQogIHN0YXRpY19jb25maWdzOgogICAgLSB0YXJnZXRzOiBbImNpYW10ZXN0LnNtb2EuY2M6NDQzIl0K" | base64 -d

get:

- job_name: external-application-exporter-https
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
    - targets: ["www.baidu.com:443"]

You can confirm that the configuration file is generated correctly, and then generate the secret:

$ kubectl apply -f additional-scrape-configs.yaml -n monitoring

Monitoring is the namespace where prometheus is deployed, put them in the same namespace.

Confirm that the secret is generated:

$ kubectl get secret -n monitoring

Output:
Please add image description

Finally, modify the CRD

Finally, reference this additional configuration in your prometheus.yaml CRD.

The official documentation allows us to modify the configuration of prometheus.
First find the CRD of prometheus:

$ kubectl get prometheus -n monitoring
NAME                                    VERSION   REPLICAS   AGE
prometheus-kube-prometheus-prometheus   v2.38.0   1          2d18h

then modify it

$ kubectl edit prometheus prometheus-kube-prometheus-prometheus -n monitoring
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  labels:
    prometheus: prometheus
spec:
  ...
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional.yaml
  ...

Finally, check the effect in the prometheus console:
Please add image description
the domain name service has been monitored. If you want to add other domain name monitoring in the future, you only need to modify the secret. Great! ! !

Alarm

Regarding alarms, we use the prometheus+alertmanager solution. The main process from monitoring alarm information to handling alarm events is as follows:
Please add image description

Our business requirement is to receive notifications when the service is down and deal with it in a timely manner. So the alarm rules we need to configure here are to collect the survival information of the application. When the non-survival state is detected, the alarm message status is set to peding. When the peding duration reaches a certain time threshold, it is set to firingtrigger an alarm. The alarm information is submitted alertmanagerto AlertManager, and then according to the rules, the alarm message is sent to 消息接收者Qiwei, DingTalk, email, etc.

The specific methods are as follows:

Step 1 prometheus alarm trigger

Reference: kube-prometheus-stack alarm configuration

Since I deployed with helm kube-prometheus-stack, in order to maintain version consistency, charts: kube-prometheus-stack-40.0.0.tgzwere downloaded ( helm pull prometheus-community/kube-prometheus-stack --version=40.0.0) to the local in advance. After decompression, you can find the following relevant entries kube-prometheus-stackin :values.yamlPrometheusRules

## Deprecated way to provide custom recording or alerting rules to be deployed into the cluster.
##
# additionalPrometheusRules: []
#  - name: my-rule-file
#    groups:
#      - name: my_group
#        rules:
#        - record: my_record
#          expr: 100 * my_record

## Provide custom recording or alerting rules to be deployed into the cluster.
##
#additionalPrometheusRulesMap: {}
#  rule-name:
#    groups:
#    - name: my_group
#      rules:
#      - record: my_record
#        expr: 100 * my_record

Modification values.yaml:

## Deprecated way to provide custom recording or alerting rules to be deployed into the cluster.
##
# additionalPrometheusRules: []
#  - name: my-rule-file
#    groups:
#      - name: my_group
#        rules:
#        - record: my_record
#          expr: 100 * my_record

## Provide custom recording or alerting rules to be deployed into the cluster.
##
additionalPrometheusRulesMap: 
  rule-name:
    groups:
    - name: Instance
      rules:
        # Alert for any instance that is unreachable for >5 minutes.
        - alert: InstanceDown
          expr: up == 0
          for: 5m
          labels:
            severity: page
          annotations:
            summary: "Instance {
    
    { $labels.instance }} down"
            description: "{
    
    { $labels.instance }} of job {
    
    { $labels.job }} has been down for more than 5 minutes."

Then update helm release

helm upgrade -nmonitoring prometheus --values=values.yaml  ../kube-prometheus-stack-40.0.0.tgz

After the update is completed, check the results on the prometheus console:
Please add image description
you can see that alert rulesthe configuration has been successful. According to the alarm rules, as long as the status of any instance is notup == 0 , the alert status will be changed to peding according to the rules. If it has not recovered after 5 minutes, the status will Change to firing to trigger an alarm message.

Step 2 alertmanager alarm notification

Reference: kube-prometheus-stack Configuring AlertManager

After the prometheus trigger collects the alarm message, it will be sent to alertmanager for unified management. alertmanager configures certain rules to distribute alert messages to different recipients. Find the following relevant entries
in . Specified configurations are provided so that you can customize some specific ones . The original configuration is as follows:kube-prometheus-stackvalues.yamlalertmanager.configalertmanager.configaltermanagerreceivers

## Configuration for alertmanager
## ref: https://prometheus.io/docs/alerting/alertmanager/
##
alertmanager:
...
  ## Alertmanager configuration directives
  ## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
  ##      https://prometheus.io/webtools/alerting/routing-tree-editor/
  ##
  config:
    global:
      resolve_timeout: 5m
    inhibit_rules:
      - source_matchers:
          - 'severity = critical'
        target_matchers:
          - 'severity =~ warning|info'
        equal:
          - 'namespace'
          - 'alertname'
      - source_matchers:
          - 'severity = warning'
        target_matchers:
          - 'severity = info'
        equal:
          - 'namespace'
          - 'alertname'
      - source_matchers:
          - 'alertname = InfoInhibitor'
        target_matchers:
          - 'severity = info'
        equal:
          - 'namespace'
    route:
      group_by: ['namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
      - receiver: 'null'
        matchers:
          - alertname =~ "InfoInhibitor|Watchdog"
    receivers:
    - name: 'null'
    templates:
    - '/etc/alertmanager/config/*.tmpl'

We modify it to:

## Configuration for alertmanager
## ref: https://prometheus.io/docs/alerting/alertmanager/
##
alertmanager:
...
  ## Alertmanager configuration directives
  ## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
  ##      https://prometheus.io/webtools/alerting/routing-tree-editor/
  ##
  config:
    global:
      resolve_timeout: 5m
    inhibit_rules:
      - source_matchers:
          - 'severity = critical'
        target_matchers:
          - 'severity =~ warning|info'
        equal:
          - 'namespace'
          - 'alertname'
      - source_matchers:
          - 'severity = warning'
        target_matchers:
          - 'severity = info'
        equal:
          - 'namespace'
          - 'alertname'
      - source_matchers:
          - 'alertname = InfoInhibitor'
        target_matchers:
          - 'severity = info'
        equal:
          - 'namespace'
    route:
      group_by: ['instance']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'wx-webhook'
      routes:
    receivers:
    - name: 'wx-webhook'
      webhook_configs: 
      - url: "http://wx-webhook:80/adapter/wx"
        send_resolved: true
    templates:
    - '/etc/alertmanager/config/*.tmpl'

The address in it webhook_configs[0].url: "http://wx-webhook:80/adapter/wx"is the enterprise WeChat group robot webhook that receives the alarm message. The establishment of the enterprise WeChat group robot webhook will be explained in detail next.

Then update helm release

helm upgrade -nmonitoring prometheus --values=values.yaml  ../kube-prometheus-stack-40.0.0.tgz

After the configuration is completed, turn off a service and view the results in the enterprise WeChat group:

Please add image description

Step 3: Build an enterprise WeChat group robot webhook

Reference: prometheus alarms through enterprise WeChat robots

Generate a Qiwei robot

In the group settings, enter the group robot function: then add a group robot and copy the address
Please add image description
of the added group robotWebhook
Please add image description

Write deploymentconfiguration file wx-webhook-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wx-webhook
  labels:
    app: wx-webhook
spec:
  replicas: 1
  selector:
    matchLabels:
      app: wx-webhook
  template:
    metadata:
      labels:
        app: wx-webhook
    spec:
      containers:
      - name: wx-webhook
        image: guyongquan/webhook-adapter:latest
        imagePullPolicy: IfNotPresent
        args: ["--adapter=/app/prometheusalert/wx.js=/wx=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxxxxxxxxxxxxxxxx"]
        ports:
        - containerPort: 80

---
apiVersion: v1
kind: Service
metadata:
  name: wx-webhook
  labels:
    app: wx-webhook
spec:
  selector:
    app: wx-webhook
  ports:
    - name: wx-webhook
      port: 80
      protocol: TCP
      targetPort: 80
      nodePort: 30904
  type: NodePort

The content is the address of args: ["--adapter=/app/prometheusalert/wx.js=/wx=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxxxxxxxxxxxxxxxxx"]the Qiwei robot created in the previous step. Then run the command:Webhook

$ kubectl apply -f wx-webhook-deployment.yaml -nmonitoring
$ kubectl get pod -n monitoring | grep wx-webhook
wx-webhook-78d4dc95fc-9nsjn                              1/1     Running   0                26d
$ kubectl get service -n monitoring | grep wx-webhook
wx-webhook          NodePort    10.106.111.183   <none>        80:30904/TCP                 27d

In this way, the establishment of the enterprise WeChat group robot webhook is completed.

Here I use Enterprise WeChat as the receiver of the alert message, and alertmanager also supports other message receivers. You can refer to this article: Detailed explanation of kube-promethues monitoring alarms (email, DingTalk, WeChat, Qiwei Robot, self-research platform)

Problems encountered

  1. After updating the capture configuration secret, no effect can be seen on the prometheus console.
    Try restarting the pod: prometheus-prometheus-kube-prometheus-prometheus-0, error:

ts=2023-07-29T09:30:54.188Z caller=main.go:454 level=error msg=“Error loading config (–config.file=/etc/prometheus/config_out/prometheus.env.yaml)” file=/etc/prometheus/config_out/prometheus.env.yaml err=“parsing YAML file /etc/prometheus/config_out/prometheus.env.yaml: scrape timeout greater than scrape interval for scrape config with job name “external-application-exporter-https””

The reason is that the configuration error of the custom indicator causes prometheus to fail to start, and there are problems with scrape_interval and scrape_timeout.

- job_name: external-application-exporter-https
  scrape_interval: 10s
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
    - targets: ["www.baidu.com:443"]

Need to be changed to

- job_name: external-application-exporter-https
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
    - targets: ["www.baidu.com:443"]

Quote

  1. Getting started with Grafana & prometheus
  2. Prometheus monitoring + Grafana + Alertmanager alarm installation and use (detailed picture and text explanation)
  3. Prometheus official tutorial
  4. Helm repository
  5. Github address of kube-prometheus project
  6. kratos official tutorial
  7. K8s official documentation
  8. Source code of prometheus-operator
  9. kube-prometheus-stack alarm configuration
  10. kube-prometheus-stack configure AlertManager
  11. prometheus alarms through enterprise WeChat robot
  12. Detailed explanation of kube-promethues monitoring and alarming (email, DingTalk, WeChat, Qiwei Robot, self-research platform)

Guess you like

Origin blog.csdn.net/qq_26356861/article/details/131997852