Prometheus monitors Kubernetes-1 architecture survey

background

Due to the vigorous development of containerization and microservices, Kubernetes has basically unified the container management solution. When we use Kubernetes for containerized management, comprehensive monitoring of Kubernetes has become the first issue we need to explore. We need to monitor the ingress, service, deployment, pod... and other services of kubernetes, so that we can grasp the internal status of the Kubernetes cluster at any time.

This article is the first in the Prometheus monitoring series, and the purpose is also very clear. It aims to find a set of frameworks that can be used for kubernetes cluster monitoring.

K8s monitoring program survey

  • 1 、 cAdvisor + InfluxDB + Grafana

  • 2 、 Heapster + InfluxDB + Grafana

  • 3 、 Promethus + kube-state-metrics + Grafana

  • Grafana :
    Open source DashBoard, backend supports a variety of databases, such as: Influxdb, Prometheus..., there are many plug-ins and powerful functions. Very suitable for display.

  • InfluxDB :
    Open source time series database with high performance

  • cAdvisor :
    A container monitoring tool from Google, which is also a container resource collection tool built into Kubelet. It automatically collects the resource occupancy of the native container CPU, memory, network, and file system, and provides cAdvisor native APIs. Start with kubelet --cadvisor-port = 1
    Insert picture description here

  • Heapster :
    Because cAdvisor only provides stand-alone container resource occupancy, Heapster provides resource monitoring for the entire cluster (before kubernetes 1.11, hpa obtains data from heapster), and supports persistent data storage to InfluxDB

Insert picture description here

  • Promethues :
    Provides powerful data collection, data storage, data display, alarms, etc. It is born to perfectly support kubernetes, the second member of the CNCF Foundation, the first is Kubernetes. And many ideas in Prometheus come from Google's internal monitoring system Borgmon, which can be said to be the godson of Google.

Insert picture description here

  • kube-state-metrics is used here as an exporter of prometheus, providing monitoring data of deployment, daemonset, cronjob and other services. It is officially provided by kubernestes and is closely integrated with prometheus.
    More information about kube-state-metrics: https://github.com/kubernetes/kube-state-metrics

Prometheus advantages

Prometheus and kubernetes love each other

Google godson, maintenance by a big factory, and the most important point is the perfect support for Kubernetes

Specification definition

Prometheus defines a good standard for monitoring at the application layer, only the application provides an interface to obtain logs.

Prometheus can implement monitoring at all levels, as follows

  • Infrastructure layer: monitor the resources of each host server (including Kubernetes Node and non-Kubernetes Node), such as CPU, memory, network throughput and bandwidth usage, disk I/O and disk usage and other indicators.
  • Middleware layer: Monitor middleware independently deployed outside the Kubernetes cluster, such as MySQL, Redis, RabbitMQ, ElasticSearch, Nginx, etc.
  • Kubernetes cluster: monitor the key indicators of the Kubernetes cluster itself
  • Applications deployed on Kubernetes clusters: monitor applications deployed on Kubernetes clusters

Based on the above three points, I finally chose to use Prometheus to monitor the Kubernetes cluster.

Kubernetes cluster monitoring architecture

Before discussing the Prometheus monitoring architecture in detail, let’s look at a few practical issues

  1. What if there are multiple Kubernetes clusters?

  2. How to deal with the monitoring data of multiple Kubernetes clusters?

  3. How should the alarms be concentrated and removed?

Fortunately, these problems are not difficult for Prometheus. In the end, we adopt Prometheus + kube-state-metrics + Alertmanager + Grafana architecture to do Kubernetes cluster monitoring. The specific architecture of the monitoring system is as follows

Insert picture description here

Using this architecture, the three problems mentioned above will no longer be a problem.

Detailed

K8s cluster:

The k8s cluster-1/-2/-3 is the cluster that needs to be monitored, which is the business cluster. A Prometheus is deployed inside each cluster, which is mainly composed of two parts: prometheus-server + kube-state-metrics.

prometheus-server: Use an account with RBAC permissions to collect existing monitoring information (in fact, obtained from cadvisor) and node information in the cluster.

kube-state-metrics: used here as an exporter of prometheus. Because prometheus cannot obtain the monitoring information of Deployment, Job, CronJob in the cluster.
When deploying kube-state-metrics, svc must bring an annotation: prometheus.io/scrape:'true' (This is very important

Monitoring summary

The monitoring summary is actually a Prometheus-server, which is used to summarize and manage the monitoring data scattered in various places.

The core idea is to use Prometheus' federation mechanism to pull data from other clusters. In this way, the prometheus of other clusters only need to store data for a short period of time, and then store it for a long time after aggregation; at the same time, it can also perform alarm judgment and data display in a unified manner.

Prometheus official Federation example

- job_name: 'federate'
  scrape_interval: 15s

  honor_labels: true
  metrics_path: '/federate'

  params:
    'match[]':
      - '{job="prometheus"}'
      - '{__name__=~"prometheus_job:.*"}'

  static_configs:
    - targets:
      - 'source-prometheus-1:9090'
      - 'source-prometheus-2:9090'
      - 'source-prometheus-3:9090'

The Prometheus to which this configuration belongs will pull monitoring data from the /federate endpoints of the three Prometheus, source-prometheus-1 ~ 3. The match[] parameter specifies that only metrics with the job="prometheus tag, or metrics whose name starts with prometheus_job are pulled.

Display panel

The display panel is a Grafana, which supports the use of Prometheus as a data source for graphical display.

Alarm handling

The alarm is using the Altermanager module officially provided by Prometheus. The Alermanager module receives alarm information from Prometheus-Server, and then performs operations such as summarizing, shielding, and alarming. Alertmanager supports email, wechat, webhook, slack, etc., which are very rich. But here is the Send_msg module developed by itself.

Message sending

The self-developed message sending module integrates email, WeChat, DingTalk, SMS and other methods. In fact, not only will messages be sent when an alarm is issued, but message sending will also be used in other places.

After the monitoring architecture is clear, the next step is to implement the monitoring process. For specific implementation steps, please see the second article of the "Prometheus Series".

End

This article is also the first in the series of "Perfectly monitor kubernetes cluster using prometheus". If you don't understand the article, please feel free to leave a message in the background.

Guess you like

Origin blog.csdn.net/Free_time_/article/details/108595101