Kubernetes cluster Helm Chart installs Prometheus&Grafana cluster monitoring solution

Install Prometheus&Grafana on Helm Chart

If you just want to know about the installation, please refer to the "kube-prometheus-stack V64.6.0 Installation" chapter.

Prometheus monitoring solution introduction

Prometheus is an open source system monitoring and alerting toolkit developed by SoundCloud. It is designed for reliable service discovery in dynamic environments and supports multiple data models. Prometheus is part of the Cloud Native Computing Foundation (CNCF) and fits nicely with Kubernetes. Prometheus provides a full set of service discovery, data collection, data analysis, data display, data alarms, and custom extension interfaces. Prometheus is currently the industry standard for monitoring solutions in the open source field of Kubernetes. Provides a large number of tools for monitoring and understanding systems in a Kubernetes environment. The following is the role and function of Prometheus in the field of Kubernetes monitoring:

  • Data model and query language : Prometheus uses a multidimensional data model and a powerful PromQL query language to perform complex queries and data analysis based on time series data.
  • Service Discovery : Prometheus automatically discovers new services and containers, making it useful in dynamic environments such as Kubernetes. For Kubernetes, Prometheus can discover new nodes, services, containers, etc.
  • Multiple data export formats : Prometheus supports multiple data export formats, allowing data to be collected from different systems and services.
  • Powerful Alerting System : Prometheus has a powerful alerting system that can send alerts to users based on defined rules. Alerts can be sent in a variety of ways, including email, Slack, PagerDuty, and more.
  • Grafana integration : Prometheus can be well integrated with Grafana, providing a powerful tool for visual monitoring data.
  • Efficient performance : Prometheus uses an efficient built-in storage mechanism that can handle large amounts of time series data on a single server.

Prometheus Architecture

insert image description here

  • Prometheus Server, the core of the monitoring and alarming platform, captures the target monitoring data, generates aggregated data, and stores time series data
  • exporter, provided by the monitored object, provides indicators of API leakage monitoring objects, for prometheus to capture the indicator data of node-exporter, blackbox-exporter, redis-exporter, custom-exporter and other objects
  • pushgateway, providing a gateway address, external data can be pushed to the gateway, and prometheus will also pull data from the gateway
  • Alertmanager, receives the alert sent by Prometheus and sends it to the specified target after a series of processing on the alert
  • Grafana: configure data sources, display data in icons

I will focus on the installation implementation based on Helm Chart. For more functional details, I suggest watching the following video:
Introduction Video 1
Introduction Video 2

1. kube-prometheus-stack V64.6.0 installation

Official website reference
The installation implementation described below will take some considerations from the perspective of actual production. The reason why StorageClass is needed is to ensure data persistence.

1.1 Prerequisites

a. Kubernetes version>=1.16

b. Helm version >=v3.2.0, for the installation of Helm, please refer to: Helm Install

c. A default StorageClass is required. For the specific preparation process, please refer to: Install StorageClass on Kubernetes

1.2 Installation process

a. Create the installation directory

#切换到当前用户根目录并建立monitor文件夹
cd ~ && mkdir monitor

cd monitor

b. Create a monitor namespace, an independent namespace helps resource management

kubectl create ns monitor

c. Add prometheus official repo warehouse

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

d. Download the kube-prometheus-stack V64.6.0 file. Downloading related files will help us better manage the project. The values.yaml file will be modified before installation

helm pull prometheus-community/kube-prometheus-stack --version 46.6.0

tar -xvf kube-prometheus-stack-46.6.0.tgz

cd kube-prometheus-stack

e. Modify the configuration in values.yaml. The configurable content of values.yaml of kube-prometheus-stack has more than 3000 lines. It is strongly recommended to import the contents of values.yaml into some graphic text editor for editing ( For example: vscode), and then introduce several commonly used modifications:

# 修改service的暴露类型为NodePort
全局搜索“type: ClusterIP”关键字,将其替换为“type: NodePort”
# 与其相关的位置大多会有Ingress的设置,如果需要将服务通过Ingress的方式暴露,可以进行相关的设置。

# 修改存储类型为StorageClass
全局搜索“storageClass”关键字,共有3处

1. Alertmanager 使用的,改为如下配置,注意 storageClassName 需要是集群中 storageClass 的实际名称
    ## Storage is the definition of how storage will be used by the Alertmanager instances.
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
    ##
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client
          accessModes: ['ReadWriteOnce']
          resources:
            requests:
              storage: 50Gi

2. Prometheus 持久化数据,改为如下配置,注意 storageClassName 需要是集群中 storageClass 的实际名称
    ## Prometheus StorageSpec for persistent data
    ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
    ##
    storageSpec:
      ## Using PersistentVolumeClaim
      ##
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client
          accessModes: ['ReadWriteOnce']
          resources:
            requests:
              storage: 50Gi

# 修改镜像地址
全局搜索“image:”关键字
# 有些镜像来自于quay.io,国内下载受限,需要修改为可以下载的地址(私有仓库或配置阿里镜像加速)

The content of the values.yaml file is too long to list in detail. If you are interested, you can view other parameter settings in my uploaded file values.yaml. Please refer to the official website instructions f. Helm Chart to perform the installation

helm install monitor ~/monitor/kube-prometheus-stack -n monitor

2. Functional verification

2.1 View related resources

kubectl get po,svc,pvc,secret -n monitor
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/alertmanager-monitor-kube-prometheus-st-alertmanager-0   2/2     Running   0          58m
pod/monitor-grafana-5c955b9765-2h7xd                         3/3     Running   0          58m
pod/monitor-kube-prometheus-st-operator-5bcdfff95c-mtdl2     1/1     Running   0          58m
pod/monitor-kube-state-metrics-6fd4d95d46-7tqfh              1/1     Running   0          58m
pod/monitor-prometheus-node-exporter-7hk2q                   1/1     Running   0          58m
pod/monitor-prometheus-node-exporter-h89ll                   1/1     Running   0          58m
pod/monitor-prometheus-node-exporter-jp9cp                   1/1     Running   0          58m
pod/monitor-prometheus-node-exporter-r7pd6                   1/1     Running   0          58m
pod/prometheus-monitor-kube-prometheus-st-prometheus-0       2/2     Running   0          58m

NAME                                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   58m
service/monitor-grafana                           NodePort    10.233.47.220   <none>        80:31721/TCP                 58m
service/monitor-kube-prometheus-st-alertmanager   NodePort    10.233.245.52   <none>        9093:30903/TCP               58m
service/monitor-kube-prometheus-st-operator       NodePort    10.233.99.28    <none>        443:30443/TCP                58m
service/monitor-kube-prometheus-st-prometheus     NodePort    10.233.61.216   <none>        9090:30090/TCP               58m
service/monitor-kube-state-metrics                ClusterIP   10.233.211.59   <none>        8080/TCP                     58m
service/monitor-prometheus-node-exporter          ClusterIP   10.233.82.51    <none>        9100/TCP                     58m
service/prometheus-operated                       ClusterIP   None            <none>        9090/TCP                     58m

NAME                                                                                                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/alertmanager-monitor-kube-prometheus-st-alertmanager-db-alertmanager-monitor-kube-prometheus-st-alertmanager-0   Bound    pvc-35ffe0d7-bdf9-4bfb-8a1d-a66f45e68b35   50Gi       RWO            nfs-client     58m
persistentvolumeclaim/prometheus-monitor-kube-prometheus-st-prometheus-db-prometheus-monitor-kube-prometheus-st-prometheus-0           Bound    pvc-9d4a2ab7-44b0-40e4-8248-03e26ff16f49   50Gi       RWO            nfs-client     58m

NAME                                                                       TYPE                                  DATA   AGE
secret/alertmanager-monitor-kube-prometheus-st-alertmanager                Opaque                                1      58m
secret/alertmanager-monitor-kube-prometheus-st-alertmanager-generated      Opaque                                1      58m
secret/alertmanager-monitor-kube-prometheus-st-alertmanager-tls-assets-0   Opaque                                0      58m
secret/alertmanager-monitor-kube-prometheus-st-alertmanager-web-config     Opaque                                1      58m
secret/default-token-vzn75                                                 kubernetes.io/service-account-token   3      112m
secret/monitor-grafana                                                     Opaque                                3      58m
secret/monitor-grafana-token-ttrxz                                         kubernetes.io/service-account-token   3      58m
secret/monitor-kube-prometheus-st-admission                                Opaque                                3      75m
secret/monitor-kube-prometheus-st-alertmanager-token-ff894                 kubernetes.io/service-account-token   3      58m
secret/monitor-kube-prometheus-st-operator-token-l5fxx                     kubernetes.io/service-account-token   3      58m
secret/monitor-kube-prometheus-st-prometheus-token-twvtr                   kubernetes.io/service-account-token   3      58m
secret/monitor-kube-state-metrics-token-rckng                              kubernetes.io/service-account-token   3      58m
secret/monitor-prometheus-node-exporter-token-t4xjq                        kubernetes.io/service-account-token   3      58m
secret/prometheus-monitor-kube-prometheus-st-prometheus                    Opaque                                1      58m
secret/prometheus-monitor-kube-prometheus-st-prometheus-tls-assets-0       Opaque                                1      58m
secret/prometheus-monitor-kube-prometheus-st-prometheus-web-config         Opaque                                1      58m
secret/sh.helm.release.v1.monitor.v1                                       helm.sh/release.v1                    1      59m

2.2 Modify the exposure mode of service monitor-grafana

The default monitor-grafana exposure method is ClusterIP (its content is not controlled by the values.yaml file)

kubectl edit svc monitor-grafana  -n monitor

# 修改其中spec.type内容为 NodePort

2.3 Get grafana interface username and password

The username and password are stored in secret/monitor-grafana

# 获取密码
kubectl -n monitor get secret monitor-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

# 获取用户名
kubectl -n monitor get secret monitor-grafana -o jsonpath="{.data.admin-user}" | base64 --decode ; echo

2.4 Log in to the Prometheus interface

Corresponding to the NodePort port of service monitor-kube-prometheus-st-prometheus. Enter the Target page:
insert image description here
you can see that key components such as api-server, coredns, kubelete, node-expoter have been added.

2.5 Log in to the Grafana interface

Need to use the NodePort port modified in step 2.2, the username and password obtained in step 2.3

a. Commonly used items have been added to the Plugins interface
insert image description here
b. Commonly used items have been added to the Dashboards interface
insert image description here
c. Check the resource status of pods in the cluster
insert image description here

Guess you like

Origin blog.csdn.net/weixin_46660849/article/details/131048678