k8s cluster monitoring cadvisor+prometheus+grafana deployment

Table of contents

1. Create a new namespace monitor

2. Deployment

2.1 deploy cadvisor

2.2 deploy node_exporter

2.3 deploy prometheus

2.4 Deploy rbac permissions

2.5. Deploy metrics

2.6 deploy grafana

3. Test monitoring effect


Reference article:

k8s cluster deployment cadvisor+node-exporter+prometheus+grafana monitoring system - cyh00001 - 博客园

Preparation:

Cluster cluster node introduction:

master: 192.168.136.21 (the following steps are all performed on this node)

worker:192.168.136.22

worker:192.168.136.23

##vim indentation confusion, in colon mode, :set paste enters paste mode, :set nopaste exits paste mode (default). ##

1. Create a new namespace monitor

kubectl create ns monitor

Pull the cadvisor image, because the official image is in the Google image and cannot be accessed in China, I use someone else’s image here, just pull it directly, note that the image name is lagoudocker/cadvisor:v0.37.0.

docker pull lagoudocker/cadvisor:v0.37.0 

2. Deployment

Create a new /opt/cadvisor_prome_gra directory, there are many configuration files, so create a new directory separately.

2.1 deploy cadvisor

Deploy the DaemonSet resource of cadvisor. The DaemonSet resource can ensure that each node in the cluster runs the same set of pods, and even newly added nodes will automatically create corresponding pods.

 vim case1-daemonset-deploy-cadvisor.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: monitor
spec:
  selector:
    matchLabels:
      app: cAdvisor
  template:
    metadata:
      labels:
        app: cAdvisor
    spec:
      tolerations:    #污点容忍,忽略master的NoSchedule
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      hostNetwork: true
      restartPolicy: Always   # 重启策略
      containers:
      - name: cadvisor
        image: lagoudocker/cadvisor:v0.37.0
        imagePullPolicy: IfNotPresent  # 镜像策略
        ports:
        - containerPort: 8080
        volumeMounts:
          - name: root
            mountPath: /rootfs
          - name: run
            mountPath: /var/run
          - name: sys
            mountPath: /sys
          - name: docker
            mountPath: /var/lib/containerd
      volumes:
      - name: root
        hostPath:
          path: /
      - name: run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/lib/containerd

kubectl apply -f case1-daemonset-deploy-cadvisor.yaml

kubectl get pod -n monitor -owide query

Because there are three nodes, there will be three pods. If a worker node is added later, DaemonSet will automatically add it. 

test cadvisor  <masterIP>:<8080>

2.2 deploy node_exporter

Deploy the DaemonSet resource and Service resource of node-exporter.

vim case2-daemonset-deploy-node-exporter.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: prom/node-exporter:v1.3.1 
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: monitor
spec:
  type: NodePort
  ports:
  - name: http
    port: 9100
    nodePort: 39100
    protocol: TCP
  selector:
    k8s-app: node-exporter

 kubectl get pod -n monitor

 

 Verify node-exporter data, pay attention to port 9100, <nodeIP>:<9100>

2.3 deploy prometheus

Prometheus resources include ConfigMap resources, Deployment resources, and Service resources.

vim case3-1-prometheus-cfg.yaml

---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: monitor 
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_service_name

Note that the k8s-master in the case3-2 configuration file should be changed, and cannot be changed to the local host ip (the reason is unknown)

Set the 192.168.136.21 (k8s-master) node as the prometheus data storage path /data/prometheus.

vim case3-2-prometheus-deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitor
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: k8s-master
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.31.2
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --storage.tsdb.retention=720h
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus/prometheus.yml
          name: prometheus-config
          subPath: prometheus.yml
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
            items:
              - key: prometheus.yml
                path: prometheus.yml
                mode: 0644
        - name: prometheus-storage-volume
          hostPath:
           path: /data/prometheusdata
           type: Directory

Create sa and clusterrolebinding

kubectl create serviceaccount monitor -n monitor

kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor --clusterrole=cluster-admin --serviceaccount=monitor:monitor

kubectl apply -f case3-2-prometheus-deployment.yaml

 In case3-2, there is a big hole in this step. It is possible to use "k8s-master", but not to use "192.168.136.21"! Deployment and pod have been unable to get up. Checking the pod log shows that the "192.168.136.21" host cannot be found, and changing to "k8s-master" will not work. It will be fine after a few days, and the machine will be shut down during the period. (reason unknown)

 

vim case3-3-prometheus-svc.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitor
  labels:
    app: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
      protocol: TCP
  selector:
    app: prometheus
    component: server

kubectl apply -f case3-3-prometheus-svc.yaml

2.4 Deploy rbac permissions

Including Secret resources, ServiceAccount resources, ClusterRole resources, and ClusterRoleBinding resources, ServiceAccount is a service account, ClusterRole is a permission rule, and ClusterRoleBinding is to bind ServiceAccount and ClusterRole.

The authentication information of pod and apiserver is defined by secret. Since the authentication information is sensitive information, it needs to be stored in the secret resource and mounted to the Pod as a storage volume. In this way, the application running in the Pod can connect to the apiserver through the information in the corresponding secret and complete the authentication.

rbac permission management is a set of authentication system of k8s. The above is just a brief explanation. For in-depth understanding, you can browse: rbac authorization of k8s APIserver security mechanism_Stupid child@GF Knowledge and action blog-CSDN blog_Which file is k8s rbac written in

vim case4-prom-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor

---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: monitor-token
  namespace: monitor
  annotations:
    kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
#apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitor

kubectl apply -f case4-prom-rbac.yaml

2.5. Deploy metrics

Including Deployment resource, Service resource, ServiceAccount resource, ClusterRole resource, ClusterRoleBinding resource.

Note that it is deployed in kube-system!

vim case5-kube-state-metrics-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/kube-state-metrics:v2.6.0 
        ports:
        - containerPort: 8080

---
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources: ["daemonsets", "deployments", "replicasets"]
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources: ["cronjobs", "jobs"]
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  type: NodePort
  ports:
  - name: kube-state-metrics
    port: 8080
    targetPort: 8080
    nodePort: 31666
    protocol: TCP
  selector:
    app: kube-state-metrics

 kubectl apply -f case5-kube-state-metrics-deploy.yaml

2.6 deploy grafana

The grafana graphical interface is connected to the prometheus data source, including Deployment resources and Service resources.

vim grafana-enterprise.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-enterprise
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-enterprise
  template:
    metadata:
      labels:
        app: grafana-enterprise
    spec:
      containers:
      - image: grafana/grafana
        imagePullPolicy: Always
        #command:
        #  - "tail"
        #  - "-f"
        #  - "/dev/null"
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 0
        name: grafana
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: "/var/lib/grafana"
          name: data
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 2500Mi
      volumes:
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitor
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 3000
    nodePort: 31000
  selector:
    app: grafana-enterprise

kubectl apply -f grafana-enterprise.yaml

Account admin Password admin

Add the data source data sources, named prometheus, pay attention to the port number 30090 .

 Add template 13332, you can also add other templates, such as: 14981, 13824, 14518.

Click the "+" sign on the left and select "import" to import the template.

 Template 13332

The cadvisor template number is 14282. There is a bug here that has not been resolved. It can monitor the performance resources of all containers in the cluster, but if one of the containers is selected, the data cannot be displayed. (should be fixable).

 The ID of the pod is displayed now, which is not convenient for the administrator to browse. In order to display the name of the pod for convenience, select the "Settings icon" on the right side of the template, select "Variables", select the second one, and change "name" to "pod" That's it.

  Each section of the dashboard also needs to be changed. Click the section title, select "Edit", and change "name" to "pod".

3. Test monitoring effect

Create a new deployment task named nginx01 to test the monitoring results.

vim nginx01.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx01
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx01
  template:
    metadata:
      labels:
        app: nginx01
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9

 kubectl apply -f nginx01.yaml 

Two nginx01 appear because 2 replicas are set.

 So far, the deployment of cadvisor+prometheus+grafana cluster monitoring is completed.

Guess you like

Origin blog.csdn.net/weixin_48878440/article/details/128374447