k8s combat the deployment of Prometheus + Grafana visual alarm monitoring platform

EDITORIAL

Prior to the deployment of web sites, architecture diagram there is a link monitoring section, and set up an effective monitoring platform is very important for the operation and maintenance, the only way to ensure a more efficient and stable operation of our servers and services, common open source monitoring software there are several, such as zabbix, Nagios, open-flcon there prometheus, each have their own advantages and disadvantages, thanks to children's shoes can own Baidu, but monitoring and k8s cluster, only more friendly in relation to the is Prometheus, today we'll look at how to deploy a comprehensive monitoring K8S Prometheus

main content

  • 1.Prometheus architecture

  • 2.K8S monitoring indicators and realization of ideas

  • 3. Prometheus deployed in K8S platform

  • 4. K8S based configuration service discovery resolution

  • 5. Deploy Grafana in K8S platform

  • 6. Monitoring K8S cluster Pod, Node, resource object

  • 7. Grafana visual display monitor data Prometheus

  • 8. alarms and alarm notification rule

1 Prometheus architecture

What is Prometheus

Prometheus (Prometheus) is a monitoring system was originally built on SoundCloud. Since 2012, the community became open source project, has a very active community of developers and users. To emphasize the open source and independent maintenance, Prometheus added cloud cloud computing native Foundation (CNCF) in 2016, becoming the second Kubernetes managed projects.
Official website address:
https://prometheus.io
https://github.com/prometheus

Prometheus composition and architecture

  • Prometheus Server: the collection of indicators and time series data storage and provides a query interface
  • ClientLibrary: client library
  • Push Gateway: short-term memory indicator data. Mainly used for temporary tasks
  • Exporters: acquisition of existing third-party service monitoring indicators and metrics exposed
  • Alertmanager: Alarm
  • Web UI: simple Web console

    Data Model

    Prometheus store all the data for the time series; and with the same metric name tag belong to the same index.
    Each time series by metric name and a key-value pair (also known as labels) uniquely identified.
    Time series format:

{

And examples of work

Examples: Example called target can grab (the Instances)
operations: a set of instances of the same object called a job (the Job)
scrape_configs:
-job_name: 'Prometheus'
static_configs:
-targets: [ 'localhost: 9090']
-job_name: 'Node'
static_configs:
-targets: [ '192.168.1.10:9090']

2 K8S monitoring indicators and realization of ideas

k8S monitoring indicators

Kubernetes monitor itself

  • Node resource utilization
  • Number Node
  • Pods number (Node)
  • Resource object status

Pod monitoring

  • Number of Pod (Project)
  • Container resource utilization
  • application

    Prometheus monitoring K8S architecture

Monitoring indicators Implementation For example
Pod performance cAdvisor CPU container
Node Performance CPU node node-exporter Memory Usage
K8S resource object kube-state-metrics Pod/Deployment/Service

Service Discovery:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

3 Prometheus deployed in K8S platform

3.1 cluster environment

ip address Roles Remark
192.168.73.136 nfs
192.168.73.138 k8s-master
192.168.73.139 k8s-node01
192.168.73.140 k8s-node02
192.168.73.135 k8s-node03

3.2 Project Address:

[root@k8s-master src]# git clone https://github.com/zhangdongdong7/k8s-prometheus.git
Cloning into 'k8s-prometheus'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
[root@k8s-master src]# cd k8s-prometheus/
[root@k8s-master k8s-prometheus]# ls
alertmanager-configmap.yaml         kube-state-metrics-rbac.yaml             prometheus-rbac.yaml
alertmanager-deployment.yaml        kube-state-metrics-service.yaml          prometheus-rules.yaml
alertmanager-pvc.yaml               node_exporter-0.17.0.linux-amd64.tar.gz  prometheus-service.yaml
alertmanager-service.yaml           node_exporter.sh                         prometheus-statefulset-static-pv.yaml
grafana.yaml                        OWNERS                                   prometheus-statefulset.yaml
kube-state-metrics-deployment.yaml  prometheus-configmap.yaml                README.md

3.3 Authorization

RBAC (Role-Based Access Control, Role-Based Access Control): responsible for completing the authorization (Authorization) work.
Written authorization yaml

[root@k8s-master prometheus-k8s]# vim prometheus-rbac.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/metrics
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
"prometheus-rbac.yaml" 55L, 1080C                                                                                 1,1           Top
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
                                                              1,1           Top
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/metrics
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rbac.yaml

3.4 Configuration Management

Use Configmap need not be stored encrypted configuration
wherein the ip address nodes need to be modified according to its own address

[root@k8s-master prometheus-k8s]# vim prometheus-configmap.yaml 
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap #
metadata:
  name: prometheus-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  prometheus.yml: |
    rule_files:
    - /etc/config/rules/*.rules

    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090

    - job_name: kubernetes-nodes
      scrape_interval: 30s
      static_configs:
      - targets:
        - 192.168.73.135:9100
        - 192.168.73.138:9100
        - 192.168.73.139:9100
        - 192.168.73.140:9100

    - job_name: kubernetes-apiservers
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: default;kubernetes;https
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_service_name
        - __meta_kubernetes_endpoint_port_name
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    - job_name: kubernetes-nodes-kubelet
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    - job_name: kubernetes-nodes-cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __metrics_path__
        replacement: /metrics/cadvisor
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    - job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name

    - job_name: kubernetes-services
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module:
        - http_2xx
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
      - source_labels:
        - __address__
        target_label: __param_target
      - replacement: blackbox
        target_label: __address__
      - source_labels:
        - __param_target
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - source_labels:
        - __meta_kubernetes_service_name
        target_label: kubernetes_name

    - job_name: kubernetes-pods
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
    alerting:
      alertmanagers:
      - static_configs:
          - targets: ["alertmanager:80"]

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-configmap.yaml

3.5 stateful deployment prometheus

As used herein storageclass dynamic supply, go to the persistent data prometheus the specific implementation measures, before you can view the article "NFS dynamic storage in the supply k8s", in addition to using a static supply of prometheus-statefulset-static-pv .yaml for persistence

[root@k8s-master prometheus-k8s]# vim prometheus-statefulset.yaml 
apiVersion: apps/v1
kind: StatefulSet 
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    k8s-app: prometheus
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v2.2.1
spec:
  serviceName: "prometheus"
  replicas: 1
  podManagementPolicy: "Parallel"
  updateStrategy:
   type: "RollingUpdate"
  selector:
    matchLabels:
      k8s-app: prometheus
  template:
    metadata:
      labels:
        k8s-app: prometheus
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: prometheus
      initContainers:
      - name: "init-chown-data"
        image: "busybox:latest"
        imagePullPolicy: "IfNotPresent"
        command: ["chown", "-R", "65534:65534", "/data"]
        volumeMounts:
        - name: prometheus-data
          mountPath: /data
          subPath: ""
      containers:
        - name: prometheus-server-configmap-reload
          image: "jimmidyson/configmap-reload:v0.1"
          imagePullPolicy: "IfNotPresent"
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://localhost:9090/-/reload
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
              readOnly: true
          resources:
            limits:
              cpu: 10m
              memory: 10Mi
            requests:
              cpu: 10m
              memory: 10Mi

        - name: prometheus-server
          image: "prom/prometheus:v2.2.1"
          imagePullPolicy: "IfNotPresent"
          args:
            - --config.file=/etc/config/prometheus.yml
            - --storage.tsdb.path=/data
            - --web.console.libraries=/etc/prometheus/console_libraries
            - --web.console.templates=/etc/prometheus/consoles
            - --web.enable-lifecycle
          ports:
            - containerPort: 9090
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9090
            initialDelaySeconds: 30
            timeoutSeconds: 30
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
            initialDelaySeconds: 30
            timeoutSeconds: 30
          # based on 10 running nodes with 30 pods each
          resources:
            limits:
              cpu: 200m
              memory: 1000Mi
            requests:
              cpu: 200m
              memory: 1000Mi

          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: prometheus-data
              mountPath: /data
              subPath: ""
            - name: prometheus-rules
              mountPath: /etc/config/rules

      terminationGracePeriodSeconds: 300
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: prometheus-rules
          configMap:
            name: prometheus-rules

  volumeClaimTemplates:
  - metadata:
      name: prometheus-data
    spec:
      storageClassName: managed-nfs-storage
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: "16Gi"

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-statefulset.yaml

Check Status

[root@k8s-master prometheus-k8s]# kubectl get pod -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
alertmanager-5d75d5688f-fmlq6           2/2     Running   0          8d
coredns-5bd5f9dbd9-wv45t                1/1     Running   1          8d
grafana-0                               1/1     Running   2          14d
kube-state-metrics-7c76bdbf68-kqqgd     2/2     Running   6          13d
kubernetes-dashboard-7d77666777-d5ng4   1/1     Running   5          14d
prometheus-0                            2/2     Running   6          14d

You can see a pod prometheus-0, which just used statefulset controller has deployed state, state Runing is normal, error details if you do not use kubectl describe pod prometheus-0 -n kube-system view is Runing

3.6 Create a service exposed Access Port

As used herein, an access port nodePort fixed, easy to remember

[root@k8s-master prometheus-k8s]# vim prometheus-service.yaml 
kind: Service
apiVersion: v1
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    kubernetes.io/name: "Prometheus"
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  type: NodePort
  ports:
    - name: http
      port: 9090
      protocol: TCP
      targetPort: 9090
      nodePort: 30090
  selector:
    k8s-app: prometheus

create

[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-service.yaml

an examination

[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system                   
NAME                                        READY   STATUS    RESTARTS   AGE
pod/coredns-5bd5f9dbd9-wv45t                1/1     Running   1          8d
pod/kubernetes-dashboard-7d77666777-d5ng4   1/1     Running   5          14d
pod/prometheus-0                            2/2     Running   6          14d

NAME                           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/kube-dns               ClusterIP   10.0.0.2     <none>        53/UDP,53/TCP       13d
service/kubernetes-dashboard   NodePort    10.0.0.127   <none>        443:30001/TCP       16d
service/prometheus             NodePort    10.0.0.33    <none>        9090:30090/TCP      13d

3.7 web access

Plus a NodeIP using any access port, the access address: http: // NodeIP: Port, this embodiment is: http: //192.168.73.139: 30090
access interfaces shown in FIG successful:

4 Grafana deployment platform in K8S

The above web access, we can see prometheus own UI interface is not much function, visual display of the function is not perfect, can not meet the required daily monitoring, so often we need to combine Prometheus + Grafana way to visualize the data show
official website address:
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus
https://grafana.com/grafana/download
just downloaded programs already written yaml Grafana of , modify according to their own environment

4.1 StatefulSet deployment grafana

[root@k8s-master prometheus-k8s]# vim grafana.yaml 
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: grafana
  namespace: kube-system
spec:
  serviceName: "grafana"
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        ports:
          - containerPort: 3000
            protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
          - name: grafana-data
            mountPath: /var/lib/grafana
            subPath: grafana
      securityContext:
        fsGroup: 472
        runAsUser: 472
  volumeClaimTemplates:
  - metadata:
      name: grafana-data
    spec:
      storageClassName: managed-nfs-storage
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: "1Gi"

---

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - port : 80
    targetPort: 3000
    nodePort: 30091
  selector:
    app: grafana

4.2 Grafana of web access

Use any of the NodeIP plus port access, access address: http: // NodeIP: Port, this case is: http: //192.168.73.139: 30091
successfully access the interface is as follows, will require the account password, the default account passwords are admin, will change your password after landing

After landing the interface as follows

The first step requires the data source is added, click create your first data source database icon, can be added according to the diagram shown in FIG.

The second step, after the addition finished click on the bottom of the green Save & Test, will succeed prompted Data sourse is working, then the data source added successfully

4.3 Monitoring K8S cluster Pod, Node, resource object

Pod
kubelet metrics cAdvisor node using the interfaces provided access to relevant data of the performance index for all nodes Pod and containers.
Exposed interface address:
HTTPS: // NodeIP: 10255 / metrics / cadvisor
HTTPS: // NodeIP: 10250 / metrics / cadvisor

Node
using node_exporter collector collected node resource utilization.
https://github.com/prometheus/node_exporter
use the document: https: //prometheus.io/docs/guides/node-exporter/
use node_exporter.sh scripts were deployed node_exporter collector on three servers, you do not need to be modified directly run the script

[root@k8s-master prometheus-k8s]# cat node_exporter.sh 
#!/bin/bash

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz

tar zxf node_exporter-0.17.0.linux-amd64.tar.gz
mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter

cat <<EOF >/usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
[root@k8s-master prometheus-k8s]# ./node_exporter.sh
  • Node_exporter detection process, is in effect
[root@k8s-master prometheus-k8s]# ps -ef|grep node_exporter
root       6227      1  0 Oct08 ?        00:06:43 /usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service
root     118269 117584  0 23:27 pts/0    00:00:00 grep --color=auto node_exporter

Resource object
kube-state-metrics collected status information k8s various resource objects, only need to master node deployment on the line
https://github.com/kubernetes/kube-state-metrics

  1. Creating rbac of yaml to authorize metrics
[root@k8s-master prometheus-k8s]# vim kube-state-metrics-rbac.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kube-state-metrics-resizer
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-rbac.yaml
  1. Write yaml Deployment and ConfigMap are metrics pod deployment, does not require modification
[root@k8s-master prometheus-k8s]# cat kube-state-metrics-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.3.0
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
      version: v1.3.0
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
        version: v1.3.0
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: lizhenliang/kube-state-metrics:v1.3.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: lizhenliang/addon-resizer:1.8.3
        resources:
          limits:
            cpu: 100m
            memory: 30Mi
          requests:
            cpu: 100m
            memory: 30Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        volumeMounts:
          - name: config-volume
            mountPath: /etc/config
        command:
          - /pod_nanny
          - --config-dir=/etc/config
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics
      volumes:
        - name: config-volume
          configMap:
            name: kube-state-metrics-config
---
# Config map for resource configuration.
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-state-metrics-config
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
data:
  NannyConfiguration: |-
    apiVersion: nannyconfig/v1alpha1
    kind: NannyConfiguration
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-deployment.yaml
  1. Service yaml written on metrics port exposed
[root@k8s-master prometheus-k8s]# cat kube-state-metrics-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "kube-state-metrics"
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-service.yaml
  1. Check the pod and svc state, you can see the normal operation of the pod / kube-state-metrics-7c76bdbf68-kqqgd and external exposure of the 8080 and 8081 ports
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system 
NAME                                        READY   STATUS    RESTARTS   AGE
pod/alertmanager-5d75d5688f-fmlq6           2/2     Running   0          9d
pod/coredns-5bd5f9dbd9-wv45t                1/1     Running   1          9d
pod/grafana-0                               1/1     Running   2          15d
pod/kube-state-metrics-7c76bdbf68-kqqgd     2/2     Running   6          14d
pod/kubernetes-dashboard-7d77666777-d5ng4   1/1     Running   5          16d
pod/prometheus-0                            2/2     Running   6          15d

NAME                           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/alertmanager           ClusterIP   10.0.0.207   <none>        80/TCP              13d
service/grafana                NodePort    10.0.0.74    <none>        80:30091/TCP        15d
service/kube-dns               ClusterIP   10.0.0.2     <none>        53/UDP,53/TCP       14d
service/kube-state-metrics     ClusterIP   10.0.0.194   <none>        8080/TCP,8081/TCP   14d
service/kubernetes-dashboard   NodePort    10.0.0.127   <none>        443:30001/TCP       17d
service/prometheus             NodePort    10.0.0.33    <none>        9090:30090/TCP      14d
[root@k8s-master prometheus-k8s]# 

5 using Grafana visual display monitor data Prometheus

Usually when using Prometheus to collect data we need to monitor K8S cluster Pod, Node, resource object, so we need to install the corresponding plug-in and resource acquisition provides api for data acquisition, in 4.3 we have configured, we can where the state of each collector in the use of Prometheus Staus UI menu interface in the Target, as shown:

Only when the state of each Target we are UP state, we can use its own interface to acquire data related to a monitored item, as shown:

From the above figure we can see Prometheus interface visual display than a single function, can not meet the demand, so we need to combine Grafana Prometheus visual display monitoring data, in the last chapter, we have successfully deployed Granfana, and therefore in need when using the dashboard and add Panel to display relevant monitoring design items, but in fact there are many mature template, we can directly use in the community which Granfana, and then to get the data according to their environment to modify the Panel's query
https: / /grafana.com/grafana/dashboards

Recommended template:

  • Cluster Resource Monitor: 3119

When adding a template if a Panel does not display data, you can click on the Edit Panel, PromQL query statement, and then to debug PromQL statement on whether Prometheus own interface can get to the value, after the last adjustment control interface as shown in Figure

  • Resource Status monitoring: 6417
    Similarly, the template is added to monitor the state of the resource, and then after adjustment shown in FIG monitoring interface, can obtain the status of various resources in k8s display monitor

  • Monitoring Node: 9276
    Similarly, the template is added to monitor the state of the resource, and then after adjustment shown in FIG monitoring interface, can obtain basic information on each node

6 deployed in Alertmanager in K8S

6.1 deployment Alertmanager

6.2 deployment alert

Email to send us to achieve the alarm information

  1. First of all need to prepare a sender mailbox, open stmp sending
  2. Use configmap store alarm rules, alarm rules yaml write files, and can be modified to add an alarm rule according to their actual situation, prometheus than zabbix trouble here, all alarm rules need to define themselves
[root@k8s-master prometheus-k8s]# vim prometheus-rules.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  namespace: kube-system
data:
  general.rules: |
    groups:
    - name: general.rules
      rules:
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: error
        annotations:
          summary: "Instance {{ $labels.instance }} 停止工作"
          description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
  node.rules: |
    groups:
    - name: node.rules
      rules:
      - alert: NodeFilesystemUsage
        expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
          description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})"

      - alert: NodeMemoryUsage
        expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 10
0 > 80
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} 内存使用率过高"
          description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})"

      - alert: NodeCPUUsage
        expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} CPU使用率过高"
          description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rules.yaml 
  1. Written warning configmap of yaml file deployment, increase alertmanager alarm configuration, configure email delivery address
[root@k8s-master prometheus-k8s]# vim alertmanager-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'mail.goldwind.com.cn:587'  #登陆邮件进行查看
      smtp_from: '[email protected]' #根据自己申请的发件邮箱进行配置
      smtp_auth_username: '[email protected]'
      smtp_auth_password: 'Dbadmin@123'

    receivers:
    - name: default-receiver
      email_configs:
      - to: "[email protected]"

    route:
      group_interval: 1m
      group_wait: 10s
      receiver: default-receiver
      repeat_interval: 1m
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-configmap.yaml
  1. Create a PVC for data persistence storage class used when installing with Prometheus my yaml file used for automatic feeding, need to modify according to their actual situation
[root@k8s-master prometheus-k8s]# vim alertmanager-pvc.yaml    
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
spec:
  storageClassName: managed-nfs-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: "2Gi"
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-pvc.yaml
  1. yaml write deployment to deploy the pod alertmanager
[root@k8s-master prometheus-k8s]# vim alertmanager-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    k8s-app: alertmanager
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v0.14.0
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: alertmanager
      version: v0.14.0
  template:
    metadata:
      labels:
        k8s-app: alertmanager
        version: v0.14.0
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      priorityClassName: system-cluster-critical
      containers:
        - name: prometheus-alertmanager
          image: "prom/alertmanager:v0.14.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --config.file=/etc/config/alertmanager.yml
            - --storage.path=/data
            - --web.external-url=/
          ports:
            - containerPort: 9093
          readinessProbe:
            httpGet:
              path: /#/status
              port: 9093
            initialDelaySeconds: 30
            timeoutSeconds: 30
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: storage-volume
              mountPath: "/data"
              subPath: ""
          resources:
            limits:
              cpu: 10m
              memory: 50Mi
            requests:
              cpu: 10m
              memory: 50Mi
        - name: prometheus-alertmanager-configmap-reload
          image: "jimmidyson/configmap-reload:v0.1"
          imagePullPolicy: "IfNotPresent"
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://localhost:9093/-/reload
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
              readOnly: true
          resources:
            limits:
              cpu: 10m
              memory: 10Mi
            requests:
              cpu: 10m
              memory: 10Mi
      volumes:
        - name: config-volume
          configMap:
            name: alertmanager-config
        - name: storage-volume
          persistentVolumeClaim:
            claimName: alertmanager
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-deployment.yaml
  1. Creating alertmanager of service of foreign exposed port
[root@k8s-master prometheus-k8s]# vim alertmanager-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Alertmanager"
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 9093
  selector:
    k8s-app: alertmanager
  type: "ClusterIP"
  [root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-service.yaml 
  1. Detecting deployment status can be found pod / alertmanager-5d75d5688f-fmlq6 and service / alertmanager normal operation
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system -o wide
NAME                                        READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
pod/alertmanager-5d75d5688f-fmlq6           2/2     Running   4          10d   172.17.15.2   192.168.73.140   <none>           <none>
pod/coredns-5bd5f9dbd9-qxvmz                1/1     Running   0          42m   172.17.33.2   192.168.73.138   <none>           <none>
pod/grafana-0                               1/1     Running   3          16d   172.17.31.2   192.168.73.139   <none>           <none>
pod/kube-state-metrics-7c76bdbf68-hv56m     2/2     Running   0          23h   172.17.15.3   192.168.73.140   <none>           <none>
pod/kubernetes-dashboard-7d77666777-d5ng4   1/1     Running   6          17d   172.17.31.4   192.168.73.139   <none>           <none>
pod/prometheus-0                            2/2     Running   8          16d   172.17.83.2   192.168.73.135   <none>           <none>

NAME                           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE   SELECTOR
service/alertmanager           ClusterIP   10.0.0.207   <none>        80/TCP              14d   k8s-app=alertmanager
service/grafana                NodePort    10.0.0.74    <none>        80:30091/TCP        16d   app=grafana
service/kube-dns               ClusterIP   10.0.0.2     <none>        53/UDP,53/TCP       42m   k8s-app=kube-dns
service/kube-state-metrics     ClusterIP   10.0.0.194   <none>        8080/TCP,8081/TCP   15d   k8s-app=kube-state-metrics
service/kubernetes-dashboard   NodePort    10.0.0.127   <none>        443:30001/TCP       18d   k8s-app=kubernetes-dashboard
service/prometheus             NodePort    10.0.0.33    <none>        9090:30090/TCP      15d   k8s-app=prometheus

6.3 Test alarm is sent

Guess you like

Origin www.cnblogs.com/double-dong/p/11910699.html
Recommended