【prometheus】-04 轻松搞定Prometheus Eureka服务发现

2021-08-23

【prometheus】-02 一张图彻底搞懂Prometheus服务发现机制

2021-08-18

【prometheus】- 01 云原生时代的监控系统入门

2021-08-16

Kubernetes云原生监控之节点性能监控

概述

Prometheus最开始设计是一个面向云原生应用程序的开源的监控&报警工具，在对 Kubernetes服务发现协议分析之前，我们先来梳理下 Prometheus 如何接入云原生，实现对 Kubernetes 集群进行监控。

Kubernetes 云原生集群监控主要涉及到如下三类指标：node 物理节点指标、pod & container 容器资源指标和Kubernetes 云原生集群资源指标。针对这三类指标都有比较成熟的方案，见下图：

环境信息

本人搭建的 Kubernetes 集群环境如下图，后续都是基于该集群演示：

node-exporter部署

物理节点性能监控一般是通过node_exporter来获取，node_exporter 是 Prometheus 官网提供的用于采集服务器节点的各种运行指标，目前node_exporter 支持几乎所有常见的监控点。

在 Kubernetes 云原生集群上，我们可以通过 DaemonSet 控制器来部署该服务，这样云原生集群下每一个节点都会自动运行一个这样的 Pod，如果我们从集群中删除或者添加节点后，也会进行自动扩展。

1、创建 DaemonSet 控制器的编排文件node-exporter-daemonset.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
      name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

2、通过DaemonSet控制器创建Pod:

kubectl create -f  node-exporter-daemonset.yaml

3、查看Pod是否运行正常：

[root@master k8s-demo]# kubectl get pod -n kube-system -owide
NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
calico-kube-controllers-6c89d944d5-hg47n   1/1     Running   0          15d     10.100.219.68    master   <none>           <none>
calico-node-247w2                          1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
calico-node-pt848                          1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
calico-node-z65m2                          1/1     Running   0          15d     192.168.52.153   node2    <none>           <none>
coredns-59c898cd69-f9858                   1/1     Running   0          15d     10.100.219.65    master   <none>           <none>
coredns-59c898cd69-ghbdg                   1/1     Running   0          15d     10.100.219.66    master   <none>           <none>
etcd-master                                1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
kube-apiserver-master                      1/1     Running   1          15d     192.168.52.151   master   <none>           <none>
kube-controller-manager-master             1/1     Running   10         15d     192.168.52.151   master   <none>           <none>
kube-proxy-5thg7                           1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
kube-proxy-659zl                           1/1     Running   0          15d     192.168.52.153   node2    <none>           <none>
kube-proxy-p2vvz                           1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
kube-scheduler-master                      1/1     Running   9          15d     192.168.52.151   master   <none>           <none>
kube-state-metrics-5f84848c58-v7v9z        1/1     Running   0          15d     10.100.166.135   node1    <none>           <none>
kuboard-74c645f5df-zzwnm                   1/1     Running   0          15d     10.100.104.2     node2    <none>           <none>
metrics-server-7dbf6c4558-qhjw4            1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
node-exporter-57djg                        1/1     Running   0          3m13s   192.168.52.152   node1    <none>           <none>
node-exporter-5kcnx                        1/1     Running   0          3m13s   192.168.52.151   master   <none>           <none>
node-exporter-cz45t                        1/1     Running   0          3m13s   192.168.52.153   node2    <none>           <none>

如上node-exporter-xxx格式的共创建三个Pod分别位于集群的三个节点上，且状态Running，我们可以通过如下链接分别获取到三个节点的性能指标：

curl http://192.168.52.151:9100/metrics

curl http://192.168.52.152:9100/metrics

curl http://192.168.52.153:9100/metrics

token创建

node-exporter在Kubernetes云原生集群上部署完成，依赖Kubernetes服务发现机制将node-exporter接入Prometheus之时，和Kubernetes交互需要token认证。

1、定义ServiceAccount，p8s_sa.yaml信息如下：

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system

2、创建ServiceAccount：

[root@master k8s-demo]# kubectl apply -f p8s_sa.yaml 
serviceaccount/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
clusterrole.rbac.authorization.k8s.io/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

3、查看ServiceAccount信息：

[root@master k8s-demo]# kubectl get sa prometheus -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"prometheus","namespace":"kube-system"}}
  creationTimestamp: "2021-07-21T04:47:10Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:secrets:
        .: {}
        k:{"name":"prometheus-token-6hln9"}:
          .: {}
          f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-07-21T04:47:10Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-07-21T04:47:10Z"
  name: prometheus
  namespace: kube-system
  resourceVersion: "113843"
  selfLink: /api/v1/namespaces/kube-system/serviceaccounts/prometheus
  uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104
secrets:
- name: prometheus-token-6hln9

4、根据secrets.name获取秘钥：

[root@master k8s-demo]# kubectl describe secret prometheus-token-6hln9 -n kube-system
Name:         prometheus-token-6hln9
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: prometheus
              kubernetes.io/service-account.uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1066 bytes
namespace:  11 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g

5、将toke保存到token.k8s文件中

eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g

Prometheus接入

下面我们就通过Kubernetes服务发现机制将上面部署完成的node-exporter接入到Prometheus，将node-exporter的性能指标抓取过来。

1、prometheus.yml配置中添加抓取job任务：

  - job_name: kubernetes-nodes
    kubernetes_sd_configs:
    - role: node
      api_server: https://apiserver.simon:6443
      bearer_token_file: /tools/token.k8s 
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /tools//token.k8s
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):10250'
      replacement: '${1}:9100'
      target_label: __address__
      action: replace
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)

备注：

a、bearer_token_file就是设置上步初始化好的token信息文件；

b、api_server可以查看/root/.kube/config文件：

c、并在/etc/hosts文件中配置：

192.168.52.151    apiserver.simon

2、检查是否接入成功：

从 Prometheus UI 界面查看到云原生集群的三个节点 target 接入进来，通过检索节点内存指标发现三个节点的性能监控指标确实都被 Prometheus 抓取成功：

grafana部署

node-exporter在云原生集群部署完成，且已经接入到Prometheus并成功抓取到node的性能监控指标，现在就来做最后一步：部署Grafana并通过PromQL从Prometheus检索性能指标数据进行展示。

1、在搭建Grafana之前，我们需要安装nfs用于创建PV和PVC：

master上执行：

# 在master上安装nfs服务
[root@master ~]# yum install nfs-utils -y

# 准备一个共享目录
[root@master ~]# mkdir /data/k8s -pv

# 将共享目录以读写权限暴露给网段中的所有主机
[root@master ~]# vi /etc/exports
[root@master ~]# more /etc/exports
/data/k8s *(rw,no_root_squash,no_all_squash,sync)

# 启动nfs服务
[root@master ~]# systemctl start nfs

接下来，要在云原生集群上的每个node节点上都安装nfs，这样的目的是为了node节点可以驱动nfs设备：

# 在node上安装nfs服务，注意不需要启动
[root@master ~]# yum install nfs-utils -y

2、创建PV和PVC编排文件grafana-volume.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana
spec:
  capacity:
    storage: 512Mi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    server: 192.168.52.151
    path: /data/k8s
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana
  namespace: kube-system
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 512Mi

3、查看PV和PVC：

4、Deployment控制器创建Grafana Pod：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: grafana
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - mountPath: /var/lib/grafana
          subPath: grafana
          name: storage
      securityContext:
        fsGroup: 0
        runAsUser: 0
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: grafana

两个比较重要的环境变量GF_SECURITY_ADMIN_USER和GF_SECURITY_ADMIN_PASSWORD，用来配置 grafana 的管理员用户和密码的，由于 grafana 将 dashboard插件这些数据保存在/var/lib/grafana这个目录下面的，所以我们这里如果需要做数据持久化的话，就需要针对这个目录进行 volume 挂载声明，其他的和我们之前的 Deployment 没什么区别。

5、创建Service：

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  type: NodePort
  ports:
    - port: 3000
  selector:
    app: grafana

6、检查Service：

[root@master k8s-demo]# kubectl get svc -n kube-system -owide
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
grafana              NodePort    10.96.80.215   <none>        3000:30441/TCP           2m32s   app=grafana
kube-dns             ClusterIP   10.96.0.10     <none>        53/UDP,53/TCP,9153/TCP   16d     k8s-app=kube-dns
kube-state-metrics   ClusterIP   None           <none>        8080/TCP,8081/TCP        15d     app.kubernetes.io/name=kube-state-metrics
kuboard              NodePort    10.96.52.49    <none>        80:32567/TCP             15d     k8s.kuboard.cn/layer=monitor,k8s.kuboard.cn/name=kuboard
metrics-server       ClusterIP   10.96.162.56   <none>        443/TCP                  15d     k8s-app=metrics-server

Dashboard配置

1、Grafana部署完成，我们在浏览器输入集群中任意节点IP:30441，就会打开Grafana UI，使用admin/admin登陆：

2、创建Prometheus数据源：

3、导入8919 dashboard，Kubernetes云原生集群节点性能监控指标就展示到模板上，如下图：

【prometheus】-05 Kubernetes云原生监控之节点性能监控