【prometheus】-05 Kubernetes云原生监控之节点性能监控

【prometheus】-04 轻松搞定Prometheus Eureka服务发现

2021-08-25

【prometheus】-03 轻松搞定Prometheus文件服务发现

2021-08-23

【prometheus】-02 一张图彻底搞懂Prometheus服务发现机制

2021-08-18

【prometheus】- 01 云原生时代的监控系统入门

2021-08-16

Kubernetes云原生监控之节点性能监控

概述

Prometheus最开始设计是一个面向云原生应用程序的开源的监控&报警工具,在对 Kubernetes服务发现协议分析之前,我们先来梳理下 Prometheus 如何接入云原生,实现对 Kubernetes 集群进行监控。

Kubernetes 云原生集群监控主要涉及到如下三类指标:node 物理节点指标、pod & container 容器资源指标和Kubernetes 云原生集群资源指标。针对这三类指标都有比较成熟的方案,见下图:

环境信息

本人搭建的 Kubernetes 集群环境如下图,后续都是基于该集群演示:

node-exporter部署

物理节点性能监控一般是通过node_exporter来获取,node_exporterPrometheus 官网提供的用于采集服务器节点的各种运行指标,目前node_exporter 支持几乎所有常见的监控点。

Kubernetes 云原生集群上,我们可以通过 DaemonSet 控制器来部署该服务,这样云原生集群下每一个节点都会自动运行一个这样的 Pod,如果我们从集群中删除或者添加节点后,也会进行自动扩展。

1、创建 DaemonSet 控制器的编排文件node-exporter-daemonset.yaml:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
      name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

2、通过DaemonSet控制器创建Pod:

kubectl create -f  node-exporter-daemonset.yaml

3、查看Pod是否运行正常:

[root@master k8s-demo]# kubectl get pod -n kube-system -owide
NAME                                       READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
calico-kube-controllers-6c89d944d5-hg47n   1/1     Running   0          15d     10.100.219.68    master   <none>           <none>
calico-node-247w2                          1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
calico-node-pt848                          1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
calico-node-z65m2                          1/1     Running   0          15d     192.168.52.153   node2    <none>           <none>
coredns-59c898cd69-f9858                   1/1     Running   0          15d     10.100.219.65    master   <none>           <none>
coredns-59c898cd69-ghbdg                   1/1     Running   0          15d     10.100.219.66    master   <none>           <none>
etcd-master                                1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
kube-apiserver-master                      1/1     Running   1          15d     192.168.52.151   master   <none>           <none>
kube-controller-manager-master             1/1     Running   10         15d     192.168.52.151   master   <none>           <none>
kube-proxy-5thg7                           1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
kube-proxy-659zl                           1/1     Running   0          15d     192.168.52.153   node2    <none>           <none>
kube-proxy-p2vvz                           1/1     Running   0          15d     192.168.52.151   master   <none>           <none>
kube-scheduler-master                      1/1     Running   9          15d     192.168.52.151   master   <none>           <none>
kube-state-metrics-5f84848c58-v7v9z        1/1     Running   0          15d     10.100.166.135   node1    <none>           <none>
kuboard-74c645f5df-zzwnm                   1/1     Running   0          15d     10.100.104.2     node2    <none>           <none>
metrics-server-7dbf6c4558-qhjw4            1/1     Running   0          15d     192.168.52.152   node1    <none>           <none>
node-exporter-57djg                        1/1     Running   0          3m13s   192.168.52.152   node1    <none>           <none>
node-exporter-5kcnx                        1/1     Running   0          3m13s   192.168.52.151   master   <none>           <none>
node-exporter-cz45t                        1/1     Running   0          3m13s   192.168.52.153   node2    <none>           <none>

如上node-exporter-xxx格式的共创建三个Pod分别位于集群的三个节点上,且状态Running,我们可以通过如下链接分别获取到三个节点的性能指标:

curl http://192.168.52.151:9100/metrics

curl http://192.168.52.152:9100/metrics

curl http://192.168.52.153:9100/metrics

token创建

node-exporterKubernetes云原生集群上部署完成,依赖Kubernetes服务发现机制将node-exporter接入Prometheus之时,和Kubernetes交互需要token认证。

1、定义ServiceAccountp8s_sa.yaml信息如下:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system 

2、创建ServiceAccount

[root@master k8s-demo]# kubectl apply -f p8s_sa.yaml 
serviceaccount/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
clusterrole.rbac.authorization.k8s.io/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

3、查看ServiceAccount信息:

[root@master k8s-demo]# kubectl get sa prometheus -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"prometheus","namespace":"kube-system"}}
  creationTimestamp: "2021-07-21T04:47:10Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:secrets:
        .: {}
        k:{"name":"prometheus-token-6hln9"}:
          .: {}
          f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-07-21T04:47:10Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-07-21T04:47:10Z"
  name: prometheus
  namespace: kube-system
  resourceVersion: "113843"
  selfLink: /api/v1/namespaces/kube-system/serviceaccounts/prometheus
  uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104
secrets:
- name: prometheus-token-6hln9

4、根据secrets.name获取秘钥:

[root@master k8s-demo]# kubectl describe secret prometheus-token-6hln9 -n kube-system
Name:         prometheus-token-6hln9
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: prometheus
              kubernetes.io/service-account.uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1066 bytes
namespace:  11 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g

5、将toke保存到token.k8s文件中

eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g

Prometheus接入

下面我们就通过Kubernetes服务发现机制将上面部署完成的node-exporter接入到Prometheus,将node-exporter的性能指标抓取过来。

1、prometheus.yml配置中添加抓取job任务:

  - job_name: kubernetes-nodes
    kubernetes_sd_configs:
    - role: node
      api_server: https://apiserver.simon:6443
      bearer_token_file: /tools/token.k8s 
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /tools//token.k8s
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):10250'
      replacement: '${1}:9100'
      target_label: __address__
      action: replace
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)

备注:

a、bearer_token_file就是设置上步初始化好的token信息文件;

b、api_server可以查看/root/.kube/config文件:


c、并在/etc/hosts文件中配置:

192.168.52.151    apiserver.simon

2、检查是否接入成功:


Prometheus UI 界面查看到云原生集群的三个节点 target 接入进来,通过检索节点内存指标发现三个节点的性能监控指标确实都被 Prometheus 抓取成功:


grafana部署

node-exporter在云原生集群部署完成,且已经接入到Prometheus并成功抓取到node的性能监控指标,现在就来做最后一步:部署Grafana并通过PromQLPrometheus检索性能指标数据进行展示。

1、在搭建Grafana之前,我们需要安装nfs用于创建PVPVC

master上执行:

# 在master上安装nfs服务
[root@master ~]# yum install nfs-utils -y

# 准备一个共享目录
[root@master ~]# mkdir /data/k8s -pv

# 将共享目录以读写权限暴露给网段中的所有主机
[root@master ~]# vi /etc/exports
[root@master ~]# more /etc/exports
/data/k8s *(rw,no_root_squash,no_all_squash,sync)

# 启动nfs服务
[root@master ~]# systemctl start nfs

接下来,要在云原生集群上的每个node节点上都安装nfs,这样的目的是为了node节点可以驱动nfs设备:

# 在node上安装nfs服务,注意不需要启动
[root@master ~]# yum install nfs-utils -y

2、创建PVPVC编排文件grafana-volume.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana
spec:
  capacity:
    storage: 512Mi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    server: 192.168.52.151
    path: /data/k8s
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana
  namespace: kube-system
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 512Mi

3、查看PVPVC


4、Deployment控制器创建Grafana Pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: grafana
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - mountPath: /var/lib/grafana
          subPath: grafana
          name: storage
      securityContext:
        fsGroup: 0
        runAsUser: 0
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: grafana

两个比较重要的环境变量GF_SECURITY_ADMIN_USERGF_SECURITY_ADMIN_PASSWORD,用来配置 grafana 的管理员用户和密码的,由于 grafanadashboard插件这些数据保存在/var/lib/grafana这个目录下面的,所以我们这里如果需要做数据持久化的话,就需要针对这个目录进行 volume 挂载声明,其他的和我们之前的 Deployment 没什么区别。

5、创建Service

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  type: NodePort
  ports:
    - port: 3000
  selector:
    app: grafana

6、检查Service

[root@master k8s-demo]# kubectl get svc -n kube-system -owide
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
grafana              NodePort    10.96.80.215   <none>        3000:30441/TCP           2m32s   app=grafana
kube-dns             ClusterIP   10.96.0.10     <none>        53/UDP,53/TCP,9153/TCP   16d     k8s-app=kube-dns
kube-state-metrics   ClusterIP   None           <none>        8080/TCP,8081/TCP        15d     app.kubernetes.io/name=kube-state-metrics
kuboard              NodePort    10.96.52.49    <none>        80:32567/TCP             15d     k8s.kuboard.cn/layer=monitor,k8s.kuboard.cn/name=kuboard
metrics-server       ClusterIP   10.96.162.56   <none>        443/TCP                  15d     k8s-app=metrics-server

Dashboard配置

1、Grafana部署完成,我们在浏览器输入集群中任意节点IP:30441,就会打开Grafana UI,使用admin/admin登陆:

2、创建Prometheus数据源:


3、导入8919 dashboardKubernetes云原生集群节点性能监控指标就展示到模板上,如下图:


猜你喜欢

转载自blog.csdn.net/god_86/article/details/120008904