【prometheus】-04 轻松搞定Prometheus Eureka服务发现
【prometheus】-03 轻松搞定Prometheus文件服务发现
【prometheus】-02 一张图彻底搞懂Prometheus服务发现机制
Kubernetes云原生监控之节点性能监控
概述
Prometheus
最开始设计是一个面向云原生应用程序的开源的监控&报警工具,在对 Kubernetes
服务发现协议分析之前,我们先来梳理下 Prometheus
如何接入云原生,实现对 Kubernetes
集群进行监控。
Kubernetes
云原生集群监控主要涉及到如下三类指标:node
物理节点指标、pod & container
容器资源指标和Kubernetes
云原生集群资源指标。针对这三类指标都有比较成熟的方案,见下图:
环境信息
本人搭建的 Kubernetes
集群环境如下图,后续都是基于该集群演示:
node-exporter部署
物理节点性能监控一般是通过node_exporter
来获取,node_exporter
是 Prometheus
官网提供的用于采集服务器节点的各种运行指标,目前node_exporter
支持几乎所有常见的监控点。
在 Kubernetes
云原生集群上,我们可以通过 DaemonSet
控制器来部署该服务,这样云原生集群下每一个节点都会自动运行一个这样的 Pod
,如果我们从集群中删除或者添加节点后,也会进行自动扩展。
1、创建 DaemonSet
控制器的编排文件node-exporter-daemonset.yaml
:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
name: node-exporter
spec:
selector:
matchLabels:
name: node-exporter
template:
metadata:
labels:
name: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: node-exporter
image: prom/node-exporter
ports:
- containerPort: 9100
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
2、通过DaemonSet
控制器创建Pod
:
kubectl create -f node-exporter-daemonset.yaml
3、查看Pod是否运行正常:
[root@master k8s-demo]# kubectl get pod -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-6c89d944d5-hg47n 1/1 Running 0 15d 10.100.219.68 master <none> <none>
calico-node-247w2 1/1 Running 0 15d 192.168.52.151 master <none> <none>
calico-node-pt848 1/1 Running 0 15d 192.168.52.152 node1 <none> <none>
calico-node-z65m2 1/1 Running 0 15d 192.168.52.153 node2 <none> <none>
coredns-59c898cd69-f9858 1/1 Running 0 15d 10.100.219.65 master <none> <none>
coredns-59c898cd69-ghbdg 1/1 Running 0 15d 10.100.219.66 master <none> <none>
etcd-master 1/1 Running 0 15d 192.168.52.151 master <none> <none>
kube-apiserver-master 1/1 Running 1 15d 192.168.52.151 master <none> <none>
kube-controller-manager-master 1/1 Running 10 15d 192.168.52.151 master <none> <none>
kube-proxy-5thg7 1/1 Running 0 15d 192.168.52.152 node1 <none> <none>
kube-proxy-659zl 1/1 Running 0 15d 192.168.52.153 node2 <none> <none>
kube-proxy-p2vvz 1/1 Running 0 15d 192.168.52.151 master <none> <none>
kube-scheduler-master 1/1 Running 9 15d 192.168.52.151 master <none> <none>
kube-state-metrics-5f84848c58-v7v9z 1/1 Running 0 15d 10.100.166.135 node1 <none> <none>
kuboard-74c645f5df-zzwnm 1/1 Running 0 15d 10.100.104.2 node2 <none> <none>
metrics-server-7dbf6c4558-qhjw4 1/1 Running 0 15d 192.168.52.152 node1 <none> <none>
node-exporter-57djg 1/1 Running 0 3m13s 192.168.52.152 node1 <none> <none>
node-exporter-5kcnx 1/1 Running 0 3m13s 192.168.52.151 master <none> <none>
node-exporter-cz45t 1/1 Running 0 3m13s 192.168.52.153 node2 <none> <none>
如上node-exporter-xxx
格式的共创建三个Pod
分别位于集群的三个节点上,且状态Running
,我们可以通过如下链接分别获取到三个节点的性能指标:
curl http://192.168.52.151:9100/metrics
curl http://192.168.52.152:9100/metrics
curl http://192.168.52.153:9100/metrics
token创建
node-exporter
在Kubernetes
云原生集群上部署完成,依赖Kubernetes
服务发现机制将node-exporter
接入Prometheus
之时,和Kubernetes
交互需要token
认证。
1、定义ServiceAccount
,p8s_sa.yaml
信息如下:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
2、创建ServiceAccount
:
[root@master k8s-demo]# kubectl apply -f p8s_sa.yaml
serviceaccount/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
clusterrole.rbac.authorization.k8s.io/prometheus created
Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
3、查看ServiceAccount
信息:
[root@master k8s-demo]# kubectl get sa prometheus -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"prometheus","namespace":"kube-system"}}
creationTimestamp: "2021-07-21T04:47:10Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:secrets:
.: {}
k:{"name":"prometheus-token-6hln9"}:
.: {}
f:name: {}
manager: kube-controller-manager
operation: Update
time: "2021-07-21T04:47:10Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-07-21T04:47:10Z"
name: prometheus
namespace: kube-system
resourceVersion: "113843"
selfLink: /api/v1/namespaces/kube-system/serviceaccounts/prometheus
uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104
secrets:
- name: prometheus-token-6hln9
4、根据secrets.name
获取秘钥:
[root@master k8s-demo]# kubectl describe secret prometheus-token-6hln9 -n kube-system
Name: prometheus-token-6hln9
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name: prometheus
kubernetes.io/service-account.uid: cbfe8330-de8f-40fd-a9b3-5aa312bb9104
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1066 bytes
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g
5、将toke
保存到token.k8s
文件中
eyJhbGciOiJSUzI1NiIsImtpZCI6Ikx6VHBOSXRwSmFCNmc2aXppS2tFeXFSTjlNVzJMNHhGX05fT3dLcXppSDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLTZobG45Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjYmZlODMzMC1kZThmLTQwZmQtYTliMy01YWEzMTJiYjkxMDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.my_sEOjhx4hxApeGRhZmpFwK7snRKuYDyjlToYzXZSytdefPugMiHP1lA0bkDvxPiS0Pces2_hSJlB0pRDacqAgipE2_hqIx2GUO6t35mfbTthB7k4wbf9rQT4lag9XUzjdInOEV3SF4nfCG1DcbSM8a9COSXJUXkshXfollPYj1AGvAmTVYSSmK_b898z64WsDk9JNMjyM7VrI-kj20fKVgc0Ngi4kV3XKqRkCuKIZXKudmuUaqthbeVhaOKWhXzfBW2wDaVsNzsHMLqzwp8vVRIfZbudQ9gVGVZoskgRYiyNoNJcLjbphdxRN1hhWoBTITKHHFyQhwZGzTBo_f6g
Prometheus接入
下面我们就通过Kubernetes
服务发现机制将上面部署完成的node-exporter
接入到Prometheus
,将node-exporter
的性能指标抓取过来。
1、prometheus.yml
配置中添加抓取job
任务:
- job_name: kubernetes-nodes
kubernetes_sd_configs:
- role: node
api_server: https://apiserver.simon:6443
bearer_token_file: /tools/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /tools//token.k8s
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
备注:
a、bearer_token_file
就是设置上步初始化好的token
信息文件;
b、api_server
可以查看/root/.kube/config
文件:
c、并在/etc/hosts文件中配置:
192.168.52.151 apiserver.simon
2、检查是否接入成功:
从 Prometheus UI
界面查看到云原生集群的三个节点 target
接入进来,通过检索节点内存指标发现三个节点的性能监控指标确实都被 Prometheus
抓取成功:
grafana部署
node-exporter
在云原生集群部署完成,且已经接入到Prometheus
并成功抓取到node
的性能监控指标,现在就来做最后一步:部署Grafana
并通过PromQL
从Prometheus
检索性能指标数据进行展示。
1、在搭建Grafana
之前,我们需要安装nfs
用于创建PV
和PVC
:
master上执行:
# 在master上安装nfs服务
[root@master ~]# yum install nfs-utils -y
# 准备一个共享目录
[root@master ~]# mkdir /data/k8s -pv
# 将共享目录以读写权限暴露给网段中的所有主机
[root@master ~]# vi /etc/exports
[root@master ~]# more /etc/exports
/data/k8s *(rw,no_root_squash,no_all_squash,sync)
# 启动nfs服务
[root@master ~]# systemctl start nfs
接下来,要在云原生集群上的每个node
节点上都安装nfs
,这样的目的是为了node
节点可以驱动nfs
设备:
# 在node上安装nfs服务,注意不需要启动
[root@master ~]# yum install nfs-utils -y
2、创建PV
和PVC
编排文件grafana-volume.yaml
:
apiVersion: v1
kind: PersistentVolume
metadata:
name: grafana
spec:
capacity:
storage: 512Mi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
server: 192.168.52.151
path: /data/k8s
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana
namespace: kube-system
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 512Mi
3、查看PV
和PVC
:
4、Deployment
控制器创建Grafana Pod
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: kube-system
labels:
app: grafana
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
readinessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/grafana
subPath: grafana
name: storage
securityContext:
fsGroup: 0
runAsUser: 0
volumes:
- name: storage
persistentVolumeClaim:
claimName: grafana
两个比较重要的环境变量GF_SECURITY_ADMIN_USER
和GF_SECURITY_ADMIN_PASSWORD
,用来配置 grafana
的管理员用户和密码的,由于 grafana
将 dashboard
插件这些数据保存在/var/lib/grafana
这个目录下面的,所以我们这里如果需要做数据持久化的话,就需要针对这个目录进行 volume
挂载声明,其他的和我们之前的 Deployment
没什么区别。
5、创建Service
:
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: kube-system
labels:
app: grafana
spec:
type: NodePort
ports:
- port: 3000
selector:
app: grafana
6、检查Service
:
[root@master k8s-demo]# kubectl get svc -n kube-system -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
grafana NodePort 10.96.80.215 <none> 3000:30441/TCP 2m32s app=grafana
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 16d k8s-app=kube-dns
kube-state-metrics ClusterIP None <none> 8080/TCP,8081/TCP 15d app.kubernetes.io/name=kube-state-metrics
kuboard NodePort 10.96.52.49 <none> 80:32567/TCP 15d k8s.kuboard.cn/layer=monitor,k8s.kuboard.cn/name=kuboard
metrics-server ClusterIP 10.96.162.56 <none> 443/TCP 15d k8s-app=metrics-server
Dashboard配置
1、Grafana
部署完成,我们在浏览器输入集群中任意节点IP:30441
,就会打开Grafana UI
,使用admin/admin
登陆:
2、创建Prometheus
数据源:
3、导入8919 dashboard
,Kubernetes
云原生集群节点性能监控指标就展示到模板上,如下图: