Table of contents
1. Create a new namespace monitor
Reference article:
k8s cluster deployment cadvisor+node-exporter+prometheus+grafana monitoring system - cyh00001 - 博客园
Preparation:
Cluster cluster node introduction:
master: 192.168.136.21 (the following steps are all performed on this node)
worker:192.168.136.22
worker:192.168.136.23
##vim indentation confusion, in colon mode, :set paste enters paste mode, :set nopaste exits paste mode (default). ##
1. Create a new namespace monitor
kubectl create ns monitor
Pull the cadvisor image, because the official image is in the Google image and cannot be accessed in China, I use someone else’s image here, just pull it directly, note that the image name is lagoudocker/cadvisor:v0.37.0.
docker pull lagoudocker/cadvisor:v0.37.0
2. Deployment
Create a new /opt/cadvisor_prome_gra directory, there are many configuration files, so create a new directory separately.
2.1 deploy cadvisor
Deploy the DaemonSet resource of cadvisor. The DaemonSet resource can ensure that each node in the cluster runs the same set of pods, and even newly added nodes will automatically create corresponding pods.
vim case1-daemonset-deploy-cadvisor.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: monitor
spec:
selector:
matchLabels:
app: cAdvisor
template:
metadata:
labels:
app: cAdvisor
spec:
tolerations: #污点容忍,忽略master的NoSchedule
- effect: NoSchedule
key: node-role.kubernetes.io/master
hostNetwork: true
restartPolicy: Always # 重启策略
containers:
- name: cadvisor
image: lagoudocker/cadvisor:v0.37.0
imagePullPolicy: IfNotPresent # 镜像策略
ports:
- containerPort: 8080
volumeMounts:
- name: root
mountPath: /rootfs
- name: run
mountPath: /var/run
- name: sys
mountPath: /sys
- name: docker
mountPath: /var/lib/containerd
volumes:
- name: root
hostPath:
path: /
- name: run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/containerd
kubectl apply -f case1-daemonset-deploy-cadvisor.yaml
kubectl get pod -n monitor -owide query
Because there are three nodes, there will be three pods. If a worker node is added later, DaemonSet will automatically add it.
test cadvisor <masterIP>:<8080>
2.2 deploy node_exporter
Deploy the DaemonSet resource and Service resource of node-exporter.
vim case2-daemonset-deploy-node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- image: prom/node-exporter:v1.3.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitor
spec:
type: NodePort
ports:
- name: http
port: 9100
nodePort: 39100
protocol: TCP
selector:
k8s-app: node-exporter
kubectl get pod -n monitor
Verify node-exporter data, pay attention to port 9100, <nodeIP>:<9100>
2.3 deploy prometheus
Prometheus resources include ConfigMap resources, Deployment resources, and Service resources.
vim case3-1-prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus-config
namespace: monitor
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
Note that the k8s-master in the case3-2 configuration file should be changed, and cannot be changed to the local host ip (the reason is unknown)
Set the 192.168.136.21 (k8s-master) node as the prometheus data storage path /data/prometheus.
vim case3-2-prometheus-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitor
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
#matchExpressions:
#- {key: app, operator: In, values: [prometheus]}
#- {key: component, operator: In, values: [server]}
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
nodeName: k8s-master
serviceAccountName: monitor
containers:
- name: prometheus
image: prom/prometheus:v2.31.2
imagePullPolicy: IfNotPresent
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config
subPath: prometheus.yml
- mountPath: /prometheus/
name: prometheus-storage-volume
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
items:
- key: prometheus.yml
path: prometheus.yml
mode: 0644
- name: prometheus-storage-volume
hostPath:
path: /data/prometheusdata
type: Directory
Create sa and clusterrolebinding
kubectl create serviceaccount monitor -n monitor
kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor --clusterrole=cluster-admin --serviceaccount=monitor:monitor
kubectl apply -f case3-2-prometheus-deployment.yaml
In case3-2, there is a big hole in this step. It is possible to use "k8s-master", but not to use "192.168.136.21"! Deployment and pod have been unable to get up. Checking the pod log shows that the "192.168.136.21" host cannot be found, and changing to "k8s-master" will not work. It will be fine after a few days, and the machine will be shut down during the period. (reason unknown)
vim case3-3-prometheus-svc.yaml
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitor
labels:
app: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
protocol: TCP
selector:
app: prometheus
component: server
kubectl apply -f case3-3-prometheus-svc.yaml
2.4 Deploy rbac permissions
Including Secret resources, ServiceAccount resources, ClusterRole resources, and ClusterRoleBinding resources, ServiceAccount is a service account, ClusterRole is a permission rule, and ClusterRoleBinding is to bind ServiceAccount and ClusterRole.
The authentication information of pod and apiserver is defined by secret. Since the authentication information is sensitive information, it needs to be stored in the secret resource and mounted to the Pod as a storage volume. In this way, the application running in the Pod can connect to the apiserver through the information in the corresponding secret and complete the authentication.
rbac permission management is a set of authentication system of k8s. The above is just a brief explanation. For in-depth understanding, you can browse: rbac authorization of k8s APIserver security mechanism_Stupid child@GF Knowledge and action blog-CSDN blog_Which file is k8s rbac written in
vim case4-prom-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: monitor-token
namespace: monitor
annotations:
kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
#apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
kubectl apply -f case4-prom-rbac.yaml
2.5. Deploy metrics
Including Deployment resource, Service resource, ServiceAccount resource, ClusterRole resource, ClusterRoleBinding resource.
Note that it is deployed in kube-system!
vim case5-kube-state-metrics-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-state-metrics
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/kube-state-metrics:v2.6.0
ports:
- containerPort: 8080
---
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources: ["daemonsets", "deployments", "replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["cronjobs", "jobs"]
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: kube-system
labels:
app: kube-state-metrics
spec:
type: NodePort
ports:
- name: kube-state-metrics
port: 8080
targetPort: 8080
nodePort: 31666
protocol: TCP
selector:
app: kube-state-metrics
kubectl apply -f case5-kube-state-metrics-deploy.yaml
2.6 deploy grafana
The grafana graphical interface is connected to the prometheus data source, including Deployment resources and Service resources.
vim grafana-enterprise.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-enterprise
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
app: grafana-enterprise
template:
metadata:
labels:
app: grafana-enterprise
spec:
containers:
- image: grafana/grafana
imagePullPolicy: Always
#command:
# - "tail"
# - "-f"
# - "/dev/null"
securityContext:
allowPrivilegeEscalation: false
runAsUser: 0
name: grafana
ports:
- containerPort: 3000
protocol: TCP
volumeMounts:
- mountPath: "/var/lib/grafana"
name: data
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
volumes:
- name: data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitor
spec:
type: NodePort
ports:
- port: 80
targetPort: 3000
nodePort: 31000
selector:
app: grafana-enterprise
kubectl apply -f grafana-enterprise.yaml
Account admin Password admin
Add the data source data sources, named prometheus, pay attention to the port number 30090 .
Add template 13332, you can also add other templates, such as: 14981, 13824, 14518.
Click the "+" sign on the left and select "import" to import the template.
Template 13332
The cadvisor template number is 14282. There is a bug here that has not been resolved. It can monitor the performance resources of all containers in the cluster, but if one of the containers is selected, the data cannot be displayed. (should be fixable).
The ID of the pod is displayed now, which is not convenient for the administrator to browse. In order to display the name of the pod for convenience, select the "Settings icon" on the right side of the template, select "Variables", select the second one, and change "name" to "pod" That's it.
Each section of the dashboard also needs to be changed. Click the section title, select "Edit", and change "name" to "pod".
3. Test monitoring effect
Create a new deployment task named nginx01 to test the monitoring results.
vim nginx01.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx01
spec:
replicas: 2
selector:
matchLabels:
app: nginx01
template:
metadata:
labels:
app: nginx01
spec:
containers:
- name: nginx
image: nginx:1.7.9
kubectl apply -f nginx01.yaml
Two nginx01 appear because 2 replicas are set.
So far, the deployment of cadvisor+prometheus+grafana cluster monitoring is completed.