EDITORIAL
Prior to the deployment of web sites, architecture diagram there is a link monitoring section, and set up an effective monitoring platform is very important for the operation and maintenance, the only way to ensure a more efficient and stable operation of our servers and services, common open source monitoring software there are several, such as zabbix, Nagios, open-flcon there prometheus, each have their own advantages and disadvantages, thanks to children's shoes can own Baidu, but monitoring and k8s cluster, only more friendly in relation to the is Prometheus, today we'll look at how to deploy a comprehensive monitoring K8S Prometheus
main content
1.Prometheus architecture
2.K8S monitoring indicators and realization of ideas
3. Prometheus deployed in K8S platform
4. K8S based configuration service discovery resolution
5. Deploy Grafana in K8S platform
6. Monitoring K8S cluster Pod, Node, resource object
7. Grafana visual display monitor data Prometheus
8. alarms and alarm notification rule
1 Prometheus architecture
What is Prometheus
Prometheus (Prometheus) is a monitoring system was originally built on SoundCloud. Since 2012, the community became open source project, has a very active community of developers and users. To emphasize the open source and independent maintenance, Prometheus added cloud cloud computing native Foundation (CNCF) in 2016, becoming the second Kubernetes managed projects.
Official website address:
https://prometheus.io
https://github.com/prometheus
Prometheus composition and architecture
- Prometheus Server: the collection of indicators and time series data storage and provides a query interface
- ClientLibrary: client library
- Push Gateway: short-term memory indicator data. Mainly used for temporary tasks
- Exporters: acquisition of existing third-party service monitoring indicators and metrics exposed
- Alertmanager: Alarm
Web UI: simple Web console
Data Model
Prometheus store all the data for the time series; and with the same metric name tag belong to the same index.
Each time series by metric name and a key-value pair (also known as labels) uniquely identified.
Time series format:
And examples of work
Examples: Example called target can grab (the Instances)
operations: a set of instances of the same object called a job (the Job)
scrape_configs:
-job_name: 'Prometheus'
static_configs:
-targets: [ 'localhost: 9090']
-job_name: 'Node'
static_configs:
-targets: [ '192.168.1.10:9090']
2 K8S monitoring indicators and realization of ideas
k8S monitoring indicators
Kubernetes monitor itself
- Node resource utilization
- Number Node
- Pods number (Node)
- Resource object status
Pod monitoring
- Number of Pod (Project)
- Container resource utilization
application
Prometheus monitoring K8S architecture
Monitoring indicators | Implementation | For example |
---|---|---|
Pod performance | cAdvisor | CPU container |
Node Performance | CPU node node-exporter | Memory Usage |
K8S resource object | kube-state-metrics | Pod/Deployment/Service |
Service Discovery:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
3 Prometheus deployed in K8S platform
3.1 cluster environment
ip address | Roles | Remark |
---|---|---|
192.168.73.136 | nfs | |
192.168.73.138 | k8s-master | |
192.168.73.139 | k8s-node01 | |
192.168.73.140 | k8s-node02 | |
192.168.73.135 | k8s-node03 |
3.2 Project Address:
[root@k8s-master src]# git clone https://github.com/zhangdongdong7/k8s-prometheus.git
Cloning into 'k8s-prometheus'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
[root@k8s-master src]# cd k8s-prometheus/
[root@k8s-master k8s-prometheus]# ls
alertmanager-configmap.yaml kube-state-metrics-rbac.yaml prometheus-rbac.yaml
alertmanager-deployment.yaml kube-state-metrics-service.yaml prometheus-rules.yaml
alertmanager-pvc.yaml node_exporter-0.17.0.linux-amd64.tar.gz prometheus-service.yaml
alertmanager-service.yaml node_exporter.sh prometheus-statefulset-static-pv.yaml
grafana.yaml OWNERS prometheus-statefulset.yaml
kube-state-metrics-deployment.yaml prometheus-configmap.yaml README.md
3.3 Authorization
RBAC (Role-Based Access Control, Role-Based Access Control): responsible for completing the authorization (Authorization) work.
Written authorization yaml
[root@k8s-master prometheus-k8s]# vim prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
"prometheus-rbac.yaml" 55L, 1080C 1,1 Top
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
1,1 Top
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
create
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rbac.yaml
3.4 Configuration Management
Use Configmap need not be stored encrypted configuration
wherein the ip address nodes need to be modified according to its own address
[root@k8s-master prometheus-k8s]# vim prometheus-configmap.yaml
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap #
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
rule_files:
- /etc/config/rules/*.rules
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: kubernetes-nodes
scrape_interval: 30s
static_configs:
- targets:
- 192.168.73.135:9100
- 192.168.73.138:9100
- 192.168.73.139:9100
- 192.168.73.140:9100
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:80"]
create
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-configmap.yaml
3.5 stateful deployment prometheus
As used herein storageclass dynamic supply, go to the persistent data prometheus the specific implementation measures, before you can view the article "NFS dynamic storage in the supply k8s", in addition to using a static supply of prometheus-statefulset-static-pv .yaml for persistence
[root@k8s-master prometheus-k8s]# vim prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
- name: prometheus-server
image: "prom/prometheus:v2.2.1"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
subPath: ""
- name: prometheus-rules
mountPath: /etc/config/rules
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: prometheus-rules
configMap:
name: prometheus-rules
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
create
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-statefulset.yaml
Check Status
[root@k8s-master prometheus-k8s]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
alertmanager-5d75d5688f-fmlq6 2/2 Running 0 8d
coredns-5bd5f9dbd9-wv45t 1/1 Running 1 8d
grafana-0 1/1 Running 2 14d
kube-state-metrics-7c76bdbf68-kqqgd 2/2 Running 6 13d
kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 14d
prometheus-0 2/2 Running 6 14d
You can see a pod prometheus-0, which just used statefulset controller has deployed state, state Runing is normal, error details if you do not use kubectl describe pod prometheus-0 -n kube-system view is Runing
3.6 Create a service exposed Access Port
As used herein, an access port nodePort fixed, easy to remember
[root@k8s-master prometheus-k8s]# vim prometheus-service.yaml
kind: Service
apiVersion: v1
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/name: "Prometheus"
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
type: NodePort
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
nodePort: 30090
selector:
k8s-app: prometheus
create
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-service.yaml
an examination
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-5bd5f9dbd9-wv45t 1/1 Running 1 8d
pod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 14d
pod/prometheus-0 2/2 Running 6 14d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 13d
service/kubernetes-dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 16d
service/prometheus NodePort 10.0.0.33 <none> 9090:30090/TCP 13d
3.7 web access
Plus a NodeIP using any access port, the access address: http: // NodeIP: Port, this embodiment is: http: //192.168.73.139: 30090
access interfaces shown in FIG successful:
4 Grafana deployment platform in K8S
The above web access, we can see prometheus own UI interface is not much function, visual display of the function is not perfect, can not meet the required daily monitoring, so often we need to combine Prometheus + Grafana way to visualize the data show
official website address:
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus
https://grafana.com/grafana/download
just downloaded programs already written yaml Grafana of , modify according to their own environment
4.1 StatefulSet deployment grafana
[root@k8s-master prometheus-k8s]# vim grafana.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: grafana
namespace: kube-system
spec:
serviceName: "grafana"
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana
ports:
- containerPort: 3000
protocol: TCP
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- name: grafana-data
mountPath: /var/lib/grafana
subPath: grafana
securityContext:
fsGroup: 472
runAsUser: 472
volumeClaimTemplates:
- metadata:
name: grafana-data
spec:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: kube-system
spec:
type: NodePort
ports:
- port : 80
targetPort: 3000
nodePort: 30091
selector:
app: grafana
4.2 Grafana of web access
Use any of the NodeIP plus port access, access address: http: // NodeIP: Port, this case is: http: //192.168.73.139: 30091
successfully access the interface is as follows, will require the account password, the default account passwords are admin, will change your password after landing
After landing the interface as follows
The first step requires the data source is added, click create your first data source database icon, can be added according to the diagram shown in FIG.
The second step, after the addition finished click on the bottom of the green Save & Test, will succeed prompted Data sourse is working, then the data source added successfully
4.3 Monitoring K8S cluster Pod, Node, resource object
Pod
kubelet metrics cAdvisor node using the interfaces provided access to relevant data of the performance index for all nodes Pod and containers.
Exposed interface address:
HTTPS: // NodeIP: 10255 / metrics / cadvisor
HTTPS: // NodeIP: 10250 / metrics / cadvisor
Node
using node_exporter collector collected node resource utilization.
https://github.com/prometheus/node_exporter
use the document: https: //prometheus.io/docs/guides/node-exporter/
use node_exporter.sh scripts were deployed node_exporter collector on three servers, you do not need to be modified directly run the script
[root@k8s-master prometheus-k8s]# cat node_exporter.sh
#!/bin/bash
wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
tar zxf node_exporter-0.17.0.linux-amd64.tar.gz
mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter
cat <<EOF >/usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
[root@k8s-master prometheus-k8s]# ./node_exporter.sh
- Node_exporter detection process, is in effect
[root@k8s-master prometheus-k8s]# ps -ef|grep node_exporter
root 6227 1 0 Oct08 ? 00:06:43 /usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service
root 118269 117584 0 23:27 pts/0 00:00:00 grep --color=auto node_exporter
Resource object
kube-state-metrics collected status information k8s various resource objects, only need to master node deployment on the line
https://github.com/kubernetes/kube-state-metrics
- Creating rbac of yaml to authorize metrics
[root@k8s-master prometheus-k8s]# vim kube-state-metrics-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics-resizer
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["extensions"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-rbac.yaml
- Write yaml Deployment and ConfigMap are metrics pod deployment, does not require modification
[root@k8s-master prometheus-k8s]# cat kube-state-metrics-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v1.3.0
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
version: v1.3.0
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: lizhenliang/kube-state-metrics:v1.3.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: lizhenliang/addon-resizer:1.8.3
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: config-volume
mountPath: /etc/config
command:
- /pod_nanny
- --config-dir=/etc/config
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
volumes:
- name: config-volume
configMap:
name: kube-state-metrics-config
---
# Config map for resource configuration.
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-state-metrics-config
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-deployment.yaml
- Service yaml written on metrics port exposed
[root@k8s-master prometheus-k8s]# cat kube-state-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "kube-state-metrics"
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-service.yaml
- Check the pod and svc state, you can see the normal operation of the pod / kube-state-metrics-7c76bdbf68-kqqgd and external exposure of the 8080 and 8081 ports
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/alertmanager-5d75d5688f-fmlq6 2/2 Running 0 9d
pod/coredns-5bd5f9dbd9-wv45t 1/1 Running 1 9d
pod/grafana-0 1/1 Running 2 15d
pod/kube-state-metrics-7c76bdbf68-kqqgd 2/2 Running 6 14d
pod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 16d
pod/prometheus-0 2/2 Running 6 15d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager ClusterIP 10.0.0.207 <none> 80/TCP 13d
service/grafana NodePort 10.0.0.74 <none> 80:30091/TCP 15d
service/kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 14d
service/kube-state-metrics ClusterIP 10.0.0.194 <none> 8080/TCP,8081/TCP 14d
service/kubernetes-dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 17d
service/prometheus NodePort 10.0.0.33 <none> 9090:30090/TCP 14d
[root@k8s-master prometheus-k8s]#
5 using Grafana visual display monitor data Prometheus
Usually when using Prometheus to collect data we need to monitor K8S cluster Pod, Node, resource object, so we need to install the corresponding plug-in and resource acquisition provides api for data acquisition, in 4.3 we have configured, we can where the state of each collector in the use of Prometheus Staus UI menu interface in the Target, as shown:
Only when the state of each Target we are UP state, we can use its own interface to acquire data related to a monitored item, as shown:
From the above figure we can see Prometheus interface visual display than a single function, can not meet the demand, so we need to combine Grafana Prometheus visual display monitoring data, in the last chapter, we have successfully deployed Granfana, and therefore in need when using the dashboard and add Panel to display relevant monitoring design items, but in fact there are many mature template, we can directly use in the community which Granfana, and then to get the data according to their environment to modify the Panel's query
https: / /grafana.com/grafana/dashboards
Recommended template:
- Cluster Resource Monitor: 3119
When adding a template if a Panel does not display data, you can click on the Edit Panel, PromQL query statement, and then to debug PromQL statement on whether Prometheus own interface can get to the value, after the last adjustment control interface as shown in Figure
Resource Status monitoring: 6417
Similarly, the template is added to monitor the state of the resource, and then after adjustment shown in FIG monitoring interface, can obtain the status of various resources in k8s display monitor
Monitoring Node: 9276
Similarly, the template is added to monitor the state of the resource, and then after adjustment shown in FIG monitoring interface, can obtain basic information on each node
6 deployed in Alertmanager in K8S
6.1 deployment Alertmanager
6.2 deployment alert
Email to send us to achieve the alarm information
- First of all need to prepare a sender mailbox, open stmp sending
- Use configmap store alarm rules, alarm rules yaml write files, and can be modified to add an alarm rule according to their actual situation, prometheus than zabbix trouble here, all alarm rules need to define themselves
[root@k8s-master prometheus-k8s]# vim prometheus-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: kube-system
data:
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})"
- alert: NodeMemoryUsage
expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 10
0 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 内存使用率过高"
description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})"
- alert: NodeCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} CPU使用率过高"
description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rules.yaml
- Written warning configmap of yaml file deployment, increase alertmanager alarm configuration, configure email delivery address
[root@k8s-master prometheus-k8s]# vim alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
smtp_smarthost: 'mail.goldwind.com.cn:587' #登陆邮件进行查看
smtp_from: '[email protected]' #根据自己申请的发件邮箱进行配置
smtp_auth_username: '[email protected]'
smtp_auth_password: 'Dbadmin@123'
receivers:
- name: default-receiver
email_configs:
- to: "[email protected]"
route:
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-configmap.yaml
- Create a PVC for data persistence storage class used when installing with Prometheus my yaml file used for automatic feeding, need to modify according to their actual situation
[root@k8s-master prometheus-k8s]# vim alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-pvc.yaml
- yaml write deployment to deploy the pod alertmanager
[root@k8s-master prometheus-k8s]# vim alertmanager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: kube-system
labels:
k8s-app: alertmanager
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.14.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: alertmanager
version: v0.14.0
template:
metadata:
labels:
k8s-app: alertmanager
version: v0.14.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
containers:
- name: prometheus-alertmanager
image: "prom/alertmanager:v0.14.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/alertmanager.yml
- --storage.path=/data
- --web.external-url=/
ports:
- containerPort: 9093
readinessProbe:
httpGet:
path: /#/status
port: 9093
initialDelaySeconds: 30
timeoutSeconds: 30
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: storage-volume
mountPath: "/data"
subPath: ""
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
- name: prometheus-alertmanager-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9093/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
volumes:
- name: config-volume
configMap:
name: alertmanager-config
- name: storage-volume
persistentVolumeClaim:
claimName: alertmanager
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-deployment.yaml
- Creating alertmanager of service of foreign exposed port
[root@k8s-master prometheus-k8s]# vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Alertmanager"
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 9093
selector:
k8s-app: alertmanager
type: "ClusterIP"
[root@k8s-master prometheus-k8s]# kubectl apply -f alertmanager-service.yaml
- Detecting deployment status can be found pod / alertmanager-5d75d5688f-fmlq6 and service / alertmanager normal operation
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/alertmanager-5d75d5688f-fmlq6 2/2 Running 4 10d 172.17.15.2 192.168.73.140 <none> <none>
pod/coredns-5bd5f9dbd9-qxvmz 1/1 Running 0 42m 172.17.33.2 192.168.73.138 <none> <none>
pod/grafana-0 1/1 Running 3 16d 172.17.31.2 192.168.73.139 <none> <none>
pod/kube-state-metrics-7c76bdbf68-hv56m 2/2 Running 0 23h 172.17.15.3 192.168.73.140 <none> <none>
pod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 6 17d 172.17.31.4 192.168.73.139 <none> <none>
pod/prometheus-0 2/2 Running 8 16d 172.17.83.2 192.168.73.135 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/alertmanager ClusterIP 10.0.0.207 <none> 80/TCP 14d k8s-app=alertmanager
service/grafana NodePort 10.0.0.74 <none> 80:30091/TCP 16d app=grafana
service/kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 42m k8s-app=kube-dns
service/kube-state-metrics ClusterIP 10.0.0.194 <none> 8080/TCP,8081/TCP 15d k8s-app=kube-state-metrics
service/kubernetes-dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 18d k8s-app=kubernetes-dashboard
service/prometheus NodePort 10.0.0.33 <none> 9090:30090/TCP 15d k8s-app=prometheus