Prometheus Operator monitoring Kubernetes
1. Prometheus basic architecture
Prometheus is a complete open source monitoring solution, covering data collection, query, alarm, show the whole monitoring process, below is Prometheus architecture diagram :
The official document: https://prometheus.io/docs/introduction/overview/
2. Component Description
Prometheus ecosystem of a plurality of components. Many of these components are optional
- Promethus server
Must be installed, it is essentially a sequence database, is responsible for data pull , storage, analysis, provide PromQL support query language;
- Push Gateway
Non-Required to support temporary Job active push index of intermediate gateway
- exporters
Deployed in the client's agent, such as node_exporte, mysql_exporter etc.
Information provided by the monitoring component of the HTTP interface is called the exporter , the current Internet company most commonly used components are exporter can be used directly, such as Varnish , Haproxy , Nginx , MySQL , Linux system information ( including disk, memory, the CPU , network, etc. ) ; as: https://prometheus.io/docs/instrumenting/exporters/
- alertmanager
Used for alarm, Promethus Server through analysis, send an alert to the departure alertmanager components, alertmanager components through its own rules to send notification , ( e-mail, or webhook)
3. Prometheus-Operator
Prometheus-Operator architecture diagram:
The figure is Prometheus-Operator architecture diagram provided by the official, which the Operator is the core part, as a controller, he would go to create Prometheus , a ServiceMonitor , AlertManager and PrometheusRule4 a CRD resource object, and then constantly monitors and maintains these 4 months state resource object.
Which created prometheus this resource object is as Prometheus Server exist, and ServiceMonitor is the exporter of various abstract, the exporter We have already learned, is used to provide specializes in providing metrics tool data interface, Prometheus is through ServiceMonitor provided metrics data Interface to pull data, of course alertmanager such resources corresponding to objects is AlertManager abstract, and PrometheusRule are made to be Prometheus alarm rules file using instances.
So we have to monitor what data in the cluster, it becomes directly to the operating Kubernetes resource object cluster, is not it a lot easier. Figure above Service and ServiceMonitor are Kubernetes resources, a ServiceMonitor by labelSelector to match the way a class of Service , Prometheus can also labelSelector to match a plurality ServiceMonitor .
4. Prometheus the Operator- deployment
The official chart Address: https://github.com/helm/charts/tree/master/stable/prometheus-operator
Search for the latest package downloaded to the local
# search for
helm search prometheus-operator NAME CHART VERSION APP VERSION DESCRIPTION stable/prometheus-operator 6.4.0 0.31.0 Provides easy monitoring definitions for Kubernetes servi...
# Pull to local
helm fetch prometheus-operator
installation
# Create a monitoring of namespaces
Kubectl create ns monitoring
# Installation
helm install -f ./prometheus-operator/values.yaml --name prometheus-operator --namespace=monitoring ./prometheus-operator
# Update
helm upgrade -f prometheus-operator/values.yaml prometheus-operator ./prometheus-operator
Uninstall prometheus-operator
helm delete prometheus-operator --purge
# Delete crd
kubectl delete customresourcedefinitions prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com kubectl delete customresourcedefinitions alertmanagers.monitoring.coreos.com kubectl delete customresourcedefinitions podmonitors.monitoring.coreos.com
Modify the configuration file values.yaml
4.1. E-mail alerts
config: global: resolve_timeout: 5m smtp_smarthost: 'smtp.qq.com:465' smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'xreqcqffrxtnieff' smtp_hello: '163.com' smtp_require_tls: false route: group_by: ['job','severity'] group_wait: 30s group_interval: 1m repeat_interval: 12h receiver: default routes: - receiver: webhook match: alertname: TargetDown receivers: - name: default email_configs: - to: '[email protected]' send_resolved: true - name: webhook email_configs: - to: '[email protected]' send_resolved: true
Here is a pit refer to: https://www.cnblogs.com/Dev0ps/p/11320177.html
4.2. Prometheus persistent storage
storage: volumeClaimTemplate: spec: storageClassName: nfs-client accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi
4.3. Grafana persistence
Lu_jing: prometheus-operator / charts / grafana / valuesyaml
persistence: enabled: true storageClassName: "nfs-client" accessModes: - ReadWriteOnce size: 10Gi
4.4. Auto Discovery Service
- job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-pod' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: istio-mesh scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-telemetry;prometheus replacement: $1 action: keep - job_name: envoy-stats scrape_interval: 15s scrape_timeout: 10s metrics_path: /stats/prometheus scheme: http kubernetes_sd_configs: - api_server: null role: pod namespaces: names: [] relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] separator: ; regex: .*-envoy-prom replacement: $1 action: keep - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::\d+)?;(\d+) target_label: __address__ replacement: $1:15090 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod_name replacement: $1 action: replace metric_relabel_configs: - source_labels: [cluster_name] separator: ; regex: (outbound|inbound|prometheus_stats).* replacement: $1 action: drop - source_labels: [tcp_prefix] separator: ; regex: (outbound|inbound|prometheus_stats).* replacement: $1 action: drop - source_labels: [listener_address] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [http_conn_manager_listener_prefix] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [http_conn_manager_prefix] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_tls.* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_tcp_downstream.* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_http_(stats|admin).* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_cluster_(lb|retry|bind|internal|max|original).* replacement: $1 action: drop - job_name: istio-policy scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-policy;http-monitoring replacement: $1 action: keep - job_name: istio-telemetry scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-telemetry;http-monitoring replacement: $1 action: keep - job_name: pilot scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-pilot;http-monitoring replacement: $1 action: keep - job_name: galley scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-galley;http-monitoring replacement: $1 action: keep - job_name: citadel scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-citadel;http-monitoring replacement: $1 action: keep - job_name: kubernetes-pods-istio-secure scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: null role: pod namespaces: names: [] tls_config: ca_file: /etc/istio-certs/root-cert.pem cert_file: /etc/istio-certs/cert-chain.pem key_file: /etc/istio-certs/key.pem insecure_skip_verify: true relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] separator: ; regex: "true" replacement: $1 action: keep - source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status, __meta_kubernetes_pod_annotation_istio_mtls] separator: ; regex: (([^;]+);([^;]*))|(([^;]*);(true)) replacement: $1 action: keep - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] separator: ; regex: (http) replacement: $1 action: drop - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] separator: ; regex: (.+) target_label: __metrics_path__ replacement: $1 action: replace - source_labels: [__address__] separator: ; regex: ([^:]+):(\d+) replacement: $1 action: keep - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::\d+)?;(\d+) target_label: __address__ replacement: $1:$2 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod_name replacement: $1 action: replace
4.5. etcd
For etcd clustered general, for security will open https certificate authentication way, so to get Prometheus access to etcd monitoring data cluster, we need to provide the appropriate certificate verification.
Since we use here is the presentation environment Kubeadm build clusters, we can use kubectl tools to get etcd certification path using the start time:
[root@cn-hongkong ~]# kubectl get pod etcd-cn-hongkong.i-j6caps6av1mtyxyofmrw -n kube-system -o yaml
We can see etcd certificates are used in the corresponding node of the / etc / kubernetes / pki / etcd the path below, so first we will need to use the certificate by secret stored in the cluster to target: ( in etcd node running )
1) manually obtain etcd information
curl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key https://172.31.182.152:2379/metrics
2) Use prometheus grab
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
3) Add values.yaml file kubeEtcd configuration
## Component scraping etcd ## kubeEtcd: enabled: true ## If your etcd is not deployed as a pod, specify IPs it can be found on ## endpoints: [] ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used ## service: port: 2379 targetPort: 2379 selector: component: etcd ## Configure secure access to the etcd cluster by loading a secret into prometheus and ## specifying security configuration below. For example, with a secret named etcd-client-cert ## serviceMonitor: scheme: https insecureSkipVerify: true serverName: localhost caFile: /etc/prometheus/secrets/etcd-certs/ca.crt certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
4) The created above etcd-certs object configuration to prometheus (especially important)
## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods. ## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not ## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated ## with the new list of secrets. ## secrets: - etcd-certs
After you install the certificate will appear in prometheus directory
Crawl Custom Server 4.6
We need to build a ServiceMonitor, namespaceSelector: of any: true means to match all namespaces below have app = sscp-transaction this label label Service.
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: sscp-transaction release: prometheus-operator name: springboot namespace: monitoring spec: endpoints: - interval: 15s path: /actuator/prometheus port: health scheme: http namespaceSelector: any: true # matchNames: # - sscp-dev selector: matchLabels: app: sscp-transaction # release: sscp
Renderings: