k8s Helm安装Prometheus Operator

知识要求:
对于prometheus/alertmanager/grafana会简单使用,知道配置文件大概是做什么的,要不一些概念性东西你可能不理解,页面也不会操作,这里我不会太细的解释。
 
 
 
1. 系统环境
  • 系统版本号CentOS 7.6
  • docker Client版本号18.09.7, Server版本号18.09.7
  • k8s版本号v1.16.2
  • helm Client版本号v2.13.1,Server版本号v2.13.1
 
确认heml镜像源并更新镜像仓库
[root@ops1 test]# helm repo add stable http://mirror.azure.cn/kubernetes/charts/
[root@ops1 test]# helm repo list
NAME            URL
local           http://127.0.0.1:8879/charts 
stable          http://mirror.azure.cn/kubernetes/charts/
incubator       http://mirror.azure.cn/kubernetes/charts-incubator/
[root@ops1 test]# helm repo update
 
2. 安装Prometheus Operator
查看并拉取prometheus压缩包,有兴趣的同学可以看看具体内容
[root@ops1 test]# helm search prometheus
stable/prometheus-operator              8.12.0 0.37.0 Provides easy monitoring definitions for Kubernetes servi...
[root@ops1 test]# helm fetch stable/prometheus-operator --version 8.12.0
[root@ops1 test]# tar -zxf prometheus-operator-8.12.0.tgz 
tar: prometheus-operator/Chart.yaml:不可信的旧时间戳 1970-01-01 08:00:00
[root@ops1 test]# ls prometheus-operator
charts  Chart.yaml  CONTRIBUTING.md crds README.md requirements.lock requirements.yaml templates values.yaml
 
 
helm安装prometheus Operater,他的文件都是安装在命名空间monitoring下
[root@ops1 test]# cat <<EOF > prometheus-operator-values.yaml
 
alertmanager:
  service:  # 设置alertmanager网络类型,方便外网测试访问
    nodePort: 30091
    type: NodePort  
 
  
  alertmanagerSpec:
    storage:  # 我使用了永久存储,如果做测试,不用写这一段
      volumeClaimTemplate:
        spec:
          storageClassName: prometheus-k8s
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
 
grafana:
  service:  # 设置prometheus网络类型,方便外网测试访问
    type: NodePort
    nodePort: 30092
 
prometheus:
  service:  # 设置prometheus网络类型,方便外网测试访问
    nodePort: 30090
    type: NodePort  
  prometheusSpec:
    storageSpec: # 我使用了永久存储,如果做测试,不用写这一段
       volumeClaimTemplate:
         spec:
           storageClassName: prometheus-k8s
           accessModes: ["ReadWriteOnce"]
           resources:
             requests:
               storage: 20Gi
kubeEtcd:
  service:  # 1.16.2版本的etcd的检测端口为2381
    port: 2381
    targetPort: 2381
EOF
[root@ops1 test]# helm install --name prometheus-operator --version=8.12.0 -f prometheus-operator-values.yaml \
    --namespace=monitoring stable/prometheus-operator
 
NAME:   prometheus-operator
......  .......
NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus-operator"
 
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
 
[root@ops1 test]# kubectl get crd | grep monitoring
alertmanagers.monitoring.coreos.com 2020-04-08T02:59:54Z
podmonitors.monitoring.coreos.com 2020-04-08T02:59:57Z
prometheuses.monitoring.coreos.com 2020-04-08T02:59:57Z
prometheusrules.monitoring.coreos.com 2020-04-08T03:00:00Z
servicemonitors.monitoring.coreos.com 2020-04-08T03:00:02Z
thanosrulers.monitoring.coreos.com 2020-04-08T03:00:05Z
 
 
[root@ops1 prometheus-operator]# kubectl get svc -n monitoring
 
 
 
1. prometheus访问界面: http://192.168.70.122:30090/graph#/alerts
 
2. alertmanager告警界面: http://192.168.70.122:30091/#/alerts
 
3. grafana界面,默认账号密码:admin/prom-operatorhttp://192.168.70.122:30092/dashboards ,
 
3. 配置prometheus监控和告警规则
 
我们查看prometheus文件,
[root@ops1 test]# kubectl get sts prometheus-prometheus-operator-prometheus -o yaml
      - args:
        - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
 
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: config-out
          readOnly: true
 
       - emptyDir: {}
        name: config-out
 
我们发现,prometheus的yaml配置文件就是pod本地文件,并没有用已有的,这里就引出了我们原来的的概念,即prometheus的yaml配置文件是由Operator控制的,如图。
上图是Prometheus-Operator官方提供的架构图,其中Operator是最核心的部分,作为一个控制器,他会去创建 Prometheus、ServiceMonitor、AlertManager以及PrometheusRule4个CRD资源对象,然后会一直监控并维持这4个资源对象的状态。
其中创建的prometheus这种资源对象就是作为Prometheus Server存在,而ServiceMonitor就是exporter的各种抽象,exporter前面我们已经学习了,是用来提供专门提供metrics数据接口的工具,Prometheus就是通过ServiceMonitor提供的metrics数据接口去 pull 数据的,当然alertmanager这种资源对象就是对应的AlertManager的抽象,而PrometheusRule是用来被Prometheus实例使用的报警规则文件。
这样我们要在集群中监控什么数据,就变成了直接去操作 Kubernetes 集群的资源对象了,是不是方便很多了。上图中的 Service 和 ServiceMonitor 都是 Kubernetes 的资源,一个 ServiceMonitor 可以通过 labelSelector 的方式去匹配一类 Service,Prometheus 也可以通过 labelSelector 去匹配多个ServiceMonitor。
 
 
[root@ops1 test]# kubectl get prometheus
NAME                             VERSION   REPLICAS AGE
prometheus-operator-prometheus   v2.15.2 1 21m
[root@ops1 test]# kubectl get prometheus prometheus-operator-prometheus -o yaml
 
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: prometheus-operator-alertmanager
      namespace: monitoring
      pathPrefix: /
      port: web
  baseImage: quay.io/prometheus/prometheus
  enableAdminAPI: false
  externalUrl: http://prometheus-operator-prometheus.monitoring:9090
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:  # 监控规则,通过crd对象中podmonitors带有这两个标签会被选中
    matchLabels:
      release: prometheus-operator
  portName: web
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:  # 报警规则,通过crd对象中prometheusrules带有这两个标签会被选中
    matchLabels:
      app: prometheus-operator
      release: prometheus-operator
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-operator-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: prometheus-operator
  storage:  # 这里的存储,就是咱们原来定义的,如果不定义,则为空 
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        storageClassName: prometheus-k8s
  version: v2.15.2
 
我们先来配置服务监控.
首先,我们建立两个tomcat,提供metrics接口,你用其他的服务,也可以。
 
[root@ops1 test]# kubectl create ns tomcat
namespace/tomcat created
[root@ops1 test]# cat <<EOF > tomcat-test1.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-test1
  namespace: tomcat
  labels:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s.eip.work/layer: svc
      k8s.eip.work/name: tomcat-test1
  template:
    metadata:
      labels:
        k8s.eip.work/layer: svc
        k8s.eip.work/name: tomcat-test1
    spec:
      containers:
        - name: tomcat-test1
          image: 'registry.cn-beijing.aliyuncs.com/wangzt/k8s/tomcat:v1.3'
 
---
apiVersion: v1
kind: Service
metadata:
  name: tomcat-test1
  namespace: tomcat
  labels:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
spec:
  selector:
    k8s.eip.work/layer: svc
    k8s.eip.work/name: tomcat-test1
  type: NodePort
  ports:
    - name: tomcat-web
      port: 80
      targetPort: 8080
    - name: metrics
      port: 9090
      targetPort: 9090
EOF
[root@ops1 test]# kubectl apply -f tomcat-test1.yaml
[root@ops1 test]# cp tomcat-test1.yaml tomcat-test2.yaml && sed -i 's&tomcat-test1&tomcat-test2&' tomcat-test2.yaml && \
     sed -i 's&v1.3&v0.8&' tomcat-test2.yaml && kubectl apply -f tomcat-test2.yaml
 
这时我们可以发现,tomcat-test1是好的,tomcat-test2是不可用的,这样方便对比
[root@ops1 test]# curl http://10.100.33.236:9090/metrics
# HELP tomcat_bytesreceived_total Tomcat global bytesReceived
# TYPE tomcat_bytesreceived_total counter
 
 
 
对命名空间tomcat带有标签k8s.eip.work/layer: svc的服务进行监控
[root@ops1 test]# cat <<EOF > prometheus-serviceMonitorTomcatTest.yaml 
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor  # 提交给这个crd接收
metadata:
  labels:
    app: prometheus-operator-tomcat-test
    chart: prometheus-operator-8.12.3
    release: prometheus-operator    # 根据这个标签进行筛选
  name: prometheus-operator-tomcat-test
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s # 每30s获取一次信息
    path: /metrics  # 对应service的访问路径
    port: metrics  # 对应service的端口名
  jobLabel: k8s.eip.work/layer
  namespaceSelector:   # 表示去匹配某一命名空间中的service,如果想从所有的namespace中匹配用any: true
    matchNames:
    - tomcat  
  selector:  # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
    matchLabels:
      k8s.eip.work/layer: svc # 匹配servic带这个标签
EOF
[root@ops1 test]# kubectl apply -f prometheus-serviceMonitorTomcatTest.yaml 
servicemonitor.monitoring.coreos.com/prometheus-operator-tomcat-test created
 
 
这时我们访问prometheus配置界面,就显示一个好用一个不好用了
 
3. 配置报警触发规则
 
服务可用性超过一半,我们先来看这条语句
 
然后我们来配置报警规则。服务死亡率超过一半则报警。
添加标签k8s_eip_work_layer: svc,alertmanager将会根据此标签选择报警规则
[root@ops1 test]# cat  <<EOF > prometheus-operator-tomcat-rules.yaml 
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    chart: prometheus-operator-8.12.3
    heritage: Tiller
    app: prometheus-operator
    release: prometheus-operator
  name: prometheus-operator-tomcat-test.rules
  namespace: monitoring
spec:
  groups:
  - name: tomcat-test.rules
    rules:
    - alert: tomcat-down
      expr:  count( up{namespace="tomcat"} == 0 )by (job) > ( count(up{namespace="tomcat"})by (job) / 2 - 1)
      for: 2m
      labels:
        alertManagerRule: node  # 注意这行,alertmangar要根据这个标签来进行报警
      annotations:
        description: "{{$labels.instance}}: Tomcat Service Is Down"
EOF
[root@ops1 test]# kubectl apply -f prometheus-operator-tomcat-rules.yaml
prometheusrule.monitoring.coreos.com/prometheus-operator-tomcat-test.rules created
 
 
我们可以进入到容器里,看到规则已经添加进去了
[root@ops1 test]# kubectl exec -it prometheus-prometheus-operator-prometheus-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-prometheus-operator-prometheus-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-0/
 
然后我们就能在 http://192.168.70.122:30090/alerts 里的Pending里看见了。
 
我们可以看到页面中出现了我们刚刚定义的报警规则信息,而且报警信息中还有状态显示。一个报警信息在生命周期内有下面3种状态:
  • inactive: 表示当前报警信息既不是firing状态也不是pending状态
  • pending: 表示在设置的阈值时间范围内被激活了
  • firing: 表示超过设置的阈值时间被激活了
 
等时间到了,默认2分钟,状态就会变为Firing,然后出发报警规则,发送报警信息给alertmanager
 
这时我们去alertmanager里就能看到报警被触发了。
好了,下一步我们去配置邮件和钉钉报警
 
4. alertmanager告警
 
我们先查看alertmanager的配置文件,发现alertmanager配置文件是通过secret来配置的。
[root@ops1 test]# kubectl get sts alertmanager-prometheus-operator-alertmanager -o yaml
 
      - args:
        - --config.file=/etc/alertmanager/config/alertmanager.yaml
 
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
 
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-prometheus-operator-alertmanager # 文件位置
 
  
[root@ops1 test]# kubectl get secret alertmanager-prometheus-operator-alertmanager -o yaml > alertmanager-cm-old.yaml
apiVersion: v1
data:
  alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==
 
[root@ops1 test]# echo "Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IFdhdGNoZG9nCiAgICByZWNlaXZlcjogIm51bGwiCg==" | base64 -d
global:
  resolve_timeout: 5m
receivers:
- name: "null"
route:
  group_by:
  - job
  group_interval: 5m
  group_wait: 30s
  receiver: "null"
  repeat_interval: 12h
  routes:
  - match:
      alertname: Watchdog
    receiver: "null"
 
 
添加邮箱报警 alertmanager.yaml
我们原来看了,alertmanager配置文件里并没有多少东西,我们重新配置。这里我们配置了两种报警规则,邮件和钉钉
[root@ops1 test]# cat <<EOF > alertmanager.yaml 
global:
  # 在没有报警的情况下声明为已解决的时间
  resolve_timeout: 5m
  # 配置邮件发送信息
  smtp_smarthost: 'smtp.exmail.qq.com:25'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: "${mima}"
  smtp_hello: '[email protected]'
  smtp_require_tls: false
# 所有报警信息进入后的根路由,用来设置报警的分发策略
route:
  # 这里的标签列表是接收到报警信息后的重新分组标签,例如,接收到的报警信息里面有许多具有 cluster=A 和 alertname=LatncyHigh 这样的标签的报警信息将会批量被聚合到一个分组里面
  group_by: ['alertname', 'cluster']
  # 当一个新的报警分组被创建后,需要等待至少group_wait时间来初始化通知,这种方式可以确保您能有足够的时间为同一分组来获取多个警报,然后一起触发这个报警信息。
  group_wait: 30s
 
  # 当第一个报警发送后,等待'group_interval'时间来发送新的一组报警信息。
  group_interval: 30s
 
  # 如果一个报警信息已经发送成功了,等待'repeat_interval'时间来重新发送他们
  repeat_interval: 2m
 
  # 默认的receiver:如果一个报警没有被一个route匹配,则发送给默认的接收器
  receiver: default
 
  # 上面所有的属性都由所有子路由继承,并且可以在每个子路由上进行覆盖。
  routes:
  - receiver: email
    group_wait: 10s
    match:
      alertManagerRule: node  #根据此标签,选择报警规则
  # - receiver: webhook
  #   match:
  #     alertManagerRule: node  #根据此标签,选择报警规则    
 
receivers:
- name: 'default'
  email_configs:
  - to: '[email protected]'
    send_resolved: true
- name: 'email'
  email_configs:
  - to: '[email protected]'
    send_resolved: true
  webhook_configs:
  - url: 'http://dingtalk-hook:5000'
    send_resolved: true
- name: 'webhook'
  webhook_configs:
  - url: 'http://dingtalk-hook:5000'
    send_resolved: true
EOF
[root@ops1 test]# kubectl delete secret alertmanager-prometheus-operator-alertmanager -n monitoring
secret "alertmanager-prometheus-operator-alertmanager" deleted
[root@ops1 test]# kubectl create secret generic alertmanager-prometheus-operator-alertmanager --from-file=alertmanager.yaml -n monitoring
secret/alertmanager-prometheus-operator-alertmanager created 
 
稍等一分钟,等待配置生效,我们访问alertmanager配置文件, http://192.168.70.122:30091/#/status
发现配置已经生效。
这时我们就可以等待,看看邮箱里没有没邮件了
 
添加钉钉报警
大家发现,我们的报警里还有另外一个规则,就是name: 'webhook',通过网络url进行报警,我们这里配置的是钉钉,先来测试
 
curl 'https://oapi.dingtalk.com/robot/send?access_token='$token'' \ -H 'Content-Type: application/json' -d '{"msgtype": "text", "text": { "content": "我就是我, 是不一样的烟火2"}}'
 
[root@ops1 test]# kubectl create secret generic dingtalk-secret --from-literal=token=$token -n monitoring
secret/dingtalk-secret created   # 设置token
[root@ops1 test]# cat <<EOF > dingtalk-hook.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk-hook
  namespace: monitoring
spec:
  replicas: 1
  selector:
   matchLabels:
     app: dingtalk-hook
  template:
    metadata:
      labels:
        app: dingtalk-hook
    spec:
      containers:
      - name: dingtalk-hook
        image: registry.cn-beijing.aliyuncs.com/wangzt/k8s/dingtalk-hook:0.1
        # image: cnych/alertmanager-dingtalk-hook:v0.2, 修改的此镜像,去掉了json
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5000
          name: http
        env:
        - name: ROBOT_TOKEN
          valueFrom:
            secretKeyRef:
              name: dingtalk-secret
              key: token
        resources:
          requests:
            cpu: 50m
            memory: 100Mi
          limits:
            cpu: 50m
            memory: 100Mi
---
apiVersion: v1
kind: Service
metadata:
  name: dingtalk-hook
  namespace: monitoring
spec:
  selector:
    app: dingtalk-hook
  ports:
  - name: hook
    port: 5000
    targetPort: http
EOF
[root@ops1 test]# kubectl apply -f dingtalk-hook.yaml 
deployment.apps/dingtalk-hook created
service/dingtalk-hook created
 
 
我们就可以在钉钉报警里看见了
 
prometheus添加证书
[root@ops1 prometheus]#  kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
secret/etcd-certs created
 
 
[root@ops1 test]# kubectl exec -it prometheus-prometheus-operator-prometheus-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-prometheus-operator-prometheus-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.crt                  healthcheck-client.crt healthcheck-client.key
 
 
 
5. prometheus收集java信息
 
[root@dev3_worker bin]# cat <<EOF > config.yaml 
---
lowercaseOutputLabelNames: true
lowercaseOutputName: true
rules:
- pattern: 'Catalina<type=GlobalRequestProcessor, name=\"(\w+-\w+)-(\d+)\"><>(\w+):'
  name: tomcat_$3_total
  labels:
    port: "$2"
    protocol: "$1"
  help: Tomcat global $3
  type: COUNTER
- pattern: 'Catalina<j2eeType=Servlet, WebModule=//([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), name=([-a-zA-Z0-9+/$%~_-|!.]*), J2EEApplication=none, J2EEServer=none><>(requestCount|maxTime|processingTime|errorCount):'
  name: tomcat_servlet_$3_total
  labels:
    module: "$1"
    servlet: "$2"
  help: Tomcat servlet $3 total
  type: COUNTER
- pattern: 'Catalina<type=ThreadPool, name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|keepAliveCount|pollerThreadCount|connectionCount):'
  name: tomcat_threadpool_$3
  labels:
    port: "$2"
    protocol: "$1"
  help: Tomcat threadpool $3
  type: GAUGE
- pattern: 'Catalina<type=Manager, host=([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), context=([-a-zA-Z0-9+/$%~_-|!.]*)><>(processingTime|sessionCounter|rejectedSessions|expiredSessions):'
  name: tomcat_session_$3_total
  labels:
    context: "$2"
    host: "$1"
  help: Tomcat session $3 total
  type: COUNTER
EOF
 
 
 
收集tomcat数据
Jar包应用
安装github上的方式启动就好了
java -javaagent:./jmx_prometheus_javaagent-0.12.0.jar=8080:config.yaml -jar yourJar.jar
 
Tomcat war包应用
进入bin目录$TOMCAT_HOME/bin,将jmx_exporter.jar包文件和config.yaml文件复制到这里。然后修改里面的一个catalina.sh的脚本,找到JAVA_OPTS,加上以下配置(代理):
如果有多tomcat,建议将jmx_prometheus_javaagent和config.yaml文件放到固定的目录,$TOMCAT_HOME/bin/catalina.sh文件中写绝对路径.
#修改bin/catalina.sh 文件 添加: JAVA_OPTS="-javaagent:bin/jmx_prometheus_javaagent-0.12.0.jar=39081:bin/config.yaml"
 
如果是war包应用
-Djava.util.logging.config.file=/path/to/logging.properties
 
7. prometheus修改默认不可监控服务
1. prometheus-operator-kube-etcd和prometheus-operator-kube-proxy异常
 
1. prometheus-operator-kube-etcd,查看配置文件发现,默认监听端口号为 1381
curl http://127.0.0.1:2381/metrics | head
. /etc/kubernetes/manifests/etcd.yaml
修改- --listen-metrics-urls=http://127.0.0.1:2381 为本机地址
- --listen-metrics-urls=http://0.0.0.0:2381
. kubectl edit svc prometheus-operator-kube-etcd -n kube-system 修改svc监听端口端口号为1381
 
2. prometheus-operator-kube-proxy异常
访问发现,他也是只监听127.0.0.1端口
[root@ops1 manifests]# kubectl get svc prometheus-operator-kube-proxy -o yaml -n kube-system
 
[root@ops1 manifests]# kubectl get ds kube-proxy -o yaml -n kube-system
kubectl edit cm kube-proxy -n kube-system
改为监听所有地址
 
8. 删除prometheus服务
如果crd不需要也要一起删除
helm del --purge prometheus-operator
# removed CRDS
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
kubectl get crd | grep monitoring

猜你喜欢

转载自www.cnblogs.com/wangzhangtao/p/12708667.html