Prometheus 监控平面组件 scheduler controller-manager proxy

如何访问 KubeSphere Prometheus 控制台

KubeSphere 监控引擎由 Prometheus 提供支持。出于调试目的,您可能希望通过 NodePort 访问内置的 Prometheus 服务,请运行以下命令将服务类型更改为 NodePort

kubectl edit svc -n kubesphere-monitoring-system prometheus-k8s


  ports:
  - name: web
    nodePort: 30066   ############
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: NodePort    #############

- job_name: kubesphere-monitoring-system/kube-scheduler/0
  honor_timestamps: true
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  follow_redirects: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kube-scheduler
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https-metrics
    action: replace
  metric_relabel_configs:
  - source_labels: [__name__]
    separator: ;
    regex: scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds)
    replacement: $1
    action: drop
  kubernetes_sd_configs:
  - role: endpoints
    follow_redirects: true
    namespaces:
      names:
      - kube-system
[root@ks-master ~]# kubectl get svc -n kube-system
NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
kube-scheduler-svc            ClusterIP   None         <none>        10259/TCP                      43h
[root@master manifests]# vim kube-scheduler.yaml 
[root@master manifests]# pwd
/etc/kubernetes/manifests
[root@master manifests]# ls
kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml

[root@master ~]# kubectl get pod -n kube-system -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
kube-scheduler-master                     1/1     Running   7          16d   192.168.100.5   master   <none>           <none>
[root@master ~]# kubectl get pod kube-scheduler-master -n kube-system --show-labels
NAME                    READY   STATUS    RESTARTS   AGE   LABELS
kube-scheduler-master   1/1     Running   7          16d   component=kube-scheduler,tier=control-plane

 What did you expect to see?

I expected the recommended services find the endpoints for the kube-scheduler and kube-controller-manager. From those docs here's the kube-scheduler discovery service:

spec:
  clusterIP: None
  clusterIPs:
  - None
  ports:
  - name: https-metrics
    port: 10259
    protocol: TCP
    targetPort: 10259
  selector:
    component: kube-scheduler
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}




apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler-prometheus-discovery
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  type: ClusterIP
  ports:
  - name: http-metrics
    port: 10259
    targetPort: 10259
    protocol: TCP
[root@master ~]# kubectl get svc -n kube-system
NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
kube-scheduler-prometheus-discovery   ClusterIP   10.233.31.176   <none>        10259/TCP                4s


[root@master ~]# kubectl get ep -n kube-system
NAME                                  ENDPOINTS                                                   AGE
kube-scheduler-prometheus-discovery   192.168.100.5:10259                                         31s

 

After deploy prometheus, it shows x509: certificate is valid for apiserver, not kubernetes.default.svc · Issue #2088 · prometheus/prometheus · GitHubhttps://github.com/prometheus/prometheus/issues/2088 

[root@master ~]# cat kube-scheduler-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler-prometheus-discovery
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  type: ClusterIP
  clusterIP: None
  ports:
  - name: https-metrics
    port: 10259
    targetPort: 10259
    protocol: TCP





    - job_name: 'kubernetes-scheduler'
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        action: keep
        regex: kube-system;kube-scheduler-prometheus-discovery

scheduler  暴露10251端口


版本是1.19+,如果k8s版本是1.19这个版本,可以看到这些都是拒绝的,因为这些端口都没有暴露出来,所以在监控这些组件的时候,会发现监控不了。如何将这些端口暴露出来呢?

[root@master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused   
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
etcd-0               Healthy     {"health":"true"} 
[root@master ~]# cd /etc/kubernetes/manifests/
[root@master manifests]# ls
kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml



修改如下:
 - --port=0   #去掉这行
 - --bind-address=192.168.100.5  #127.0.0.1修改为本机的ip地址
[root@master ~]# netstat -tpln | grep 10251
tcp6       0      0 :::10251                :::*                    LISTEN      28249/kube-schedule 
[root@master ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused   
scheduler            Healthy     ok                                                                                            
etcd-0               Healthy     {"health":"true"}  

 

controller-manager  暴露10252端口


[root@master manifests]# vim  kube-controller-manager.yaml 


修改如下:
 - --port=0   #去掉这行
 - --bind-address=192.168.100.5  #127.0.0.1修改为本机的ip地址

让上面全部失效

systemctl restart kubelet

 

 

kube-proxy


可以看到kube-proxy还是绑定本机127.0.0.1 lo网卡上,怎么绑定到本机网卡呢

[root@master ~]# netstat -tpln | grep 10249
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      3122/kube-proxy     
You have new mail in /var/spool/mail/root


[root@node1 ~]# netstat -tpln | grep 10249
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      3296/kube-proxy  


[root@node2 ~]# netstat -tpln | grep 10249
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      3897/kube-proxy  

这里不能修改为具体的ip了,因为kube-proxy在每个节点都有,所以不能变为具体的IP

[root@master prometheus]# kubectl edit configmap kube-proxy -n kube-system

  metricsBindAddress: "0.0.0.0:10249"

修改之后保存,将这些pod删除掉

[root@master manifests]# kubectl delete pod kube-proxy-5zxsr -n kube-system
pod "kube-proxy-5zxsr" deleted
[root@master manifests]# kubectl delete pod kube-proxy-bwm6f -n kube-system
pod "kube-proxy-bwm6f" deleted
[root@master manifests]# kubectl delete pod kube-proxy-x7mb4  -n kube-system
pod "kube-proxy-x7mb4" deleted
kubectl get pods -n kube-system | grep kube-proxy |awk '{print $1}' | xargs kubectl delete pods -n kube-system
[root@master ~]# netstat -tpln | grep 10249
tcp6       0      0 :::10249                :::*                    LISTEN      740/kube-proxy 

[root@node1 ~]# netstat -tpln | grep 10249
tcp6       0      0 :::10249                :::*                    LISTEN      7910/kube-proxy  


[root@node2 ~]# netstat -tpln | grep 10249
tcp6       0      0 :::10249                :::*                    LISTEN      17547/kube-proxy  

所以在安装好k8s之后,想要监控scheduler ,controller-manager,kube-proxy这些,最好按照上面步骤修改。修改之后重启kubelet和删除kube-proxy。(不要在业务已经跑起来的时候去做这些操作!)

Guess you like

Origin blog.csdn.net/qq_34556414/article/details/121451439