文章目录

1、前言
2、安装 PolarDB-X Monitor
3、创建PolarDBXMonitor对象
4、访问 Grafana Dashboard
5、配置 Prometheus 和 Grafana

1、前言

本文介绍如何在 K8s 集群中为 PolarDB-X 数据库配置监控。

2、安装 PolarDB-X Monitor

PolarDB-X 通过 Prometheus 和 Grafana 来监控 PolarDB-X 集群。PolarDB-X Monitor 集成了 kube-promethus 组件栈，通过安装 PolarDB-X Monitor 即可一键部署监控所需的资源和组件。

1、前置要求

已经准备了一个运行中的 K8s 集群，并确保集群版本 >= 1.18.0
已经安装了 Helm 3
已经安装 PolarDB-X Operator 1.2.0 及以上的版本

2、Helm 包安装

首先创建一个名为 polardbx-monitor 的命名空间:

kubectl create namespace polardbx-monitor

安装 PolarDBXMonitor CRD:

注意：如果您的 PolarDB-X Operator 1.2.0 是通过 helm install 直接安装的，PolarDBXMonitor 的 CRD 会默认安装，可以跳过这步。如果您的 PolarDB-X Operator 是从1.1.0 及以下的低版本通过 helm upgrade 升级而来，需要执行如下命令手工安装：

kubectl apply -f https://raw.githubusercontent.com/ApsaraDB/galaxykube/v1.1.0/charts/polardbx-operator/crds/polardbx.aliyun.com_polardbxmonitors.yaml

执行如下命令安装 PolarDB-X Monitor：

helm install --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.2.0.tgz

您也可以通过 PolarDB-X 的 Helm Chart 仓库安装:

helm repo add polardbx https://polardbx-charts.oss-cn-beijing.aliyuncs.com
helm install --namespace polardbx-monitor polardbx-monitor polardbx/polardbx-monitor

注：通过这种方式安装 Prometheus 和 Grafana 采用的都是默认配置便于快速体验。如果部署在生产集群，你可以参考: 配置 Prometheus + Grafana

注：如果您是在 minikube 上安装 PolarDB-X Monitor, 可能会因为资源不够导致组件无法创建，可以参考: 配置 Prometheus + Grafana

期望看到如下输出：

polardbx-operator monitor plugin is installed. Please check the status of components:

    kubectl get pods --namespace {
    
    {
    
     .Release.Namespace }}

Now start to monitor your polardbx cluster.

PolarDB-X Monitor 安装完成后，会在您 K8s 集群的 polardbx-monitor 命名空间下创建 prometheus 和 grafana 等组件，以此来监控 K8s 内的 PolarDB-X，通过如下命令检查相关组件是否正常，确认所有的 pod 都处于 Running 状态。

kubectl get pods -n polardbx-monitor

3、创建PolarDBXMonitor对象

PolarDB-X 集群的监控采集功能默认是关闭的，需要为您需要监控的 PolarDBXCluster 创建 PolarDBXMonitor对象进行开启。

kubectl apply -f polardbx-monitor.yaml

其中 polardbx-monitor.yaml 的yaml 描述如下:

apiVersion: polardbx.aliyun.com/v1
kind: PolarDBXMonitor
metadata:
  name: quick-start-monitor
spec:
  clusterName: quick-start
  monitorInterval: 30s
  scrapeTimeout: 10s

spec.clusterName: 需要开启监控的 PolarDB-X 集群名称
spec.monitorInterval: 监控数据采集频率，默认30s
spec.scrapeTimeout: 监控数据采集的超时时间，默认10s。注意：scrapeTimeout 的值需要小于 monitorInterval

4、访问 Grafana Dashboard

默认情况下执行如下命令将 Grafana 端口转发到本地：

kubectl port-forward svc/grafana -n polardbx-monitor 3000

在浏览器中输入: http://localhost:3000, 即可访问到 PolarDB-X Dashboard，默认的用户名和密码都是 admin。

注：由于 Grafana 的配置存储在 ConfigMap 中，您在 Grafana 中修改的密码或者新增的 Dashboard 不会被持久化，一旦 Grafana Pod 重建，这部分配置会丢失，请注意提前保存。

在这里插入图片描述

如果您的 K8s 集群中支持 LoadBalancer，你可以为 Grafana 的 Service 配置 LoadBalancer 进行访问，参考：

如果您的 K8s 集群内有多个 PolarDB-X Cluster，可以通过 Grafana 页面上面的下拉框切换 Namespace 和 PolarDB-X Cluster。

访问 Prometheus：

默认情况下执行如下命令将 Prometheus 端口转发到本地：

kubectl port-forward svc/prometheus-k8s -n polardbx-monitor 9090

在浏览器中输入: http://localhost:9090, 即可访问到 Prometheus页面。

5、配置 Prometheus 和 Grafana

PolarDB-X Monitor 的 helm chart 采用了默认的 Prometheus 和 Grafana 配置，如果您想修改相关配置，可以使用如下的命令安装或者升级 PolarDB-X Monitor，通过 values.yaml 覆盖默认的配置。

helm install --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.2.0.tgz -f values.yaml

#或者：

helm upgrade --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.2.0.tgz -f values.yaml

values.yaml 文件包含了 Prometheus 和 Grafana 的相关配置项，下面针对常见的几种场景给出配置示例，详细的配置列表如下：

# monitor related components
monitors:
  # AlertManager related config.
  # AlertManager will not be installed by default
  alertManager:
    # AlertManager docker repo and image. It can be set to your private repo.
    repo: quay.io/prometheus
    image: alertmanager
    # AlertManager version
    version: v0.21.0
    # AlertManager replicas
    replicas: 0
    # AlertManager ports default: 9093
    port: 9093
    # The node selector labels, AlterManager pod will deploy only on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux
    # ServiceAccount configuration of AlertManager. If ServiceAccount exists, set create to false
    serviceAccount:
      create: true

    # AlertManager service type, default: ClusterIP. Support NodePort, LoadBalancer, ClusterIP
    serviceType: ClusterIP

    # AlertManager resources, default is empty
    resources: {
    
    }

  # Grafana related configurations
  grafana:
    # Grafana docker repo and image. It can be set to your private repo.
    repo: grafana
    image: grafana
    version: 7.5.3
    # Grafana replicas, you can scale out/in grafana by this value.
    replicas: 1
    # Grafana port, default: 3000
    port: 3000
    # Grafana service type, default: ClusterIP. Support NodePort, LoadBalancer, ClusterIP
    serviceType: ClusterIP

    # Grafana resources configuration
    resources:
      requests:
        cpu: 2000m
        memory: 4Gi
      limits:
        cpu: 4000m
        memory: 8Gi
    # The node selector labels, Grafana pod will deploy only on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux

    # ServiceAccount configuration of Grafana. If ServiceAccount exists, set create to false
    serviceAccount:
      create: true

  # KubeStateMetrics related configuration.
  # kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics
  # about the state of the objects.
  kubeStateMetrics:
    repo: quay.io/coreos
    image: kube-state-metrics
    version: v1.9.7
    replicas: 1

    # The node selector labels, KubeStateMetrics pod will deploy only on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux

    # RBAC and ServiceAccount of KubeStateMetrics
    rbac:
      create: true
    serviceAccount:
      create: true

    resources: {
    
    }

  # KubeRBACProxy is a small HTTP proxy for a single upstream, that can perform RBAC authorization
  # against the Kubernetes API using SubjectAccessReview.
  kubeRBACProxy:
    repo: quay.io/brancz
    image: kube-rbac-proxy
    version: v0.8.0

    resources: {
    
    }

  # NodeExporter is Prometheus exporter for hardware and OS metrics exposed by *NIX kernels,
  # written in Go with pluggable metric collectors.
  # It is a DaemonSet deployed in K8s
  nodeExporter:
    # NodeExporter docker repo and image
    repo: quay.io/prometheus
    image: node-exporter
    version: v1.0.1
    # Node exporter port, update this value if port conflict
    port: 9100

    # Node exporter RBAC and ServiceAccount
    rbac:
      create: true
    serviceAccount:
      create: true

    # Node exporter resources, you can scale up/down
    resources:
      limits:
        cpu: 250m
        memory: 512Mi
      requests:
        cpu: 100m
        memory: 128Mi

  # PrometheusAdapter is an implementation of the custom.metrics.k8s.io API using Prometheus
  prometheusAdapter:
    # PrometheusAdapter docker repo and image
    repo: directxman12
    image: k8s-prometheus-adapter
    version: v0.8.2
    # PrometheusAdapter replicas
    replicas: 1

    # The node selector labels, PrometheusAdapter pod will deploy only on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux

    # RBAC and ServiceAccount of PrometheusAdapter
    rbac:
      create: true
    serviceAccount:
      create: true

    resources: {
    
    }

  # Prometheus configuration
  prometheus:
    # Docker repo and image of Prometheus
    repo: quay.io/prometheus
    image: prometheus
    version: v2.22.1
    # Prometheus node replicas, you can scale out/in prometheus by this value
    replicas: 1
    # Prometheus service type, default: ClusterIP. Support NodePort, LoadBalancer, ClusterIP
    serviceType: ClusterIP

    # Prometheus node selector. The prometheus pod will deployed on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux

    # Monitor data retention time
    retention: 1w
    # Monitor data retention size
    retentionSize: 200GB
    # Prometheus resources
    resources:
      requests:
        cpu: 4000m
        memory: 8Gi
      limits:
        cpu: 8000m
        memory: 16Gi

    # Prometheus RBAC and ServiceAccount
    rbac:
      create: true
    serviceAccount:
      create: true

    # Monitor data persist configuration, default is false
    persist: false
    # If persist is true, the monitor data storage class
    storageClassName: ssd
    # If persist is true, the monitor data storage size request
    storageRequest: 40G

  # The Prometheus Operator provides Kubernetes native deployment and management of Prometheus and related monitoring
  # components. The purpose of this component is to simplify and automate the configuration of a Prometheus based
  # monitoring stack for Kubernetes clusters.
  prometheusOperator:
    repo: quay.io/prometheus-operator
    image: prometheus-operator
    version: v0.44.1
    replicas: 1

    # The node selector labels, PrometheusOperator pod will deploy only on the nodes with such labels
    nodeSelector:
      kubernetes.io/os: linux

    # PrometheusOperator RBAC and ServiceAccount
    rbac:
      create: true
    serviceAccount:
      create: true

    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 100m
        memory: 100Mi

  # PrometheusOperator related component
  prometheusConfigReloader:
    repo: quay.io/prometheus-operator
    image: prometheus-config-reloader
    version: v0.44.1

1、配置 LoadBalancer**

如果您的 K8s 集群支持 LoadBalancer，可以在安装或者升级 PolarDB-X Monitor 的时候通过 -f 参数指定如下配置：

monitors:
  grafana:
    serviceType: LoadBalancer
  prometheus:
    serviceType: LoadBalancer

2、持久化监控数据

默认配置创建的 Prometheus 集群的监控数据是不持久化的，存在数据丢失的风险，您可以通过如果的values.yaml 指定数据持久化的目录:

monitors:
  prometheus:
    persist: true
    # K8s 集群内支持的 storage class
    storageClassName: ssd
    # 存储空间的大小
    storageRequest: 100G

3、配置 Prometheus 和 Grafana 规格

默认配置中，Prometheus 集群包含1个节点，每个节点限定8C16G资源，Grafana包含1个节点，每个节点限定4C8G的资源，您可以通过如下配置项修改 Prometheus 和 Grafana集群的规格和节点数量：

monitors:
  grafana:
    resources:
      requests:
        cpu: 1000m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 8Gi
  prometheus:
    resources:
      requests:
        cpu: 1000m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 8Gi

猿创征文 | 国产数据库之在K8s集群中为 PolarDB-X 数据库配置监控