istio集成prometheus-operator, alertmanager

目录

istio-tools

下载安装包

修改官方提供的脚本setup_istio.sh

创建证书

生成证书

 创建证书

证书集成到prometheus

prometheus没有收集到istio指标

查看ns的label

发现istio-injection被禁用了,因此启用istio-injection

prometheus-operator 集成alertmanager,接管告警规则

关于envoy proxy

一些告警表达式




istio-tools

https://github.com/istio/tools/tree/master/perf/istio-install

官方给出的文档是先在GCP上安装集群,然后安装istio,以及prometheus-operator。

直接在已有集群上执行这个操作。

下载安装包

如果使用官方setup_istio.sh里面的内容去下载的话会很慢,里面是从storage.googleapis.com下载。

先下载istio安装包:istio-1.6.1-linux-amd64.tar.gz

// 默认下载最新版本的istio
curl -L https://istio.io/downloadIstio | sh -

// 下载1.6.1版本的istio
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.6.1 sh -

修改官方提供的脚本setup_istio.sh

脚本里面的GO111MODULE=on默认就有的,没有删掉,可以试试删掉会不会有影响。

脚本里面的DNS_DOMAIN="istio-test.local", 是自己添加的,如果没有这个值的话,后面istio-gateway.yaml安装会报错。

脚本会安装istio相关组件,以及prometheus-operator。

我把prometheus-operator安装在了istio-system这个namespace下了。

#!/usr/bin/bash

# Copyright Istio Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -ex

WD=$PWD
DIRNAME="${WD}"
mkdir -p "${DIRNAME}"
export GO111MODULE=on
export DNS_DOMAIN="istio-test.local"

# 解压istio到当前目录
tar xvf istio-1.6.1-linux-amd64.tar.gz 

function install_istioctl() {
  istioctl manifest apply --skip-confirmation -d ./istio-1.6.1/manifests 
}

function install_extras() {
  local domain=${DNS_DOMAIN:-"DNS_DOMAIN like v104.qualistio.org"}
  kubectl create namespace istio-system|| true
  # Deploy the gateways and prometheus operator.
  # We install the prometheus operator first, then deploy the CR, to wait for the CRDs to get created
  helm template --set domain="${domain}" --set prometheus.deploy=false "${WD}/base" | kubectl apply -f -
  # Check CRD
  CMDs_ARR=('kubectl get crds/prometheuses.monitoring.coreos.com' 'kubectl get crds/alertmanagers.monitoring.coreos.com'
  'kubectl get crds/podmonitors.monitoring.coreos.com' 'kubectl get crds/prometheusrules.monitoring.coreos.com'
  'kubectl get crds/servicemonitors.monitoring.coreos.com')
  for CMD in "${CMDs_ARR[@]}"
  do
    MAXRETRIES=0
    until $CMD || [ $MAXRETRIES -eq 60 ]
    do
      MAXRETRIES=$((MAXRETRIES + 1))
      sleep 5
    done
    if [[ $MAXRETRIES -eq 60 ]]; then
      echo "crds were not created successfully"
      exit 1
    fi
  done
  # Redeploy, this time with the Prometheus resource created
  helm template --set domain="${domain}" "${WD}/base" | kubectl apply -f -
  # Also deploy relevant ServiceMonitors
  "istioctl" manifest generate --set profile=empty --set addonComponents.prometheusOperator.enabled=true -d ./istio-1.6.1/manifests | kubectl apply -f -
}

#download_release
install_istioctl "${DIRNAME}/istio-1.6.1" 

if [[ -z "${SKIP_EXTRAS:-}" ]]; then
  install_extras
fi

创建证书

进入之前解压好的istio1.6.1目录

生成证书

NAME固定为istio.prometheus,因为prometheus需要这个名称的secret

NAMESPACE为当前的istio-system

 make -f ./istio-1.6.1/tools/certs/Makefile NAME="istio.prometheus" NAMESPACE="istio-system" "prometheus"-certs-wl

 创建证书

参考:https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/

kubectl create secret generic istio.prometheus -n istio-system \
    --from-file=prometheus/ca-cert.pem \
    --from-file=prometheus/ca-key.pem \
	--from-file=prometheus/root-cert.pem \
    --from-file=prometheus/cert-chain.pem \
	--from-file=prometheus/key.pem \
	--from-file=prometheus/workload-cert-chain.pem

证书集成到prometheus

prometheus日志报错:

level=error ts=2020-07-08T21:20:20.023Z caller=manager.go:188 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to use specified client cert (/etc/prometheus/secrets/istio.prometheus/cert-chain.pem) & key (/etc/prometheus/secrets/istio.prometheus/key.pem): tls: private key does not match public key" scrape_pool=istio-system/kubernetes-services-secure-monitor/0

原因:

缺少了workload-cert-chain.pem

prometheus没有收集到istio指标

集成完成后,发现并没有收集到istio的相关指标。

查看ns的label

k get ns -A --show-labels

发现istio-injection被禁用了,因此启用istio-injection

kubectl label ns istio-system istio-injection=enabled --overwrite

 启用完成后,即可看到istio的指标已经收集到prometheus了。

prometheus-operator 集成alertmanager,接管告警规则

1. 找到  https://github.com/istio/tools/blob/master/perf/istio-install/base/templates/prometheus-install.yaml

2. 修改prometheus-install.yaml, 集成alertmanager

     ````

      spec:

        alerting:

            alertmanagers:

            - name: alertmanager-main

               namespace: istio-system

               port: web

      ...         

3.  添加ruleSelector

 ruleSelector:

     matchLabels: 

       prometheus: k8s

       role: alert-rules

关于envoy proxy

https://blog.getambassador.io/understanding-envoy-proxy-and-ambassador-http-access-logs-fee7802a2ec5

一些告警表达式

https://banzaicloud.com/blog/istio-telemetry/

结合实际情况替换:namespace, service, response_code, response_flags, reporter,span

rule:
// 请求量
  request: sum(rate(istio_requests_total{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica)  {{.operation}} {{.threshold}}

// 平均响应延迟
  latency-avg: (avg(rate(istio_request_duration_milliseconds_sum{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m]) by (namespace, destination_app, prometheus_replica) )/(avg(rate(istio_request_duration_milliseconds_count{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app,prometheus_replica)) {{.operation}} {{.threshold}}

// 50%响应延迟

  latency-50: histogram_quantile(0.5, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}

// 90%响应延迟
  latency-90: histogram_quantile(0.9, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}

// 99%延迟
  latency-99: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}

// 响应错误比例(4xx,5xx)  
response-code: sum(rate(istio_request_duration_milliseconds_sum{namespace = "{{.namespace}}", destination_app = "{{.svcName}}", response_code=~"^40.*|^50.*"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica) / sum(rate(istio_request_duration_milliseconds_count{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica) {{.operation}} {{.threshold}}

猜你喜欢

转载自blog.csdn.net/u010918487/article/details/107221329