Prometheus-Operator 自定义配置

选择Prometheus-Operator：

因为是prometheus主动去拉取的,所以在k8s里pod因为调度的原因导致pod的ip会发生变化,人工不可能去维持,自动发现有基于DNS的,但是新增还是有点麻烦。

Prometheus-operator的本职就是一组用户自定义的CRD资源以及Controller的实现，Prometheus Operator这个controller有BRAC权限下去负责监听这些自定义资源的变化，并且根据这些资源的定义自动化的完成如Prometheus Server自身以及配置的自动化管理工作。

promethues-operator的github地址：https://github.com/coreos/prometheus-operator.git

在按照github部署玩成promethues-operator之后，你会发现原始的配置文件是满足不了个人的需求的，比如我想实现邮件报警，以及增加grafana的Dashboard或者是datasource，如果直接在界面上增加的话容器如果自动重启或者发布的话数据就不存在了，因为promethues-operator中的grafana的数据是以secret存放并且通过挂载的形式固化在容器里，每次重启之后就会重新读取该secret，那么通过界面配置的就会丢失，所以我们需要自定义配置重写secret。

自定义配置我们需要Jsonnet jb

jb 安装：

go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb

jsonnet安装：

go get github.com/google/go-jsonnet/cmd/jsonnet

go get github.com/brancz/gojsontoyaml

$ mkdir my-kube-prometheus; cd my-kube-prometheus

$ jb init # Creates the initial/empty `jsonnetfile.json` # Install the kube-prometheus dependency

$ jb install github.com/coreos/kube-prometheus/jsonnet/[email protected]

# Creates `vendor/` & `jsonnetfile.lock.json`, and fills in `jsonnetfile.json`

那么在my-kube-prometheus的目录结构就是：jsonnetfile.json jsonnetfile.lock.json vendor

之后我们需要将 git clone https://github.com/coreos/kube-prometheus.git克隆下来

进入该目录/root/prometheus-operator/contrib/kube-prometheus 将build.sh和example.jsonnet 复制到

my-kube-prometheus的目录下

之后复制并修改example.jsonnet

其中es-dashboard.json就是grafana的dashboard的json数据我们只要将grafana的json数据通过import导入就可以了，所以我们要在此目录下新建一个json文件es-dashboard.json

json的数据通过grafana 临时生成复制出json：

将以上json复制出来到es-dashboard.json，这样dashboard的数据就完成了

之后新加一个datasource用以支撑es-dashboard.json

修改grafana.libsonnet：

vim vendor/grafana/grafana.libsonnet

新增es的数据源：

{

name: 'es',

type: 'elasticsearch',

url: 'http://elasticsearch-api.kube-system.svc.cluster.local:9200',

access: 'proxy',

database: '[java-]YYYY.MM.DD',

jsonData: {

esVersion: '56',

interval: 'Daily' ,

maxConcurrentShardRequests: '2560',

timeField: "@timestamp",

之后通过./build.sh example.jsonnet 直接build，如果成功的话会生成manifests文件，并且该文件里会存在多个yaml文件：

如果之前已经按照官网的部署了那么我们只要replace grafana-dashboardDatasources.yaml，grafana-dashboardDefinitions.yaml 并且重启pod就可以了

效果图：

之后无论怎么重启配置将不会丢失

同样邮件也是类似，我这里直接贴我的配置

邮件报警配置路径在：/root/my-kube-prometheus/vendor/kube-prometheus/alertmanager

修改alertmanager.libsonnet：

_config+:: {

namespace: 'default',

versions+:: {

alertmanager: 'v0.16.1',

imageRepos+:: {

alertmanager: 'quay.io/prometheus/alertmanager',

alertmanager+:: {

name: $._config.alertmanager.name,

config: {

global: {

resolve_timeout: '5m',

smtp_smarthost: 'smtp.mxhichina.com:465',

smtp_from: '*****************',

smtp_auth_username: '******************',

smtp_auth_password: '**********************,

smtp_require_tls: false,

route: {

group_by: ['job'],

group_wait: '30s',

group_interval: '5m',

repeat_interval: '12h',

receiver: 'mail',

routes: [

{

receiver: 'mail',

match: {

alertname: 'DeadMansSwitch',

坑：

alertmanager 配置邮件报错

time="2019-03-03T08:46:47Z" level=error msg="Error on notify: require_tls: true (default), but "smtp.icoremail.net:465" does not advertise the STARTTLS extension" source="notify.go:283"

time="2019-03-03T08:46:47Z" level=error msg="Notify for 1 alerts failed: require_tls: true (default), but "smtp.icoremail.net:465" does not advertise the STARTTLS extension" source="dispatch.go:262"

将 require_tls这是为false之后报错：

time="2019-03-03T08:29:52Z" level=error msg="Error on notify: *smtp.plainAuth failed: wrong host name" source="notify.go:283"

time="2019-03-03T08:29:52Z" level=error msg="Notify for 1 alerts failed: *smtp.plainAuth failed: wrong host name" source="dispatch.go:262"

最后在设置false的基础上升级了alertmanager的版本从v0.15.2 升级到v0.16.1 解决

噜噜噜的博客

发布了49 篇原创文章 · 获赞 39 · 访问量 6万+

私信关注

Prometheus-Operator 自定义配置

猜你喜欢