confd+etcd 自动化管理Prometheus


各类地址
confd网址
https://github.com/kelseyhightower/confd/releases
https://github.com/kelseyhightower/confd/blob/master/docs/templates.md

etcd网址
https://github.com/coreos/etcd/

alertmanager网址
https://github.com/prometheus/alertmanager

Prometheus网址
https://github.com/prometheus

grafana网址

https://github.com/grafana/grafana

node_exporter网址
https://github.com/prometheus/node_exporter

软件放置位置均在 /root 目录下
各个软件的启动
/root/alertmanager/alertmanager --config.file=/root/alertmanager/alertmanager.yml
/root/etcd-v3.4.15/etcd
/root/prometheus/prometheus --web.enable-lifecycle  --config.file=/root/prometheus/prometheus.yml --storage.tsdb.path=/root/prometheus/data
/root/node_exporter/node_exporter
/root/grafana-7.0.3/bin/grafana-server -homepath /root//grafana-7.0.3

Prometheus启动参数 --web.enable-lifecycle 是为了可以使用api重新加载prometheus配置文件
etcd的配置文件如下
[root@bogon alertmanager]# cat /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/"
ETCD_LISTEN_CLIENT_URLS="http://192.168.73.101:2379"
ETCD_NAME="default"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.73.101:2379"

自动化管理Prometheus
confd的conf.d及templates文件如下
[root@bogon conf.d]# cat /etc/confd/conf.d/prometheus.conf.toml
[template]
#prefix = "/prometheus"
src = "prometheus.yml.tmpl"
dest = "/root/prometheus/prometheus.yml"
mode = "0755"
keys = [
"/job/",
]
reload_cmd = "curl -XPOST 'http://192.168.73.100:9090/-/reload'"

[root@bogon templates]# cat /etc/confd/templates/prometheus.yml.tmpl
# 全局配置
global:
  scrape_interval:     15s # 设置抓取(pull)时间间隔,默认是1m
  evaluation_interval: 15s # 设置rules评估时间间隔,默认是1m
  # scrape_timeout is set to the global default (10s).

# 告警管理配置,暂未使用,默认配置
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

# 加载rules,并根据设置的时间间隔定期评估,暂未使用,默认配置
rule_files:
  - /root/prometheus/alert.rules
  - /root/prometheus/prometheus.rules
  # - "first_rules.yml"
  # - "second_rules.yml"

# 抓取(pull),即监控目标配置
# 默认只有主机本身的监控配置
scrape_configs:
  # 监控目标的label(这里的监控目标只是一个metric,而不是指某特定主机,可以在特定主机取多个监控目标),在抓取的每条时间序列表中都会添加此label
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    # 可覆盖全局配置设置的抓取间隔,由15秒重写成5秒。
    scrape_interval: 5s

    # 静态指定监控目标,暂不涉及使用一些服务发现机制发现目标
    static_configs:
    - targets: ['192.168.73.100:9090']
        # (opentional)再添加一个label,标识了监控目标的主机
  - job_name: 'server'
    static_configs:
      - targets: ['192.168.73.100:9100']


  { {range $job_name := gets "/job/*"}}
  { {$jobJson := json $job_name.Value}}
  - job_name: '{ {$jobJson.name}}'
    scheme: '{ {$jobJson.scheme}}'
    metrics_path: '{ {$jobJson.metrics}}'
    static_configs:
  { {$target := printf "%s/*" $job_name.Key}}{ {range $ins_name := gets $target}}
  { {$insJson := json $ins_name.Value}}
    - targets: ['{ {$insJson.instance}}']
      labels:
        name: '{ {$insJson.name}}'
        ip: '{ {$insJson.ip}}'
  { {end}}
  { {end}}


confd启动
confd -watch -backend etcdv3 -node http://192.168.73.100:2379  &

etcd数据写入
 etcdctl --endpoints="http://192.168.73.100:2379" put /job/test '{"scheme":"http","metrics":"/metrics","name":"test"}'
 etcdctl --endpoints="http://192.168.73.100:2379" put /job/test/test2  '{"name":"test2","instance":"2.2.2.2:9093","ip":"2.2.2.2"}'


查看Prometheus配置文件
[root@localhost ~]# cat /root/prometheus/prometheus.yml
# 全局配置
global:
  scrape_interval:     15s # 设置抓取(pull)时间间隔,默认是1m
  evaluation_interval: 15s # 设置rules评估时间间隔,默认是1m
  # scrape_timeout is set to the global default (10s).

# 告警管理配置,暂未使用,默认配置
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

# 加载rules,并根据设置的时间间隔定期评估,暂未使用,默认配置
rule_files:
  - /root/prometheus/alert.rules
  - /root/prometheus/prometheus.rules
  # - "first_rules.yml"
  # - "second_rules.yml"

# 抓取(pull),即监控目标配置
# 默认只有主机本身的监控配置
scrape_configs:
  # 监控目标的label(这里的监控目标只是一个metric,而不是指某特定主机,可以在特定主机取多个监控目标),在抓取的每条时间序列表中都会添加此label
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    # 可覆盖全局配置设置的抓取间隔,由15秒重写成5秒。
    scrape_interval: 5s

    # 静态指定监控目标,暂不涉及使用一些服务发现机制发现目标
    static_configs:
    - targets: ['192.168.73.100:9090']
        # (opentional)再添加一个label,标识了监控目标的主机
  - job_name: 'server'
    static_configs:
      - targets: ['192.168.73.100:9100']
  
    - targets: ['2.2.2.2:9093']
      labels:
        name: 'test2'
        ip: '2.2.2.2'

查看Prometheus页面


告警规则自动化管理
告警规则定义可以根据KEY值写入不同的文件中

confd文件
[root@localhost conf.d]# cat /etc/confd/conf.d/HCYrules.conf.toml 
[template]
src = "HCYrules.yml.tmpl"
dest = "/tmp/HCYrules.yml"
mode = "0755"
keys = [
  "/rules/alert/HCY/",
  "/rules/alert/HCY/alert",
]
reload_cmd = "curl -XPOST 'http://192.168.73.100:9090/-/reload'"

[root@localhost conf.d]# 
[root@localhost conf.d]# cat /etc/confd/conf.d/HLYrules.conf.toml 
[template]
src = "HLYrules.yml.tmpl"
dest = "/tmp/HLYrules.yml"
mode = "0755"
keys = [
  "/rules/alert/HLY/",
  "/rules/alert/HLY/alert",
]
reload_cmd = "curl -XPOST 'http://192.168.73.100:9090/-/reload'"

templates文件
[root@localhost templates]# cat /etc/confd/templates/HCYrules.yml.tmpl 
groups:
  { {range $alert_name := gets "/rules/alert/HCY/*"}}
  { {$alertJson := json $alert_name.Value}}
  - name: { {$alertJson.name}}
    rules :
  { {$alert := printf "%s/*" $alert_name.Key}}{ {range $ins_name := gets $alert}}
  { {$insJson := json $ins_name.Value}}
    - alert: { {$insJson.alert}}
      expr: { {$insJson.expr}}
      for: { {$insJson.for}}
      Labels:
        severity:   { {$insJson.labels.serverity}}
      annotations :
        summary: { {$insJson.annotations.summary}}
        description: { {$insJson.annotations.description}}
{ {end}}
{ {end}}

[root@localhost templates]# 
[root@localhost templates]# cat /etc/confd/templates/HLYrules.yml.tmpl 
groups:
  { {range $alert_name := gets "/rules/alert/HLY/*"}}
  { {$alertJson := json $alert_name.Value}}
  - name: { {$alertJson.name}}
    rules :
  { {$alert := printf "%s/*" $alert_name.Key}}{ {range $ins_name := gets $alert}}
  { {$insJson := json $ins_name.Value}}
    - alert: { {$insJson.alert}}
      expr: { {$insJson.expr}}
      for: { {$insJson.for}}
      Labels:
        severity:   { {$insJson.labels.serverity}}
      annotations :
        summary: { {$insJson.annotations.summary}}
        description: { {$insJson.annotations.description}}
{ {end}}
{ {end}}


prometheus配置文件修改,包括confd中templates中的prometheus.yml.tmpl文件
新加入/tmp/HCYrules.yml和/tmp/HLYrules.yml

# 加载rules,并根据设置的时间间隔定期评估,暂未使用,默认配置
rule_files:
  - /root/prometheus/alert.rules
  - /root/prometheus/prometheus.rules
  - /tmp/HCYrules.yml
  - /tmp/HLYrules.yml
  # - "first_rules.yml"
  # - "second_rules.yml"


confd启动
confd -watch -backend etcdv3 -node http://192.168.73.100:2379  &


etcd数据写入
etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HCY/alerrule '{"name":"HCY"}'
etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HCY/alerrule/alerrule '{"alert":"HCY","expr":"up == 0","for":"1m","labels":{"serverity":"page"},"annotations":{"summary":"hcy","description":"hcy"}}'
etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HCY/alerrule/alerrules '{"alert":"HCY","expr":"up == 0","for":"1m","labels":{"serverity":"page"},"annotations":{"summary":"hcy","description":"hcy"}}'

etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HLY/hello '{"name":"HLY"}'
etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HLY/hello/helloworld '{"alert":"HLY","expr":"up == 0","for":"1m","labels":{"serverity":"page"},"annotations":{"summary":"hcy","description":"hcy"}}'
etcdctl --endpoints="http://192.168.73.100:2379" put /rules/alert/HLY/hello/helloworlds '{"alert":"HLY","expr":"up == 0","for":"1m","labels":{"serverity":"page"},"annotations":{"summary":"hcy","description":"hcy"}}'

查看新产生的告警规则文件
[root@localhost conf.d]# cat /tmp/HCYrules.yml 
groups:
  
  
  - name: HCY
    rules :
  
  
    - alert: HCY
      expr: up == 0
      for: 1m
      Labels:
        severity:   page
      annotations :
        summary: hcy
        description: hcy

  
    - alert: HCY
      expr: up == 0
      for: 1m
      Labels:
        severity:   page
      annotations :
        summary: hcy
        description: hcy

[root@localhost conf.d]# 
[root@localhost conf.d]# cat /tmp/HLYrules.yml 
groups:
  
  
  - name: HLY
    rules :
  
  
    - alert: HLY
      expr: up == 0
      for: 1m
      Labels:
        severity:   page
      annotations :
        summary: hcy
        description: hcy

  
    - alert: HLY
      expr: up == 0
      for: 1m
      Labels:
        severity:   page
      annotations :
        summary: hcy
        description: hcy

查看prometheus页面,告警规则是否已经成功写入


 

猜你喜欢

转载自blog.csdn.net/liao__ran/article/details/116007421