Kubernetes Service监控(Blackbox_exporter)

Blackbox_exporter 主动监测主机与服务状态

Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集
官方github: https://github.com/prometheus/blackbox_exporter
部署Blackbox_exporter

[sss@prometheus01 ]$ cd /usr/local/blackbox_exporter/
[sss@prometheus01 ]$ wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.12.0/blackbox_exporter-0.12.0.linux-amd64.tar.gz 
[sss@prometheus01 ]$ tar zxvf blackbox_exporter-0.12.0.linux-amd64.tar.gz
[sss@prometheus01 blackbox_exporter-0.12.0.linux-amd64]$ cd blackbox_exporter-0.12.0.linux-amd64
[sss@prometheus01 blackbox_exporter-0.12.0.linux-amd64]$ ll
total 15720
-rwxr-xr-x. 1 1000 1000 16074005 Feb 27  2018 blackbox_exporter
-rw-rw-r--. 1 1000 1000      932 Nov 21 16:05 blackbox.yml
-rw-rw-r--. 1 1000 1000    11357 Feb 27  2018 LICENSE
-rw-rw-r--. 1 1000 1000       94 Feb 27  2018 NOTICE
[sss@prometheus01 blackbox_exporter-0.12.0.linux-amd64]$cp -r blackbox_exporter /usr/local/bin
[sss@prometheus01 blackbox_exporter-0.12.0.linux-amd64]$ cat /etc/supervisord.conf|grep blackbox -A 20
[program:blackbox_exporter]
command=/usr/local/bin/blackbox_exporter   --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter-0.12.0.linux-amd64/blackbox.yml 
stdout_logfile=/tmp/prometheus/blackbox_exporter.log
autostart=true
autorestart=true
startsecs=5
priority=1
user=root
stopasgroup=true
killasgroup=true
[sss@prometheus01 blackbox_exporter-0.12.0.linux-amd64]$ supervisorctl  status |grep blackbox
blackbox_exporter                RUNNING   pid 25343, uptime 0:19:25
blackbox.yml 文件
  • 通过 blackbox.yml 定义模块详细信息
  • 在 Prometheus 配置文件中引用该模块以及配置被监控目标主机
    modules:
      http_2xx:
        prober: http
        timeout: 10s
        http:
          preferred_ip_protocol: "ip4" ##如果http监测是使用ipv4 就要写上,目前国内使用ipv6很少。
      http_post_2xx_query: ##用于post请求使用的模块)由于每个接口传参不同 可以定义多个module 用于不同接口(例如此命名为http_post_2xx_query 用于监测query.action接口 
        prober: http
        timeout: 15s
        http:
          preferred_ip_protocol: "ip4" ##使用ipv4
          method: POST
          headers:
            Content-Type: application/json ##header头
          body: '{"hmac":"","params":{"publicFundsKeyWords":"xxx"}}' ##传参
      tcp_connect:
        prober: tcp
      pop3s_banner:
        prober: tcp
        tcp:
          query_response:
          - expect: "^+OK"
          tls: true
          tls_config:
            insecure_skip_verify: false
      ssh_banner:
        prober: tcp
        tcp:
          query_response:
          - expect: "^SSH-2.0-"
      irc_banner:
        prober: tcp
        tcp:
          query_response:
          - send: "NICK prober"
          - send: "USER prober prober prober :prober"
          - expect: "PING :([^ ]+)"
            send: "PONG ${1}"
          - expect: "^:[^ ]+ 001"
      icmp:
        prober: icmp
        timeout: 5s
        icmp:

    Blackbox_exporter 应用场景

    HTTP 测试
    定义 Request Header 信息
    判断 Http status / Http Respones Header / Http Body 内容
    TCP 测试
    业务组件端口状态监听
    应用层协议定义与监听
    ICMP 测试
    主机探活机制
    POST 测试
    接口联通性
    SSL 证书过期时间

    HTTP 测试

    相关代码块添加到 Prometheus 文件内
    对应 blackbox.yml文件的 http_2xx 模块

    - job_name: 'blackbox_http_2xx'
      scrape_interval: 45s
      metrics_path: /probe
      params:
        module: [http_2xx]  # Look for a HTTP 200 response.
      static_configs:
          - targets:
            - https://www.baidu.com/
            - 172.0.0.1:9090
      relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: 10.XXX.XX.XX:9115  # The blackbox exporter's real hostname:port.

    http截图

  • TCP 测试
    • 监听 业务端口地址,用来判断服务是否在线,我觉的和telnet 差不多
    • 相关代码块添加到 Prometheus 文件内
    • 对应 blackbox.yml文件的 tcp_connect 模块
    • - job_name: "blackbox_telnet_port]"
        scrape_interval: 5s
        metrics_path: /probe
        params:
          module: [tcp_connect]
        static_configs:
            - targets: [ '1x3.x1.xx.xx4:443' ]
              labels:
                group: 'xxxidc机房ip监控'
            - targets: ['10.xx.xx.xxx:443']
              labels:
                group: 'Process status of nginx(main) server'
        relabel_configs:
            - source_labels: [__address__]
              target_label: __param_target
            - source_labels: [__param_target]
              target_label: instance
            - target_label: __address__
              replacement: 10.xxx.xx.xx:9115

      tcp_connect截图

    • ICMP 测试
      • 相关代码块添加到 Prometheus 配置文件内
      • 对应 blackbox.yml文件的 icmp 模块
      • - job_name: 'blackbox00_ping_idc_ip'
          scrape_interval: 10s
          metrics_path: /probe
          params:
            module: [icmp]  #ping
          static_configs:
              - targets: [ '1x.xx.xx.xx' ]
                labels:
                  group: 'xxnginx 虚拟IP'
          relabel_configs:
              - source_labels: [__address__]
                regex: (.*)(:80)?
                target_label: __param_target
                replacement: ${1}
              - source_labels: [__param_target]
                regex: (.*)
                target_label: ping
                replacement: ${1}
              - source_labels: []
                regex: .*
                target_label: __address__
                replacement: 1x.xxx.xx.xx:9115

        icmp截图

      • POST 测试
        • 监听业务接口地址,用来判断接口是否在线
        • 相关代码块添加到 Prometheus 文件内
        • 对应 blackbox.yml文件的 http_post_2xx_query 模块(监听query.action这个接口)
        • - job_name: 'blackbox_http_2xx_post'
            scrape_interval: 10s
            metrics_path: /probe
            params:
              module: [http_post_2xx_query]
            static_configs:
                - targets:
                  - https://xx.xxx.com/api/xx/xx/fund/query.action
                  labels:
                    group: 'Interface monitoring'
            relabel_configs:
                - source_labels: [__address__]
                  target_label: __param_target
                - source_labels: [__param_target]
                  target_label: instance
                - target_label: __address__
                  replacement: 1x.xx.xx.xx:9115  # The blackbox exporter's real hostname:port.

          POST截图

          查看监听过程

          类似于

        • curl http://172.16.10.65:9115/probe?target=prometheus.io&module=http_2xx&debug=true

          告警应用测试

          icmp、tcp、http、post 监测是否正常可以观察probe_success 这一指标
          probe_success == 0 ##联通性异常
          probe_success == 1 ##联通性正常
          告警也是判断这个指标是否等于0,如等于0 则触发异常报警

        • [sss@prometheus01 prometheus]$ cat rules/blackbox-alert.rules 
          groups:
          - name: blackbox_network_stats
            rules:
            - alert: blackbox_network_stats
              expr: probe_success == 0
              for: 1m
              labels:
                severity: critical
              annotations:
                summary: "Instance {{ $labels.instance }}  is down"
                description: "This requires immediate action!"

          SSL 证书过期时间监测

          cat << 'EOF' > prometheus.yml
          rule_files:
            - ssl_expiry.rules
          scrape_configs:
            - job_name: 'blackbox'
              metrics_path: /probe
              params:
                module: [http_2xx]  # Look for a HTTP 200 response.
              static_configs:
                - targets:
                  - example.com  # Target to probe
              relabel_configs:
                - source_labels: [__address__]
                  target_label: __param_target
                - source_labels: [__param_target]
                  target_label: instance
                - target_label: __address__
                  replacement: 127.0.0.1:9115  # Blackbox exporter.
                  EOF 
          cat << 'EOF' > ssl_expiry.rules 
          groups: 
            - name: ssl_expiry.rules 
              rules: 
                - alert: SSLCertExpiringSoon 
                  expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 30 
                  for: 10m
          EOF

猜你喜欢

转载自www.cnblogs.com/qinghe123/p/11881024.html