prometheus+grafana可视化监控部署记录

目录

  1. 安装环境准备
    1.1 主机环境准备
    1.1.1. 关闭selinux
    1.1.2. 部署规划
    1.1.3. 系统主机时间、时区、系统语言
  2. GO安装部署
  3. prometheus安装部署
  4. grafana安装部署
  5. node_exporter监控linux服务器
  6. 配置微信告警
  7. alertmanager安装部署
  8. 配置prometheus.yml
  9. blackbox_exporter安装(linux-url监控)
  10. blackbox_exporter安装(windows-url监控)
  11. 服务端口监控
  12. Oracle采集器安装部署
  13. Mysql采集器安装部署(linux)
  14. Mysql采集器安装部署(windows)
  15. 结束

备注:文章中涉及的路径、端口、ip信息等请根据实际环境情况自行修正测试。

1. 安装环境准备

1.1 主机环境准备

1.1.1. 关闭selinux

sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0

1.1.2. 部署规划****

prometheus+grafana可视化监控部署记录

prometheus+grafana可视化监控部署记录

1.1.3. 系统主机时间、时区、系统语言

 本节视实际情况需要操作
 修改时区

ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

 修改系统语言环境

echo 'LANG="en_US.UTF-8"' >> /etc/profile && source /etc/profile

 配置主机NTP时间同步

yum -y install ntp
systemctl enable ntpd && systemctl start ntpd
echo 'server ntp1.aliyun.com' >> /etc/ntp.conf
echo 'server ntp2.aliyun.com' >> /etc/ntp.conf

2. GO安装部署

下载go的安装包,解压到/usr/local 目录下

tar -xvf go1.11.5.linux-amd64.tar.gz -C /usr/local/

配置环境变量

cat >>/etc/profile<<EOF
export PATH=\$PATH:/usr/local/go/bin
EOF
source /etc/profile
go version

3. prometheus安装部署

下载Prometheus安装包,并解压到/usr/local 目录下

tar -xvf prometheus-2.18.1.linux-amd64.tar.gz -C /usr/local/
mv prometheus-2.18.1.linux-amd64  prometheus

配置启动脚本

扫描二维码关注公众号,回复: 12660520 查看本文章
cat >start.prometheus.sh<<EOF
/usr/local/prometheus /prometheus --config.file=/usr/local/prometheus /prometheus.yml &
EOF
chmod +x start.prometheus .sh
./start.prometheus .sh

Prometheus启动后,默认端口是9090.浏览器打开地址是http://127.0.0.1:9090/graph
操作系统Centos7以上配置自启动

cat >/etc/systemd/system/prometheus.service<<EOF
 [Unit]
Description=Prometheus Monitoring System
Documentation=Prometheus Monitoring System

[Service]
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/usr/local/prometheus/prometheus.yml \
  --storage.tsdb.path=/usr/local/prometheus/storage \
  --web.console.templates=/usr/local/prometheus/consoles \
  --web.console.libraries=/usr/local/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d --web.listen-address=:9090

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
ps -ef|grep prometheus

4. grafana安装部署

下载grafana安装包 https://grafana.com/grafana/download

wget https://dl.grafana.com/oss/release/grafana-7.3.7-1.x86_64.rpm
yum install grafana-7.3.7-1.x86_64.rpm
rpm -ivh --nodeps grafana-7.3.7-1.x86_64.rpm
systemctl daemon-reload 
systemctl enable grafana-server
systemctl start grafana-server
ps -ef|grep grafana-server

浏览器打开grafana,默认端口是3000,默认账号和密码都是admin
登陆后,需要修改密码,如果需要映射互联网,请使用强密码,大小写字母+特殊符号+数字+8位以上。
添加数据源
点击主界面的“add data source”
prometheus+grafana可视化监控部署记录

选择prometheus
prometheus+grafana可视化监控部署记录
Dashboards页面选择“Prometheus 2.0 Stats”
prometheus+grafana可视化监控部署记录
Settings页面填写普罗米修斯地址并保存
prometheus+grafana可视化监控部署记录
切换到我们刚才添加的“Prometheus 2.0 Stats”即可看到整个监控页面,这里这个现实的本机的资源。
prometheus+grafana可视化监控部署记录
到这里prometheus+grafana监控就安装完成了。下面是进行配置服务器和中间件等监控了。

5. node_exporter监控linux服务器

下载安装包node_exporter-0.18.1.linux-amd64.tar.gz,上传到被监控服务器。在被监控的机器安装node-exporter,

tar -xvf node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
mv node_exporter-0.18.1.linux-amd64 node_exporter
cat >start.node-exporter.sh<<EOF
/usr/local/node_exporter/node_exporter &
EOF
chmod 755 start.node-exporter.sh
./start.node-exporter.sh

这个被监控软件启动后,默认端口是9100 。如果需要修改端口,需要在启动脚本上添加参数。

cat >start.node-exporter.sh<<EOF
/usr/local/node_exporter/node_exporter --web.listen-address=':9001' &
EOF

这里的9001为我设定的指定端口,表示node-exporter的监听端口是9001

vi /etc/init.d/node
#!/bin/sh
#chkconfig: 2345 80 90
#
# Simple node_exporter init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.

NODE_HOME=/usr/local/node_exporter
PIDNUM=`ps -ef|grep node_exporter |grep -v grep |awk '{print $2}'`
PID=`ps -ef|grep node_exporter |grep -v grep |awk '{print $2}' |wc -l`

case "$1" in
    start)
        if [ ${PID} -eq 1 ]
        then
                echo "node_exporter exists, process is already running or crashed"
        else
                echo "Starting node_exporter server..."
                su -l tomcat -c "nohup /usr/local/node_exporter/node_exporter --web.listen-address=:9100 >/dev/null 2>&1 &"
        fi
        ;;
    stop)
        if [ ${PID} -eq 0 ]
        then
                echo "node_exporter does not exist, process is not running"
        else
                kill -9 $PIDNUM
                echo "Stopping ..."
                echo "node_exporter stopped"
        fi
        ;;
    *)
        echo "Please use start or stop as first argument"
        ;;
esac

chmod +x /etc/init.d/node
chkconfig --add node && chkconfig node on
chkconfig --list node
service node start
service node stop
ps -ef|grep node_exporter

操作系统Centos7以上配置自启动

cat >>/etc/systemd/system/node.service <<EOF
[Unit]
Description=Prometheus Monitoring System
Documentation=Prometheus Monitoring System
Requires=network-online.target
After=network.target

[Service]
ExecStart=/data/prometheus/node_exporter/node_exporter \
  --web.listen-address=:9100

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable node
systemctl start node
ps -ef|grep node

启动被监控服务器的node-exporter后,需要到监控服务器的/usr/local/prometheus目录下去修改配置文件,将被监控服务器添加到监控端。配置文件名称为 prometheus.yml
打开配置文件,添加被监控服务器信息,这个是监控同一网段服务器的配置

- job_name: ‘web-41’
static_configs:
- targets: ['192.168.205.41:9100']
   labels:
      instance: ‘web-41’

这个是通过网闸监控内网服务的配置192.168.220.21为网闸ip。10.176.0.75为本机ip

- job_name: ‘10.176.0.75’
static_configs:
- targets: ['192.168.220.21:9001]
   labels:
      instance: ‘10.176.0.75’

6. 配置微信告警

(1)首先配置企业微信应用
首先注册企业微信,点击“我的企业”,登记“企业ID”
prometheus+grafana可视化监控部署记录

(2)然后添加需要告警推送的用户。并将监控项目进行分部门。

将对应的人员添加到相应的部门,并登记部门ID。

点击应用管理,选择创建应用。
prometheus+grafana可视化监控部署记录
选择应用logo,填写应用名称,选择对应部门。
prometheus+grafana可视化监控部署记录

创建好以后,点开应用,登记Agentid和Secret 。
prometheus+grafana可视化监控部署记录

7. alertmanager安装部署

(1)首先下载alertmanager-0.20.0.linux-amd64.tar.gz,将其上传监控服务器的/usr/local路径下解压,并将文件夹重命名为 alertmanager。

tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
mv alertmanager-0.20.0.linux-amd64 alertmanager
vi alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
templates:
- '/usr/local/alertmanager/wechat.tmpl'
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'ww732'
    to_party: '4'
    agent_id: '100'
    api_secret: 'RXpJffF_l0'
    send_resolved: true
inhibit_rules:
- equal: ['alertname', 'cluster', 'service']
  source_match:
    severity: 'high'
  target_match:
    severity: 'warning'

(2)上传wechat.tmpl微信告警模版至/usr/local/alertmanager路径下,路径要与alertmanager.yml中的templates模块的路径保持一致。

{{ define "wechat.default.message" }}
{{ range $i, $alert :=.Alerts }}
===XX监控报警===
告警级别:{{ $alert.Labels.severity }}
告警类型:{{ $alert.Labels.alertname }}
故障主机: {{ $alert.Labels.instance }}
告警详情: {{ $alert.Annotations.description }}
触发时间: {{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
{{ end }}

(3)配置服务启动脚本
配置启动脚本:

cat >start.alertmanager.sh<<EOF
/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --storage.path=/usr/local/alertmanager/data &
EOF
chmod 755 start.alertmanager.sh

操作系统CentOS7以上配置自启动服务:

cat >/etc/systemd/system/alertmanager.service<<EOF
[Unit]
Description=Alertmanager
After=network.target

[Service]
Type=simple

ExecStart=/usr/local/alertmanager/alertmanager \
--config.file=/usr/local/alertmanager/alertmanager.yml \
--storage.path=/usr/local/alertmanager/data
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager
ps -ef|grep alertmanager

8. 配置prometheus.yml

vim /usr/local/prometheus/prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  query_log_file: /usr/local/prometheus/log/prometheus.log

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 10.255.2.38:9093
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/usr/local/prometheus/rules/*.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
mkdir /usr/local/prometheus/log
mkdir /usr/local/prometheus/rules

上传node.yml文件到 /usr/local/prometheus/rules

systemctl restart prometheus

9. blackbox_exporter安装(linux-url监控)

(1)下载采集器软件包blackbox_exporter-0.16.0.linux-amd64.tar.gz

tar -zxvf blackbox_exporter-0.16.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
mv blackbox_exporter-0.16.0.linux-amd64 blackbox_exporter
cd blackbox_exporter
cp blackbox.yml blackbox.yml.bak

配置启动脚本:

cat >start.blackbox_exporter.sh<<EOF
/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml &
EOF
chmod 755 start.blackbox_exporter.sh

操作系统CentOS7以上配置自启动服务:

cat >/usr/lib/systemd/system/blackbox_exporter.service<<EOF
[Unit]
Description=blackbox_exporter
After=network.target
[Service]

ExecStart=/usr/local/blackbox_exporter/blackbox_exporter  \
         --config.file=/usr/local/blackbox_exporter/blackbox.yml
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable blackbox_exporter.service
systemctl start blackbox_exporter.service
ps -ef|grep blackbox_exporter

(2)定义接口模块参数(仅适用于post,get请求无需进行此步)post请求添加方式(需要向开发索要body请求体,以及fail_if_body_not_matches_regexp状态返回码),编辑blackbox.yml文件,在modules模块下添加post请求的相关配置(对应post-blackbox.yml)。

vi /usr/local/blackbox_exporter/blackbox.yml
modules:
  monitor:   接口模块自定义,prometheus.yml会引用
    prober: http
    timeout: 15s
    http:
      preferred_ip_protocol: "ip4"
      method: POST
      headers:
        Content-Type: application/json;charset=UTF-8
      body:  '{"app_id":"1BQA48ETK00082","biz_content":"193D2752D9F1C6F50998617FCD0E8471331D79","enc_type":"AES","method":"ehc.ehealthcard.queryInfo","sign":"5D55D1D06B3EB630F","sign_type":"MD5","term_id":"301","timestamp":"1540614","version":"X.M.0.1"}'
      fail_if_body_not_matches_regexp:
        - "0000"

(3)get请求 直接在prometheus.yml上加,并修改blackbox-dis.yml文件添加相关url监控页面地址(对应get-blackbox-dis.yml,/usr/local/blackbox_exporter/blackbox.yml保持不变)

vi /usr/local/prometheus/prometheus.yml
  - job_name: "blackbox"
    metrics_path: /probe
    params:
      module: [http_2xx]
    file_sd_configs:
    - refresh_interval: 1m
      files:
      - "/usr/local/blackbox_exporter/blackbox-dis.yml"
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: 10.255.2.38:9115
vi /usr/local/blackbox_exporter/blackbox-dis.yml
  - targets: ['http://192.168.0.41:8080/web']
    labels:
      instance: 'http://192.168.0.41:8080/web'
      tags: 'xx服务'
      product: 'web服务'

(4)将blackbox.yml(对应rules-blackbox.yml)文件上传到/usr/local/prometheus/rules/下

systemctl restart blackbox_exporter.service
systemctl restart prometheus

(5)上传dashboard json文件到grafana,http status overview-1591064596650.json 监控url

10. blackbox_exporter安装(windows-url监控)

1、下载blackbox_exporter-0.16.0.windows-amd64.tar.gz
解压该文件,并进入该文件夹,双击blackbox_exporter.exe即可
2、定义http post参数监控详见Linux环境安装的第二步
3、定义http get参数监控详见linux环境安装的第三步
4、上传blackbox.yml到rule文件夹下
5、启动blackbox并重新加载prometheus
重启blackbox和prometheus (关闭exe程序,再双击重启即可)
6、上传dashboard json文件到grafana,http status overview-1591064596650.json 监控url

11. 服务端口监控

(1)修改prometheus.yml配置文件,添加端口监控

vi /usr/local/prometheus/prometheus.yml
  - job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
    - targets: ['10.16.84.27:1344']
      labels:
        instance: '10.16.84.27'
        tags: 'ehcServer27'
        port: 1344
    - targets: ['10.16.84.28:1344']
      labels:
        instance: '10.16.84.28'
        tags: 'ehcServer28'
        port: 1344
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 10.255.2.38:9115

(2)上传port.yml到/usr/local/prometheus/rule,并修改rule下的blackbox.yml文件

vi /usr/local/prometheus/rule/blackbox.yml
    #expr: probe_success == 0
    expr: probe_success{job != 'port_status'} == 0

(3)上传dashboard json文件到grafana,服务监控展示面板-1600237804697.json 监控url性能
linux服务器模板.json 监控服务器资源

12. Oracle采集器安装部署

建议装在第三方用户
1、创建采集用户

create temporary tablespace prometheus_tmp tempfile '/data/oradata/orcl/prometheus_tmp.dbf' size 64m autoextend on next 64m maxsize unlimited extent management local;
create tablespace prometheus_data logging datafile '/data/oradata/orcl/prometheus_data.dbf' size 64m autoextend on next 64m maxsize unlimited extent management local;
create user prometheus identified by 密码 default tablespace prometheus_data temporary tablespace prometheus_tmp;
grant connect,resource,dba to prometheus;
grant unlimited tablespace to prometheus;
create or replace directory dir_dump as '/data/backup';
grant read,write on directory dir_dump to prometheus;
ALTER PROFILE DEFAULT LIMIT PASSWORD_LIFE_TIME UNLIMITED;
alter system set processes=500 scope=spfile;
alter system set sessions=555 scope=spfile;

2、下载采集器和客户端
客户端下载连接https://www.oracle.com/database/technologies/instant-client/downloads.html(版本必须为18以上)
oracledb_exporter.0.2.8-ora18.5.linux-amd64.tar.gz
下载basic sqlplus 这两个个文件

rpm -ivh oracle-instantclient18.5-basic-18.5.0.0.0-3.x86_64.rpm
rpm -ivh oracle-instantclient18.5-sqlplus-18.5.0.0.0-3.x86_64.rpm

3、添加tnsnames.ora

vim /usr/lib/oracle/18.5/client64/tnsnames.ora
prometheus =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.7.18)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
    )
  )

4、添加系统变量

vim .bash_profile
export ORACLE_HOME=/usr/lib/oracle/18.5/client64
#export TNS_ADMIN=$ORACLE_HOME/network
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export PATH=$ORACLE_HOME/bin:$PATH
export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
export DATA_SOURCE_NAME="prometheus/密码@192.168.44.90:1521/orcl"
source .bash_profile

5、验证是否可以登录

sqlplus prometheus/密码@192.168.44.90/orcl

6、安装采集器

tar -zxvf oracledb_exporter.0.2.8-ora18.5.linux-amd64.tar.gz -C /usr/local
mv /usr/local/oracledb_exporter.0.2.8-ora18.5.linux /usr/local/oracledb_exporter

启动

vim start_oracledb_exporter.sh
cd /usr/local/oracledb_exporter
./oracledb_exporter -query.timeout=50

配置启动脚本:

cat >start.oracledb_exporter.sh<<EOF
/usr/local/oracledb_exporter/oracledb_exporter -query.timeout=50 &
EOF
chmod 755 start.oracledb_exporter.sh

7、prometheus中配置job和添加rule文件

  - job_name: 'oracledb'
    scrape_interval: 50s
    scrape_timeout: 50s
    static_configs:
    - targets: ['192.168.44.90:9161']
      labels:
        instance: '192.168.44.90'
        tags: '数据库'

8、上传oracle.yml到/usr/local/prometheus/rule
9、上传json文件到grafana,监控oracle服务Oracledb overview-1591064578925.json

13. Mysql采集器安装部署(linux)

linux环境下的安装
1、创建采集用户

create user mysqld_exporter IDENTIFIED BY '密码';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqld_exporter'@'%';

2、下载采集器并配置
mysql的采集器为mysqld_exporter ,下载路径mysqld_exporter-0.12.0.linux-amd64.tar.gz

tar -zxvf mysqld_exporter-0.12.0.linux-amd64.tar.gz -C /usr/local
mv mysqld_exporter-0.12.0.linux-amd64 mysqld_exporter
cd mysqld_exporter
vim .my.cnf
[client]
user=mysqld_exporte
password=密码

设置开机自启动

vim /usr/lib/systemd/system/mysql_exporter.service
[Unit]
Description=Prometheus1
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
         --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf  
Restart=on-failure
[Install]
WantedBy=multi-user.target

chown -R prometheus:prometheus /usr/local/mysqld_exporter/
systemctl enable mysql_exporter.service
systemctl start mysql_exporter.service

问题处理:
若启动不了服务(提示密码错误)
在这里插入图片描述

则配置系统变量:

vim /etc/profile
export DATA_SOURCE_NAME='mysqld_exporter:密码@tcp(127.0.0.1:3306)/'
source /etc/profile

3、在prometheus中指定job并添加rule文件

- job_name: 'mysql'
static_configs:
- targets: ['192.168.44.90:9104']
labels:
instance: '192.168.44.90'
tags: '数据库'

4、上传mysql.yml到/usr/local/prometheus/rule
5、上传json文件到grafana,监控oracle服务MySQL Overview-1591064554994.json

14. Mysql采集器安装部署(windows)

1、创建采集用户

create user mysqld_exporter IDENTIFIED BY '密码';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqld_exporter'@'%';

2、下载采集器并配置
mysql的采集器为mysqld_exporter ,下载mysqld_exporter-0.12.1.windows-amd64.tar.gz软件包并解压该文件。
新增环境变量:
这台电脑-->属性-->高级系统配置→环境变量→新增环境变量

变量名:DATA_SOURCE_NAME
变量值:mysqld_exporter:密码@tcp(127.0.0.1:3306)/

在这里插入图片描述

3、运行mysqld_exporter
双击解压后的Mysqld_exporter.exe程序即可
在这里插入图片描述

4、在prometheus中指定job并添加rule文件

- job_name: 'mysql'
static_configs:
- targets: ['192.168.44.90:9104']
labels:
instance: '192.168.44.90'
tags: '数据库'

5、上传mysql.yml到/usr/local/prometheus/rule
6、上传json文件到grafana,监控oracle服务MySQL Overview-1591064554994.json

15. 结束

猜你喜欢

转载自blog.51cto.com/8355320/2646629