process exporter 监控进程并告警

本文内容基于 k8s部署prometheus + grafana


  • process-exporter介绍:

在 prometheus 中,process-exporter 可以用来检测所选进程的存活状态。

用法:

process-exporter [options] -config.path filename.yml

如果选择监控的进程并将其分组,可以提供命令行参数或使用 yaml 配置文件。推荐通过 -config.path 指定配置文件。

-config.path yaml 文件的常规格式是顶级 process_names 部分,其中包含名称匹配器列表:

process_names:
  - matcher1
  - matcher2
  ...
  - matcherN

deb/rpm 软件包附带的默认配置为:

process_names:
  - name: "{
    
    {.Comm}}"
    cmdline:
    - '.+'

一个进程仅可能属于一个组:即使匹配多个,也只会归属于第一个匹配的 groupname 组。

其中的每一项 process_names 都提供了用于识别和命名过程的方法。可选 name 标签定义用于命名匹配过程的模板;如果未指定,则 name 默认为 { {.ExeBase}}

可用的模板变量:

{
   
   {.Comm}}           包含原始可执行文件的基本名称,即 /proc/<pid>/stat

{
   
   {.ExeBase}}        包含可执行文件的基本名称

{
   
   {.ExeFull}}        包含可执行文件的标准路径

{
   
   {.Username}}       包含有效用户的用户名

{
   
   {.Matches}}        包含所有由于应用cmdline正则表达式而产生的匹配项

{
   
   {.PID}}            包含过程的PID。请注意,使用PID意味着该组将仅包含一个进程

{
   
   {.StartTime}}      包含过程的开始时间。与PID结合使用时,这很有用,因为PID会随着时间的推移而被重用

不建议使用 PIDStartTime:这并不会得到想要的结果,并且可能会导致 prometheus 遇到麻烦——metrics 基数过高。

process_exporter 配置参考:process-exporter

  • 安装process-exporter:
vim process.sh
#!/bin/bash
#用于安装process_exporter

PROCESS_VER=0.7.5
PROCESS_DIR=/usr/local/process-exporter

[ ! -d /software/ ] && mkdir /software

install_process() {
    
    
    cd /software
    yum install -y wget

    if [ $? -eq 0 ]
    then
        echo -e "\033[36myum安装依赖包成功\033[0m"
    else
        echo -e "\033[31myum安装依赖包失败,请检查\033[0m"
        exit 1
    fi
    
    [ ! -f process-exporter-$PROCESS_VER.linux-amd64.tar.gz ] && wget https://github.com/ncabatoff/process-exporter/releases/download/v$PROCESS_VER/process-exporter-$PROCESS_VER.linux-amd64.tar.gz
    [ ! -d process-exporter-$PROCESS_VER.linux-amd64 ] && tar xf process-exporter-$PROCESS_VER.linux-amd64.tar.gz
    [ ! -d $PROCESS_DIR ] && mv process-exporter-$PROCESS_VER.linux-amd64 $PROCESS_DIR
    
    cat > $PROCESS_DIR/process-exporter.yaml << EOF
process_names:
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'redis-server'
 
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'mysqld'
 
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'org.apache.zookeeper.server.quorum.QuorumPeerMain'
 
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer'
 
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'org.apache.hadoop.hdfs.qjournal.server.JournalNode'
EOF
    id prometheus || useradd -M -s /sbin/nologin prometheus
    chown -R prometheus:prometheus $PROCESS_DIR
    
    cat > /usr/lib/systemd/system/process_exporter.service << EOF
[Unit]
Description=process_exporter
Documentation=https://github.com/ncabatoff/process-exporter
After=network.target
 
[Service]
Type=simple
User=prometheus
Group=prometheus
WorkingDirectory=$PROCESS_DIR
ExecStart=$PROCESS_DIR/process-exporter -config.path=$PROCESS_DIR/process-exporter.yaml
Restart=always
 
[Install]
WantedBy=multi-user.target
EOF
    systemctl daemon-reload && systemctl enable process_exporter
    
    systemctl start process_exporter
    if [ $? -eq 0 ]
    then
        echo -e "\033[36mprocess_exporter安装完成\033[0m"
    else
        echo -e "\033[31mprocess_exporter安装失败\033[0m"
        exit 1
    fi
}

install_process
sh process.sh
  • 修改配置:

按监控进程名称自定义该配置文件

vim /usr/local/process-exporter/process-exporter.yaml
process_names:
  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'sys#abut-exec.jar'

  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'sys#open-exec.jar'

  - name: "{
    
    {.Matches}}"
    cmdline:
    - 'sys#activity-exec.jar'
systemctl restart process_exporter
  • prometheus 添加监控:
vim prometheus/config.yaml              #添加
      - job_name: 'yty-process'              #进程监控
        static_configs:
        - targets: ['xxx.xxx.xxx.xxx:9256']
vim prometheus/rules.yaml               #添加
    - name: process
      rules:
      - alert: ProcessAbutDown
        expr: (namedprocess_namegroup_num_procs{
    
    groupname="map[:sys#abut-exec.jar]"}) == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "{
    
    { $labels.instance }}: Process Abut-exec Down"
          description: "{
    
    { $labels.instance }}: Process Abut-exec has been down for more than 1m"
          value: "{
    
    { $value }}"

      - alert: ProcessOpenDown
        expr: (namedprocess_namegroup_num_procs{
    
    groupname="map[:sys#open-exec.jar]"}) == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "{
    
    { $labels.instance }}: Process Open-exec Down"
          description: "{
    
    { $labels.instance }}: Process Open-exec has been down for more than 1m"
          value: "{
    
    { $value }}"

      - alert: ProcessActivityDown
        expr: (namedprocess_namegroup_num_procs{
    
    groupname="map[:sys#activity-exec.jar]"}) == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "{
    
    { $labels.instance }}: Process Activity-exec Down"
          description: "{
    
    { $labels.instance }}: Process Activity-exec has been down for more than 1m"
          value: "{
    
    { $value }}"
kubectl apply -f prometheus/

kubectl delete pod -n monitoring prometheus-b58f6d4c7-v8m7x

在这里插入图片描述

  • 测试告警:

任选一个监控的进程宕掉,

ps aux |grep abut |grep java |awk '{print $2}' | xargs kill

等待1m,收到钉钉告警,

在这里插入图片描述

重启该进程,收到恢复告警,

在这里插入图片描述

至此,process exporter 监控进程并告警配置完成。


猜你喜欢

转载自blog.csdn.net/miss1181248983/article/details/113571803