Construction of Alarm Monitoring System

[tap]

Development of alarm monitoring system

1. Demand analysis

  • [ ] need:

Use the shell to customize various personalized alarm tools, but need unified management and standardized management.

  • [ ] Ideas:

Specify a script package, including the main program, subprograms, configuration files, mail engine, output log, etc.

  • [ ] Main program:

As the entrance of the whole script, it is the lifeblood of the whole system.

Configuration file:

It is a control center, which is used to switch each subprogram and specify each associated log file.

  • [ ] Subroutine:

This is the real monitoring script, which is used to monitor each indicator.

  • [ ] Mail engine:

It is implemented by a python program, which can define the server, sender and sender password for sending emails.

  • [ ] Output log:

The entire monitoring system must have log output.

2. The main script of the alarm system

As a good practice, put all scripts in the /usr/local/sbin directory and create corresponding subdirectories.

2.1 Main script

2.1.1 Create the corresponding directory

[root@xavi ~]# cd /usr/local/sbin
[root@xavi sbin]# mkdir mon
[root@xavi sbin]# cd mon
[root@xavi mon]# mkdir mail bin shares conf log
[root@xavi mon]# ls
bin  conf  log  mail  shares
#!/bin/bash

# 是否发送邮件的开关(维护模式下我们需要关闭此功能,监控还是继续,但不发任何邮件。)
# export 定义的变量send将应用于所有子脚本中
export send=1

# 过滤ip地址(目的是:一旦报警,需要需要知道是哪台机器的IP,没有服务端,全部都是独立运行的。监控的网卡可以修改,也可更改为hostname)
export addr=`/sbin/ifconfig |grep -A1 "ens33: "|awk '/inet/ {print $2}'`
# 当前主脚本所在目录
dir=`pwd`
# 只需要最后一级目录名
last_dir=`echo $dir|awk -F'/' '{print $NF}'`
# 下面的判断目的是,保证执行脚本的时候,我们在bin目录里,不然监控脚本、邮件和日志很有可能找不到(脚本中涉及的目录几乎都是相对路径。)
!!
if [ $last_dir == "bin" ] || [ $last_dir == "bin/" ]; then
    conf_file="../conf/mon.conf"
else
    echo "you shoud cd bin dir"
    exit
fi
#定义输出的正确日志和错误日志
exec 1>>../log/mon.log 2>>../log/err.log
echo "`date +"%F %T"` load average"
/bin/bash ../shares/load.sh
#先检查配置文件中是否需要监控502
if grep -q 'to_mon_502=1' $conf_file; then
    export log=`grep 'logfile=' $conf_file |awk -F '=' '{print $2}' |sed 's/ //g'`
    /bin/bash ../shares/502.sh
fi

3. Alarm system configuration file

[root@xavi bin]# cd ../conf
[root@xavi conf]# vim mon.conf

## to config the options if to monitor
## 定义mysql的服务器地址、端口以及user、password(如下的cdb是我目前的数据库,可选择监控与否)
to_mon_cdb=0  ##0 or 1, default 0,0 not monitor, 1 monitor
db_ip=10.20.3.13
db_port=3315
db_user=username
db_pass=passwd
## httpd  如果是1则监控,为0不监控
to_mon_httpd=0
## php 如果是1则监控,为0不监控
to_mon_php_socket=0
## http_code_502  需要定义访问日志的路径(配合如上主脚本,如果to_mon_502=1 就会开启监控报警)
to_mon_502=1
logfile=/data/log/xxx.xxx.com/access.log
## request_count  定义日志路径以及域名(监控请求数,如上说明0 或者1开关与否)
to_mon_request_count=0
req_log=/data/log/www.xxx.com/access.log
domainname=www.xxx.com

Note: Why do I define log here?

If the number of servers is large, we must consider the generality and ease of modification of the script. The purpose is to change the environment, we only need to adjust the configuration file

4. Monitoring items

4.1 Alarm system load.sh

[root@xavi conf]# cd ../shares
[root@xavi shares]# vim load.sh

4.1.1 Jump to the specified directory and configure

#! /bin/bash

load=`uptime |awk -F 'average:' '{print $2}'|cut -d',' -f1|sed 's/ //g' |cut -d. -f1`
if [ $load -gt 10 ] && [ $send -eq "1" ]
then
    echo "$addr `date +%T` load is $load" >../log/load.tmp
  # load.tmp在发邮件时用到它,mail.sh会用到mail.py
    /bin/bash ../mail/mail.sh [email protected] "$addr\_load:$load" `cat ../log/load.tmp`
fi
echo "`date +%T` load is $load"
# 记录到系统日志中

4.1.2 Script Analysis

  • [ ] load: The value of load can be configured to the system first to check whether it is correct (get an integer).

  • [ ] Alarm premise: Then judge the value of the load (the judged value can be configured according to the hardware of your real server!!)

  • [ ] Email alert: satisfy the load exceeding 10, and not in maintenance mode (defined by the main configuration file)

  • [ ] Execute action: write the log, and execute the script for sending email (described later).

  • [ ] Script: alarm when the load is higher than that; record the log when the load is not higher.

4.2 Alarm system 502.sh

4.2.1 Create a monitoring script directly in the current directory:

[root@xavi shares]# pwd
/usr/local/sbin/mon/shares
[root@xavi shares]# vim 502.sh

#! /bin/bash
d=`date -d "-1 min" +%H:%M`
c_502=`grep :$d: $log |grep ' 502 '|wc -l`
if [ $c_502 -gt 10 ] && [ $send == 1 ]; then
    echo "$addr $d 502 count is $c_502">../log/502.tmp
    /bin/bash ../mail/mail.sh $addr\_502 $c_502 ../log/502.tmp
fi
echo "`date +%T` 502 $c_502"

4.2.2 Script Analysis

Intercept the log file one minute ago and judge, if the number of 502 exceeds 10 times, or if it is not in maintenance mode (the main script has been defined), call mail to send an email, if not, just record the log.

4.3 Alarm system disk.sh

4.3.1 Create a monitoring script (applicable to the system language of English, if it is not English, you need to change the system language to LANG=en in the script):

[root@xavi shares]# vim disk.sh

#! /bin/bash

LANG=en
rm -f ../log/disk.tmp
## 用空格或者%为分隔符,筛选出来磁盘使用量的百分比。
for r in `df -h |awk -F '[ %]+' '{print $5}'|grep -v Use`
do
    if [ $r -gt 90 ] && [ $send -eq "1" ]
then
    echo "$addr `date +%T` disk useage is $r" >>../log/disk.tmp
    fi
done
if [ -f ../log/disk.tmp ]
then
    df -h >> ../log/disk.tmp
    /bin/bash ../mail/mail.sh $addr\_disk $r ../log/disk.tmp
    echo "`date +%T` disk useage is nook"
else
    echo "`date +%T` disk useage is ok"
fi

Script parsing:

  • [ ] Monitor all disk partitions
  • [ ] View the used percentage of each disk
  • [ ] Set the alarm value of partition usage
  • [ ] write to a temporary file
  • [ ] Add another judgment, if the file exists, it will start to send an email alarm and write it to the log

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324771379&siteId=291194637