Case II, Nginx server monitoring whether there is a status code 502

Nginx + php-fpm + MySQL Web site, there are many reasons cause 502 problems, the most common is due to the depletion of resources lead to php-fpm. In this case to monitor this server is the situation, usually has been very good, but if the high site traffic, there will be a 502 status code appears. 502 problem occurs, the need for timely analysis of causes of php-fpm resources depleted, so do a monitoring script, there are 502 status code when the first alarm time to notify us.

Requirements are as follows:

1) one minute script execution time

2) monitoring state 502 by access log analysis site, may be initiated by using curl tool http request, obtain the status code, the recommended access log analysis, if the access log path /data/logs/access.log, log segment as :

54.36.149.38 - [16/Sep/2018:18:21:10 +0800] www.lishiming.net "thread-5360-1-1.html" 301 "GET /thread-5360-1-1.html HTTP/1.1"-" "Mozilla/5.0(compatible;AhrefsBot/5.2; +http://ahrefs.com/robot/)"

3) one minute 502 occurs more than 50 times the required alarm

4) need to send e-mail alert notification email to [email protected]


Knowledge Point one: Task Scheduler cron

This case requires one minute to perform a script, so to use cron. Excuting an order:

crontab -e
* * * * * /bin/bash /usr/local/sbin/mon_502.sh

Example 1:

First, Wednesdays and Fridays 4:20 to execute the script every week

/usr/local/sbin/123.sh
20 4 * * 1,3,5 /bin/bash /usr/local/sbin/123.sh

Example Two:

Empty file every 3 days

/data/log/tmp.log
* * */3 * * true>/data/log/tmp.log

Example Three:

Daily 9:00 to 12:00, 14:00 to 16:00 execute the script

/usr/local/sbin/xxx.sh 
0 9-12,14-16 * * * /bin/bash /usr/local/sbin/xxx.sh



Knowledge Point two: filtering keywords

shell script, a filter using the grep command keyword, such as in the present embodiment needs to log 502 contains the log filtered off, the command is:

grep '502' /data/log/access.log

1) was filtered off line beginning with a letter:

grep '^ [a-zA-Z]' 1.txt

2) Remove all blank lines:

grep -v '^$' 1.txt

3) at least two consecutive filtered off line numbers:

grep -E '[0-9] [0-9  ] +' 1.txt 
or  
grep -E '[0-9] {2 , $}' 1.txt

4) filtered aming or linux containing a few lines:

grep Ec 'our | linux' 1.txt



Look demand this case, because the detection time is one minute, so the keyword is filtered on a minute, the log analysis, a minute can be found available:

date -d "-1 min" +%d/%m/%Y:%H:%M:[0-5][0-9]

This keyword is represented. [0-5] [0-9] is matched to 59 seconds.


Knowledge Point three: shell variables in

If the object is not to compare numbers, but when a string, you can use this:

if [$str == "aminglinux"]

When the value of the variable str is aminglinux time.


Knowledge Point four: e-mail from the command line

A Python script with e-mail, call the third-party e-mail, using the 163 mailboxes. Phone can install the client, so do not worry about alerts. Incoming mail can be your own person, that is, for ourselves mail, spam so there will not be trouble.

Mail Python script:

mail.py

#!/bin/bash
#coding:utf-8
import smtplib
from email.mime.text import MIMEText
import sys
mail_host = 'smtp.163.com'
mail_user = '[email protected]'
mail_pass = 'your_mail_password'
mail_postfix = '163.com'
def send_mail(to_list,subject,content):
     me = "zabbix告警平台"+"<"+mail_user+"@"+mail_postfix+">"
     msg = MIMEText(content, 'plain', 'utf-8')
     msg['subject'] = subject
     msg[from] = me
     msg['to'] = to_list
     try:
        s = smtplib.SMTP()
        s.connect(mail_host)
        s.login(mail_user,mail_pass)
        s.sendmail(me,to_list,msg.as_string())
        s.close()
        return True
     expect Exception,e:
        print str(e)
        return False
if _name_ == "_main_":
     send_mail(sys.argv[1],sys.argv[2],sys.argv[3])

说明:

该脚本会用到第三方的邮箱账户,需要填写正确的mail_host,mail_user和mail_pass。脚本名字为mail.py,发邮件的命令为:

python mail.py [email protected] "邮件主题" "邮件内容"


本案例参考脚本:

vim /usr/local/sbin/mon_502.sh

#!/bin/bash
##该脚本用来监控网站的502问题
##作者:
##日期:
##版本:v0.1
#[0-5][0-9]表示59秒内任何数字,就是前一分钟任何秒数的日志或文件。
t=`date -d "-1 min" +%d/%m/%Y:%H:%M:[0-5][0-9]`
log="/data/logs/access.log"
#假设mail.py已写好,并放在/usr/local/sbin/下
mail_script="/usr/local/sbin/mail.py"
[email protected]
n=`grep $t $log|grep -c "502"`
if [ $n -gt 50 ]
then
    python $mail_script $mail_user "网站有502" "一分钟内出现了$n次"
fi

增加任务计划:

* * * * * /bin/bash /usr/local/sbin/mon_502.sh 2>/tmp/mon_502.err

说明:

需要在该cron最后面定义一个错误日志输出,如果脚本执行过程中有报错,我们可以到/tmp/mon_502.err文件中查看错误信息。



Guess you like

Origin blog.51cto.com/13576245/2421158