Case Eight, fool operation and maintenance tool

Operation and maintenance work, a lot of things again and again, and doing things to so few, unless there are special needs require special treatment. So wrote a fool operation and maintenance script.

Requirements are as follows:

1) When executing the script, the system will first printing of several status values: System time, load, CPU usage, memory usage, disk usage, network card traffic (5s within the mean, the need to list all the network cards)

2) then lists a list of commands in the command list, the user just want to make a corresponding digital, you can run

3) the list of functions

a) Check the website access log the last 100 lines (assuming that there is only one site, access log path / data / logs / on this server www.log

b) Check the mysql slow query log, the last 50 lines (mysql slow query log path /data/mysql/slow.log

c) See php-fpm slow execution log, and the last 50 lines (log path /usr/local/php/logs/slow.log)

d) Service to restart php-fpm

e) Restart Nginx Service

f) Check MySQL queue

g) Exit script

4) Permanent script script key combination or press Ctrl c q / Q exit the script


A knowledge point: case logic judgment

If the first few cases frequently used logical judgment, when the judgment condition is very large, it is recommended to use case, case usage scenarios Example:

case $name in
     zhangsan)
          echo "Hello,zhangsan."
          ;;
     lisi)
          echo "Hello,lisi."
          ;;
     wangwu)
          echo "Hello,wangwu."
         ;;
      *)
          echo "Sorry,there is no such person."
          ;;
esac

Description: Starts case, the end esac, $ name selectable values ​​are zhangsan, lisi, wangwu, * refers to the addition of optional value otherwise.


Knowledge point two: select Usage

Menu script in this case with the select realized very convenient, look at a simple example:

select name in zhangsan lisi wangwu
do
  case $name in
     zhangsan)
          echo "Hello,zhangsan."
          ;;
     lisi)
          echo "Hello,lisi."
          ;;
     wangwu)
          echo "Hello,wangwu."
         ;;
      *)
          echo "Sorry,there is no such person."
          ;;
  esac
done

In select a script, in conjunction with the use case, you can automatically print a list of cycling conditions After executing select the script, but they do not deliberately end the script, the script an infinite loop. Execution of the script, as follows:

SH select.sh # 
1) zhangsan 
2) lisi 
3) wangwu 
#? 1 
the Hello, zhangsan. 
#? 2 
the Hello, lisi. 
#? 3 
the Hello, wangwu. 
#? 4 
Sorry, there IS NO SUCH the Person. 
#?  
1) zhangsan 
2) lisi 
3) wangwu 
#?

Description:? # Here you can change, you need to define PS3, change the contents of the script above, as follows:

PS3="Please select a number:"
select name in zhangsan lisi wangwu
do
  case $name in
     zhangsan)
          echo "Hello,zhangsan."
          ;;
     lisi)
          echo "Hello,lisi."
          ;;
     wangwu)
          echo "Hello,wangwu."
         ;;
      *)
          echo "Sorry,there is no such person."
          ;;
  esac
done

After executing the script with the following results:

# sh select.sh 
1) zhangsan
2) lisi
3) wangwu
Please select a number:1
Hello,zhangsan.
Please select a number:2
Hello,lisi.


Knowledge Point three: w command

w command: view the current load on the system

# w
 18:59:07 up 21 min,  2 users,  load average: 0.00, 0.04, 0.11
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     tty1                      18:38   20:27   0.39s  0.39s -bash
root     pts/0    192.168.93.1     18:39    3.00s  0.10s  0.01s w

Linux administrators most commonly used is the w command, the command displays the information.

The first line of information starting from the left of the display are: time, system uptime, the number of users log in, the average load.

All of the second line and below the line, tell us information, which users currently logged on, and where they are logged on, and so on.

In this information, we should be concerned with is three behind the load average value of the first row.

The first value represents the average load value of the system in one minute;

The second value represents the average value of the system load within 5 minutes;

The third value represents the average value of the system load within 15 minutes.

The significance of this is that the value of the unit CPU activity period the number of processes, the greater the value the greater the pressure on the server. Under normal circumstances this value does not exceed the number of servers as long as the CPU does not matter.

How to view the server has several CPU?

# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
stepping	: 9
microcode	: 0x8e
cpu MHz		: 2501.000
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec arat
bogomips	: 5002.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management:

/proc/cpuinfo这个文件记录了CPU的详细信息。如果服务器有2颗4核CPU,在Linux看来,它就是8个CPU。查看该文件会显示8段类似的信息。所以,查看当前系统有几个CPU,可以使用命令grep -c 'processor' /proc/cpuinfo。查看有几颗物理CPU,查看关键字physical id。


知识点四:sar命令

sar命令很强大,它可以监控系统所有资源状态,比如平均负载、网卡流量、硬盘状态、内存使用等。

它不同于其它系统状态监控的地方在于,它可以打印历史信息,可以显示当天从零点开始到到当前时间的系统状态信息。

它的数据库文件在/var/log/sa目录下,默认保存一个月。因为这个命令太复杂,所以只介绍几个:

1)查看网卡流量sar -n DEV

# sar -n DEV 1 2
Linux 3.10.0-693.el7.x86_64 (wbs) 	2019年07月22日 	_x86_64_	(2 CPU)

19时33分29秒     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
19时33分30秒     ens37      0.00      0.00      0.00      0.00      0.00      0.00      0.00
19时33分30秒        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
19时33分30秒     ens33      0.00      0.00      0.00      0.00      0.00      0.00      0.00

19时33分30秒     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
19时33分31秒     ens37      0.00      0.00      0.00      0.00      0.00      0.00      0.00
19时33分31秒        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
19时33分31秒     ens33      1.00      1.00      0.06      0.49      0.00      0.00      0.00

平均时间:     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
平均时间:     ens37      0.00      0.00      0.00      0.00      0.00      0.00      0.00
平均时间:        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
平均时间:     ens33      0.50      0.50      0.03      0.25      0.00      0.00      0.00

说明:后面跟1 2,表示每隔1秒显示一次,显示2次。

另外也可以查看某一天的网卡流量历史,使用-f选项,后面跟文件名。如果系统格式是Redhat或者CentOS,那么sar的库文件一定是在/var/log/sa目录下的。

# sar -n DEV -f /var/log/sa/sa21

2)查看历史负载

# sar -q
Linux 3.10.0-693.el7.x86_64 (wbs) 	2019年07月22日 	_x86_64_	(2 CPU)

18时38分04秒       LINUX RESTART

18时40分01秒   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
18时50分01秒         4       208      0.00      0.12      0.16         0
19时00分01秒         6       207      0.00      0.03      0.11         0
19时10分01秒         3       208      0.07      0.07      0.07         0
19时20分02秒         5       207      0.00      0.03      0.05         0
19时30分01秒         1       207      0.00      0.01      0.05         0
平均时间:         4       207      0.01      0.05      0.09         0

这个命令有助于我们查看服务器在过去的某个时间的负载情况。

runq-sz:run状态的进程数量;

plist-sz:进程数量;

ldavg-1:一分钟内平均负载值;

ldavg-5:五分钟内平均负载值;

ldavg-15:十五分钟内平均负载值;

blocked:被阻塞的进程数。


3)不带选项,查看CPU使用情况

# sar
Linux 3.10.0-693.el7.x86_64 (wbs) 	2019年07月22日 	_x86_64_	(2 CPU)

18时38分04秒       LINUX RESTART

18时40分01秒     CPU     %user     %nice   %system   %iowait    %steal     %idle
18时50分01秒     all      0.10      0.00      0.48      0.01      0.00     99.41
19时00分01秒     all      0.08      0.00      0.24      0.01      0.00     99.68
19时10分01秒     all      0.09      0.01      0.31      0.06      0.00     99.53
19时20分02秒     all      0.10      0.00      0.25      0.00      0.00     99.65
19时30分01秒     all      0.14      0.00      0.32      0.03      0.00     99.51
19时40分01秒     all      0.13      0.00      0.30      0.01      0.00     99.56
平均时间:     all      0.11      0.00      0.32      0.02      0.00     99.56


知识点五:MySQL慢查询日志

MySQL的慢查询日志用来记录在MySQL中响应时间超过阈值的语句,具体指运行时间超过long_query_time值的SQL,则会被记录到慢查询日志中。默认情况下,mysql数据库是不开启慢查询日志的,long_query_time的默认值为10(即10秒,建议设置为1秒),即运行10秒以上的语句是慢查询语句。


通过分析慢查询日志让我们找到那么查询很慢的SQL语句,然后就可以针对这些慢查询SQL进行优化,通常情况下,造成慢查询的原因往往是因为表没有创建索引。

如下为MySQL5.7版本my.cnf配置慢查询日志的参数:

slow_query_log = on
long_query_time = 1
slow_query_log_file = /data/mysql/slow.log


知识点六:php-fpm慢执行日志

对于LNMP架构的网站,如果访问卡顿,我们第一时间应该想到查看php-fpm的慢执行日志,该日志和mysql的慢查询日志类似,它会记录执行慢的PHP代码,可以说php-fpm的慢执行日志是追踪网站性能的利器。在php-fpm.conf中加上两行配置即可:

request_slowlog_timeout = 1
slowlog = /usr/local/php/logs/slow.log


知识点七:shell脚本中显示颜色

为了让shell脚本在执行过程中输出的信息更加容易分辨,我们会特意给它带上颜色。先看一个例子:

# echo -e "\033[31m 红色字 \033[0m"
 红色字

这样输出的字颜色为红色,常用的几种颜色总结如下:
echo -e "\033[30m 黑色字 \033[0m"
echo -e "\033[31m 红色字 \033[0m"
echo -e "\033[32m 绿色字 \033[0m"
echo -e "\033[33m ×××字 \033[0m"
echo -e "\033[34m 蓝色字 \033[0m"
echo -e "\033[35m 紫色字 \033[0m"
echo -e "\033[36m 天蓝字 \033[0m"
echo -e "\033[37m 白色字 \033[0m"


本案例参考脚本

#!/bin/bash
#这是一个傻瓜运维脚本,根据列表输入对应数字即可实现想要的功能
#作者:
#日期:
#版本:v0.1

LANG=en
sar 1 5 > /tmp/cpu.log &
sar -n DEV 1 5 |grep '^Average:' > /tmp/net.log &
echo -n "收集数据"
for i in `seq 1 5`
do
  echo -n "."
  sleep 1
done
echo

t=`date +"%F %T"`
load=`uptime |awk -F 'load averages?:' '{print $2}'|cut -d '.' -f1`
cpu_idle=`tail -1 /tmp/cpu.log|awk '{print $NF}'`
cpu_use=`echo "scale=2;100-$cpu_idle"|bc`
mem_tot=`free -m |grep '^Mem:'|awk '{print $2}'`
men_ava=`free -m |grep '^Mem:'|awk '{print $NF}'`
mysql_p="dR6wB1jzp"
echo -e "\033[32m当前时间:$t \033[0m"
echo "######"
echo -e "\033[31m当前负载:$load \033[0m"
echo "######"
echo -e "\033[33mCPU使用率:$cpu_use% \033[0m"
echo "######"
echo -e "\033[34m内存总数:$men_tot"MB",内存剩余:$men_ava"MB" \033[0m"
echo "######"
echo -e "\033[35m磁盘空间使用情况: \033[0m"
df -h
echo "######"
echo -e "\033[36m磁盘iNode使用情况: \033[0m"
df -i
echo "######"
sed '1d' /tmp/net.log |awk '{print "网卡"$2":入口流量"$5/1000*8"Mbi,出口流量"$6/1000*8"Mbi"}'
echo "######"

get_acc_log()
{
    tail -100 /data/logs/www.log
}

get_mysql_slow_log()
{
    tail -50 /data/mysql/slow.log
}

get_php_slow_log()
{
    tail -50 /usr/local/php/logs/slow.log
}

restart_php()
{
    /etc/init.d/php-fpm restart
}

restart_nginx()
{
    /etc/init.d/nginx restart
}

get_mysql_process()
{
    mysql -uroot -p$mysql_p -e "show processlist"
}


PS3="请选择你想要做的操作:"

select c in 查看访问日志 查看mysql慢查询日志 查看php-fpm的慢执行日志 重启php-fpm服务 重启nginx服务 查看mysql队列 退出脚本
do
    case $c in
查看访问日志)
    get_acc_log
    ;;
查看mysql慢查询日志)
    get_mysql_slow_log
    ;;
查看php-fpm的慢执行日志)
    get_php_slow_log
    ;;
重启php-fpm服务)
    restart_php
    ;;
重启nginx服务)
    restart_nginx
    ;;
查看mysql队列)
    get_mysql_process
    ;;
退出脚本)
    exit 0
    ;;
    esac
done


Guess you like

Origin blog.51cto.com/13576245/2423620