Wrote a script to monitor the abnormality of the ElasticSearch process!

Author: JackTian
Source: Public Account "Jake's IT Journey"
ID: Jake_Internet
Link: I wrote a script to monitor the abnormality of the ElasticSearch process!

Server configuration key-free environment preparation:

Before configuring keyless, you need to configure the corresponding relationship between the target host name and IP in the hosts file of the server.

vim /etc/hosts
IP1 hostname1
IP2 hostname2
......

Unzip the mianmiyaojiaoben.zip installation package in the current directory

cd /usr/local/jiaoben
unzip mianmiyaojiaoben.zip

Modify the mianmiyao_config configuration file, add the target host name and target host password, and call it by using a key-free script.

vim mianmiyao_config

AllHosts=hostname1,hostname2
Passwd='test23!\@Test^&*','test23!\@Test^&*'

In the configuration file, note:

  • AllHosts: You can configure the hostname of the current host leading to the target host, which can save the key itself, and the number is not limited. Multiple target hosts need to be separated by commas

  • Passwd: The password corresponding to the host, the sequence needs to correspond to the sequence of the host

  • If the original password is: test23!@Test^&*, the password characters with special symbols can be escaped with \ character

Contents of mianmiyao.sh script file:

vim mianmiyao.sh

#!/bin/bash -x
source mianmiyao_config
yum -y install expect expect-devel
#rm -rf /root/.ssh/*
/usr/bin/expect -d <<-EOF
set timeout 100
spawn ssh-keygen -t rsa
expect {
"*id_rsa):" { send "\r"; exp_continue }
"*(y/n)?" { send "y\r"; exp_continue }
"*passphrase)*" { send "\r"; exp_continue }
"*again:" { send "\r"; exp_continue }
"*-------+" { send "\r"}
}
expect eof
EOF

hostsarr=(${hosts//,/ })
passwdarr=(${passwd//,/ })
num=${#hostsarr[@]}  
for((i=0;i<num;i++));  
do  
    /usr/bin/expect <<-EOF
    set timeout 100
    spawn ssh-copy-id ${hostsarr[i]}
    expect {
    "*(yes/no)?" { send "yes\r"; exp_continue }
    "*password:" { send "${passwdarr[i]}\r"; exp_continue }
    "*authorized_keys*" { send "\r"}
    }
    expect eof
    exit
EOF
done 

Add execute permission to mianmiyao.sh file and execute this script

chmod +x mianmiyao.sh
./mianmiyao.sh

After the script is executed, you can manually execute the following command first. If you jump to the corresponding target server without entering a password, it means success.

ssh hostname2

Server deployment monitoring ElasticSearch environment preparation:

Add the corresponding ES cluster host name, ES port, and ES master node server host name to the cpufreedisk_config configuration file.

vim cpufreedisk_config

# 所有 ES 集群的主机名,用英文逗号分隔,需要在免密钥机器上执行
EsHosts=hostname1,hostname2

# ES 端口
EsPort=9200

# ES 主节点服务器的主机名
EsMaster=hostname1

Put the cpufreedisk.sh script file into the /usr/local/jiaoben/ directory of the ElasticSearch server

#!/bin/bash
# @Time    : 2023/02/01
# @Author  : JackTian
# @File    : cpufreedisk.sh
# @Desc    : 使用该脚本监控 ES 系统程序假死、挂掉、异常及服务器断网、宕机服务器恢复后,程序做判断恢复/检测服务器cpu内存磁盘。
# 使用前提:ES 集群服务器配置免密钥
# 使用方法:将 cpufreedisk.sh 脚本放置 ES 服务器的 /usr/local/jiaoben/ 目录下、在 cpufreedisk_config 中配置 ES 集群的主机名、端口、ES 主节点服务器的主机名
# 设置定时任务(可以事先手动执行)
# 0 6 * * * source /etc/profile && cd /usr/local/jiaoben && ./cpufreedisk.sh
source /usr/local/jiaoben/cpufreedisk_config

function esStatus
{
curl --connect-timeout 30 -m 60 $1:$esport > resultEsCurl.log
echo "`cat resultEsCurl.log | grep cluster_name`"
}

function esLost
{
iptemp=`cat /etc/hosts | grep -w $1 | grep '^[^#]' | awk '{print $1}'`
curl --connect-timeout 30 -m 60 $esMaster:$esport/_cat/nodes?v | grep $iptemp > resultEsCurl1.log
echo "`cat resultEsCurl1.log`"
}

function esDie
{
ssh $1 "source /etc/profile && jps | grep Elasticsearch | awk '{print \$1}' | xargs"
}

function restart
{
ssh $1 <<EOF
echo "请手动启动 ES 进程"
exit
EOF
}


today=$(date +"%Y-%m-%d")
todaytime=`date`
#针对 ES 做假死、宕机、挂掉,做日志记录和处理
serverroothostname=(${esHosts//,/ })
for rootHost in ${serverroothostname[*]}
do
    esStatusResult=`esStatus $rootHost`
    echo "$rootHost 的状态为: $esStatusResult"
    if [ -n "$esStatusResult" ];then
        esLostResult=`esLost $rootHost`
        echo "$rootHost 的状态为: $esLostResult"
        if [ -n "$esLostResult" ];then
            echo "ES 运行状态正常。"
        else
            echo "$rootHost 脱离集群。"
            echo "${todaytime}ES的${rootHost}节点脱离集群。请人工排查" >> /usr/local/jiaoben/ESmanager.log
                        restart $rootHost
        fi
    else
        echo "${todaytime}xxx系统$rootHost 的 ES 进程运行状态异常,启动重启中..." >> /usr/local/jiaoben/ESmanager.log
        echo "${todaytime}xxx系统$rootHost 重启" >> /usr/local/jiaoben/ESmanager.log

ssh $rootHost <<EOF >>/usr/local/jiaoben/ESmanager.log
        mkdir -p /usr/local/jiaoben/
        cd /usr/local/jiaoben/
        echo "--------------------------------------服务器分割线-------------------------------------------"
        echo "$rootHost磁盘信息"
        df -h
        echo "$rootHost内存信息(单位为:G)"
        free -h
        echo "$rootHost的CPU信息"
        vmstat
        exit
EOF
        if [ $? -eq 0 ];then
                        esDieResult=`esDie $rootHost`
                        if [ -n "$esDieResult" ];then
                        echo "${todaytime}xxx系统 ES 出现假死,已执行重启临时解决,详情参看日志" >> /usr/local/jiaoben/ESmanager.log
                        else
            echo "${todaytime}xxx系统 ES 未启动,已执行重启临时解决,详情参看日志" >> /usr/local/jiaoben/ESmanager.log
                        fi
        else
            echo "${todaytime}xxx系统 ES 服务器疑似宕机:无法 ssh 登录" >> /usr/local/jiaoben/ESmanager.log
        fi
        restart $rootHost
    fi

done

Add executable permission to the cpufreedisk.sh script file and execute it

chmod +x cpufreedisk.sh
./cpufreedisk.sh

Set periodic timing tasks and execute them regularly every day.

crontab -e
# 使用该脚本监控 ES 系统程序假死、挂掉、异常及服务器断网、宕机服务器恢复后,程序做判断恢复/检测服务器cpu内存磁盘。
0 6 * * * source /etc/profile && cd /usr/local/jiaoben && ./cpufreedisk.sh

Recommended reading:

Wrote a script that automatically inspects multiple interface addresses!

7 very useful Shell script examples!

Super hardcore! 11 very practical examples of Python and Shell scripts!

Ready-to-use script case (3)


The above is all the content to be shared today.

If you think this article is useful to you, please like this article, leave a comment or forward it, so that more friends can see it, because this will be the strongest motivation for me to continue to output more high-quality articles!

Guess you like

Origin blog.csdn.net/jake_tian/article/details/128845318