Shell script cooperates with zabbix to realize fault self-healing of tomcat

Shell script cooperates with zabbix to realize fault self-healing of tomcat

1. Background and implementation methods

Tomcat running JAVA-like program code often leads to memory overflow. It is often processed after receiving the alarm. Re-processing after receiving the alarm will delay the time for troubleshooting. Therefore, it is necessary to rely on the fault self-healing mechanism to solve the problem of human intervention. the cost of.

There are many ways for a service to achieve self-healing:

  • Through shell script + timed task
    • The state of the application is detected through the shell script. The state is 1, which means that the abnormality is 0, and the state is normal. When the state is 1, the self-healing script is triggered to complete the self-healing of the program.
    • Detecting through scripts needs to be implemented with timed tasks. There are certain drawbacks. It may need to be detected every 5 minutes, which will affect server performance more or less.
  • Blue Whale Pass fault self-healing platform
    • The Blue Whale automated operation and maintenance platform has a fault self-healing module, which can easily obtain zabbix alarm information and then realize fault self-healing.
    • If you use the Blue Whale platform just to achieve self-healing, it will be a little useless. The Blue Whale platform is very complicated to build and requires a large number of servers. It is not recommended to use this method.
    • Blue Whale's article on realizing JAVA program failure self-healing: https://jiangxl.blog.csdn.net/article/details/118731222
  • shell script + zabbix trigger action
    • The most recommended fault self-healing method.
    • Add monitoring items of service status in zabbix, and configure triggers, and then configure the action function of zabbix. When a trigger alarm of abnormal service status is received, the self-healing script is executed in the remote server to realize the self-healing of the program. .
    • This method is not perfect. For example, there are 10 tomcats on a server, and the ports are different. The port number of the faulty service cannot be obtained through the zabbix trigger alarm. Therefore, a tomcat needs to be configured with a zabbix action.

Shell+zabbix realizes the general implementation steps of fault self-healing:

1. Add service status monitoring and triggers in zabbix.

2. Write fault self-healing recovery scripts. Services on different ports need to write separate scripts.

3. Configure the zabbix action function for each tomcat image that needs to be self-healing.

2. Write a fault self-healing script

#!/bin/bash
java_node=java-7180				
java_dir=/data/tomcat/${java_node}
java_port=`grep 'protocol="HTTP/1.1"' ${
     
     java_dir}/conf/server.xml |awk -F'"' '{if(NR==1){print $2}}'`
host_ip=192.168.10.100
dtime=`date +%F" "%H:%M:%S`
day=`date +%F`
selflheal_logdir=/var/log/java_selflheal

echo "${dtime} ${java_node} 开始自愈..." >>${selflheal_logdir}/selflheal-${day}.log

#关闭tomcat
ps aux | grep $java_dir | grep -v grep | awk '{print $2}' |xargs kill -9 

#启动服务
su - www -c "${java_dir}/bin/startup.sh"
if [ $? -eq 0 ];then
	sleep 10s
	for i in {
    
    1..20}
	do
		sleep 3s
		echo "第$i次尝试"
		ava=`curl -s http://${
     
     host_ip}:${
     
     java_port}/check`
		if [[ "$ava" = "true" ]];then
			echo "${dtime} ${java_node} 自愈成功!!!" >>${selflheal_logdir}/selflheal-${day}.log
			echo "=====================================================" >>${selflheal_logdir}/selflheal-${day}.log
			break
		fi
		if [ $i -ge 20 ];then
			echo  "${dtime} ${java_node} 自愈不成功!!!" >>${selflheal_logdir}/selflheal-${day}.log
			echo "=====================================================" >>${selflheal_logdir}/selflheal-${day}.log
			exit 1
		fi
	done
fi

3. Configure zabbix action mechanism to realize Tomcat fault self-healing

Achieving the goal: to achieve self-healing of faults, but also to send message reminders.

3.1. Creating Actions

Fill in the name of the action and associate the trigger with the abnormal service status.

insert image description here

3.2. Configure fault self-healing message content and execute self-healing script

1) The fault self-healing alarm information is as follows

-----------故障自愈事件触发-------
故障:{TRIGGER.STATUS},服务器:{HOSTNAME1}
故障_触发器名称:  {EVENT.NAME}  
IP地址:{HOST.CONN}
故障  时间:  {EVENT.DATE} {EVENT.TIME}
故障  事件:  {ITEM.NAME}:{ITEM.VALUE}

2) Add actions to execute remote commands

Zabbix achieves self-healing by executing commands on a remote host.

Select remote command for operation type -> fill in the server where tomcat is located in the target list - fill in the command to execute the fault self-healing script.
insert image description here

3.3. Action creation completed

insert image description here

4. Observe the fault self-healing

insert image description here

Guess you like

Origin blog.csdn.net/weixin_44953658/article/details/123268526
Recommended