shell script to monitor k8s cluster job status, error monitoring If there is an alarm triggered by the process of cloud Ali

#!/bin/bash

while [ 1 ]
   
do
 
   job_error_no=`kubectl get pod -n weifeng |grep -i "job"|grep -ci error`
   
   
 
   if [ $job_error_no -gt 0  ];then
      ps -fe|grep k8s_job_status_monitor|grep -v grep|awk '{print $2}'|xargs kill -9
      echo "k8s job running  is not stable " >> /tmp/k8s_job_error_no.log
 
   fi
   sleep 60
 
done

  

If k8s cluster job status appear error, the script automatically kill off their montior process, triggered by an alarm monitoring process to monitor cloud Ali cloud  

Ali and so the monitoring process monitoring document   https://www.cnblogs.com/weifeng1463/p/11591796.html

 

Guess you like

Origin www.cnblogs.com/weifeng1463/p/11776633.html