Business scene
The file is used as the interface between the applications. In order to simplify the design, a separate file transfer module is not used, but the upload and download of files are automatically completed by the system level by using a shared nfs server. However, in the maintenance phase, it often happens that nfs cannot be connected. The problem caused the application io to report an error, which affected the normal progress of the business.
Cause Analysis
1>Unknown to the client maintenance personnel, the server where the nfs server is located has been restarted, resulting in the original mount point being unable to access the nfs server, which needs to be re-mounted
2> The nfs client side host restarts, there is no automatic mount or the mount fails, causing the application to report an error
3> The network failure causes the nfs connection to be unavailable
countermeasures taken
Automatically detect the situation of nfs mount through the shell program
Program implementation
When I manually confirm nfs, I use the df command. If there is a problem with the nfs connection, because it is synchronous IO, df will be stuck there and wait until the nfs connection is available. Judging whether nfs is normal by the execution of the df command, but it needs to be implemented through two processes. If one process is stuck, once the df command is stuck there, the whole program will be stuck there, so two shell programs are needed. .
1. Main shell program
Invoke the subshell program and let the subshell program execute in the background
sleep for 30 seconds, then use pid to detect whether the subshell process has exited
If the child process is still there, it means that the nfs connection is abnormal and returns an exception exit 2,
Note: There is no kill sub-shell process here. If the nfs mount is successful, df will return successfully, and the sub-process will automatically exit. Of course, you can also kill it manually.
If the child process is not there, then further execute the df|grep command to see if the mount point exists
If the mount point does not exist, execute the mount command to mount it
If mount is successful, return success exit 0
If mount fails, return exception exit 1
2. Subshell program
write pid to a file
Execute the df command
main program
#!/bin/sh echo "checknfs.sh start..." workdir="/home/wk" mdirdown="/data/download" mdirup="/data/upload" remoteip=1.2.3.4 rcdown=0 rcup=0 cd $workdir sh ./checkdf.sh& sleep 30 pid=`cat checkdfpid.log` echo "got checkdf.sh pid is:"$pid ps -p $pid if [ $? -ne 0 ]; then downcount=`df|grep $mdirdown|wc -l` upcount=`df|grep $mdirup|wc -l` if [ $downcount -ge 1 ] && [ $upcount -ge 1 ]; then echo "checknfs.sh end normally..." exit 0 else if [ $downcount -lt 1 ]; then mount $remoteip:$mdirdown $mdirdown rcdown=$? if [ $rcdown -ne 0 ]; then echo "(mount fail) -> "$mdirdown be be if [ $upcount -lt 1 ]; then mount $remoteip:$mdirup $mdirup rcup=$? if [ $rcup -ne 0 ]; then echo "(mount fail) -> "$mdirup be be if [ $rcdown -eq 0 ] && [ $rcup -eq 0 ]; then echo "checknfs.sh end normally(remounted)..." exit 0 else echo "checknfs.sh end abnormally(mount fail)..." exit 1 be be else echo "checknfs.sh end abnormally(df block)..." exit 2 be
subroutine
#!/bin/sh echo "checkdf.sh start..." echo 'checkdf pid:'$$ echo $$ > checkdfpid.log df echo "checkdf.sh end normally..."