The main function of the script: regularly analyze the nginx log of the previous day of the website every day, then extract the status code 404 and UA is the crawling path of Baidu spider, and write it to the death.txt file in the root directory of the website for submitting Baidu dead links .
#!/bin/bash
#Desc: Death Chain File Script#Author: ZhangGe
#Blog: http://zhangge.net/5038.html #Date
: 2015-05-03 #Initialize
variables #Define
spider UA information (default is Baidu spider)
UA=' +http://www.baidu.com/search/spider.html' #The
date of the previous day (nginx log)
DATE=`date +%Y-%m-%d -d "1 day ago"` #Define
log Path
logfile=/www/wwwlogs/www.80rc.com.log_${DATE}.log #Define the
storage path of dead link files
deathfile=/www/wwwroot/80rc/death.txt #Define
website access address
website=http:/ /www.80rc.com #Analyze
logs and save dead link data
for url in `awk -v str="${UA}" '$9=="404" && $15~str {print $7}' ${logfile}`
do
grep -q "$url" ${deathfile} || echo ${website}${url} >> ${deathfile}
done