Shell script automatically grabs spider 404 dead links and submits them to search engines

The main function of the script: regularly analyze the nginx log of the previous day of the website every day, then extract the status code 404 and UA is the crawling path of Baidu spider, and write it to the death.txt file in the root directory of the website for submitting Baidu dead links .

#!/bin/bash

#Desc: Death Chain File Script
#Author: ZhangGe
#Blog: http://zhangge.net/5038.html #Date
: 2015-05-03 #Initialize
variables #Define
spider UA information (default is Baidu spider)
UA=' +http://www.baidu.com/search/spider.html' #The


date of the previous day (nginx log)
DATE=`date +%Y-%m-%d -d "1 day ago"` #Define


log Path
logfile=/www/wwwlogs/www.80rc.com.log_${DATE}.log #Define the


storage path of dead link files
deathfile=/www/wwwroot/80rc/death.txt #Define


website access address
website=http:/ /www.80rc.com #Analyze


logs and save dead link data
for url in `awk -v str="${UA}" '$9=="404" && $15~str {print $7}' ${logfile}`
do
        grep -q "$url" ${deathfile} || echo ${website}${url} >> ${deathfile}
done

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325511907&siteId=291194637