[Notes] Python self Scrapy reptiles project settings commonly used settings
Note: The following codes are added / modified in Scrapy project settings in
- Setting a record level of day to remove unwanted log
LOG_LEVEL="WARNING" # warning表示警告日志
the type of log information:
ERROR : 一般错误
WARNING : 警告
INFO : 一般的信息
DEBUG : 调试信息
默认的显示级别是 DEBUG
Set the log information to specify the output:
在settings.py配置文件中,加入LOG_LEVEL = ‘指定日志信息种类’即可。
LOG_FILE = 'log.txt'则表示将日志信息写入到指定文件中进行存储。
- The robots set to False, so that you can not comply with the rules of the site crawlers
ROBOTSTXT_OBEY = False
- Addition request header, the analog browser sends the request
# 把这条注释取消,并通过浏览器调试工具获得UWER_AGENT
#USER_AGENT = 'ctwp_spider (+http://www.yourdomain.com)'
# 如下示例
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'