Set cookies and headers when using scrapy shell

Sometimes in order to test xpath, need temporary download pages, then use the command line to test is the most convenient, but many web pages that require authentication can not be used directly scrapy shell commands page crawl, so the need to re-structure the request, set cookies and headers.

First install ipython in the current environment with a python in scrapy

# Under python environment 
PIP install IPython
 # under conda environment 
conda install ipython

 First enter scrapy shell, it will automatically use ipython

scrapy shell

 

 

 

 

 

  

The cookies turn into a dictionary format

# 指定请求目标的 URL 链接
url = 'https://novel18.syosetu.com/n7016er/31/'
# 自定义 Headers 请求头(一般建议在调试时使用自定义 UA,以绕过最基础的 User-Agent 检测)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
# 构造需要附带的 Cookies 字典
cookies = {"key_1": "value_1", "key_2": "value_2", "key_3": "value_3"}
# 构造 Request 请求对象
req = scrapy.Request(url, cookies=cookies, headers=headers)
# 发起 Request 请求
fetch(req)
# 在系统默认浏览器查看请求的页面(主要为了检查是否正常爬取到内页)
view(response)
# 网页响应正文 byte类型
response.body
# 网页响应正文 str类型  
response.text  
# xpath选择器
repsonse.xpath()  

 



原文链接:https://blog.csdn.net/u010741500/article/details/100974510

Guess you like

Origin www.cnblogs.com/yoyowin/p/12348047.html