User-Agent to bypass anti-reptile combat
User-Agent crawler is the server means to distinguish between normal and crawler via user verification request header User-Agent value, which is a more primary anti crawler means.
"""
User-Agent 反爬虫绕过实战
实例1.校园新闻网列表页User-Agent反爬虫
任务:爬取校园新闻网站页面右侧“本周热点”列表中的新闻标题
URL:http://www.porters.vip/verify/uas/index.html
"""
import requests
from parsel import Selector
url = 'http://www.porters.vip/verify/uas/index.html'
#向目标网站发起请求
resp = requests.get(url=url)
#打印输出状态码
print(resp.status_code)
#如果本次请求的状态码为200,则继续,否则提示失败
if resp.status_code == 200:
sel = Selector(resp.text)
#根据HTML标签和属性从响应正文中提取新闻标题
res = sel.css('.list-group-item::text').extract()
print(res)
else:
print('This request is Fial !')
The request did not succeed, but the browser can be opened normally, this is why? Is the site what the problem is, we can try Postman, Postman request the following results