Not just anti-reptile Cookie

Cookie anti reptile

refers to a cookie anti crawler server user to distinguish between normal and crawler means by cookie value verification request header, this approach is widely used in a web application.

Cookie reptiles to bypass anti-combat

"""
Cookie 反爬虫绕过实战
示例2:旅游网公告详情页
网站:http://www.porters.vip/verify/cookie/content.html
任务:爬取旅游网公告详情页中地公告标题
"""


import requests
from lxml import etree

url = 'http://www.porters.vip/verify/cookie/content.html'
headers = {'Cookie': 'isfirst=789kq7uc1pp4c'}
#向目标网站发起请求
resp = requests.get(url,headers=headers)
#打印输出状态码
print(resp.status_code)
#如果本次请求地状态码是200,则继续,否则提示失败
if resp.status_code == 200:
    html = etree.HTML(resp.text)
    #根据HTML标签和签名从文档中去除标题
    res = html.cssselect('.page-header h1')[0].text
    print(res)
else:
    print('This request is fial !')



Cookie anti reptile principle and

Most of the crawler requests by default only HTML text resources, which means they will not take the initiative to complete the operation browser to save Cookie, this time anti-reptile formal use of this feature. That is how the browser to retrieve a Cookie and set it?

Browser automatically detect the presence of Set-Cookie header field response header, if present, the value is stored locally, and are automatically carried back to the Cookie value corresponding to each request, this time to the server side as long as the request header the Cookie value can be verified. The server

Guess you like

Origin blog.csdn.net/weixin_43870646/article/details/105179604