UA camouflage

Web Scraper-UA Camouflage

UA camouflage

Masquerading of the request carrier identity:

User-Agent:

Request carrier identity, through the request initiated by the browser, the request carrier is the browser, then the User-Agent of the request is the identity of the browser, if the request initiated by the crawler program is used, the carrier of the request is the crawler program, then The User-Agent of the request is the identity of the crawler. The server can use this value to determine whether the request is made by a browser or a crawler.

Anti-climbing mechanism:

Some portal websites will capture and judge the User-Agent in the request to visit the website. If the UA of the request is a crawler, then the request data will be refused.

Anti-anti-climbing strategy:

Disguise the crawler's UA as a browser's identity

import requests
kew_word=input("请输入查询的关键字:")
url="https://www.sogou.com/web"
parm={
    'query':kew_word
}
#修改爬虫的UA为浏览器的UA
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" } res=requests.get(url=url,params=parm,headers=headers) with open(f"{kew_word}.html","w",encoding="utf-8")as fw: fw.write(res.text) print("爬取成功")

Reprinted in: https://www.cnblogs.com/whnbky/p/11520538.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326559808&siteId=291194637