UA camouflage

Collector -UA camouflage page

UA camouflage

Request the identity of the carrier camouflage identified:

User-Agent:

Request carrier identity, initiated by the browser request, request support for the browser, the request of the User-Agent for the identity of the browser, if a request crawler initiated the request of the carrier for the reptile program, the User-Agent request to identify the identity of the crawler. The server can be judged by the value of the browser that initiated the request or reptiles.

Anti-climbing mechanisms:

Some portals have a User-Agent request access to the site of capture and determine if the request is UA crawlers, then declined to provide the requested data.

Anti-anti-climbing strategy:

The reptile UA disguised as an identity of a browser

import requests
kew_word=input("请输入查询的关键字:")
url="https://www.sogou.com/web"
parm={
    'query':kew_word
}
#修改爬虫的UA为浏览器的UA
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" } res=requests.get(url=url,params=parm,headers=headers) with open(f"{kew_word}.html","w",encoding="utf-8")as fw: fw.write(res.text) print("爬取成功")

Guess you like

Origin www.cnblogs.com/whnbky/p/11520538.html