Simulation of crawler browser--Hreader property

Others 2022-04-27 11:28:59 views: 0

# Simulate the browser
    headers = ( "User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36")

Commonly used "User-Agent":

ua_list = [
    "Mozilla/5.0 (Windows NT 6.1; ) Apple.... ",
    "Mozilla/5.0 (X11; CrOS i686 2268.111.0)... ",
    "Mozilla/5.0 (Macintosh; U; PPC Mac OS X.... ",
    "Mozilla/5.0 (Macintosh; Intel Mac OS... "
]

user_agent = random.choice(ua_list)

`两种让爬虫模拟成浏览器的方法：`

Method 1: Use build_opener() to modify the header
Since urlopen() does not support some advanced HTTP functions, if we want to modify the header, we can use urllib.request.build_opener() for
example:

url = "http://blog.csdn.net/weiwei_pig/article/details/51178226"
header = (“User-Agent”,“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36”)
opener = urllib.request.build_opener()#Create a build_opener operation object
opener.addheaders = [header]#Add header information
data = opener.open(url).read()#Receive return information and read

#At this time, it has been imitated as a browser to open, and we save the crawled information
fhandle = open(“F:/python/part4/3.html”,“wb”)
a = fhandle.write(data)#print(a)View the number of bytes written
fhandle.close()

Method 2: Use add_header() to add headers
In addition to the above methods, you can also use add_header() under urllib.request.Request() to implement browser simulation:

import urllib.request
url = "http://blog.csdn.net/weiwei_pig/article/details/51178226"
req = urllib.request.Request(url)#Create a request object
req.add_header(“User-Agent”,“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36”)
data = urllib.request.urlopen(req).read()
data = data.decode("utf-8")#Transcode, convert the original data in the form of utf-8 encoding
print(data)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325561795&siteId=291194637

Simulation of crawler browser--Hreader property

Python crawler simulation login knowledge

Private property browser

Crawler ----- various browser proxies

Crawler ----- various browser proxies

By setting ua simulation browser

Python browser simulation of selenium

Python web crawler - Log on simple simulation examples to explain

[python] Crawler notes (ten) selenium simulation login & evasion detection

Python crawler four: mouse click simulation, JS analysis

Python crawler + selenium simulation click + web content needs to be clicked to open

Questionnaire survey batch simulation real people fill in crawler actual combat

Web of Science crawler [simulated browser]

Browser camouflage technology for crawler (019)

HtmlUnit crawler framework for java browser

CSS3 browser private property

Codeforces Round # 631 (Div. 2) (D. XOR property conclusion E simulation?)

Python crawler selenium closes and switches browser tabs

python crawler.2. disguised browser

docker+python headless browser crawler

[python crawler] how to get the browser header

[Crawler] Urllib lets our python pretend to be a browser

[Crawler]1.2.3 Using the browser’s developer tools

Crawler combat study notes_7 [Actual combat] Simulation download page video (template)

python-web-selenium simulation control browser

python-web-selenium simulation control browser

Browser console newspaper Can not read property 'conf' of undefined

jquery ------ it determines whether the browser supports a css property (a solution to)

jquery ------ it determines whether the browser supports a css property (a solution to)

form of property novalidate cancel browser that comes with the check function

Recommended

Rushing to the GitHub hot list——How can open source programming languages and frameworks be so cute?

Beijing Humanoid Robot Innovation Center launches Tiangong, the world's first full-size humanoid robot with purely electric drive for anthropomorphic running

Ranking

8个无需编写代码即可使用 Python 内置库的方法

Java collections interview knowledge Lite

Machine learning algorithms foundation - Introduction a (watermelon materials for the book)

(Easy) Ransom Note - LeetCode

[Five days] Qt from entry to actual combat: the second day

Remember once extremely pit father can not download Maven Jar package of issues: IDEA question

The minimum cost Shortest

OSPF study map (the most complete version)

Network Takeaways

GnuPG

Daily

More

2024-04-27(29)

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)