wedge
Reptile request in the request, very often, we need to add the request headers, otherwise the server will be considered illegal requests, thereby denying you access.
import requests
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
response = requests.get(url=url)
print(response.status_code) # 400
In addition request header is to add the most commonly used user-agent
terms in this request disguised as a browser.
User Agent Chinese called user agent, referred to as UA, it is a special string head, so that the server can identify the operating system and the version used by the customer, CPU type, browser and version, browser rendering engine, browser language, browser plug-ins.
So how do the individual user-agent
does, right? Eight Immortals recount, but to solve personal problems are generally hand!
import requests
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
}
response = requests.get(url=url, headers=headers)
print(response.status_code) # 200
However, with the introduction fake_useragent
, my mother no longer have to worry about ......
from fake_useragent import UserAgent
# 实例化 user-agent 对象
ua = UserAgent()
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
headers = {"user-agent": ua.chrome} # 指定浏览器 user-agent # 或者可以这样写 # headers = {"user-agent": UserAgent().random} # 一步到位,随机生成一个 user-agent response = requests.get(url=url, headers=headers) print(response.status_code) # 200
About
What is fake_useragent?
In short, fake_useragent
just like your girlfriend, you can help us to generate flexible user-agent
, freeing both hands.
install
pip install fake_useragent
update
pip install -U fake-useragent
View version
import fake_useragent
print(fake_useragent.VERSION) # 0.1.11
Usage
Generate the specified browser user-agent
import fake_useragent
# 实例化 user-agent 对象
ua = fake_useragent.UserAgent()
# ua.ie
print(ua.ie) # Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; chromeframe/13.0.782.215)
# ua.msie print(ua['Internet Explorer']) # Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; SLCC1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727) # ua.opera print(ua.opera) # Opera/9.80 (Windows NT 6.1; U; en-US) Presto/2.7.62 Version/11.01 # ua.chrome print(ua.chrome) # Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36 # ua.google print(ua['google chrome']) # Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36 # ua.firefox print(ua.firefox) # Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/21.0.1 # ua.ff print(ua.ff) # Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/29.0 # ua.safari print(ua.safari) # Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
Randomly generated user-agent
import fake_useragent
# 实例化 user-agent 对象
ua = fake_useragent.UserAgent()
print(ua.random) # Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36
print(ua.random) # Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)
print(ua.random) # Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
Randomly generated every time a UA said, greatly enhancing the authenticity of the reptiles.
Other Uses
The remote user agent json file downloaded to the local
Since the fake_useragent
database maintenance of user-agent json file is online:
import fake_useragent
print(fake_useragent.settings.CACHE_SERVER)
'''
# 网址,其实是个json文件
https://fake-useragent.herokuapp.com/browsers/0.1.11
'''
Since it is online json file, then we can be downloaded to the local:
from fake_useragent import UserAgent, VERSION
location = './fake_useragent%s.json' % fake_useragent.VERSION
ua = UserAgent(path=location)
If the error fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
, re-run the code just fine.
You will find in the bin directory with the same level script files have a json file.
If you only want the new json file saved locally
from fake_useragent import UserAgent
ua = UserAgent()
ua.update()
If you do not want to cache database or file system is not writable
from fake_useragent import UserAgent
ua = UserAgent(cache=False)
If you do not want to use the hosted cache server
from fake_useragent import UserAgent
ua = UserAgent(use_cache_server=False)
Handling Exceptions
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries
from fake_useragent import UserAgent
# 禁用服务器缓存: use_cache_server=False
headers = {"User-Agent": UserAgent(use_cache_server=False).chrome} response = requests.get(url=url, headers=headers) print(response.status_code) # 200
FakeUserAgentError(‘Maximum amount of retries reached’
from fake_useragent import UserAgent
# 法1 禁用服务器缓存: use_cache_server=False
headers = {"User-Agent": UserAgent(use_cache_server=False).chrome} # 法2 忽略ssl验证 headers = {"User-Agent": UserAgent(verify_ssl=False).chrome} # 法3 不缓存数据 headers = {"User-Agent": UserAgent(cache=False).chrome} response = requests.get(url=url, headers=headers) print(response.status_code) # 200
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
from fake_useragent import UserAgent, VERSION
location = './fake_useragent%s.json' % fake_useragent.VERSION
ua = UserAgent(path=location)
When I json file will be written to the local online, the urllib.error.URLError: <urlopen error timed out>
error caused re-run just fine, local file download is complete.
wedge
Reptile request in the request, very often, we need to add the request headers, otherwise the server will be considered illegal requests, thereby denying you access.
import requests
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
response = requests.get(url=url)
print(response.status_code) # 400
In addition request header is to add the most commonly used user-agent
terms in this request disguised as a browser.
User Agent Chinese called user agent, referred to as UA, it is a special string head, so that the server can identify the operating system and the version used by the customer, CPU type, browser and version, browser rendering engine, browser language, browser plug-ins.
So how do the individual user-agent
does, right? Eight Immortals recount, but to solve personal problems are generally hand!
import requests
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
}
response = requests.get(url=url, headers=headers)
print(response.status_code) # 200
However, with the introduction fake_useragent
, my mother no longer have to worry about ......
from fake_useragent import UserAgent
# 实例化 user-agent 对象
ua = UserAgent()
url = 'https://www.zhihu.com/question/315387406/answer/812734512'
headers = {"user-agent": ua.chrome} # 指定浏览器 user-agent # 或者可以这样写 # headers = {"user-agent": UserAgent().random} # 一步到位,随机生成一个 user-agent response = requests.get(url=url, headers=headers) print(response.status_code) # 200
About
What is fake_useragent?
In short, fake_useragent
just like your girlfriend, you can help us to generate flexible user-agent
, freeing both hands.
install
pip install fake_useragent
update
pip install -U fake-useragent
View version
import fake_useragent
print(fake_useragent.VERSION) # 0.1.11
Usage
Generate the specified browser user-agent
import fake_useragent
# 实例化 user-agent 对象
ua = fake_useragent.UserAgent()
# ua.ie
print(ua.ie) # Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; chromeframe/13.0.782.215)
# ua.msie print(ua['Internet Explorer']) # Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; SLCC1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 2.0.50727) # ua.opera print(ua.opera) # Opera/9.80 (Windows NT 6.1; U; en-US) Presto/2.7.62 Version/11.01 # ua.chrome print(ua.chrome) # Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36 # ua.google print(ua['google chrome']) # Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36 # ua.firefox print(ua.firefox) # Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/21.0.1 # ua.ff print(ua.ff) # Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/29.0 # ua.safari print(ua.safari) # Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
Randomly generated user-agent
import fake_useragent
# 实例化 user-agent 对象
ua = fake_useragent.UserAgent()
print(ua.random) # Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36
print(ua.random) # Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)
print(ua.random) # Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
Randomly generated every time a UA said, greatly enhancing the authenticity of the reptiles.
Other Uses
The remote user agent json file downloaded to the local
Since the fake_useragent
database maintenance of user-agent json file is online:
import fake_useragent
print(fake_useragent.settings.CACHE_SERVER)
'''
# 网址,其实是个json文件
https://fake-useragent.herokuapp.com/browsers/0.1.11
'''
Since it is online json file, then we can be downloaded to the local:
from fake_useragent import UserAgent, VERSION
location = './fake_useragent%s.json' % fake_useragent.VERSION
ua = UserAgent(path=location)
If the error fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
, re-run the code just fine.
You will find in the bin directory with the same level script files have a json file.
If you only want the new json file saved locally
from fake_useragent import UserAgent
ua = UserAgent()
ua.update()
If you do not want to cache database or file system is not writable
from fake_useragent import UserAgent
ua = UserAgent(cache=False)
If you do not want to use the hosted cache server
from fake_useragent import UserAgent
ua = UserAgent(use_cache_server=False)
Handling Exceptions
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries
from fake_useragent import UserAgent
# 禁用服务器缓存: use_cache_server=False
headers = {"User-Agent": UserAgent(use_cache_server=False).chrome} response = requests.get(url=url, headers=headers) print(response.status_code) # 200
FakeUserAgentError(‘Maximum amount of retries reached’
from fake_useragent import UserAgent
# 法1 禁用服务器缓存: use_cache_server=False
headers = {"User-Agent": UserAgent(use_cache_server=False).chrome} # 法2 忽略ssl验证 headers = {"User-Agent": UserAgent(verify_ssl=False).chrome} # 法3 不缓存数据 headers = {"User-Agent": UserAgent(cache=False).chrome} response = requests.get(url=url, headers=headers) print(response.status_code) # 200
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
from fake_useragent import UserAgent, VERSION
location = './fake_useragent%s.json' % fake_useragent.VERSION
ua = UserAgent(path=location)
When I json file will be written to the local online, the urllib.error.URLError: <urlopen error timed out>
error caused re-run just fine, local file download is complete.