python reptile User Agent User Agent

UserAgent Profile

UserAgent Chinese called user agent, is part Http protocol, part of the header field belonging, UserAgent also referred UA. It is a special string head, providing a browser type and version you are using to access the site, and operating system version, browser kernel, and other identifying information. Through this identification, the site the user visits can display a different layout so as to provide a better experience or information statistics; for example, using a mobile phone to access Google and computer access is not the same, these are Google judged according to UA visitors of. UA can be disguised. 
UA string browser standard format: Browser Identification (operating system identity; the level of encryption identifier; Browser Language) rendering engine that identifies the version information. But each browser is different.

We do reptiles, when not properly accessed through a browser, it will be denied access to many sites, this time we need to manually add the headers in UA property to masquerade as browser access.

Common UserAgent value

Use the time we can directly copy, put headers in the corresponding User-Agent parameter

UserAgent method to add two

1 headers defined directly a dictionary, then passed to the Request to instantiate an object class, and then passed to the urlopen, the following format:

1
2
3
4
5
6
7
8
9
10
from  urllib  import  request
 
url  =  'http://baidu.com'
 
headers  =  { 'User-Agent' : 'Mozilla/5.0(iPhone;U;CPUiPhoneOS4_3_3likeMacOSX;en-us)AppleWebKit/533.17.9(KHTML,likeGecko)Version/5.0.2Mobile/8J2Safari/6533.18.5' }
 
req  =  request.Request(url,headers = headers)
response  =  request.urlopen(req)
 
print (response.read().decode())

2. Use the add_header () method

 
 
1
2
3
4
5
6
7
8
9
from  urllib  import  request
 
url  =  'http://baidu.com'
 
req  =  request.Request(url)
req.add_header( 'User-Agent' , 'Mozilla/5.0(iPhone;U;CPUiPhoneOS4_3_3likeMacOSX;en-us)AppleWebKit/533.17.9(KHTML,likeGecko)Version/5.0.2Mobile/8J2Safari/6533.18.5' )
response  =  request.urlopen(req)
 
print (response.read().decode())

Guess you like

Origin www.cnblogs.com/mxk123/p/12006966.html