urllib send data and anti reptiles reptiles learning process ---------- python

Transmitting data ------------------ ------------------ the urllib
urllopen method can not be directly added to the contents of other head and there is also a cookie before urllopen operation, to instantiate objects, to add some of the cookie and disposed head and a method of transmitting data .... urllib.request.Request (url, data = None, headers = None, origin_req_host = None, unverifiable = False, method = None)
Parameter Description:
url: used to request URL, this is a must-parameters, other parameters are optional.
data: must pass bytes (byte stream) type. Urlencode urllib.parse gains required by the module () coding.
headers: dictionary, headers can be configured by parameters directly when constructing request. UserAgent dictionary generally placed (with
User Agent) and cookie (temporary identity card)
origin_req_host: it refers to the party requesting host name or IP address.
unverifiabl: e represents the request is not verified, the default is False, meaning that a user does not have sufficient privileges to selected
Optional result of receiving this request.
method: is a string, the method used for indicating a request, such as GET, POST, and PUT like.
In which parameters are common headers, data, method
Example data Data bytes transmitted is data type can be used urllib.parse method of urlencode
GET data transmitted in a way that links them, we need to get the data string concatenation, data transmission needs
To be used by encoding to splice
urllib.parse urlencode the method for encoding and decoding content encoded parse_qs
 
Example:
data={'wd':'python'}
data=urllib.parse.urlencode(data)
data
'wd=python'
data=urllib.parse.parse_qs(data)
data
{'wd': ['python']}
 
Example:
url = 'https://www.baidu.com/s?'
import urllib.parse
from urllib import request
data={'wd':'python'}
data=urllib.parse.urlencode(data)
url=url+data
a = request.Request(url=url)
a.add_header('User‐Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac
OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
rep=request.urlopen(a)
rep.read()
 
POST data transmission, data transmission is also required POST encoded.
Example:
The urllib.parse Import 
from the urllib Import Request 
in the capture tool used url #url 
URL = 'HTTP: //**.**.com/login/' 
Request = request.Request (URL) 
#header head by add_header settings can also be added when the object is instantiated requests.Request 
request.add_header ( 
'the User-Agent', 'the Mozilla / 5.0 (X11; the U-;. the Linux i686) the Gecko / Firefox 20,071,127 / 2.0 
0.11') 
Data = { 'username ':' Melon ',' password ':' the passwd ',' AUTHCODE ':' 1234 '} 
encoded_data = urllib.parse.urlencode (Data) 
request.add_data (encoded_data) 
REQ = request.Request (URL, encoded_data)

 

------------------ urllib the solution to the general anti-reptiles ------------------
 
For anti reptiles general process is headrs (request header) is provided User-Agent (identifier), cookie (Temporary Identity
Card), Referer (arrivals). There are some ip address verification, determine the number of services we can through a proxy ip settings.
 
Is added to the dictionary when an instance object is added embodiment, is added after the instantiation () was added in a manner
 
urllib the User-Agent (identifier) ​​provided
= requests.Request Request (URL) 
request.add_header ( 
'the User-Agent', 'the Mozilla / 5.0 (X11; the U-;. the Linux i686) the Gecko / Firefox 20,071,127 / 2.0 
0.11') 
REP = request.urlopen (Request) 
REP. Read () 
# User-Agent can add_header provided, may be added when the object is instantiated requests.Request 
head = { 
'the User-Agent': 'the Mozilla / 5.0 (X11; the U-; the Linux i686) the Gecko / Firefox 20,071,127 / 2.0 . 0.11 ' 
 } 
A = request.Request (URL, headers = header) 
REP = request.urlopen (A) 
rep.read ()

 

in urllib the Referer (comers) provided
= requests.Request Request (URL) 
request.add_header ( 'the Referer', 'http://www.xxx.com/') 
REP = request.urlopen (Request) 
rep.read () 
#referer by add_header settings, can be added at the time the object is instantiated requests.Request 
head = { 'the Referer': 'http://www.xxx.com/'} 
A = request.Request (URL, headers = header) 
REP = request.urlopen (A) 
rep.read ()

 

urllib in the cookie (temporary identity card) Set
One way: passive add a cookie header in the headers, add the same after the previous instance of the object as user-agent and referer following is a blog after simulated landing page Park (passive)
 
Example:
= URL 'HTTPS: //www.cnblogs.com/lcyzblog/' 
from the urllib Import Request 
REQ = request.Request (URL) 
req.add_header ( 'Cookie', '_ = Ga GA1.2.439597965.1564067765; key .... parameter is ignored ') 
REP = request.urlopen (REQ) 
rep.read (). decode (' UTF-. 8 ') 
#cookie head may be provided by add_header, may be added when the object is instantiated requests.Request 
head = { 
' Cookie ':' _ga = GA1.2.439597965.1564067765; .... ignore key parameters ' 
} 
A = request.Request (URL, headers = header) 
REP = request.urlopen (A) 
. rep.read () decode (' UTF -8')

 

Second way: Using http.cookiejar module, you can take the initiative to go get cookie landing
Class module described in HTTP.COOKIEJAR
CookieJar: Management HTTP cookie value, store HTTP request generated by the cookie, the outgoing HTTP request object to add a cookie. The entire cookie is stored in memory, for instance cookie CookieJar after garbage collection will also be lost.
 
FileCookieJar (filename, delayload = None, policy = None): derived from CookieJar for creating FileCookieJar instance, and retrieve cookie information stored in the cookie file. filename is stored in the cookie file name. delayload supports file access latency access to True, that is, only when needed to read files or data stored in a file.
 
MozillaCookieJar (filename, delayload = None, policy = None): derived from FileCookieJar, create FileCookieJar instance cookies.txt compatible with the Mozilla browser.
 
LWPCookieJar (filename, delayload = None, policy = None): derived from FileCookieJar, with libwww-perl create standard FileCookieJar example Set-Cookie3 file format compatible after get get cookie
 
way generally do not get used to landing, here only to provide ideas
from http.cookiejar import CookieJar
url = 'http://www.baidu.com/' 
headers = {
'User‐Agent': 'Mozilla/5.0(Windows NT 10.0; WOW64)AppleWebKit/537.36 (KHTM L, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.3427.400 QQBrowser/ 9.6.12513.400'
} 
cookie = CookieJar()

from urllib import request
handler = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
resp = opener.open(url)
cookie
<CookieJar[Cookie(version=0, name='BAIDUID', value='...忽略

  

post website for cookie ideas ~ _ ~ because there is no code to find no landing site, so only provide ideas.
post ideas and get almost the way, it is just an object to be request.Request when opened opener.open is open, has been added to send good data in an object
Example:
from urllib import request
import urllib.parse
from http.cookiejar import CookieJar
url='http://www.xxx.com'
headers = {
User‐Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36'}
data = {'username': 'xxxxxxxxx', 'password': 'xxxxxxx'} 
data = urllib.parse.urlencode(data).encode('utf‐8')
cookie=CookieJar()
handler = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
req = request.Request(url, data)
resp = opener.open(req)

 

总结方式:
第一步导入对应的包和模块
第二步设置url和一些反反爬虫的参数如headers中的user-agent...在get获取cookie时不需要
实例化在post时需要先实例化并传入post发送的data(数据)
第三步使用CookieJar实例化一个对象,步骤中我们称为cookie对象。接着再使用request下的
HTTPCookieProcessor(cookie对象)丢入在生成构建一个对象,称为handler对象。在使用
request下的build_opener(handler对象)丢入生成一个获取返回信息的对象,称为opener
对象,在post时要先用request. Request方式生成一个带有发送数据的对象,get时直接传入
url。
最后一步,用opener对象的open方式丢入request.Request生成的对象或者一个普通的url。 此时再去访问cookie这个对象就可以获得服务器返回的cookie数据
 
 
ip代理设置方式
首先需要获取ip代理,我们可以在百度搜索免费ip代理作为测试,在实际工作中建议购买稳定 的ip代理。
代理分成两部分。第一个叫协议,第二个为ip地址和端口号,在python中以字典的 形式传入
from urllib import request
url = "http://www.baidu.com"
proxy = {'HTTPS': '218.91.112.139:9999'}
proxy_handler = request.ProxyHandler(proxy)
opener = request.build_opener(proxy_handler)
req=request.Request(url)
response=opener.open(req)
response.getcode()
200

  

 

 

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 

Guess you like

Origin www.cnblogs.com/lcyzblog/p/11266640.html