A Send Request post
import requests url = "" # 发送post请求 data = { } response = requests.post(url, data=data)
II. Intranet Certification
auth= (user,pwd) response = requests.get(url, auth=auth)
III. Acting
import requests url = "http://www.baidu.com" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/557.36" } free_proxy = {'http': '118.187.58.35:53281'} response = requests.get(url, proxies = free_proxy) data = response.content.decode() with open("proxy.html", "w") as f: f.write(data)
IV. Third-party CA certificate
Know the difference between day1 in https to http and https is a third-party CA certificate authentication, but some sites though is https, but it is not a CA certificate (can be their own certificates issued, as previously 12306), so would visit the site can not, reported the following error
Solution: Ignore certificate to tell web access
code show as below
import requests url = 'https://10.10.10.9/' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36' } response = requests.get(url=url, headers=headers, verify=False) data = response.content.decode() with open('03-ssl.html', 'w') as f: f.write(data)
Five. Cookie
1. cookie workflow:
When users browse A Cookie use of a site, the site's server generates a unique identification code is A, and thus produce a project in the back-end database server as an index. Then add a header line called Set-cookie in the HTTP response message to the A's. Here, " '" header field name "is the" Set-cookie ", and the latter value is assigned to the user," "identification code" ", for example, this is a header row
Set-cookie:31d4d96e407aad42
When A receives the response, adding a line to which the browser in a particular file Cookie it manages, wherein the identification code comprises a given server host name and the rear Set-cookie. When A continue to browse this site, each sending a HTTP request messages, their browser will remove the site from its identification code Cookie file, and place the cookie HTTP request packet
Header row:
Cookie:31d4d96e407aad42
As a result, the site will be able to track users 31d4d96e407aad42 (User A) activity in the site. Note that the server does not need to know the user's real name and other information. But the server can know at what time the user 31d4d96e407aad42 visit which pages, as well as access to the order of the pages. If A is shopping online, then the server can maintain a list of items purchased is A, the A paid shopping together at the end of this
If A visit this site again in a few days, his browser will continue to use the request packet header row Cookie in its HTTP: 31d4d96e407aad42, and this site servers can access recommended product according to his past record A. If A has been registered and used the credit card to pay, then the site would have saved information Name A's, e-mail, credit card number on the site. Thus, when the A to continue shopping on the site, as long as still use the same computer, because the HTTP browser generated request packet carries the same Cookie header row, the server can use the Cookie to verify that this is the user A, So after A will not have to re-enter information as name, credit card number at the keyboard when shopping this site. This is obviously convenient for the customer.
Although Cookie users can simplify the process of online shopping, but Cookie's use has caused a lot of controversy. Some people will think that Cookie computer virus to the user's computer. In fact, this is a misunderstanding of Cookie's. Cookie is just a small text file, not an executable program of the computer, it is impossible to spread computer viruses, it is impossible to obtain information about a user's computer hard drive. For another controversy Cookie, it is the protection of user privacy. For example, the website server knows some information A, it is possible to sell this information to third parties. Cookie can also be used to collect user behavior on the Web site. These are all the user's personal privacy. Some sites to enable customers at ease, openly stated that they will protect the privacy of customers, the customer will not or identification code and personal information sold or transferred to other manufacturers.
2. There are two ways of cookie
(1) obtaining artificial cookie, then add it to the request
In the form of a cookie module requests a dictionary or cookieJar object, as shown in FIG. Source:
Generally acquired from the packet cookie is a string, to artificially converted dict be cumbersome (can be performed with sublime graphical interface regular match, then the resulting dictionary copy over), the following code is directly regular matching codes, dict obtained in the form of cookie, the following code
import requests member_url = 'https://www.yaozh.com/member/' headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36' } # cookies的字符串 cookies = 'acw_tc=2f624a2a15635259218338401e63e5d881869f24ac893d6eef3e3a125160f6; _ga=GA1.2.217274285.1563525921; _gid=GA1.2.2017411935.1563525921; _gat=1; Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1563611476; yaozh_userId=786936; UtzD_f52b_saltkey=vG51G41g; UtzD_f52b_lastvisit=1563607885; yaozh_uidhas=1; acw_tc=2f624a2a15635259218338401e63e5d881869f24ac893d6eef3e3a125160f6; Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1563522881%2C1563523792%2C1563525921%2C1563611474; UtzD_f52b_ulastactivity=1563523048%7C0; UtzD_f52b_creditnotice=0D0D2D0D0D0D0D0D0D696235; UtzD_f52b_creditbase=0D0D0D0D0D0D0D0D0; UtzD_f52b_creditrule=%E6%AF%8F%E5%A4%A9%E7%99%BB%E5%BD%95; _ga=GA1.1.2067661254.1563611505; _gid=GA1.1.484577145.1563611505; MEIQIA_VISIT_ID=1OGl2skSTjPKvqcpWkd9Il9EqnL; yaozh_logintime=1563611638; yaozh_user=786936%09jjfeng123; db_w_auth=696235%09jjfeng123; UtzD_f52b_lastact=1563611638%09uc.php%09; UtzD_f52b_auth=9e94yt5nXd1XEO2zCaqRPMm3nVIP4co2Z5Rt3At8%2BvqcTkbtetREmvBHW5EhMJEd5tAFnnwR6GelOYZc53%2F6GPrQcJo; MEIQIA_VISIT_ID=1OGl2skSTjPKvqcpWkd9Il9EqnL; PHPSESSID=c1395bscfl2h36ksdoverjk9o5; yaozh_mylogin=1563690804' # The first method STR -> dict # cook_dict = {} # cookies_list = cookies.split ( ' ; ' ) # for Cookie in cookies_list: # cook_dict [cookie.split ( ' = ' ) [ 0 ]] = Cookie .split ( ' = ' ) [ . 1 ] # the second method, the dictionary derivation cook_dict = {cookie.split ( ' = ' ) [ 0 ]: cookie.split ( ' = ' ) [ . 1 ] for cookie in cookies.split('; ')} response = requests.get(member_url, headers=headers, cookies=cook_dict) data = response.content print(type(data)) with open('03-cookie.html','wb') as f: f.write(data)
(2) using the session (to log in using the session cookies can be automatically saved properties, and then go ask the site), as follows:
Requests Import # data request URL member_url = ' https://www.yaozh.com/member/ ' headers = { ' the User-- Agent ' : ' the Mozilla / 5.0 (the Macintosh; the Intel the Mac the OS X-10_12_6) AppleWebKit / 537.36 (KHTML, the Gecko like) the Chrome / 70.0.3538.67 Safari / 537.36 ' } # class can automatically save the session cookies, equivalent to cookiesJar the session = requests.session () # . 1 . Code Sign LOGIN_URL = ' https://www.yaozh.com/ Login ' login_form_data = { ' username ' :' J ' , ' pwd ' : ' Y ' , ' formhash ' : ' EAACD4636B ' , ' backurl ' : ' HTTPS. 2F%%% 2Fwww.yaozh.com%. 3A. 2F ' , } # send a login request login_response = session.post (LOGIN_URL, data = login_form_data, headers = headers) Print (login_response.content.decode ()) # 2 . after successful login access request with a valid cookies target data data = the session. GET (member_url, headers =headers).content with open('04-cookie2.html','wb') as f: f.write(data)
VI. Canonical matching (analysis data)
(1) python in a regular match by default greedy mode (as many matches under conditions that match), solution: "?" Non-greedy operator, the operator can use the "*", "+" followed by "?", requiring regular match, the better.
'' In addition to the line breaks \ n can be matched (to match a newline is used re.S)
Note: Match out the results of type list, in addition regular match strictly case-sensitive (re.I expressed ignore case)
. 1 Import Re 2 . 3 # ' . ' Except the newline \ n match . 4 One = "" " . 5 m jfkaj n- dgkdjfgkdfg n- . 6 sdjfaksjdfasdjfsdkfsd nN . 7 " "" . 8 pattern = the re.compile (R & lt ' m (. *) n- ' ) . 9 Result = pattern.findall (One) 10 Print (result.group ())
At this time, as a result matching [ ' jfkajndgkdjfgkdfg '] (the result of the greedy), where n does not match any of the next line, because. "" Not newline
If line 8 into the code pattern = re.compile ( 'm (. *?) N'), the result is [ 'jfkaj']
If line of code into 8
pattern = re.compile(r'm(.*)n', re.S)
The matching result is: [ 'jfkajndgkdjfgkdfgn \ n-sdjfaksjdfasdjfsdkfsd' ]
If line of code into 8
= the re.compile pattern (R & lt ' m (*) n-. ' , re.S | re.I) # "|" is or, re.I case to omit
Regular matching result is: [ 'jfkajndgkdjfgkdfgn \ n-sdjfaksjdfasdjfsdkfsdn']
(2) match,search,findall,sub,split
. A match: match from the beginning to match only once
. 1 Import Re 2 . 3 One = 'ABC 123 ' . 4 patter = the re.compile ( ' \ + D ' ) . 5 # match match match a scratch . 6 Result = patter.match (One) type # < class ' _sre.SRE_Match ' > 7 Print (result.group ())
Result of the operation None
b. search from an arbitrary position, a matching
If the code line 6 into
result = patter.search (one) # of type < class ' _sre.SRE_Match ' >
Operation result 123
. C findall find content that matches the regular - list
If a sixth line read as follows (where the matching results for list, so the front of the first 10 lines of code into print (result))
result = patter.findall (one) # of type <class 'list'>, is the type of list
Operation result [ '123']
d sub replacement string
If a sixth line read as follows (line 10 also into print (result))
result = patter.sub ( '#', one) # of type <class 'str'> is the string type, of which no Goup () method
Result of the operation abc #
e split Split
import re one = 'abc 123' #patter = re.compile('\d+') # # split 拆分 patter = re.compile(' ') result = patter.split(one) # <class 'list'> print(result)
Result of the operation
['abc', '123']