Business reptile study notes day5

A Send Request post

import requests

url = ""
# 发送post请求
data = {
}
response = requests.post(url, data=data)

II. Intranet Certification

auth= (user,pwd)
response = requests.get(url, auth=auth)

III. Acting

import requests

url = "http://www.baidu.com"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/557.36"
}
free_proxy = {'http': '118.187.58.35:53281'}
response = requests.get(url, proxies = free_proxy)

data = response.content.decode()
with open("proxy.html", "w") as f:
    f.write(data)

 

IV. Third-party CA certificate

Know the difference between day1 in https to http and https is a third-party CA certificate authentication, but some sites though is https, but it is not a CA certificate (can be their own certificates issued, as previously 12306), so would visit the site can not, reported the following error

Solution: Ignore certificate to tell web access

code show as below

import requests

url = 'https://10.10.10.9/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36'
}

response = requests.get(url=url, headers=headers, verify=False)
data = response.content.decode()

with open('03-ssl.html', 'w') as f:
    f.write(data)

 Five. Cookie

1. cookie workflow:

  When users browse A Cookie use of a site, the site's server generates a unique identification code is A, and thus produce a project in the back-end database server as an index. Then add a header line called Set-cookie in the HTTP response message to the A's. Here, " '" header field name "is the" Set-cookie ", and the latter value is assigned to the user," "identification code" ", for example, this is a header row

Set-cookie:31d4d96e407aad42

  When A receives the response, adding a line to which the browser in a particular file Cookie it manages, wherein the identification code comprises a given server host name and the rear Set-cookie. When A continue to browse this site, each sending a HTTP request messages, their browser will remove the site from its identification code Cookie file, and place the cookie HTTP request packet

Header row:

Cookie:31d4d96e407aad42

  As a result, the site will be able to track users 31d4d96e407aad42 (User A) activity in the site. Note that the server does not need to know the user's real name and other information. But the server can know at what time the user 31d4d96e407aad42 visit which pages, as well as access to the order of the pages. If A is shopping online, then the server can maintain a list of items purchased is A, the A paid shopping together at the end of this

  If A visit this site again in a few days, his browser will continue to use the request packet header row Cookie in its HTTP: 31d4d96e407aad42, and this site servers can access recommended product according to his past record A. If A has been registered and used the credit card to pay, then the site would have saved information Name A's, e-mail, credit card number on the site. Thus, when the A to continue shopping on the site, as long as still use the same computer, because the HTTP browser generated request packet carries the same Cookie header row, the server can use the Cookie to verify that this is the user A, So after A will not have to re-enter information as name, credit card number at the keyboard when shopping this site. This is obviously convenient for the customer.

  Although Cookie users can simplify the process of online shopping, but Cookie's use has caused a lot of controversy. Some people will think that Cookie computer virus to the user's computer. In fact, this is a misunderstanding of Cookie's. Cookie is just a small text file, not an executable program of the computer, it is impossible to spread computer viruses, it is impossible to obtain information about a user's computer hard drive. For another controversy Cookie, it is the protection of user privacy. For example, the website server knows some information A, it is possible to sell this information to third parties. Cookie can also be used to collect user behavior on the Web site. These are all the user's personal privacy. Some sites to enable customers at ease, openly stated that they will protect the privacy of customers, the customer will not or identification code and personal information sold or transferred to other manufacturers.

2. There are two ways of cookie


(1) obtaining artificial cookie, then add it to the request

In the form of a cookie module requests a dictionary or cookieJar object, as shown in FIG. Source:

 

 Generally acquired from the packet cookie is a string, to artificially converted dict be cumbersome (can be performed with sublime graphical interface regular match, then the resulting dictionary copy over), the following code is directly regular matching codes, dict obtained in the form of cookie, the following code

import requests

member_url = 'https://www.yaozh.com/member/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36'
}

# cookies的字符串
cookies = 'acw_tc=2f624a2a15635259218338401e63e5d881869f24ac893d6eef3e3a125160f6; _ga=GA1.2.217274285.1563525921; _gid=GA1.2.2017411935.1563525921; _gat=1; Hm_lpvt_65968db3ac154c3089d7f9a4cbb98c94=1563611476; yaozh_userId=786936; UtzD_f52b_saltkey=vG51G41g; UtzD_f52b_lastvisit=1563607885; yaozh_uidhas=1; acw_tc=2f624a2a15635259218338401e63e5d881869f24ac893d6eef3e3a125160f6; Hm_lvt_65968db3ac154c3089d7f9a4cbb98c94=1563522881%2C1563523792%2C1563525921%2C1563611474; UtzD_f52b_ulastactivity=1563523048%7C0; UtzD_f52b_creditnotice=0D0D2D0D0D0D0D0D0D696235; UtzD_f52b_creditbase=0D0D0D0D0D0D0D0D0; UtzD_f52b_creditrule=%E6%AF%8F%E5%A4%A9%E7%99%BB%E5%BD%95; _ga=GA1.1.2067661254.1563611505; _gid=GA1.1.484577145.1563611505; MEIQIA_VISIT_ID=1OGl2skSTjPKvqcpWkd9Il9EqnL; yaozh_logintime=1563611638; yaozh_user=786936%09jjfeng123; db_w_auth=696235%09jjfeng123; UtzD_f52b_lastact=1563611638%09uc.php%09; UtzD_f52b_auth=9e94yt5nXd1XEO2zCaqRPMm3nVIP4co2Z5Rt3At8%2BvqcTkbtetREmvBHW5EhMJEd5tAFnnwR6GelOYZc53%2F6GPrQcJo; MEIQIA_VISIT_ID=1OGl2skSTjPKvqcpWkd9Il9EqnL; PHPSESSID=c1395bscfl2h36ksdoverjk9o5; yaozh_mylogin=1563690804' 
# The first method STR -> dict 
# cook_dict = {} 
# cookies_list = cookies.split ( ' ; ' ) 
# for Cookie in cookies_list: 
# cook_dict [cookie.split ( ' = ' ) [ 0 ]] = Cookie .split ( ' = ' ) [ . 1 ] 
# the second method, the dictionary derivation 
cook_dict = {cookie.split ( ' = ' ) [ 0 ]: cookie.split ( ' = ' ) [ . 1 ] for cookie in cookies.split('; ')}

response = requests.get(member_url, headers=headers, cookies=cook_dict)
data = response.content
print(type(data))
with open('03-cookie.html','wb') as f:
    f.write(data)

 

(2) using the session (to log in using the session cookies can be automatically saved properties, and then go ask the site), as follows:

Requests Import 

# data request URL 
member_url = ' https://www.yaozh.com/member/ ' 
headers = { 
    ' the User-- Agent ' : ' the Mozilla / 5.0 (the Macintosh; the Intel the Mac the OS X-10_12_6) AppleWebKit / 537.36 (KHTML, the Gecko like) the Chrome / 70.0.3538.67 Safari / 537.36 ' 
} 
# class can automatically save the session cookies, equivalent to cookiesJar 
the session = requests.session () 
# . 1 . Code Sign 
LOGIN_URL = ' https://www.yaozh.com/ Login ' 
login_form_data = {
     ' username ' :' J ' ,
     ' pwd ' : ' Y ' ,
     ' formhash ' : ' EAACD4636B ' ,
     ' backurl ' : ' HTTPS. 2F%%% 2Fwww.yaozh.com%. 3A. 2F ' , 
} 
# send a login request 
login_response = session.post (LOGIN_URL, data = login_form_data, headers = headers) 
Print (login_response.content.decode ()) 
# 2 . after successful login access request with a valid cookies target data 
data = the session. GET (member_url, headers =headers).content

with open('04-cookie2.html','wb') as f:
    f.write(data)

 

VI. Canonical matching (analysis data)

(1) python in a regular match by default greedy mode (as many matches under conditions that match), solution: "?" Non-greedy operator, the operator can use the "*", "+" followed by "?", requiring regular match, the better.

         '' In addition to the line breaks \ n can be matched (to match a newline is used re.S)

 Note: Match out the results of type list, in addition regular match strictly case-sensitive (re.I expressed ignore case)

. 1  Import Re
 2  
. 3 # ' . ' Except the newline \ n match
 . 4 One = "" "
 . 5      m jfkaj n- dgkdjfgkdfg n-
 . 6      sdjfaksjdfasdjfsdkfsd nN
 . 7  " ""
 . 8 pattern = the re.compile (R & lt ' m (. *) n- ' )
 . 9 Result = pattern.findall (One)
 10 Print (result.group ())

At this time, as a result matching [ ' jfkajndgkdjfgkdfg '] (the result of the greedy), where n does not match any of the next line, because. "" Not newline

If line 8 into the code pattern = re.compile ( 'm (. *?) N'), the result is [ 'jfkaj']

 If line of code into 8

pattern = re.compile(r'm(.*)n', re.S)

The matching result is: [ 'jfkajndgkdjfgkdfgn \ n-sdjfaksjdfasdjfsdkfsd' ]

 If line of code into 8

= the re.compile pattern (R & lt ' m (*) n-. ' , re.S | re.I) # "|" is or, re.I case to omit

Regular matching result is: [ 'jfkajndgkdjfgkdfgn \ n-sdjfaksjdfasdjfsdkfsdn']

 (2) match,search,findall,sub,split

. A match: match from the beginning to match only once

. 1  Import Re
 2  
. 3 One = 'ABC 123 ' 
. 4 patter = the re.compile ( ' \ + D ' )
 . 5  # match match match a scratch
 . 6 Result = patter.match (One) type # < class  ' _sre.SRE_Match ' >
 7 Print (result.group ())

Result of the operation None

 b. search from an arbitrary position, a matching

 If the code line 6 into

result = patter.search (one) # of type < class  ' _sre.SRE_Match ' >

Operation result 123

 . C findall find content that matches the regular - list

 If a sixth line read as follows (where the matching results for list, so the front of the first 10 lines of code into print (result))

result = patter.findall (one) # of type <class 'list'>, is the type of list

Operation result [ '123']

 d sub replacement string

 If a sixth line read as follows (line 10 also into print (result))

result = patter.sub ( '#', one) # of type <class 'str'> is the string type, of which no Goup () method

 Result of the operation abc #

 e split Split

import re

one = 'abc 123'
#patter = re.compile('\d+')
# # split  拆分
patter = re.compile(' ')
result = patter.split(one)  # <class 'list'>
print(result)

Result of the operation

['abc', '123']

 

  

 

Guess you like

Origin www.cnblogs.com/jj1106/p/11221218.html