Python simulation login, POST/GET request

Usually when we visit a web page, we will enter data through an input box, the web page will send a POST, GET or other forms to initiate a request to the server, and after success, the data will be returned to the front desk for display. The following is a brief introduction to the requests library of python.
The prerequisite is to install python and requests library first.
Install requests:

pip install requests

Request test url = http://www.test.com

One, GET request

1. No request parameters: access to a URL link directly to get data

result = requests.get(url=url)
print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

2. There are request parameters: key-value pairs form parameters

result = requests.get(url=url, params={
    
    'keyword1':'val1','keyword2':'val2'})
#或者可以直接先拼接url
#new_url = url + '?keyword1=' + val1 + '&keyword1=' +val2
#result = requests.get(url=new_url)

print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

3. There are request header parameters: key-value pairs form parameters

header = {
    
    
'Host': 'test.com',
'Content-Type': 'application/json; charset=UTF-8',
}
result = requests.get(url=url, header=header)
print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

Two, POST request

1. The requested result set is application/x-www-form-urlencoded

result = requests.post(url=url,data={
    
    'keyword1':'val1','keyword2':'val2'},headers={
    
    'Content-Type':'application/x-www-form-urlencoded'})
print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

2. The requested result set is multipart/form-data

result = requests.post(url=url,data={
    
    'keyword1':'val1','keyword2':'val2'},headers={
    
    'Content-Type':'multipart/form-data'})
print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

3. The requested result set is application/json

import json
data = {
    
    'keyword1':'val1','keyword2':'val2'}
json_data = json.dumps(data)
result = requests.post(url=url,data=json_data,headers={
    
    'Content-Type':'application/json'})
print(result.status_code) # 请求状态
print(result.url)# 请求url
print(result.text) # 请求结果

As shown in the figure below:
Insert picture description here
Let's talk about the pit I encountered before, the request method is POST, and it is a request in the form of Request Payload.
At first I thought it was the same as form-data, only one url and data was passed, and the data was not formatted into JSON, resulting in a status of 415: The server could not process the media format attached to the request. After consulting, the format and request header were changed. The data is returned smoothly.
The complete demo is as follows:

import json
import requests
import datetime
import re, urllib.request, lxml.html, http.cookiejar

url = 'http://test.com/products'
# payloadData数据
payload_data = {
    
    'keyword1': "val1", 'keyword2': "val2"}
# 请求头设置
payload_header = {
    
    
'Host': 'test.com',
'Content-Type': 'application/json; charset=UTF-8',
}
# 下载超时
timeout = 30
# 代理IP
# proxy_list = {"HTTP":'http://210.22.5.117"3128',"HTTP":'http://163.172.189.32:8811',"HTTP":'http://180.153.144.138:8800'}
json_data = json.dumps(payload_data)
# allow_redirects 是否重定向
# result = requests.post(url=url, data=json_data, headers=payloadHeader, timeout=timeout, proxies=proxy_list, allow_redirects=True)

result = requests.post(url, data=json_data, headers=payload_header, timeout=timeout, allow_redirects=True)
# 下面这种直接填充json参数的方式也OK
# result = requests.post(url, json=json_data, headers=payload_header)
print("请求耗时:{0}, 状态码:{1}, 结果:{2}".format(datetime.datetime.now(),res.status_code,res.text))

Three, need to simulate login before sending Post request

Sometimes you want to simulate some subtle operations on the page, for example, after logging in, you need to use an ajax request to modify the data on the front end. If it is only a very small number of changes, then the front-end direct operation is faster. If it is a large-scale modification, you still have to use the program to traverse the modification.
Login page:
Insert picture description here
First open F12 to enter the developer mode, then just enter the data in the form above, click login, although it is wrong login data, we are just to view the data format submitted by the login request, as shown below:
Insert picture description here

Some of them are not the hidden values ​​we entered. We need to get them from the form in the page source code, right-click to view the page source code, and search for the values ​​of "__VIEWSTATE", "__VIEWSTATEGENERATOR", "__EVENTVALIDATION" that were not entered by ourselves in the original page ,E.g:
Insert picture description here

In other words, we have to visit the source code of the page in advance and parse to obtain the above attribute values:

import requests, string
import re, urllib.request, lxml.html, http.cookiejar

login_url = "http://test.com/Login.aspx"
response = urllib.request.urlopen(login_url)
f = response.read()
doc = lxml.html.document_fromstring(f)

VIEWSTATE = doc.xpath("//input[@id='__VIEWSTATE']/@value")
VIEWSTATEGENERATOR = doc.xpath("//input[@id='__VIEWSTATEGENERATOR']/@value")
EVENTVALIDATION = doc.xpath("//input[@id='__EVENTVALIDATION']/@value")

After getting these, you have to put these values ​​back into Form-Data (in the form data):

from urllib.parse import quote
login_data = urllib.parse.urlencode({
    
    
       '__EVENTTARGET' : '',
       '__EVENTARGUMENT' : '',
       '__VIEWSTATE' : VIEWSTATE[0],
       '__VIEWSTATEGENERATOR' : VIEWSTATEGENERATOR[0],
       '__EVENTVALIDATION' : EVENTVALIDATION[0],
       'TextCustomerID' : "真实商户号",
       'TextAdminName' : '真实用户名',
       'TextPassword' : '真实密码',
       'btnLogin.x' : 40,
       'btnLogin.y' : 10
    }).encode('utf-8')

The encoding of login parameters is very important. If utf-8 encoding is not performed, the following error will be reported:

Traceback (most recent call last):
  File "c:\users\user\appdata\local\programs\python\python38\lib\http\client.py", line 965, in send
    self.sock.sendall(data)
  File "c:\users\user\appdata\local\programs\python\python38\lib\ssl.py", line 1201, in sendall
    with memoryview(data) as view, view.cast("B") as byte_view:
TypeError: memoryview: a bytes-like object is required, not 'str'

With the form data, the next step is to get the request header:
Insert picture description here

header = {
    
    
   'Host': 'www.test.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
    'Accept-Encoding': 'gzip, deflate',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Origin': 'http://www.test.com',
    'Connection': 'keep-alive',
    'Referer': 'http://www.test.com/Login.aspx',
    'Upgrade-Insecure-Requests': 1
}

Simulate login and save cookies:

#模拟登录请求
login_request = urllib.request.Request(login_url, login_data, Headers)
#创建cookie,利用cookie实现持久化登录
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
login_result = opener.open(login_request)

After the final simulation login, if you want to collect data on a certain page, you can also access the page link through urllib.request.urlopen to read the page source code for data collection. If there is a batch of data that needs to be Post/Get processed, then you can get the data to be processed, and then traverse and initiate a Post or Get request:

import time, random

var datas = {
    
    .....}
for data in datas:
    response = requests.get(url, headers = headers, data=json_data, cookies = cj)
    # 或
    response = requests.post(url, headers = headers, data=json_data, cookies = cj)
    time.sleep(random.randint(3, 5))

Guess you like

Origin blog.csdn.net/Lin_Hv/article/details/105137605