Python crawler learns module request parameters in requests two

1. Request parameters of the requests module

1 This is the final explanation


method: 请求方法  get  post
url: 请求的链接地址

反扒的三剑客
headers: (可选的) 构建请求头字段的关键字参数, 构建字典
cookies: (可选的) 通过这个关键字传递cookies字段信息, 构建字典
proxies: (可选的) ip代理的关键字参数, 构建字典

请求参数的三剑客
params: (可选的) 构建查询参数的关键字
data: (可选的) 构建请求参数的关键字
json: (可选的) 以 json 数据提交的请求参数关键字, 构建字典
    
timeout: (可选的) 设置响应时间, 单位/, 如果超过这个时间程序会报错 最大180秒
allow_redirects: (可选的) 是否允许重定向, 默认如果发生了重定向, 会自动重定向, 设置布尔值
verify: (可选的) 是否验证网站证书  ca证书  ssl证书, 默认为Ture, 默认会校验证书 返回警告

files: (可选的) 文件操作
stream: (可选的) 数据流数据, 是都是数据流数据, 实时更新的数据(直播,弹幕,股票)

2.Keywords of json request

1. In some cases, the request sent by post will use json parameters to construct keywords, such as address: url='http://www.zfcg.sh.gov.cn/'



import requests
url = 'http://www.zfcg.sh.gov.cn/front/search/category'
json_data = {
    
    
    "utm": "sites_group_front.2ef5001f.0.0.07ec2550d86011edb93db70f086e4f9a",
    "categoryCode": "ZcyAnnouncement3012",
    "pageSize": '15',
    "pageNo": '1'
}
# json 主要是以json字符串提交的请求参数
response = requests.post(url=url, json=json_data)
print(response.json())

Please add a picture description

2. Remember that this keyword is searched in the Request Payload, if not, there is no need to construct json query parameters.

3. Construction of cookies keywords

1 In python, the cookies keyword can be constructed in three ways,

①. Build together in the request headers

②.Construct the keywords of cookies separately into the form of cookies dictionary


# 单独构建cookies字典
# cookies = {'Cookie': 'REALTIME_TRANS_SWITCH=1; SOUND_SPD_SWITCH=1; HISTORY_SWITCH=1; FANYI_WORD_SWITCH=1; SOUND_PREFER_SWITCH=1; PSTM=1657895499; BIDUPSID=D26C29435949C22624426B7C5A1F52F3; ab_sr=1.0.1_MDJhNTY0OGI3NzhkNjMxNGE5ZWY3MzNiNGI3OGJiMjRmYjJlNGQ2NThkYjYyNzc5OTllMWEwZWFiMDM5MjBlODYwOWI4Y2M0Zjc5NWNkMGFjNmI5OGM2NDkwOTBmNjAxYzVjZTdiMTc3ZjkxMWQ4ZTM0OWNkYTA0MjA1ZDI4MjE5ZmIyMGJlYjM2MjY2NTBjM2EzNGI5NmIxMDEzYjJmOTFjM2FhNDliYWQ5Y2M5YjdlYWU0MWJhZTU2YzRiYmM3'}

③.Construct each fragment of the cookies keyword separately into the form of key-value pairs


cookies = {
    
    
'BAIDUID': '963EC08DDD8CA5647A50D2ED99D0CCF2:SL=0:NR=10:FG=1',
'BAIDUID_BFESS': '963EC08DDD8CA5647A50D2ED99D0CCF2:SL=0:NR=10:FG=1',
'ZFY': 'fVb9op8tO3yhpq3TJlvkhdkE8iS3bLYoA53APCw5awg:C',
'1.0.1_MDJhNTY0OGI3NzhkNjMxNGE5ZWY3MzNiNGI3OGJiMjRmYjJlNGQ2NThkYjYyNzc5OTllMWEwZWFiMDM5MjBlODYwOWI4Y2M0Zjc5NWNkMGFjNmI5OGM2NDkwOTBmNjAxYzVjZTdiMTc3ZjkxMWQ4ZTM0OWNkYTA0MjA1ZDI4MjE5ZmIyMGJlYjM2MjY2NTBjM2EzNGI5NmIxMDEzYjJmOTFjM2FhNDliYWQ5Y2M5YjdlYWU0MWJhZTU2YzRiYmM3',
......
}

2. When using the first type of cookies and no data is returned, use the second type, and if the second type does not work, use the third type.

Four. The use of verify


import requests
requests.packages.urllib3.disable_warnings()  # 忽略关闭证书以后引发的警告

url = 'https://data.stats.gov.cn/'
# verify=False 发送请求的时候不校验证书
response = requests.post(url=url, verify=False)
print(response.text)

"""
requests.exceptions.SSLError:  网站没有证书引发的报错, 因为requests模块会默认校验证书
"""

1. Some websites will verify the certificate of the website when they visit, and they will not visit without the website certificate, so at this time we will use verify=False to prohibit access to the certificate, so as to access the website.

Five. The use of timeout


import requests
url = 'https://github.com/'
# timeout=1  设置请求时间,单位秒, 超过时间就会报错, 可以通过异常捕获取处理
response = requests.post(url=url, timeout=0.1)
print(response.text)


1. When we visit some websites, the access will be very slow. At this time, we can use timeout to filter these websites.

Six. The use of allow_redirects


import requests

url = 'http://github.com/'
# allow_redirects=False 阻止重定向
response = requests.post(url=url, allow_redirects=False)
print(response.status_code)
print(response.url)


1. This is a parameter to allow redirection.

7. There are two other things that are basically useless. If you need them, you can find the information yourself.

Guess you like

Origin blog.csdn.net/m0_74459049/article/details/130764201