requests library third-party library
1. Send GET/POST request
import requests
# 添加headers 和 查询参数 信息
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36'
}
url = 'https://www.baidu.com/'
kw = {'wd': '中国'}
response = requests.get(url, headers=headers, params=kw)
print(response)
#查询响应内容
# print(response.text) #返回Unicode格式数据
#
# print(response.content) #返回字节流数据
# print(response.content.decode('utf-8')) #产生乱码时
print(response.url)
print(response.encoding) #响应字符编码
Note the difference between content and text,
response.text returns data in Unicode format
response.content returns byte stream data, when there is garbled code, you need to use decode() to decode
2. Proxies Agent
The use of proxy IP has been introduced in the introduction of the urllib library before. The principle of proxy in requests is the same. The difference in use is that the requests library is more concise and convenient. Put the proxy directly in the proxies attribute of the request method. Yes, as follows:
import requests
url = 'http://httpbin.org/ip'
proxy = {
'http': '123.160.68.74:9999'
}
resp = requests.get(url, proxies=proxy)
print(resp.text)
3. Cookie
If a response contains cookies, you can use the cookies attribute to get the return cookie value
3.1 Using cookies to achieve simulated login
import requests
# resp = requests.get('https://www.baidu.com/')
# print(resp.cookies)
# print(resp.cookies.get_dict())
url = 'https://www.zhihu.com/hot'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
'cookie': '_zap=9cb16e80-2e5a-442a-ad83-8c4e56151274; d_c0="AHBUkeFrFhGPThKJGfZWuvXPYdeDQlxWqI4=|1586342451"; _xsrf=KGrwON9rdqf1Va6QrWyiLwNOTRoK5SPY; _ga=GA1.2.925220024.1595327992; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1595327992,1595330061,1595376694; capsion_ticket="2|1:0|10:1599905310|14:capsion_ticket|44:NzhiNDdjZDFjNjBiNDAxOThhNWI3ODQ0MDJhMGQxZGU=|c18f9b858f5a3b1953d240092ab6d1be2fcdd60cb4ca8bdcb531a2161f93fb1b"; z_c0="2|1:0|10:1599905438|4:z_c0|92:Mi4xbkV5a0JRQUFBQUFBY0ZTUjRXc1dFU2NBQUFDRUFsVk5uaXVFWHdEQXZmUFJ5Y0x4WC1ySS1wQ0dYQnl5ZHh3RVhB|29705d2526c129e3642b869de321e6f086c38b17aa2c1285a131192d2de3477b"; tst=h; tshl=; q_c1=1de7075e7f0448aeb62af8961806c2f1|1599916711000|1588725781000; KLBRSID=2177cbf908056c6654e972f5ddc96dc2|1599917151|1599915145'
}
resp = requests.get(url, headers=headers)
print(resp.text)
3.2 Session, realize shared cookie
import requests
post_url = 'https://i.meishi.cc/login.php?redirect=https%3A%2F%2Fwww.meishij.net%2F'
post_data = {
'username':'[email protected]',
'password':'wq15290884759.'
}
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
# 登录
session = requests.session()
session.post(post_url,headers=headers,data=post_data)
#访问个人网页
url = 'https://i.meishi.cc/cook.php?id=13686422'
resp = session.get(url)
print(resp.text)
4. Dealing with untrusted SSL certificates
4.1 SSL certificate
An SSL certificate is a type of digital certificate , similar to an electronic copy of a driver's license, passport, and business license. Because it is configured on the server, it is also called an SSL server certificate.
SSL certificate is to comply with the SSL protocol, issued by a trusted digital certificate authority CA, after verifying the identity of the server, with server identity verification and data transmission encryption functions.
https://baike.baidu.com/item/SSL%E8%AF%81%E4%B9%A6/5201468?fr=aladdin
If the SSL request is not trusted, an error will occur: such as the following request error
import requests
url = 'https://inv-veri.chinatax.gov.cn/'
resp = requests.get(url)
print(resp.text)
Therefore, after adding the verify attribute value for this situation, it can be accessed normally.
import requests
url = 'https://inv-veri.chinatax.gov.cn/'
resp = requests.get(url, verify=False)
print(resp.text)