1. requests module
1.1 requests Introduction
requests is a powerful, easy-to-use library of HTTP requests than previously used urllib api module, the module requests more convenient. (Essentially encapsulates urllib3)
You can use pip install requests command to install, but it is susceptible to network problems, so I am looking at a mirror image of domestic sources to accelerate.
Then found the mirror source watercress:
pip install 包名 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
As long as the package name changed a bit, you can quickly download the module.
1.2 requests Request
There are many request methods, but we talk about the two most common: GET requests and POST requests.
1.2.1 GET request
A method for transmitting a GET request to the target URL, the method returns a response object Response, Response explained in detail in the next section.
GET method parameters:
url: Required to specify the requested URL
params: a dictionary specified request parameter commonly used to send a GET request
example:
import requests url = 'http://www.httpbin.org/get' params = { 'key1':'value1', 'key2':'value2' } response = requests.get(url=url,params=params) print(response.text)
result:
headers: a dictionary, designation request header
example:
import requests url = 'http://www.httpbin.org/headers' headers = { 'USER-AGENT':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } response = requests.get(url=url,headers=headers) print(response.text)
result:
proxies: dictionary type, specify the use of proxy
example:
import requests url = 'http://www.httpbin.org/ip' proxies = { 'http':'113.116.127.164:8123', 'http':'113.116.127.164:80' } response = requests.get(url=url,proxies=proxies) print(response.text)
result:
cookies: a dictionary, designated Cookie
example:
import requests url = 'http://www.httpbin.org/cookies' cookies = { 'name1':'value1', 'name2':'value2' } response = requests.get(url=url,cookies=cookies) print(response.text)
result:
auth: tuple type, when specified login ID and password
example:
import requests url = 'http://www.httpbin.org/basic-auth/user/password' auth = ('user','password') response = requests.get(url=url,auth=auth) print(response.text)
结果:
verify:布尔类型,指定请求网站时是否需要进行证书验证,默认为 True,表示需要证书验证,假如不希望进行证书验证,则需要设置为False
import requests response = requests.get(url='https://www.httpbin.org/',verify=False)
结果:
但是在这种情况下,一般会出现 Warning 提示,因为 Python 希望我们能够使用证书验证。
如果不希望看到 Warning 信息,可以使用以下命令消除:
import urllib3 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
timeout:指定超时时间,若超过指定时间没有获得响应,则抛出异常
1.2.2 POST请求
POST请求和GET请求的区别就是POST数据不会出现在地址栏,并且数据的大小没有上限。
所以GET的参数,POST差不多都可以使用, 除了params参数,POST使用data参数即可。
data:字典类型,指定表单信息,常用于发送 POST 请求时使用
例子:
import requests url = 'http://www.httpbin.org/post' data = { 'key1':'value1', 'key2':'value2' } response = requests.post(url=url,data=data) print(response.text)
结果:
1.3 requests响应
1.3.1 response属性
使用GET或POST请求后,就会接收到response响应对象,其常用的属性和方法列举如下:
response.url:返回请求网站的 URL
response.status_code:返回响应的状态码
response.encoding:返回响应的编码方式
response.cookies:返回响应的 Cookie 信息
response.headers:返回响应头
response.content:返回 bytes 类型的响应体
response.text:返回 str 类型的响应体,相当于response.content.decode('utf-8')
response.json():返回 dict 类型的响应体,相当于json.loads(response.text)
import requests response = requests.get('http://www.httpbin.org/get') print(type(response)) # <class 'requests.models.Response'> print(response.url) # 返回请求网站的 URL # http://www.httpbin.org/get print(response.status_code) # 返回响应的状态码 # 200 print(response.encoding) # 返回响应的编码方式 # None print(response.cookies) # 返回响应的 Cookie 信息 # <RequestsCookieJar[]> print(response.headers) # 返回响应头 # {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Mon, 16 Dec 2019 03:16:22 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '189', 'Connection': 'keep-alive'} print(type(response.content))# 返回 bytes 类型的响应体 # <class 'bytes'> print(type(response.text)) # 返回 str 类型的响应体 # <class 'str'> print(type(response.json())) # 返回 dict 类型的响应体 # <class 'dict'>
1.3.2 编码问题
#编码问题 import requests response=requests.get('http://www.autohome.com/news/') # response.encoding='gbk' #汽车之家网站返回的页面内容为gb2312编码的,而requests的默认编码为ISO-8859-1,如果不设置成gbk则中文乱码 print(response.text)