In-depth understanding of the requests library and how to use it

[Software Testing Interview Crash Course] How to force yourself to finish the software testing eight-part essay tutorial in one week. After finishing the interview, you will be stable. You can also be a high-paying software testing engineer (automated testing)

1. Introduction to requests

First we need to understand what the requests library is

#简介:使用requests可以模拟浏览器的请求,比起之前用的urllib,requests模块的api更加便捷(本质就是封装了urllib3)
 
#注意:requests库发送请求将网页内容下载下来以后,并不会执行js代码,这需要我们自己分析目标站点然后发起新的request请求
 
#安装:pip3 install requests
 
#各种请求方式:常用的就是requests.get()和requests.post()
 
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})
>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

2. GET request based on requests

1. Basic requirements
import requests 
response=requests.get('http://dig.chouti.com/') 
print(response.text) # 字符串格式   content 二进制格式
2. GET request with parameters -> params
# 在请求头内将自己伪装成浏览器,否则百度不会正常返回页面内容
url = 'https://www.baidu.com/s?wd=软件测试&pn=1'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'}
 
import requests
response=requests.get(url=url, headers=headers)
print(response.text)
 
# 如果查询关键词是中文或者有其他特殊符号,则不得不进行url编码
from urllib.parse import urlencode
wd='软件测试'
encode_res=urlencode({'k':wd},encoding='utf-8')
keyword=encode_res.split('=')[1]
print(keyword)
# 然后拼接成url
url='https://www.baidu.com/s?wd=%s&pn=1'%keyword
response=requests.get(url,headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'})
res1=response.text
# 上述操作可以用requests模块的一个params参数搞定,本质还是调用urlencode
# 软件测试技术群:603401995
from urllib.parse import urlencode
wd='软件测试'
pn=1
 
response=requests.get('https://www.baidu.com/s',
                      params={
                          'wd':wd,
                          'pn':pn
                      },
                      headers={
                        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36',
                      })
res2=response.text
 
# 验证结果,打开a.html与b.html页面内容一样
with open('a.html','w',encoding='utf-8') as f:
f.write(res1) 
with open('b.html', 'w', encoding='utf-8') as f:
f.write(res2)
3. GET request with parameters->headers
#通常我们在发送请求时都需要带上请求头,请求头是将自身伪装成浏览器的关键,常见的有用的请求头如下
Host
Referer #大型网站通常都会根据该参数判断请求的来源
User-Agent #客户端
Cookie #Cookie信息虽然包含在请求头里,但requests模块有单独的参数来处理他,headers={}内就不要放它了
#添加headers(浏览器会识别请求头,不加可能会被拒绝访问,比如访问https://www.zhihu.com/explore)
import requests
response=requests.get('https://www.zhihu.com/explore')
response.status_code #500
 
#自己定制headers
headers={
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36',
}
respone=requests.get('https://www.zhihu.com/explore',headers=headers)
print(respone.status_code) #200
4. GET request with parameters->cookies
#登录github,然后从浏览器中获取cookies,以后就可以直接拿着cookie登录了,无需输入用户名密码
#用户名:admin 邮箱[email protected] 密码123456
import requests
Cookies={'user_session':'wGMHFJKgDcmRIVvcA14_Wrt_3xaUyJNsBnPbYzEL6L0bHcfc',
}
response=requests.get('https://github.com/settings/emails',cookies=Cookies) 
print('[email protected]' in response.text) #True

3. Based on POST request

1 Introduction


# The default request method for GET request HTTP is GET
(1) There is no request body
(2) The data must be within 1K
(3) The GET request data will be exposed in the browser's address bar.
Common operations for GET requests:
(1) In If the URL is given directly in the address bar of the browser, it must be a GET request
(2) Clicking on a hyperlink on the page must also be a GET request
(3) When submitting a form, the form uses the GET request by default, but it can be set to POST
# POST request
(1) The data will not appear in the address bar
(2) There is no upper limit on the size of the data
(3) There is a request body
(4) If there is Chinese in the request body, URL encoding will be used!
#! ! ! The usage of requests.post() is exactly the same as requests.get(). The special thing is that requests.post() has a data parameter to store the request body data.

2. Send a POST request to simulate the browser’s login behavior

#For logging in, you should enter the wrong username or password and then analyze the packet capture process. Think about it with your brain. If you enter the correct password, the browser will jump. Even if you analyze it for a while, you will be exhausted and can't find the package. 

 2 一 目标站点分析
 3     浏览器输入https://github.com/login
 4     然后输入错误的账号密码,抓包
 5     发现登录行为是post提交到:https://github.com/session
 6     而且请求头包含cookie
 7     而且请求体包含:
 8         commit:Sign in
 9         utf8:✓
10        authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ==
11         login:admin
12         password:123456
13 
14 
15 二 流程分析
16     先GET:https://github.com/login拿到初始cookie与authenticity_token
17     返回POST:https://github.com/session, 带上初始cookie,带上请求体(authenticity_token,用户名,密码等)
18     最后拿到登录cookie
19 
20     ps:如果密码时密文形式,则可以先输错账号,输对密码,然后到浏览器中拿到加密后的密码,github的密码是明文
21 '''
22 
23 import requests
24 import re
25 
26 #第一次请求
27 r1=requests.get('https://github.com/login')
28 r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授权)
29 authenticity_token=re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #从页面中拿到CSRF TOKEN
30 
31 #第二次请求:带着初始cookie和TOKEN发送POST请求给登录页面,带上账号密码
32 data={
33     'commit':'Sign in',
34     'utf8':'✓',
35     'authenticity_token':authenticity_token,
36     'login':'[email protected]',
37     'password':'alex3714'
38 }
39 r2=requests.post('https://github.com/session',
40              data=data,
41              cookies=r1_cookie
42              )
43 
44 
45 login_cookie=r2.cookies.get_dict()
46 
47 
48 #第三次请求:以后的登录,拿着login_cookie就可以,比如访问一些个人配置
49 r3=requests.get('https://github.com/settings/emails',
50                 cookies=login_cookie)
51 
52 print('[email protected]' in r3.text) #True
import requests
import re
 
session=requests.session()
#第一次请求
r1=session.get('https://github.com/login')
authenticity_token=re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #从页面中拿到CSRF TOKEN
#第二次请求
data={
    'commit':'Sign in',
    'utf8':'✓',
    'authenticity_token':authenticity_token,
    'login':'[email protected]',
    'password':'alex3714'
}
r2=session.post('https://github.com/session',
             data=data,
             )
#第三次请求
r3=session.get('https://github.com/settings/emails')
print('[email protected]' in r3.text) #True
requests.session()自动帮我们保存cookie信息
3. Supplement
requests.post(url='xxxxxxxx',data={'xxx':'yyy'}) #没有指定请求头,#默认的请求头:application/x-www-form-urlencoed
#如果我们自定义请求头是application/json,并且用data传值, 则服务端取不到值
requests.post(url='',
              data={'':1,},
              headers={
                  'content-type':'application/json'
              })
requests.post(url='',json={'':1,},) #默认的请求头:application/json

4. Response

1. response attribute
import requests
respone=requests.get('http://www.jianshu.com')
# respone属性
print(respone.text)
print(respone.content)
print(respone.status_code)
print(respone.headers)
print(respone.cookies)
print(respone.cookies.get_dict())
print(respone.cookies.items())
print(respone.url)
print(respone.history)
print(respone.encoding)
#关闭:response.close()
from contextlib import closing
with closing(requests.get('xxx',stream=True)) as response:
    for line in response.iter_content():
    pass
2. Encoding issues
#编码问题
import requests
response=requests.get('http://www.autohome.com/news')
# response.encoding='gbk' #汽车之家网站返回的页面内容为gb2312编码的,而requests的默认编码为ISO-8859-1,如果不设置成gbk则中文乱码
print(response.text)
3. Get binary data
import requests
 
response=requests.get('https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=123456&di=712e4ef3ab258b36e9f4b48e85a81c9d&imgtype=0&src=http%3A%2F%2Fc.hiphotos.baidu.com%2Fimage%2Fpic%2Fitem%2F11385343fbf2b211e1fb58a1c08065380dd78e0c.jpg')
with open('a.jpg','wb') as f:
    f.write(response.content)
#stream参数:一点一点的取,比如下载视频时,如果视频100G,用response.content然后一下子写到文件中是不合理的
import requests
 
response=requests.get('https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4',
                      stream=True)
 
with open('b.mp4','wb') as f:
    for line in response.iter_content():
        f.write(line)
4. Parse json
#解析json
import requests
response=requests.get('http://httpbin.org/get')
 
import json
res1=json.loads(response.text) #太麻烦
res2=response.json() #直接获取json数据
print(res1 == res2) #True
5、Redirection and History
import requests
import re
 
#第一次请求
r1=requests.get('https://github.com/login')
r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授权)
authenticity_token=re.findall(r'name="authenticity_token".*?value="(.*?)"',r1.text)[0] #从页面中拿到CSRF TOKEN
 
#第二次请求:带着初始cookie和TOKEN发送POST请求给登录页面,带上账号密码
data={
    'commit':'Sign in',
    'utf8':'✓',
    'authenticity_token':authenticity_token,
    'login':'[email protected]',
    'password':'alex3714'
}
 
#测试一:没有指定allow_redirects=False,则响应头中出现Location就跳转到新页面,r2代表新页面的response
r2=requests.post('https://github.com/session',
             data=data,
             cookies=r1_cookie
             )
print(r2.status_code) #200
print(r2.url) #看到的是跳转后的页面
print(r2.history) #看到的是跳转前的response
print(r2.history[0].text) #看到的是跳转前的response.text
 
#测试二:指定allow_redirects=False,则响应头中即便出现Location也不会跳转到新页面,r2代表的仍然是老页面的response
r2=requests.post('https://github.com/session',
             data=data,
             cookies=r1_cookie,
             allow_redirects=False
             )
 
print(r2.status_code) #302
print(r2.url) #看到的是跳转前的页面https://github.com/session
print(r2.history) #[]

5. Advanced usage

1、SSL Cert Verification
#证书验证(大部分网站都是https)
import requests
respone=requests.get('https://www.12306.cn') #如果是ssl请求,首先检查证书是否合法,不合法则报错,程序终端
 
#改进1:去掉报错,但是会报警告
import requests
respone=requests.get('https://www.12306.cn',verify=False) #不验证证书,报警告,返回200
print(respone.status_code)
 
#改进2:去掉报错,并且去掉警报信息
import requests
from requests.packages import urllib3
urllib3.disable_warnings() #关闭警告
respone=requests.get('https://www.12306.cn',verify=False)
print(respone.status_code)
 
#改进3:加上证书
#很多网站都是https,但是不用证书也可以访问,大多数情况都是可以携带也可以不携带证书
#知乎\百度等都是可带可不带
#有硬性要求的,则必须带,比如对于定向的用户,拿到证书后才有权限访问某个特定网站
import requests
respone=requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))
print(respone.status_code)
2. Use a proxy
#官网链接: http://docs.python-requests.org/en/master/user/advanced/#proxies
 
#代理设置:先发送请求给代理,然后由代理帮忙发送(封ip是常见的事情)
import requests
proxies={
    'http':'http://egon:123@localhost:9743',#带用户名密码的代理,@符号前是用户名与密码
    'http':'http://localhost:9743',
    'https':'https://localhost:9743',
}
respone=requests.get('https://www.12306.cn',proxies=proxies)
 
print(respone.status_code)
 
#支持socks代理,安装:pip install requests[socks]
import requests
proxies = {
    'http': 'socks5://user:pass@host:port',
    'https': 'socks5://user:pass@host:port'
}
respone=requests.get('https://www.12306.cn',proxies=proxies)
print(respone.status_code)
3. Timeout setting
#超时设置
#两种超时:float or tuple
#timeout=0.1 #代表接收数据的超时时间
#timeout=(0.1,0.2)#0.1代表链接超时  0.2代表接收数据的超时时间
 
import requests
respone=requests.get('https://www.baidu.com',timeout=0.0001)
4. Authentication settings
#官网链接:http://docs.python-requests.org/en/master/user/authentication/
 
# 认证设置:登陆网站是,弹出一个框,要求你输入用户名密码(与alter很类似),此时是无法获取html的
# 但本质原理是拼接成请求头发送r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
# 一般的网站都不用默认的加密方式,都是自己写
# 那么我们就需要按照网站的加密方式,自己写一个类似于_basic_auth_str的方法
# 得到加密字符串后添加到请求头
# r.headers['Authorization'] =func('.....')
 
#看一看默认的加密方式吧,通常网站都不会用默认的加密设置
import requests
from requests.auth import HTTPBasicAuth
r=requests.get('xxx',auth=HTTPBasicAuth('user','password'))
print(r.status_code)
 
#HTTPBasicAuth可以简写为如下格式
import requests
r=requests.get('xxx',auth=('user','password'))
print(r.status_code)
5. Exception handling
#异常处理
import requests
from requests.exceptions import * #可以查看requests.exceptions获取异常类型
try:
    r=requests.get('http://www.baidu.com',timeout=0.00001)
except ReadTimeout:
    print('===:')
# except ConnectionError: #网络不通
#     print('-----')
# except Timeout:
#     print('aaaaa')
except RequestException:
    print('Error')
6. Upload files
import requests
files={'file':open('a.jpg','rb')}
respone=requests.post('http://httpbin.org/post',files=files)
print(respone.status_code)

The following are supporting learning materials. For those who are doing [software testing], it should be the most comprehensive and complete preparation warehouse. This warehouse has also accompanied me through the most difficult journey. I hope it can also help you!

Software testing interview applet

A software test question bank that has been used by millions of people! ! ! Who is who knows! ! ! The most comprehensive interview test mini program on the Internet, you can use your mobile phone to answer questions, take the subway, bus, and roll it up!

Covers the following interview question sections:

1. Basic theory of software testing, 2. web, app, interface function testing, 3. network, 4. database, 5. linux

6. Web, app, interface automation, 7. Performance testing, 8. Programming basics, 9. HR interview questions, 10. Open test questions, 11. Security testing, 12. Computer basics

How to obtain documents:

This document should be the most comprehensive and complete preparation warehouse for friends who want to engage in [software testing]. This warehouse has also accompanied me through the most difficult journey. I hope it can also help you!

Guess you like

Origin blog.csdn.net/qq_73332379/article/details/133205057