Library requests and common interface test assertion library

We are in the process of testing the interface in the interface format used for the return of html xml json and the like; common scenario we need to process the returned data, such as verification of the returned data assertions operation.

A plug-in: json file parsing library jsonpath

The official document: https://goessner.net/articles/JsonPath/

scenes to be used:

json curl command requests return data may | JQ formatted data into json

Combined with grep to find keywords to find the element values ​​| '.' Jq | grep name

Can be positioned by the element tree json | jq -r '.data.stocks [0] .name'

For complex assertions:

Elements in the list is the value sybol dictionary dictionary name == F006947 when the value is not equal to "Warburg in short-term debt Bonds A"

jsonpath.jsonpath the Assert (r.json (), 
"$ .data.stocks [? (@. Symbol == 'F006947')]. name") [0] == "Warburg in short-term debt Bonds A"
assert_that ( jsonpath.jsonpath (r.json (), "$. data.stocks [? (@. Symbol == 'F006947')]. name") [0],
equal_to ( "Warburg short-term debt bonds B"), "comparison Listing code name")
{ "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 }, { "category": "fiction", "author": "Herman Melville", "title": "Moby Dick", "isbn": "0-553-21311-3", "price": 8.99 }, { "category": "fiction", "author": "J. R. R. Tolkien", "title": "The Lord of the Rings", "isbn": "0-395-19395-8", "price": 22.99 } ], "bicycle": { "color": "red", "price": 19.95 } } }

 

XPath JSONPath Result
/store/book/author $.store.book[*].author the authors of all books in the store
//author $..author all authors
/store/* $.store.* all things in store, which are some books and a red bicycle.
/store//price $.store..price the price of everything in the store.
//book[3] $..book[2] the third book
//book[last()] $..book[(@.length-1)]
$..book[-1:]
the last book in order.
//book[position()<3] $..book[0,1]
$..book[:2]
the first two books
//book[isbn] $..book[?(@.isbn)] filter all books with isbn number
//book[price<10] $..book[?(@.price<10)] filter all books cheapier than 10
//* $..* all Elements in XML document. All members of JSON structure.

 

Two plug-ins: HTML text parsing library BeautifulSoup, css selector + xpath  

References: https://foofish.net/crawler-beautifulsoup.html

BeatifulSoup operation is for a HTML document library Python, BeatifulSoup initialization, and the need to specify the HTML document specific string parser. BeatifulSoup There are three types of common data types, are Tag, NavigableString, and BeautifulSoup. There are two ways to find HTML elements, are traversing the document tree and search the document tree, usually requires a combination of both fast access to data. By BeautifulSoup locate any object can be a node in HTML tag.

 

from bs4 import BeautifulSoup  
text = """ <html>  <head>  <title >hello, world</title>  </head>  <body>  <h1>BeautifulSoup</h1>  <p class="bold">如何使用BeautifulSoup</p>  <p class="big" id="key1"> 第二个p标签</p>  <a href="http://foofish.net">python</a>  </body> </html> """ soup = BeautifulSoup(text, "html.parser") # title 标签 >>> soup.title <title>hello, world</ Title > # p label >>> Soup . P < P class = "Bold" > \ u5982 \ u4f55 \ u4f7f \ u7528BeautifulSoup </ P > Content # p tag >>> Soup . P . String U ' \ u5982 \ u4f55 \ u4f7f \ u7528 BeautifulSoup '

 

Plug three: xml parsing library Xpath

 

Plug-four: text parsing library regex

 

Plug-five: Assertion Library hamcrest

The official document: https://github.com/hamcrest/PyHamcrest

from hamcrest import *
import unittest

class BiscuitTest(unittest.TestCase): def testEquals(self): theBiscuit = Biscuit('Ginger') myBiscuit = Biscuit('Ginger') assert_that(theBiscuit, equal_to(myBiscuit)) if __name__ == '__main__': unittest.main()

def test_hamcrest(self):
assert_that(0.1 * 0.1, close_to(0.01, 0.000000000000001))
#assert_that(0.1 * 0.1, close_to(0.01, 0.000000000000000001))
assert_that(
["a", "b", "c"],
all_of(
has_items("c", "d"),
has_items("c", "a")
)
)
 

Plug-six: structure assertions  jsonschema

The official document:  https://github.com/Julian/jsonschema 

Practical Scenario: We want to save when making assertions json data structure down and not be judged based on certain value assertions, but to judge according to the data structure of the entire return json

 
 
= the json.load schema (Open ( "list_schema.json")) # "list_schema.json" line is generated using the generated schema
Whether validate (instance = r.json (), schema = schema) #validata comparative data structure json incoming data schema describes consistent
实例:
>> from jsonschema import validate >>> # A sample schema, like what we'd get from json.load() >>> schema = { ... "type" : "object", ... "properties" : { ... "price" : {"type" : "number"}, ... "name" : {"type" : "string"}, ... }, ... } >>> # If no exception is raised by validate(), the instance is valid. >>> validate(instance={"name" : "Eggs", "price" : 34.99}, schema=schema) >>> validate( ... instance={"name" : "Eggs", "price" : "Invalid"}, schema=schema, ... ) # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... ValidationError: 'Invalid' is not of type 'number'

It can also be used from console:

$ jsonschema -i sample.json sample.schema
 

 插件七:模版断言 pystache

官方文档: https://github.com/defunkt/pystache

实用场景:做接口测试的数据准备时,接口的请求参数往往非常庞大,但是我们通常只想把特定的参数动态更新,其余的参数不改动

实例:

>>> import pystache
>>> print pystache.render('Hi {{person}}!', {'person': 'Mom'})
Hi Mom!
@classmethod
def parse(self, template_path, dict):
template = "".join(open(template_path).readlines())
#支持把temp中的{{值}} 替换成dict中给的值 temp中存放请求的json
return pystache.render(template, dict)

 

requests部分:

原文: https://foofish.net/http-requests.html

requests 的安装可以直接使用 pip 方法:pip install requests

>>> import requests
# GET 请求
>>> response = requests.get("https://foofish.net")

返回的时 Response 对象,Response 对象是 对 HTTP 协议中服务端返回给浏览器的响应数据的封装,响应的中的主要元素包括:状态码、原因短语、响应首部、响应体等等,这些属性都封装在Response 对象中。

# 状态码
>>> response.status_code
200

# 原因短语
>>> response.reason
'OK'

# 响应首部
>>> for name,value in response.headers.items():
...     print("%s:%s" % (name, value))
...
Content-Encoding:gzip
Server:nginx/1.10.2
Date:Thu, 06 Apr 2017 16:28:01 GMT

# 响应内容
>>> response.content

'<html><body>此处省略一万字...</body></html>

requests 除了支持 GET 请求外,还支持 HTTP 规范中的其它所有方法,包括 POST、PUT、DELTET、HEADT、OPTIONS方法。

>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})
>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

构建请求查询参数

很多URL都带有很长一串参数,我们称这些参数为URL的查询参数,用"?"附加在URL链接后面,多个参数之间用"&"隔开,比如:http://fav.foofish.net/?p=4&s=20 ,现在你可以用字典来构建查询参数:

>>> args = {"p": 4, "s": 20}
>>> response = requests.get("http://fav.foofish.net", params = args)
>>> response.url
'http://fav.foofish.net/?p=4&s=2'

构建请求首部 Headers

requests 可以很简单地指定请求首部字段 Headers,比如有时要指定 User-Agent 伪装成浏览器发送请求,以此来蒙骗服务器。直接传递一个字典对象给参数 headers 即可。

>>> r = requests.get(url, headers={'user-agent': 'Mozilla/5.0'})

构建 POST 请求数据

requests 可以非常灵活地构建 POST 请求需要的数据,如果服务器要求发送的数据是表单数据,则可以指定关键字参数 data,如果要求传递 json 格式字符串参数,则可以使用json关键字参数,参数的值都可以字典的形式传过去。

作为表单数据传输给服务器

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)

作为 json 格式的字符串格式传输给服务器

>>> import json
>>> url = 'http://httpbin.org/post'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, json=payload)

Response中的响应体

HTTP返回的响应消息中很重要的一部分内容是响应体,响应体在 requests 中处理非常灵活,与响应体相关的属性有:content、text、json()。

content 是 byte 类型,适合直接将内容保存到文件系统或者传输到网络中

>>> r = requests.get("https://pic1.zhimg.com/v2-2e92ebadb4a967829dcd7d05908ccab0_b.jpg")
>>> type(r.content)
<class 'bytes'>
# 另存为 test.jpg
>>> with open("test.jpg", "wb") as f:
...     f.write(r.content)

text 是 str 类型,比如一个普通的 HTML 页面,需要对文本进一步分析时,使用 text。

>>> r = requests.get("https://foofish.net/understand-http.html")
>>> type(r.text)
<class 'str'>
>>> re.compile('xxx').findall(r.text)

如果使用第三方开放平台或者API接口爬取数据时,返回的内容是json格式的数据时,那么可以直接使用json()方法返回一个经过json.loads()处理后的对象。

>>> r = requests.get('https://www.v2ex.com/api/topics/hot.json')
>>> r.json()
[{'id': 352833, 'title': '在长沙,父母同住...

代理设置

当爬虫频繁地对服务器进行抓取内容时,很容易被服务器屏蔽掉,因此要想继续顺利的进行爬取数据,使用代理是明智的选择。如果你想爬取墙外的数据,同样设置代理可以解决问题,requests 完美支持代理。

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

超时设置

requests 发送请求时,默认请求下线程一直阻塞,直到有响应返回才处理后面的逻辑。如果遇到服务器没有响应的情况时,问题就变得很严重了,它将导致整个应用程序一直处于阻塞状态而没法处理其他请求。

>>> import requests
>>> r = requests.get("http://www.google.coma")
...一直阻塞中

正确的方式的是给每个请求显示地指定一个超时时间。

>>> r = requests.get("http://www.google.coma", timeout=5)
5秒后报错
Traceback (most recent call last):
socket.timeout: timed out

Session

HTTP协议是一中无状态的协议,为了维持客户端与服务器之间的通信状态,使用 Cookie 技术使之保持双方的通信状态。

有些网页是需要登录才能进行爬虫操作的,而登录的原理就是浏览器首次通过用户名密码登录之后,服务器给客户端发送一个随机的Cookie,下次浏览器请求其它页面时,就把刚才的 cookie 随着请求一起发送给服务器,这样服务器就知道该用户已经是登录用户。

import requests
# 构建会话
session  = requests.Session()
# 登录url
session.post(login_url, data={username, password})
# 登录后才能访问的url
r = session.get(home_url)
session.close()

构建一个session会话之后,客户端第一次发起请求登录账户,服务器自动把cookie信息保存在session对象中,发起第二次请求时requests 自动把session中的cookie信息发送给服务器,使之保持通信状态。

---恢复内容结束---

 

Guess you like

Origin www.cnblogs.com/1026164853qqcom/p/11298192.html