Use Requests and XPath

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/LXJRQJ/article/details/100672795

Requests to use

Hold down the way:pip3 install requests

1, response common methods:

A, get request

print(response.text) #页面源码
print(response.status_code) # 状态吗
print(response.headers) # 响应头
print(response.request.headers) #获取请求头
print(response.content) #获取页面的二进制数据
* response.encoding = 'utf-8' 可以设置编码类型
* response.encoding 获取当前的编码
* response.json() 内置的JSON解码器,以json形式返回,前提返回的内容确保是json格式的,不然解析出错会抛异常

Two, post a request

response = requests.post(url=url, data = data)

	* url:post请求的目标url
	* data:post请求的表单数据

post request to upload files

from_data = {
    'username':'LXJ',
    'password':'292143060li'
}
url = 'http://127.0.0.1:8001/api/login/'
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
response = requests.post(url=url,data=from_data,headers=headers)

url = 'https://httpbin.org/post'
files = {'file':open('pages.html','r',encoding='gbk')}
# 读取本地文件
response = requests.post(url=url,files=files,headers=headers)
if response.status_code == 200:
    print('文件上传成功')
    print(response.text)

Set up a proxy (proxies parameter)

import requests

url = 'http://college.gaokao.com/schlist/'
# params : 跟的是get请求url地址后?后面拼接的参数
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}

proxies = {
    'http':'192.168.2.913:8992',
    'http':'192.168.2.913:9923',
}
response = requests.get(url,params=None,headers=headers,proxies=proxies)

XPath selectors

<1> What is XPath?

  • XPath (XML Path Language) is an XML document to find information in the language, it can be used to traverse the elements and attributes in an XML document.

<2> XPath path expression most common:

  • / Select from the root node.
  • // Select the document matches the selected node from the current node, regardless of their location.
  • . Select the current node.
  • ... Select the parent of the current node.
  • @ Select Properties.
  • All child nodes of the bookstore element selected bookstore.
  • / Bookstore selected root element bookstore. Note: If the path starts with a forward slash (/), then this path is always representative of the absolute path to an element!
  • bookstore / book book select all elements belonging to sub-elements of the bookstore.
  • // book Selects all book sub-elements, regardless of their position in the document.
  • bookstore // book Selects all book elements that belong to the descendants of the bookstore element, and no matter what position they are located below the bookstore.
  • // @ lang Selects all the property named lang.
  • / Bookstore / * select all child elements of the bookstore element.
  • // * Select all elements in the document. html / node () / meta / @ * meta attributes to select all nodes below the node in any html
  • // title [@ *] to select all elements with title attributes.

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/LXJRQJ/article/details/100672795