Selenium gets the request response information, including the request's response header and response body

  When we use selenium to request a web page, sometimes we don't want to get data from the html tag parsed by the browser. It will be easier to parse if we can directly get the json format data returned by the url. Just like the response data returned by request and scrapy crawlers. So, what should we do with selenium?

Selenium does not support obtaining response data. We can use the selenium-wire library, which extends Selenium's Python bindings to access the underlying requests issued by the browser. The code is written the same way as Selenium.

  • Install the selenium-wire library:
pip install selenium-wire
  • Modify the import method
# from selenium import webdriver
from seleniumwire import webdriver

 

  • Obtain the content of the network response, that is, the url response as shown in the figure:
driver.requests

 

  •  Parsing the returned response headers
driver.requests returns a list, traverse each url to get the desired url response information. The following example demonstrates that if the request header of the specified URL is obtained, the obtained request header can be directly used to send the request.
# 提取接口的请求信息
get_conver_header = {}
get_conver_url = ""
for request in driver.requests:
    if "https://coXXX.com/api/" in request.url:
        get_conver_url = request.url
        for header_key in request.headers:
            get_conver_header[header_key] = request.headers[header_key]
        break
  • get return response body
from io import BytesIO
import gzip
 
# 清空之前获取的请求信息
del driver.requests
 
# 获取响应体的内容数据
rp_body = driver.requests[0].response.body
 
# 获取到的编码为byte数据,需要解码为utf-8,直接解码会报错
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
# print输出的字节码是以"b’\x1f\x8b\x08"开头的,说明它是gzip压缩过的数据,所以我们要对字节码进行一个解码操作
buff = BytesIO(rp_body)
f = gzip.GzipFile(fileobj=buff)
htmls = f.read().decode('utf-8')
print(htmls)

Guess you like

Origin blog.csdn.net/qq_48811377/article/details/131900761