requests module - the difference between the text and content of the response object

Send a GET request

  1. Requirement: Send a request to the Baidu homepage through requests to obtain the source code of the page
  2. Run the following code and observe the results of the printout

Using Requests to send a GET request is very simple, just call requests.get()the method. For example:

# 导入requests库
import requests

# 目标url
url = 'https://www.baidu.com' 

# 向百度发送GET请求,并获取响应对象
response = requests.get(url)

# 打印响应对象的文本内容,即百度网站的HTML源码
print(response.text)

The above code will send a GET request to http://www.baidu.comand print the response.

During observation, it is found that the returned data will have乱码

insert image description here

Observing the result of running the above code, we found that there are a lot of garbled characters; this is caused by the different character sets used by the encoding and decoding; we try to use the following method to solve the problem of Chinese garbled characters

# 导入requests库
import requests 

# 目标url
url = 'https://www.baidu.com' 

# 向目标url发送get请求
response = requests.get(url)

# 打印响应内容
# print(response.text)
print(response.content.decode()) # 注意这里!

insert image description here

knowledge development

What is the difference between response.text and response.content in request

When sending a request using Python's requests library, we can use response.text()the or response.content()method to get the response content. The difference between these two methods is the data type returned.

response.text()
  • type:str
  • Decoding type: make an educated guess about the encoding of the response based on the HTTP headers, guessed text encoding
  • How to modify the encoding method:response.encoding="gbk"

response.text()The method returns a string in Unicode format, which is usually used to process text information. If the response content is in a text format such as XML or HTML, using response.text()methods can easily parse and process the data.

When using response.text()the method, if the header of the response does not specify an encoding method, the requests library will automatically infer the encoding method of the response, and then return the response content in Unicode format. However, it should be noted that since the requests library is based on the chardet module for encoding speculation, there is a certain possibility of misjudgment.

In addition, special attention should be paid to the fact that if the response content is binary data instead of text data, the method response.text()will generate garbled characters or throw an exception.

response.content()
  • type:bytes
  • Decoding type: not specified
  • How to modify the encoding method:response.content.decode("utf-8")

response.content()The method returns data in binary format, which is suitable for processing multimedia files such as pictures, audio, and video. Use response.content()the method to directly save the response content locally and keep the original binary data.

It should be noted that response.content()the method returns the original byte string, and the encoding method will not be parsed and converted. If you need to decode it into a string, you need to use the decode() method of the bytes object to specify the correct encoding method. For example, you response.content.decode("utf-8")can also use it directly response.content.decode()because response.content.decode defaults to utf-8.

In addition, response.content()the data returned by the method can be directly saved locally or transmitted to other systems in binary form without additional encoding and decoding operations.

In summary, the usage response.text()method is suitable for processing text information, and the usage response.content()method is suitable for processing binary data. When using these two methods, you need to pay attention to the matching of data types and encoding methods.

Guess you like

Origin blog.csdn.net/m0_67268191/article/details/131754269