Header is the metadata information in HTTP request and response, which is used to pass additional parameters and configuration in the request. Sending a request with a header can achieve customized functions and more precise control. The following are some common HTTP header fields and their functions:
header field | effect | example |
---|---|---|
Authorization | Provide authentication credentials to allow access to resources that require permissions | Authorization: Bearer <token> |
User-Agent | Used to identify the client type and version information sending the request | User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) |
Content-Type | Specifies the media type of the body part of the request or response | Content-Type: application/json |
Accept | Specifies the response content types that the client can accept | Accept: application/json |
Cookie | Used to pass session information between client and server | Cookie: session_id=ABC123 |
Refer | Indicates the source address of the request, used to prevent cross-site request forgery attacks | Referer: https://example.com/page1 |
If-Modified-Since | When the resource is not modified, return the cached version, reducing data transfer | If-Modified-Since: Sat, 01 Jan 2023 00:00:00 GMT |
User-Custom-Header | User-defined Header field, which can be used to pass custom information | X-Custom-Header: custom_value |
Note: Header field names are not case-sensitive.
Different header fields can be used to pass different information in the HTTP request to achieve a more flexible and personalized request and response process. However, it should be noted that when using the header, you need to follow the relevant HTTP specifications and ensure the security and legality of the data.
Let's first write a code to get Baidu's homepage
# 导入requests库
import requests
# 设置要访问的URL
url = 'https://www.baidu.com'
# 发送GET请求获取响应
response = requests.get(url)
# 打印响应内容
print(response.content.decode())
# 打印响应对应请求的请求头信息
print(response.request.headers)
think
-
Comparing the source code of the Baidu homepage on the browser and the source code of the Baidu homepage in the code, what is the difference?
- To view the source code of a web page:
- Right click - view web page source code or
- right click - inspect
- To view the source code of a web page:
-
What is the difference between the response content of the corresponding url and the source code of the Baidu homepage in the code?
- The method to view the response content corresponding to the url:
- right click - inspect
- click
Net work
- tick
Preserve log
- refresh page
- View
Name
the URL under the same column as the address bar of the browserResponse
- The method to view the response content corresponding to the url:
-
The source code of the Baidu homepage in the code is very small, why?
-
We need to bring request header information
Review the concept of crawlers, simulate browsers, deceive servers, and obtain content consistent with browsers
-
There are many fields in the request header, among which the User-Agent field is essential, indicating the client's operating system and browser information
-
The method of sending a request with a request header
requests.get(url, headers=headers)
- The headers parameter receives request headers in the form of a dictionary
- The field name of the request header is used as the key, and the value corresponding to the field is used as the value
Complete code implementation
Copy the User-Agent from the browser to construct a headers dictionary; after completing the following code, run the code to view the result
# 导入requests库
import requests
# 设置要访问的URL
url = 'https://www.baidu.com'
# 构造请求头字典,模拟浏览器发送请求
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
# 在请求头中带上User-Agent,模拟浏览器发送请求
response = requests.get(url, headers=headers)
# 打印响应内容
print(response.content)
# 打印请求头信息
print(response.request.headers)