Project scenario:
I have recently started to learn Python's network data collection. However, when I use Python's requests to collect data from web pages, even if user-agent has been added as headers, the returned HTTP status is still 418.
Problem Description
Since most websites now have certain anti-** mechanisms, when using Python requests to collect web page data, you need to add headers, otherwise it will be easily recognized by the website's anti-** mechanism and return status code 418. .
Then when I used Python requests to collect data from the web page, even though user-agent was added as headers, I still couldn't get the results. By printing res, I found that the returned status code was <418>. The following is a display of part of the code
def get_data(n):
base_url = 'https://book.douban.com/top250'
headers = {
'User - Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
params = {
'start':(n-1)*25
}
res=requests.get(base_url,headers=headers,params=params)
print(res)
get_data(1)
Cause Analysis:
Through constant inspection of the code, I found that the problem occurred in the headers. My code is 'User - Agent' . This is because I directly chose to copy the user-agent in the network for the sake of convenience. ,as the picture shows.
solution:
Therefore, by deleting the spaces in the middle, the http status code was finally successfully returned to <Response [200]>
Summary: In fact, in the process of writing code, we often encounter problems caused by laziness. Such problems are relatively small, but sometimes they are not easy to find.