Article Directory
1. Crawl the original page
The original page cited is as shown below, which is an Amazon product
2. Analysis of error-prone points
Since Amazon has set up source review, you need to change the code to crawl the above content, that is, change the header information, which is the headers, and use a dictionary to construct key-value pairs.
kv = {
'user-agent':'Mozilla/5.0'}
For detailed explanation, please refer to the article I wrote before (find yourself, hehe)
Link: https://blog.csdn.net/weixin_44578172/article/details/109302571
3. Complete code
import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
kv = {
'user-agent':'Mozilla/5.0'}
#使用字典构造键值对,用Mozilla/5.0代替之前发送请求的header中的user-agent
r = requests.get(url,headers=kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("爬取失败")
The crawling results are as follows:
At the end of this article, please point out any errors~
Quote from
中国大学MOOC Python网络爬虫与信息提取
https://www.icourse163.org/course/BIT-1001870001