爬虫被发现了,你就是一个茶壶
import urllib.request
url = "https://movie.douban.com/top250?start=%s&filter="
# # 构建请求对象
req = urllib.request.Request(url,)
# 请求并获取响应
response_1 = urllib.request.urlopen(req).read().decode('utf-8')
print(response_1)
Report an errorurllib.error.HTTPError: HTTP Error 418:
problem causes:你是写的爬虫被发现了返回了418
Solution: Add Headers to pretend to be a browser
import urllib.request
# 写一个headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
url = "https://movie.douban.com/top250?start=%s&filter="
# # 构建请求对象
# 增加到请求对象中
req = urllib.request.Request(url,headers=headers)
response_1 = urllib.request.urlopen(req).read().decode('utf-8')
print(response_1)
HTTP response code 418