爬虫被发现了,你就是一个茶壶
import urllib.request
url = "https://movie.douban.com/top250?start=%s&filter="
# # 构建请求对象
req = urllib.request.Request(url,)
# 请求并获取响应
response_1 = urllib.request.urlopen(req).read().decode('utf-8')
print(response_1)
报错urllib.error.HTTPError: HTTP Error 418:
问题原因:你是写的爬虫被发现了返回了418
解决办法:增加Headers 伪装成浏览器
import urllib.request
# 写一个headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
url = "https://movie.douban.com/top250?start=%s&filter="
# # 构建请求对象
# 增加到请求对象中
req = urllib.request.Request(url,headers=headers)
response_1 = urllib.request.urlopen(req).read().decode('utf-8')
print(response_1)
HTTP 响应码 418