requests爬取猫眼电影403错误解决方法

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weixin_40567229/article/details/84545576

原代码如下: 

import requests
from requests.exceptions import RequestException


def one_page_code(url):
    try:
        page = requests.get(url)
        if page.status_code == 200:
            return page.text
        print("Failed\n状态码为%d"%(page.status_code))
    except RequestException:
        print("Exception")

def main():
    url = 'http://maoyan.com'
    print(one_page_code(url))

if __name__ == '__main__':
    main()

这个代码无论是请求百度、淘宝还是豆瓣都能正常的显示出网页源代码,但是在爬取猫眼时却返回403错误

 

原来请求网页的过程中,忽略了很重要的一点,就是请求头

我们在浏览器检查元素中把network中的请求头复制出来,添加到请求函数中

import requests
from requests.exceptions import RequestException


def one_page_code(url):
    try:
        header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'}
        page = requests.get(url,headers = header)
        if page.status_code == 200:
            return page.text
        print("Failed\n状态码为%d"%(page.status_code))
    except RequestException:
        print("Exception")

def main():
    url = 'http://maoyan.com/board/4'
    print(one_page_code(url))

if __name__ == '__main__':
    main()

就可以正常获取到网页的源代码了

猜你喜欢

转载自blog.csdn.net/weixin_40567229/article/details/84545576