网络爬虫前奏之实例爬取京东商品004

import requests
url = "https://item.jd.com/100006349791.html"
try:
    r=requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    #[:1000]是字符串切片,前1000个字符
    print(r.text[:1000])
except:
    print("爬取失败")

因为京东有反爬所以报错:

 解决后的代码:

import requests
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36",
    "Cookie":"unpl=V2_ZzNsbUJSFxJ0AUVSZ0kOAWUfFwgXVF8dcwoVSH5MC1FhChJbQwNEEGlJKFRzEVQZJkB8XkJfQwklTShUehhaAWAzEVxBVl8UcBRGXWoZVQ5kBRlZRmdDJXUJR1V6GloGbgIibXJXQSV0OEZdexhYBmECGlpyUkZFdQhBBi8FXVdkUA5YS1FLCXwBQVVnHFwFZVBHVRBXSx13OEBS; __jda=122270672.1810527096.1583840132.1583840132.1587205676.1; __jdv=122270672|kong|t_1000027280_100756|zssc|14e60827-ac53-4dd2-973b-4dfe78170e64-p_1999-pr_2191-at_100756|1587205676408; __jdc=122270672; __jdu=1810527096; shshshfpa=5c31968b-41e5-cba8-dba2-b39569a132c9-1587205677; shshshfpb=um3kANbAFjMv5MDiflvoSBQ%3D%3D; 3AB9D23F7A4B3C9B=D35J7U2PGKXJ2GUPPEPRNBKPJWDZYS34NGT3TOIN5D7WXWYROAMHFI2GO6BGTEFJHQO6BPSSQX7BTCQ35PFLX64BUY; areaId=6; ipLoc-djd=6-379-388-0; shshshfp=0ca00c61b8d65ee5c54c217cb3fe41ca"
}
url = "https://item.jd.com/100006349791.html"
try:
    r=requests.get(url,headers=headers)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000])
except:
    print("爬取失败")

 解决方法:

添加header,有很多网上随便找一个就行

cookie:

发布了50 篇原创文章 · 获赞 8 · 访问量 2105

猜你喜欢

转载自blog.csdn.net/qq_42753878/article/details/105603197