python爬虫之requests

上一章我们介绍了urllib库,功能很强大,操作略微繁琐,今天我们来讲一下功能同样强大,但是操作简单,大名鼎鼎的Request库。

安装requests

pip install requests
中文文档地址http://docs.python-requests.org/zh_CN/latest/index.html

get和post请求

get请求

如果我们要访问百度

import requests
resp = requests.get("http://www.baidu.com")
print(resp.status_code)
print(resp.headers)
print(resp.text)

添加参数和请求头

百度请求我的

import requests

data = {
    'wd': 'ermuv5'
}
header = {
 "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"
}
resp = requests.get("http://www.baidu.com/s",params=data, headers=header)
print(resp.status_code)
print(resp.headers)

发送POST请求

最基本的请求response = requests.post("http://www.baidu.com/",data=data)
传入数据和请求头

import requests

url = "https://www.lagou.com/jobs/positionAjax.json?city=%E5%8C%97%E4%BA%AC&needAddtionalResult=false"

header = {
    "Refer": "https://www.lagou.com/jobs/list_%E7%88%AC%E8%99%AB%E5%B7%A5%E7%A8%8B%E5%B8%88?labelWords=sug&fromSearch=true&suginput=%E7%88%AC%E8%99%AB",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
    "X-Anit-Forge-Code": "0",
    "X-Anit-Forge-Token": "None",
    "X-Requested-With": "XMLHttpRequest",
    "Origin": "https://www.lagou.com",
    "Pragma": "no-cache",
    "Cookie": "_ga=GA1.2.1285415081.1535156393; user_trace_token=20180825081952-961f8d7c-a7fc-11e8-a50b-525400f775ce; LGUID=20180825081952-961f9284-a7fc-11e8-a50b-525400f775ce; index_location_city=%E5%8C%97%E4%BA%AC; JSESSIONID=ABAAABAAAGGABCBCF65F9D2618DED9BAEEFB90369B05D3A; _gid=GA1.2.1519692043.1538068027; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1536140500,1536204980,1536204986,1538068028; TG-TRACK-CODE=search_code; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1538068118; LGRID=20180928010838-fa08177b-c277-11e8-a748-525400f775ce; SEARCH_ID=81a30ba9597447c3b6dcdd1bd1417eb0"
}

data = {
    "first": "true",
    "pn": "1",
    "kd": "爬虫工程师",
    "city":"北京"
}

proxy = {
    'http': "218.59.193.14:47138"
}
response = requests.post(url, data=data, headers=header, proxies=proxy)

print(response.json())

这里顺便加了代理的设置,就是一句话,我们可以看一下如果是urllib的代理添加。

from urllib import request

# 这个是没有使用代理的
# resp = request.urlopen('http://httpbin.org/get')
# print(resp.read().decode("utf-8"))

# 这个是使用了代理的
handler = request.ProxyHandler({"http":"218.59.193.14:47138"})
opener = request.build_opener(handler)
req = request.Request("http://httpbin.org/ip")
resp = opener.open(req)
print(resp.read())

处理不信任的证书

对于那些已经被信任的SSL整数的网站,比如https://www.baidu.com/,那么使用requests直接就可以正常的返回响应。示例代码如下:
resp = requests.get('http://www.12306.cn/mormhweb/',verify=False)
print(resp.content.decode('utf-8'))

小结

本节讲了requests的get请求,post请求,代理,处理不信任的证书,我们发现,requests库基本都是一步完成,确实对得起他人性化的口号。

发布了176 篇原创文章 · 获赞 84 · 访问量 44万+

猜你喜欢

转载自blog.csdn.net/lovemenghaibin/article/details/82875497