Spider——requests模块(HTTP for Humans)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_37049781/article/details/81872164

虽然 urllib,与 urllib2已经能够满足一般的爬虫需求,但是对于人类来说仍然不是太友好。requests 模块继承了urllib2的所有特性,并支持HTTP连接保持和连接池,支持使用cookie保持会话,文件上传,自动确定响应内容编码等。
requests 中文文档: http://docs.python-requests.org/zh_CN/latest/index.html

requests基本请求

import requests
response = requests.get("www.baidu.com")
response = requests.post("www.baidu.com",data=data)

设置headers,传递参数

import requests
# 定义参数
kw = {"wd":"python"}
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
# 在requests的请求中会自动将请求字典参数进行url编码,在post请求传递参数中,使用data
response = requests.get("www.baidu.com",params = kw, headers = headers)

为请求设置代理

import requests

# 根据协议类型,选择不同的代理
proxies = {
  "http": "http://12.34.56.79:9527",
  "https": "http://12.34.56.79:9527",
}

response = requests.get("http://www.baidu.com", proxies = proxies)
print response.text
  • 私密代理验证
import requests

# 如果代理需要使用HTTP Basic Auth,可以使用下面这种格式:
proxy = { "http": "account:password@host:port" }
response = requests.get("http://www.baidu.com", proxies = proxy)
print response.text
  • web验证
import requests
auth=('account', 'passwd')
response = requests.get('host', auth = auth)
print response.text

设置cookies与session

  • 添加cookies
cookies = {'cookies_are':'working'}
r = requests.get(url, cookies=cookies)
# 或者是
jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'http://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
  • 获取cookie
import requests
response = requests.get("http://www.baidu.com/")
name = response.cooks["cookie_name"]
print name
# 将cookies转化为字典
cookiejar = response.cookies
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
  • 添加sission
import requests

# 1. 创建session对象,可以保存Cookie值
ssion = requests.session()

# 2. 处理 headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
data = {"email":"account", "password":"password"}  
ssion.post("http://www.renren.com/PLogin.do", data = data)

跳过SSL证书验证

r = requests.get("https://www.12306.cn/mormhweb/", verify = False)

猜你喜欢

转载自blog.csdn.net/qq_37049781/article/details/81872164