White school Python reptiles (17): Requests Basics

Life is short, I used Python

The foregoing Portal:

White school Python Reptile (1): Opening

White Python crawler Science (2): Pre-preparation (a) is mounted substantially libraries

Getting Started with Linux pre-prepared base (B): white reptile learn Python (3)

Docker basis of pre-entry preparation (III): white reptile learn Python (4)

White school Python Reptile (5): pre-prepared (four) database infrastructure

White school Python Reptile (6): pre-prepared (E) crawler frame installation

White school Python reptiles (7): HTTP basic

White school Python reptiles (8): page basis

White school Python reptiles (9): Reptile basis

White school Python reptiles (10): Session and Cookies

White school Python reptiles (11): urllib Basics (a)

White school Python reptiles (12): urllib Basics (b)

White school Python reptiles (13): urllib Basics (c)

White school Python reptiles (14): urllib based on the use (d)

White school Python reptiles (15): urllib basis using (E)

White Science Python crawler (16): urllib combat crawling sister of FIG.

introduction

In the previous pre-prepared, we installed a lot of request from a third library, such as Request, AioHttp, etc., I do not know the students do not have the impression, not the impression that the students can looking through the previous article.

Previous articles we have a general understanding of the basic usage urllib of which there are indeed many inconvenient places, such as processing Cookies or use a proxy to access when the need to use Opener and Handler to deal with.

At this time, the emergence of more powerful Request library matter of course. With Request library, we can be more simple and convenient to use these higher-order operations.

Brief introduction

Sincerely, above all, the various official address:

这里列出各种官方文档的目的是希望各位同学能养成查阅官方文档的习惯,毕竟小编也是人,也会犯错,相比较而言,官方文档的错误率会非常低,包括有时候一些疑难问题都能通过官方文档来解决。

各种基础概念我们已经在介绍 urllib 基础使用的时候都介绍过了,这里也就不再多 BB ,直接进入干货环节:写代码

这里我们使用的测试地址依然事前面提到过的:https://httpbin.org/

GET 请求

GET 请求是我们最常用的请求,先来了解一下如何使用 Requests 发送一个 GET 请求。代码如下:

import requests

r = requests.get('https://httpbin.org/get')
print(r.text)

结果如下:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0"
  }, 
  "origin": "116.234.254.11, 116.234.254.11", 
  "url": "https://httpbin.org/get"
}

这里就不多讲了,和前面的 urllib 是一样的。

如果我们想在 GET 请求中添加请求参数,需要如何添加呢?

import requests

params = {
    'name': 'geekdigging',
    'age': '18'
}

r1 = requests.get('https://httpbin.org/get', params)
print(r1.text)

结果如下:

{
  "args": {
    "age": "18", 
    "name": "geekdigging"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0"
  }, 
  "origin": "116.234.254.11, 116.234.254.11", 
  "url": "https://httpbin.org/get?name=geekdigging&age=18"
}

可以看到,请求的链接被自动构造成了:https://httpbin.org/get?name=geekdigging&age=18

值得注意的一点是, r1.text 返回的数据类型是 str 类型,但是实际上是一个 json ,如果想直接将这个 json 转化成我们可以直接使用的字典格式,可以使用以下方法:

print(type(r1.text))
print(r1.json())
print(type(r.json()))

结果如下:

<class 'str'>
{'args': {'age': '18', 'name': 'geekdigging'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0'}, 'origin': '116.234.254.11, 116.234.254.11', 'url': 'https://httpbin.org/get?name=geekdigging&age=18'}
<class 'dict'>

添加请求头:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
    'referer': 'https://www.geekdigging.com/'
}
r2 = requests.get('https://httpbin.org/get', headers = headers)
print(r2.text)

结果如下:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "Referer": "https://www.geekdigging.com/", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
  }, 
  "origin": "116.234.254.11, 116.234.254.11", 
  "url": "https://httpbin.org/get"
}

与 urllib.request 一样,我们也是通过 headers 参数来传递头信息。

如果我们想要抓取一张图片或者一个视频这种文件,可以怎么做呢?

这些文件本质上都是由二进制码组成的,由于有特定的保存格式和对应的解析方式,我们才可以看到这些形形色色的多媒体。所以,想要抓取它们,就要拿到它们的二进制码。

比如我们抓取一张百度上的 logo 图片,图片地址为:https://www.baidu.com/img/superlogo_c4d7df0a003d3db9b65e9ef0fe6da1ec.png

import requests

r3 = requests.get("https://www.baidu.com/img/superlogo_c4d7df0a003d3db9b65e9ef0fe6da1ec.png")
with open('baidu_logo.png', 'wb') as f:
    f.write(r3.content)

结果小编就不展示了,可以正常下载。

POST 请求

我们接着来介绍一个非常常用的 POST 请求。和上面的 GET 请求一样,我们依然使用: https://httpbin.org/post 进行测试。示例代码如下:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
    'referer': 'https://www.geekdigging.com/'
}

params = {
    'name': 'geekdigging',
    'age': '18'
}

r = requests.post('https://httpbin.org/post', data = params, headers = headers)
print(r.text)

结果如下:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "18", 
    "name": "geekdigging"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "23", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "Referer": "https://www.geekdigging.com/", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
  }, 
  "json": null, 
  "origin": "116.234.254.11, 116.234.254.11", 
  "url": "https://httpbin.org/post"
}

我们在这个 POST 请求中添加了请求头和参数。

Response 响应

上面我们使用过 text 和 json 来获取响应内容,除了这两个,还有很多属性和方法可以用来获取其他信息。

我们来访问百度首页演示一下:

import requests

r = requests.get('https://www.baidu.com')
print(type(r.status_code), r.status_code)
print(type(r.headers), r.headers)
print(type(r.cookies), r.cookies)
print(type(r.url), r.url)
print(type(r.history), r.history)

结果如下:

<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'Keep-Alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Thu, 05 Dec 2019 13:24:11 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:23:55 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
<class 'str'> https://www.baidu.com/
<class 'list'> []

这里分别打印输出 status_code 属性得到状态码,输出 headers 属性得到响应头,输出 cookies 属性得到 Cookies ,输出 url 属性得到 URL ,输出 history 属性得到请求历史。

示例代码

本系列的所有代码小编都会放在代码管理仓库 Github 和 Gitee 上,方便大家取用。

示例代码-Github

示例代码-Gitee

Guess you like

Origin www.cnblogs.com/babycomeon/p/12033132.html