Sesame HTTP: Usage of Requests Library of Python Crawler Tool

foreword

We used the urllib library before, which is a good tool for getting started. It is helpful for understanding some basic concepts of crawlers and mastering the crawling process of crawlers. After getting started, we need to learn some more advanced content and tools to facilitate our crawling. So this section briefly introduces the basic usage of the requests library.

Install

Install with pip

​$ pip install requests

 Or use easy_install

​$ easy_install requests

 The installation can be completed by the above two methods.

introduce

First, let's introduce a small example to get a feel for it


import requests
 
r = requests.get('http://cuiqingcai.com')
print type(r)
print r.status_code
print r.encoding
#print r.text
print r.cookies

 In the above code, we request the URL of this site, and then print out the type of returned result, status code, encoding method, Cookies, etc.

The running result is as follows


<class 'requests.models.Response'>
200
UTF-8
<RequestsCookieJar[]>

 What, isn't it very convenient. Don't worry, it's more convenient in the back.

basic request

The requests library provides all the basic request methods of http. E.g

r = requests.post("http://httpbin.org/post")
r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")

 Well, in one sentence.

Basic GET request

The most basic GET request can directly use the get method

​r = requests.get("http://httpbin.org/get")

 If you want to add parameters, you can use the params parameter

​import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://httpbin.org/get", params=payload)
print r.url

 operation result

​http://httpbin.org/get?key2=value2&key1=value1

 If you want to request a JSON file, you can use the json() method to parse it

For example, write a JSON file named a.json by yourself, the content is as follows

​["foo", "bar", {
  "foo": "bar"
}]

 Use the following program to request and parse

import requests
 
r = requests.get("a.json")
print r.text
print r.json()

 The results of the operation are as follows, one of which is to output the content directly, and the other is to use the json() method to parse and feel the difference between them

​["foo", "bar", {
 "foo": "bar"
 }]
 [u'foo', u'bar', {u'foo': u'bar'}]

 If you want to get the raw socket response from the server, you can get r.raw. But you need to set stream=True in the initial request.

​r = requests.get('https://github.com/timeline.json', stream=True)
r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

 In this way, the raw socket content of the web page is obtained.

If you want to add headers, you can pass the headers parameter

​import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
headers = {'content-type': 'application/json'}
r = requests.get("http://httpbin.org/get", params=payload, headers=headers)
print r.url

 The headers information in the request header can be added through the headers parameter

Basic POST request

 

For POST requests, we generally need to add some parameters to it. Then the most basic parameter transfer method can use the data parameter.

​import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)
print r.text

 operation result



{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "key1": "value1",
    "key2": "value2"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "23",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.9.1"
  },
  "json": null,
  "url": "http://httpbin.org/post"
}

 You can see that the parameter transfer was successful, and then the server returned the data we passed.

Sometimes the information we need to transmit is not in the form of a form, and we need to pass the data in JSON format, so we can use the json.dumps() method to serialize the form data.

import json
import requests
 
url = 'http://httpbin.org/post'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text

 operation result

​{
  "args": {},
  "data": "{\"some\": \"data\"}",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "16",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.9.1"
  },
  "json": {
    "some": "data"
  },  
  "url": "http://httpbin.org/post"
}

 Through the above method, we can POST data in JSON format

If you want to upload a file, you can use the file parameter directly

Create a new a.txt file and write Hello World!

​import requests
 
url = 'http://httpbin.org/post'
files = {'file': open('test.txt', 'rb')}
r = requests.post(url, files=files)
print r.text

 You can see the results of the operation are as follows



{
  "args": {},
  "data": "",
  "files": {
    "file": "Hello World!"
  },
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "156",
    "Content-Type": "multipart/form-data; boundary=7d8eb5ff99a04c11bb3e862ce78d7000",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.9.1"
  },
  "json": null,
  "url": "http://httpbin.org/post"
}

 So we have successfully completed the upload of a file.

requests supports streaming uploads, which allow you to send large data streams or files without first reading them into memory. To use streaming uploads, simply provide a file-like object for your request body



with open('massive-body') as f:
    requests.post('http://some.url/streamed', data=f)

 This is a very useful and convenient function.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326363673&siteId=291194637