foreword
We used the urllib library before, which is a good tool for getting started. It is helpful for understanding some basic concepts of crawlers and mastering the crawling process of crawlers. After getting started, we need to learn some more advanced content and tools to facilitate our crawling. So this section briefly introduces the basic usage of the requests library.
Install
Install with pip
$ pip install requests
Or use easy_install
$ easy_install requests
The installation can be completed by the above two methods.
introduce
First, let's introduce a small example to get a feel for it
import requests r = requests.get('http://cuiqingcai.com') print type(r) print r.status_code print r.encoding #print r.text print r.cookies
In the above code, we request the URL of this site, and then print out the type of returned result, status code, encoding method, Cookies, etc.
The running result is as follows
<class 'requests.models.Response'> 200 UTF-8 <RequestsCookieJar[]>
What, isn't it very convenient. Don't worry, it's more convenient in the back.
basic request
The requests library provides all the basic request methods of http. E.g
r = requests.post("http://httpbin.org/post") r = requests.put("http://httpbin.org/put") r = requests.delete("http://httpbin.org/delete") r = requests.head("http://httpbin.org/get") r = requests.options("http://httpbin.org/get")
Well, in one sentence.
Basic GET request
The most basic GET request can directly use the get method
r = requests.get("http://httpbin.org/get")
If you want to add parameters, you can use the params parameter
import requests payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get("http://httpbin.org/get", params=payload) print r.url
operation result
http://httpbin.org/get?key2=value2&key1=value1
If you want to request a JSON file, you can use the json() method to parse it
For example, write a JSON file named a.json by yourself, the content is as follows
["foo", "bar", { "foo": "bar" }]
Use the following program to request and parse
import requests r = requests.get("a.json") print r.text print r.json()
The results of the operation are as follows, one of which is to output the content directly, and the other is to use the json() method to parse and feel the difference between them
["foo", "bar", { "foo": "bar" }] [u'foo', u'bar', {u'foo': u'bar'}]
If you want to get the raw socket response from the server, you can get r.raw. But you need to set stream=True in the initial request.
r = requests.get('https://github.com/timeline.json', stream=True) r.raw <requests.packages.urllib3.response.HTTPResponse object at 0x101194810> r.raw.read(10) '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
In this way, the raw socket content of the web page is obtained.
If you want to add headers, you can pass the headers parameter
import requests payload = {'key1': 'value1', 'key2': 'value2'} headers = {'content-type': 'application/json'} r = requests.get("http://httpbin.org/get", params=payload, headers=headers) print r.url
The headers information in the request header can be added through the headers parameter
Basic POST request
For POST requests, we generally need to add some parameters to it. Then the most basic parameter transfer method can use the data parameter.
import requests payload = {'key1': 'value1', 'key2': 'value2'} r = requests.post("http://httpbin.org/post", data=payload) print r.text
operation result
{ "args": {}, "data": "", "files": {}, "form": { "key1": "value1", "key2": "value2" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "23", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.9.1" }, "json": null, "url": "http://httpbin.org/post" }
You can see that the parameter transfer was successful, and then the server returned the data we passed.
Sometimes the information we need to transmit is not in the form of a form, and we need to pass the data in JSON format, so we can use the json.dumps() method to serialize the form data.
import json import requests url = 'http://httpbin.org/post' payload = {'some': 'data'} r = requests.post(url, data=json.dumps(payload)) print r.text
operation result
{ "args": {}, "data": "{\"some\": \"data\"}", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "16", "Host": "httpbin.org", "User-Agent": "python-requests/2.9.1" }, "json": { "some": "data" }, "url": "http://httpbin.org/post" }
Through the above method, we can POST data in JSON format
If you want to upload a file, you can use the file parameter directly
Create a new a.txt file and write Hello World!
import requests url = 'http://httpbin.org/post' files = {'file': open('test.txt', 'rb')} r = requests.post(url, files=files) print r.text
You can see the results of the operation are as follows
{ "args": {}, "data": "", "files": { "file": "Hello World!" }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "156", "Content-Type": "multipart/form-data; boundary=7d8eb5ff99a04c11bb3e862ce78d7000", "Host": "httpbin.org", "User-Agent": "python-requests/2.9.1" }, "json": null, "url": "http://httpbin.org/post" }
So we have successfully completed the upload of a file.
requests supports streaming uploads, which allow you to send large data streams or files without first reading them into memory. To use streaming uploads, simply provide a file-like object for your request body
with open('massive-body') as f: requests.post('http://some.url/streamed', data=f)
This is a very useful and convenient function.