Getting started with python crawler (two)-common methods and use of requests library (requests library Chinese official website)

Preface

Learning makes me happy, games make me sad. Today rushB is another day for nothing.
HXDM, let us learn the methods and use of the requests library together and immerse ourselves in the world of code. Woooo~~

1. Introduction to the requests library

First list the official website of the requests library. It is here: Requests library Chinese official website
requests library declaration Requests: HTTP for Humans (domineering)

Its official website has detailed usage documents, and there are some small examples, you can go to the official website to learn if you don't want to read Luo Li's snobbish blog.

The requests library is based on the urllib framework.

Two, the common methods and instructions of the requests library

method Description
requests.get() Get a certain page
requests.post() Send HTTP POST request
requests.put() Submit HTTP PUT request
requests.delete() Submit an HTTP delete request
requests.head() Get HTTP HEAD (header) information
requests.options() Submit HTTP OPTINS request

Three, the properties and description of the response object

Before understanding the properties of the response object, let's take a look at what a response object is. When we use a browser, for example, search for CSDN and press F12 to open the developer tools. In Network, we can see the two Headers in the red circle in the picture below.
Insert picture description here
The request object is the request sent by the browser to the server, and the response object is the server's response to the browser request. Let's take a look at the code below.

import requests

r = requests.get('https://www.baidu.com')

So we can think of it 'www.baidu.com'as a request, and the method requests.get()returns a response, so we get the response object r. Let's rtry to print it directly .

import requests

r = requests.get('https://www.baidu.com')
print(r)

Execution result: It
Insert picture description here
can be seen that the result is not the html code of the Baidu webpage we imagined, but the response status code. This is because rhere is a response object, and we need to access its properties to get the information we want.
Properties of the response object:

Attributes Description
response.text HTTP response page content
response.encoding Text encoding format
response.apparent_encoding Response content coding method derived from content analysis
response.status_code HTTP response status
response.content Binary format of HTTP response page content

Let's take a look at the following code

import requests

r = requests.get('https://www.baidu.com')
r.encoding = 'utf-8'

print(r.text)

Execution result:
Insert picture description here
we got the html code of Baidu homepage

Fourth, the use of common methods in the requests library

1. The use of requests.get()

The get method is the most commonly used method in the requests library. We can get the HTTP response we want by changing the parameters.
For example, we directly visit the Douban website to see what
code will be :

import requests

r = requests.get('https://www.douban.com')
r.encoding = 'utf-8'

print(r.text)

Execution result:
Insert picture description here
Hey, what's the matter, there is nothing, the code is not wrong! Why is there no text?
Let's print the HTTP status response code.
Code:

import requests

r = requests.get('https://www.douban.com')
r.encoding = 'utf-8'

print(r.status_code)

Execution result:
Insert picture description here
You can see that the HTTP status response code is 418. What does this status code mean? 418: It means that when the client sends a request to make coffee to a teapot, it will return an error status code that means: I'm a teapot, that is, "I am a teapot". This HTTP status code is used as an easter egg in some websites, and it is also used in someReptile warning. It means that we were warned, and Douban told us, "You bad reptile, if you don't follow the rules, people won't show it to you! Humph╭(╯^╰)╮!". So we need to access Douban through the normal way. Fortunately, the requests.get()method allows us to add some parameters to simulate browser access. Now we can say, "Hey, can't run away."
How to imitate browser access?
Open a webpage, press F12 to open the developer mode, in the network, just click on an item under name,
you can see the following situation
Insert picture description here
, the big red circle under Request Headers User-Agentis the information that tells the server browser. How can we write in the code:

import requests

headers = {
    
    
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 \
            (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
}  # 模拟浏览器访问

r = requests.get('https://www.douban.com', headers=headers)
r.encoding = 'utf-8'

print(r.text)
print('状态码:')
print(r.status_code)

Execution result:
Insert picture description here
You can see that we got the html code of Douban, as well as the status code 200. requests.get()The method has many useful parameters, for example, you can use

payload = {
    
    'key1': 'value1', 'key2': 'value2'}
r = requests.get("http://httpbin.org/get", params=payload)

Instead of

r = requests.get("http://httpbin.org/get?key1=value1&key2=value2")

You can go to the official website of the requests library to explore on your own and write more code to improve.

2. Use of requests.post() method

The requests library also allows us to request HTTP responses in post, because some web pages do not accept URLsubsequent additions Vaule, it may only accept forms. We can write:

payload = {
    
    'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)

This way we can customize the content of the form ourselves. emmm As for what a form is, this requires you to learn.

3. Use of put, delete and other methods

The use of other methods is naturally similar to the above. After learning the knowledge about HTTP requests, we naturally know which method we should use to crawl web resources.
Examples of use:

r = requests.put('http://httpbin.org/put', data = {
    
    'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')

4. Practicing website

http://httpbin.org is a website that accepts requests and responses. You can usually practice on it.
Insert picture description here
Come on!
Insert picture description here

Guess you like

Origin blog.csdn.net/Bob_ganxin/article/details/108740320