Acquiring response content
response object has attributes:
text request all returned content
status_code status code
encoding encoded
contents of the response byte of the content, such as in \ n denotes a carriage return, and \ t \ r et
r.json () returns if json strings, analyzing will be used to json json decoder comes Requests
Delivery request parameter
import requests
dict = {'key1' : 'value1', 'key2' : 'value2'}
link = 'http://httpbin.org/get'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.get(link, headers=headers, params=dict)
print(r.content)
print(r.status_code)
print(r.json())
程序运行结果:
b'{\n "args": {\n "key1": "value1", \n "key2": "value2"\n }, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Type": "text/html", \n "Host": "httpbin.org", \n "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"\n }, \n "origin": "223.72.90.250, 223.72.90.250", \n "url": "https://httpbin.org/get?key1=value1&key2=value2"\n}\n'
200
{'args': {'key1': 'value1', 'key2': 'value2'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'text/html', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}, 'origin': '223.72.90.250, 223.72.90.250', 'url': 'https://httpbin.org/get?key1=value1&key2=value2'}
Params = dict visible through the request parameters has been correctly transmitted key1 = value1 & key2 = value2. Further, if you want a compact format json format the data format can use the online tool http://www.bejson.com/
Custom request headers
Examples of the above specified headers parameter transmitted through the User-Agent, we can convey more information headers, such as the
import requests
= Link 'http://httpbin.org/get'
headers = { 'the Host': 'www.santostang.com', 'the User-- Agent': 'the Mozilla / 5.0 (the Windows NT 6.1; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 73.0.3683.103 Safari / 537.36 ',' the Type-the Content ':' text / HTML '}
R & lt requests.get = (Link, headers = headers)
Print (r.status_code)
can also pass a more many headers parameters, view the contents of request Headers from requesting browser can join.
Sending a post request
import requests
dict = {'key1' : 'value1', 'key2' : 'value2'}
headers = {'Host' : 'www.santostang.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.post('http://httpbin.org/post', headers=headers, data=dict)
print(r.text)
运行结果:
{
"args": {},
"data": "key1=value1&key2=value2",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "text/html",
"Host": "www.santostang.com",
"The User-Agent": "Mozilla / 5.0 (Windows NT 6.1; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 73.0.3683.103 Safari / 537.36"
},
"json": null,
"Origin": " 223.72.90.250, 223.72.90.250 ",
" URL ":" https://www.santostang.com/post "
}
POST request parameter value specified by the request parameter data
Set timeout
import requests
requests.post = R & lt ( 'http://httpbin.org/post', timeout = 0.001)
Print (r.text)
result:
Because the timeout value of 0.001 timeout parameter set is too small, the execution of the program being given socket.timeout: timed out
Crawling watercress network top250 movie
import requests
from bs4 import BeautifulSoup
def getMovies():
headers = {'Host' : 'movie.douban.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
movies = []
for i in range(0, 10):
r = requests.post('https://movie.douban.com/top250?start=' + str(i * 25), headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
div_list = soup.find_all('div', class_='hd')
for div in div_list:
title = div.a.span.text
movies.append(title)
return movies
movies = getMovies()
for i, movie in enumerate(movies):
print(str(i+1) + "==" + movie)
Run the program will display the first 250 movie watercress network.
BeautifulSoup documents refer https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
This concludes this article, it may be more concerned about the number of public and personal micro signal: