[Share] Huawei cloud white papers, most Python as the most important language library Requests

Requests reptile Python library is the most important and the most most common library, be sure to master it.

Here we come to know the library

Requests reptile Python library is the most important and the most most common library, be sure to master it.

Here we come to know the library

import requests
url = 'http://www.baidu.com'
r = requests.get(url)
print type(r)
print r.status_code
print r.encoding
#print r.content
print r.cookies


得到:
<class 'requests.models.Response'>
200
ISO-8859-1
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

2.Get request method

values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.get(url,values)
print r.url

得到:http://www.baidu.com/?user=aaa&id=123

3.Post request method

values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.post(url,values)
print r.url
#print r.text

得到:
http://www.baidu.com/

4. Processing the request header headers

user_agent = {'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4295.400 QQBrowser/9.7.12661.400'}
header = {'User-Agent':user_agent}
url = 'http://www.baidu.com/'
r = requests.get(url,headers=header)
print r.content

Note that the processing of the request headers
a lot of time on our servers will check whether the request from the browser, so we need to head in a request disguised as a server browser to request general made the request, it is best to be disguised as a browser, to prevent One strategy to deny access and other errors, and this is an anti-reptiles

Special note, no matter what we do after the request, be sure to bring headers, do not be lazy and save time, treat this place as a traffic rules to understand, running red lights will not necessarily unsafe but dangerous occurrence, in order to save time, we follow the red light stop green light OK enough, do a web crawler requests, too, must the headers plus, in case of error.

user_agent = {'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4295.400 QQBrowser/9.7.12661.400'}
header = {'User-Agent':user_agent}
url = 'http://www.qq.com/'
request = urllib2.Request(url,headers=header)
response = urllib2.urlopen(request)
print response.read().decode('gbk')#这里注意一下需要对读取的网页内容进行转码,先要查看一下网页的chatset是什么格式.

Www.qq.com open in the browser and press F12, see the User-Agent:

User-Agent: Proxy server or some may be judged by whether the value is a request sent by the browser
Content-Type: REST interface in use, the server will check this value to determine how HTTP Body contents of the analysis.
application / xml: In XML RPC, such as the use RESTful / SOAP call
application / json: used when JSON RPC calls
using the browser to submit a Web form when: the Application / the X--the WWW-form-urlencoded
RESTful use provided by the server or SOAP when serving, Content-Type set incorrectly will cause the server to denial of service

The response code and response code processing head headers

url = 'http://www.baidu.com'
r = requests.get(url)

if r.status_code == requests.codes.ok:
 print r.status_code
 print r.headers
 print r.headers.get('content-type')#推荐用这种get方式获取头部字段
else:
 r.raise_for_status()

得到:
200
{'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Server': 'bfe/1.0.8.18', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:57 GMT', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Date': 'Wed, 17 Jan 2018 07:21:21 GMT', 'Content-Type': 'text/html'}
text/html

6.cookie processing

url = 'https://www.zhihu.com/'
r = requests.get(url)
print r.cookies
print r.cookies.keys()

得到:
<RequestsCookieJar[<Cookie aliyungf_tc=AQAAACYMglZy2QsAEnaG2yYR0vrtlxfz for www.zhihu.com/>]>
['aliyungf_tc']

7. redirect messages and history

Handling redirects only need to set up allow_redirects field to the allow_redirectsy is set to True to allow redirection, set to False disables redirection.

r = requests.get(url,allow_redirects = True)
print r.url
print r.status_code
print r.history

得到:
http://www.baidu.com/
200
[]

8. timeout setting

Timeout option is set by the parameter timeout
Python URL = 'http://www.baidu.com' = R & lt requests.get (URL, timeout = 2)

9. Proxy Settings

proxis = {
 'http':'http://www.baidu.com',
 'http':'http://www.qq.com',
 'http':'http://www.sohu.com',

}

url = 'http://www.baidu.com'
r = requests.get(url,proxies = proxis)

Author: Ni Pingyu

Released 1019 original articles · won praise 5416 · Views 910,000 +

Guess you like

Origin blog.csdn.net/devcloud/article/details/104307408