Requests library update

1. Obtain --- requests web content library
"We need to understand the HTTP protocol."
> Requests the library seven main method
 
method
Explanation
requests.requests()
A configuration request, a method of supporting at the basis of the method
requests.get()
Get HTML pages of the main methods, corresponding to the GRT and HTTP
requests.head()
HTML page header information obtaining method corresponding to the HTTP HEAD
requests.post()
POST request methods to submit HTML pages corresponding to the HTTP POST
requests.put()
Method PUT request to submit HTML pages corresponding to the HTTP PUT
requests.patch()
Local modification request to submit HTML pages corresponding to the HTTP PATCH
requests.delete()
Submit a request to delete HTML pages, corresponding to the HTTP DELETE
> Understand requests library exception
 
abnormal
Explanation
requests.ConnectionError
Network connection error exceptions, such as DNS query failed, refused connections
requests.HTTPRrror HTTP connection error exception
requests.URLRrror
URL missing abnormal
requests.TooManyRedirecrs
Exceeds the maximum number of redirects, redirect produce abnormal
requests.ConnectTimeout
Connect to a remote server timeout exception
requests.Timeout
URL request times out, resulting in a timeout exception
> Requests the get () method
 
Return value get () method is a Response object, as a result of the server Response to get () response, with its own properties and methods.
The main property of the Response object as follows:
 
Attributes
Explanation
status_code
Returns the requested HTTP status code indicates a successful link 200, 404 represents a failure
text
String HTTP response content, i.e., the corresponding URL page content
encoding 
Encoding HTTP response
content
HTTP response binary form content
headers
It returns a dictionary, the content server in response to head
url
Return URL request
apparent_encoding
Analysis of the content of the response from the content encoding (encoding alternatively)
The difference between encoding and apparent_encoding
encoding: if charset header does not exist, encoding is considered to ISO-8859-1
apparent_encoding the analyzed content of the page encoding
apparen_encoding more accurate
 
> Crawled pages generic code frame
import requests
 
def getHTMLText(url):
       try:
             r = requests.get(url,timeout=30)
             r = raise_for_status () # If the status is not 200, exception caused HTTPError
             r.encoding = r.apparent_encoding
             return r.text
       except:
             return 'abnormal'
url = "http://www.baidu.com"
print(getHTMLText(url))
 
 
 

Guess you like

Origin www.cnblogs.com/1328497946TS/p/11016505.html