Python Reptile: HTTP protocol, Requests library (reptile learning on the first day)

HTTP protocol:

HTTP (Hypertext Transfer Protocol): Hypertext Transfer Protocol. URL is Internet access path resources via the HTTP protocol, a URL corresponding to a data resource.

HTTP protocol operations for resources:

Requests library provides all the basic HTTP request method. Official description: http://www.python-requests.org/en/master

Requests library of six main methods:

Requests library exception:

Two important objects Requests Library: Request (request), Response (corresponding). Request object supports multiple request methods; the Response object contains all the information returned by the server, Request also contains information request.

Property Response object:

Wherein, r.encoding means: if charset header does not exist, encoding is considered as ISO-8859-1.

r.raise_for_status () can know whether r.status_code equal to 200.

HTTP protocol and Requests library comparison:

Climbing frame taken generic code page:

The try. 1: 
2 = R & lt requests.get (URL, timeout = 30) 
. 3 r.raise_for_status () 
. 4 # If the state is not 200, exception HTTPError initiator 
. 5 r.encoding = r.apparent_encoding 
. 6 return r.text 
. 7 the except: 
. 8 return 'abnormal'

For example, access to information PMCAFF home page:

 1 import requests
 2 
 3 def getHtmlText(url):
 4     try:
 5         r = requests.get(url,timeout = 30)
 6         r.raise_for_status()
 7         r.encoding = r.apparent_encoding
 8         return r.text
 9     except:
10         return '产生异常'
11 
12 if __name__ == '__main__':
13     url = 'https://www.pmcaff.com/'
14     print(getHtmlText(url))

爬取网页的通用代码框架:操作环境:win,Python 3.6

参考资料:中国大学MOOC课程《Python网络爬虫与信息提取》

Guess you like

Origin www.cnblogs.com/ltn26/p/10981294.html