Web crawler study notes (1)-Getting started with the Requests library

Install the requests library

pip install requests

It is recommended to change the domestic source

r = requets.get(url)

  • Construct a Requests object requesting resources from the server, and the get () function returns a Response object containing the server resources

  • Response object contains the content returned by the crawler, important

  • Response object properties
    Insert picture description here

  • r.encoding: If there is no charset in the header, the encoding is considered to be ISO-8859-1

  • r.apparent_encoding: The encoding method analyzed based on the content of the web page is more accurate than r.encoding

Generally, you can pass r.status_codethe result as a judgment, if you return 200, you can use

r.text
r.enconding 
r.apparent_encoding 
r.content

To parse the information in the returned object, otherwise 404 or other means that some reason will cause an exception

Common code framework

try:
	r = requests.get(url, timeout=30)
	r.rasise_for_status()
	r.encoding = r.apparent_encoding
	return r.text
except:
	return "产生异常"
  • Full format
requests.get(url, params=None, **kwargs)

url: get the url link of the page
param: extra parameters in the url, dictionary or byte stream format, optional
** kwargs: 12 parameters to control access

Requests library exceptionInsert picture description here

Published 16 original articles · liked 0 · visits 457

Guess you like

Origin blog.csdn.net/weixin_43951831/article/details/104842559