Web crawler --requests explain (to crawl all the papers in a community, including the code), did not know it requests it out.


Prior to this blog, you need to have the regular expression and browser camouflage relevant knowledge, if not understand, please click on the blue on the left view.

requests installation

installation windows, first win + r, enter cmd, enter the command line, enter the statement pip install requests patience can be.
install linux under the direct input statement pip install requests patience can be.
Here Insert Picture Description

requests to explain the basis

requests request method

requests请求方式主要有三种:get、post、put…

requests to use the request:

import requests
#get请求
res = requests.get("https://yq.aliyun.com/search/articles/")
#post请求
res = requests.post("链接")
//返回结果:网页源代码

requests the parameters in Table 1

parameter Meaning of the parameters
text Fetch response data
content Fetch response data in a binary format
encoding Acquiring a Web page coding
url Gets the current request url
status_code Get the current status code
cookies Get cookies

requests to use parameters:

import requests
#get请求
res = requests.get("https://yq.aliyun.com/search/articles/")
print(res.text)
//返回结果:"<!DOCTYPE html>······</html>"即源文件
print(res.content)
//返回结果:这咋说勒,就是上面文件的文字变成二进制格式了嘿嘿,如"\xe6\x90\x9……"
print(res.encoding)
//返回结果:"utf-8"
print(res.url)
//返回结果:"https://yq.aliyun.com/search/articles/"
print(res.status_code)
//返回结果:200#如果请求错了就会有什么404、505之类的
print(res.cookies)
//返回结果:"<Requests···n.com/>]>"就是你的cookies值

requests Parameters Table 2

parameter Meaning parameters
params Get request parameters
header Header information, camouflage browser
proxies Add Agent, add the proxy ip
cookies Save cookies
data Storing the requested data post

requests parameters to use:
If you did not understand click here below: explain the relevant browser camouflaged reptiles knowledge

import requests

//params用法#get请求的参数
getdata = {
    "q":key,
    "p":str(i + 1)
}
requests.get(url,params=getdata)

//headers用法#这里就是将headers伪装成浏览器
UA=("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36")
requests.get("https://yq.aliyun.com/search/articles/",headers=UA)

requests receive the actual source code

After reading this blog if you feel a little half-comprehended, I recommend to do some requests related to actual projects, there are a crawling all papers in a community project source code, welcomed everyone to receive!
Micro-channel public number "Proud programmer" reply "crawlers 129" to receive the source code, more learning videos waiting for you oh ~
Here Insert Picture Description
Here Insert Picture Description

Published 10 original articles · won praise 93 · views 10000 +

Guess you like

Origin blog.csdn.net/xiaozhezhe0470/article/details/104219329