The requests module crawls data operation process

1. Introduction to requests module:

is a third-party module that can simulate browser requests and is used for network access. In fact, there are many similar modules, such as urllib and urllib2. Compared with urllib, the API of the requests module is more convenient (essentially encapsulated urllib3)
Note: After the requests library sends a request to download the web page content, the js code will not be executed. This requires us to analyze the target site ourselves and then initiate a new request request

2. Install the configuration of the requests module

anaconda environment variables:
These values ​​in the path environment are so that you can find some executable files in the cmd terminal.
python looks for each value in the path variable, which is equivalent to finding python.exe in each directory.
Configure environment variables: we have two needs Configuration place:
1. Configure python.exe: Root directory C:\Anaconda3-----In order to let the system find python.exe
2. Configuration pip: We need to configure C:\Anaconda3\Scripts to find pip.exe
, so we need to put it at the top.
Open cmd and enter the following code to find the path between python.exe and pip:
where python
where pip

3. Installation of requests

pip install requests

Fourth, how to use the requests module:

1.因为请求有两类。所以requests有两个方法,get和post。
2.使用步骤:
	1.导包
		import requests
	2.确定基础url(确定带爬取url是啥)
		base_url = 'https://www.baidu.com'
	3.发起请求,获取响应
		resposne = requests.get(base_url)
3.get方法的参数
	requests.get(
			url = 请求的url,
			headers = 请求头字典,
			params=  '请求参数’,
			timeout='超时时长'
	
	)
4.res
	响应包含:状态行,响应头,空行,响应正文。
	(1)响应内容:
		字符串类型:res.text
		二进制类型(bytes):res.content
			二进制类型的作用:进行乱码问题的解决;
						print(response.content.decode('utf-8'))
							图片视频等内容的下载。
 (2)响应内容的编码
	 乱码的第一种解决方法:res.encoding
 		乱码的第二种解决方法:res.text其实是使用的res.encoding设置 编码格式来把响应内容转换字符串。如果res.text出现乱码,解决办法就是给res.encoding设置正确的编码格式。
	(3)获取响应json内容。
		res.json()
	 ( 4 ) res.status_code :获取状态码
	 ( 5 ) res.url:获取请求的url
	 ( 6 ) res.headers:获取响应头

Guess you like

Origin blog.csdn.net/Smile_Lai/article/details/101312179