foreword
Hello! Hello everyone, this is the Demon King~
Development environment:
- Python 3.8
- Pycharm
Module use:
- requests >>> pip install requests data request
- parsel >>> pip install parsel parsing module (extracting data)
If installing python third-party modules:
- win + R Enter cmd Click OK, enter the installation command pip install module name (pip install requests) Enter
- Click Terminal in pycharm to enter the installation command
How to configure the python interpreter in pycharm?
- Select file >>> setting >>> Project >>> python interpreter (python interpreter)
- Click on the gear, select add
- Add python installation path
How does pycharm install plugins?
- Select file >>> setting >>> Plugins
- Click on Marketplace and enter the name of the plug-in you want to install. For example: translation plug-in input translation / Chinese plug-in input Chinese
- Select the corresponding plug-in and click install.
- After the installation is successful, the option to restart pycharm will pop up, click OK, and restart to take effect.
The basic process of crawler:
1. Data source analysis
- What is a crawling website? What kind of data and content do you want to obtain from a website? For
example, crawling pictures from a picture to analyze them.
Use developer tools for packet capture analysis, and compare some parameters of the url address of the picture we want
2. Steps to implement crawler code:
- Send a request, send a request for the url address obtained from the analysis
Request URL
Request method
Request header parameters >>> Masquerading can disguise the python code as a browser (client) to send a request
. What will be the consequences if you don't disguise it >>> will not give you return the data you want - Get data, get response server to return response data
- Parse the data, extract the image url address and image title of the data content we want
- Save data, save image data to local
Basic syntax:
- for loop
- custom variable assignment
- String formatting methods
- dictionary creation
- function keyword parameter
- zip built-in function
- output function
file operations
- Requests simply use get requests to get data
- parsel simple use css syntax
code
It is better for me to delete the URL in the code than to review it. If you want a friend, you can read the comments or privately chat with me to get it~
# 导入数据请求模块 导入模块没有使用, 灰色待机状态
import requests # pip install requests
# 导入数据解析模块
import parsel # pip install parsel
"""
1. 发送请求
headers 请求头参数, 可以开发者工具里面直接进行复制, 其次headers字典数据类型, 键值对
user-agent: 用户代理 表示浏览器基本身份标识
cookie: 用户信息, 检测用户是否有登陆账号
"""
for page in range(2, 11):
url = f'https://sj.enterdesk.com/woman/{
page}.html'
headers = {
'cookie': 't=f2cf055ce8713058cbfdbd1561c38e86; r=1281; Hm_lvt_86200d30c9967d7eda64933a74748bac=1645625923,1646892448; Hm_lpvt_86200d30c9967d7eda64933a74748bac=1646894465',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
}
response = requests.get(url=url, headers=headers) # <Response [200]> 返回响应对象 200状态码标识请求成功
"""
2. 获取数据, 获取服务器返回数据内容, 获取响应对象文本数据 字符串数据
返回数据内容, 和我们在开发者工具里面看到不一样 说明你被服务器识别出来是你爬虫程序, 所以他没有给你返回数据
"""
# print(response.text)
"""
3. 解析数据
css选择器 xpath re 三种解析方式都可以去用 选择最适合
css选择器: 根据标签属性提取数据内容
对于获取response.text 进行数据类型转换 转成 selector 对象 <Selector xpath=None data='<html xmlns="http://www.w3.org/1999/x...'>
attr() 属性选择器 .egeli_pic_li .egeli_pic_dl dd a img 都是定位标签, 告诉它是哪一个标签
img::attr(src) 取img标签里面的src属性数据
getall() 获取所有标签内容数据 返回列表数据类型
"""
selector = parsel.Selector(response.text)
src = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(src)').getall()
alt = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(alt)').getall()
for img_url, title in zip(src, alt):
img_url = img_url.replace('edpic_360_360', 'edpic_source')
# 4. 保存数据
img_content = requests.get(url=img_url, headers=headers).content # 获取二进制数据内容
with open('img\\' + title + '.jpg', mode='wb') as f:
f.write(img_content)
print(img_url, title)
epilogue
Well, this article of mine ends here!
If you have more suggestions or questions, feel free to comment or private message me! Let's work hard together (ง •_•)ง
Follow the blogger if you like it, or like and comment on my article! ! !