Python collects ultra-clear quality mobile phone wallpapers and makes it endless~

foreword

Hello! Hello everyone, this is the Demon King~

Development environment:

  • Python 3.8
  • Pycharm

Module use:

  • requests >>> pip install requests data request
  • parsel >>> pip install parsel parsing module (extracting data)

If installing python third-party modules:

  1. win + R Enter cmd Click OK, enter the installation command pip install module name (pip install requests) Enter
  2. Click Terminal in pycharm to enter the installation command

How to configure the python interpreter in pycharm?

  1. Select file >>> setting >>> Project >>> python interpreter (python interpreter)
  2. Click on the gear, select add
  3. Add python installation path

How does pycharm install plugins?

  1. Select file >>> setting >>> Plugins
  2. Click on Marketplace and enter the name of the plug-in you want to install. For example: translation plug-in input translation / Chinese plug-in input Chinese
  3. Select the corresponding plug-in and click install.
  4. After the installation is successful, the option to restart pycharm will pop up, click OK, and restart to take effect.

The basic process of crawler:

1. Data source analysis

  1. What is a crawling website? What kind of data and content do you want to obtain from a website? For
    example, crawling pictures from a picture to analyze them.
    Use developer tools for packet capture analysis, and compare some parameters of the url address of the picture we want

2. Steps to implement crawler code:

  1. Send a request, send a request for the url address obtained from the analysis
    Request URL
    Request method
    Request header parameters >>> Masquerading can disguise the python code as a browser (client) to send a request
    . What will be the consequences if you don't disguise it >>> will not give you return the data you want
  2. Get data, get response server to return response data
  3. Parse the data, extract the image url address and image title of the data content we want
  4. Save data, save image data to local

Basic syntax:

  • for loop
  • custom variable assignment
  • String formatting methods
  • dictionary creation
  • function keyword parameter
  • zip built-in function
  • output function

file operations

  1. Requests simply use get requests to get data
  2. parsel simple use css syntax

code

It is better for me to delete the URL in the code than to review it. If you want a friend, you can read the comments or privately chat with me to get it~

# 导入数据请求模块  导入模块没有使用, 灰色待机状态
import requests   # pip install requests
# 导入数据解析模块
import parsel   # pip install parsel
"""
1. 发送请求
headers 请求头参数, 可以开发者工具里面直接进行复制, 其次headers字典数据类型, 键值对
user-agent: 用户代理 表示浏览器基本身份标识
cookie: 用户信息, 检测用户是否有登陆账号
"""
for page in range(2, 11):
    url = f'https://sj.enterdesk.com/woman/{
    
    page}.html'
    headers = {
    
    
        'cookie': 't=f2cf055ce8713058cbfdbd1561c38e86; r=1281; Hm_lvt_86200d30c9967d7eda64933a74748bac=1645625923,1646892448; Hm_lpvt_86200d30c9967d7eda64933a74748bac=1646894465',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)   #  <Response [200]> 返回响应对象 200状态码标识请求成功
    """
    2. 获取数据, 获取服务器返回数据内容, 获取响应对象文本数据  字符串数据
        返回数据内容, 和我们在开发者工具里面看到不一样  说明你被服务器识别出来是你爬虫程序, 所以他没有给你返回数据
        
    """
    # print(response.text)
    """
    3. 解析数据
        css选择器 xpath re 三种解析方式都可以去用 选择最适合
    css选择器: 根据标签属性提取数据内容
    对于获取response.text 进行数据类型转换 转成 selector 对象 <Selector xpath=None data='<html xmlns="http://www.w3.org/1999/x...'>
    attr() 属性选择器  .egeli_pic_li .egeli_pic_dl dd a img 都是定位标签, 告诉它是哪一个标签
    img::attr(src) 取img标签里面的src属性数据
    getall()  获取所有标签内容数据 返回列表数据类型
    
    """
    selector = parsel.Selector(response.text)
    src = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(src)').getall()
    alt = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(alt)').getall()
    for img_url, title in zip(src, alt):
        img_url = img_url.replace('edpic_360_360', 'edpic_source')
        # 4. 保存数据
        img_content = requests.get(url=img_url, headers=headers).content  # 获取二进制数据内容
        with open('img\\' + title + '.jpg', mode='wb') as f:
            f.write(img_content)
        print(img_url, title)

epilogue

Well, this article of mine ends here!

If you have more suggestions or questions, feel free to comment or private message me! Let's work hard together (ง •_•)ง

Follow the blogger if you like it, or like and comment on my article! ! !

Guess you like

Origin blog.csdn.net/python56123/article/details/124106803
Recommended