When my roommate ate instant noodles, I downloaded thousands of mobile phone wallpapers with a dozen lines of Python code

Mobile wallpapers and computer wallpapers, for the majority of male compatriots, isn't the favorite thing to look good? A lot of them can be downloaded in minutes with a dozen lines of code, and they can't be used up. Come on, show it!


1. If you want to do something good, you must first sharpen its tools

1. Development environment

Here we use the Python environment, as long as it is not Python 2. For the editor, use pycharm. There are many convenient tools in version 21. Whoever uses it knows, I will not go into details.

If there is no software that can be downloaded from the official website, you can also scan the code on the left or the bottom of the article to get it, and there are ways to use pycharm permanently.

2. Third-party modules

requests 
parsel 

requests is a data request module, and parsel is a module for extracting data and parsing, which can be installed directly by pip

3. Install modules and problems

  • If installing python third-party modules:
    1. win + R Enter cmd Click OK, enter the installation command pip install module name (pip install requests) Enter
    2. Click Terminal in pycharm to enter the installation command
  • Reason for installation failure:
    • Failure 1: pip is not an internal command
      Solution: Set environment variables

    • Failure 2: There are a lot of red reports (read time out)
      Solution: Because the network link timed out, you need to switch the mirror source
      Tsinghua: https://pypi.tuna.tsinghua.edu.cn/simpleAlibaba
      cloud: http://mirrors .aliyun.com/pypi/simple/ University of Science and Technology
      of China https://pypi.mirrors.ustc.edu.cn/simple/Huazhong
      University of Science and Technology: http://pypi.hustunique.com/Shandong
      University of Technology: http:// pypi.sdutlinux.org/
      Douban: http://pypi.douban.com/simple/
      For example: pip3 install -i https://pypi.doubanio.com/simple/ module name

    • Failure 3: The cmd shows that it has been installed, or the installation is successful, but it still cannot be imported in pycharm
      Solution: There may be multiple python versions installed (anaconda or python can install one), just uninstall one
      or you can put it in pycharm python interpreter not set up

2. Process

1. Data source analysis
What is the crawling website, and what kind of data content do you want to get from the website; for
example, crawling pictures, analyze from a picture;
carry out packet capture analysis through developer tools, and compare some parameters of the url address of the picture we want ;

2. Steps to implement the crawler code:
1). Send a request, and send a request for the url address obtained from the analysis

  • request url
  • Request method
    Request header parameters >>> Disguise can disguise the python code as a browser (client) to send a request
    . What will happen if you don't disguise it >>> will not return you the data you want;

2). Get the data, get the response data returned by the response server;
3). Parse the data, extract the url address of the picture and the picture title of the data content we want;
4). Save the data, save the picture data locally;

3. Code display

1. Import the module

import requests
import parsel 

2. Send a request

  • The headers request header parameter can be copied directly in the developer tool, followed by the headers dictionary data type, key-value pair
  • user-agent: The user agent represents the browser basic identity
  • cookie: user information, to detect whether the user has a login account
for page in range(2, 11):
    url = f'https://sj..com/woman/{page}.html'
    headers = {
    
    
        'cookie': 't=f2cf055ce8713058cbfdbd1561c38e86; r=1281; Hm_lvt_86200d30c9967d7eda64933a74748bac=1645625923,1646892448; Hm_lpvt_86200d30c9967d7eda64933a74748bac=1646894465',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)   #  <Response [200]> 返回响应对象 200状态码标识请求成功

3. Get data

Get the content of the data returned by the server, and get the text data of the response object.

print(response.text)

The string data returns the data content, which is different from what we saw in the developer tools, which means that you are recognized by the server as your crawler, so it does not return data to you.

4. Parse the data

The css selector xpath re can be used in all three parsing methods to choose the most suitable

css selector: extract data content based on tag attributes

Convert data type to selector object for getting response.text

attr() attribute selector.egeli_pic_li .egeli_pic_dl dd a

img are positioning tags, tell which tag it is

img::attr(src) Get src attribute data in img tag getall() Get all tag content data Return list data type

selector = parsel.Selector(response.text)
src = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(src)').getall()
alt = selector.css('.egeli_pic_li .egeli_pic_dl dd a img::attr(alt)').getall()
for img_url, title in zip(src, alt):
    img_url = img_url.replace('edpic_360_360', 'edpic_source')

4. Save data

img_content = requests.get(url=img_url, headers=headers).content  # 获取二进制数据内容
with open('img\\' + title + '.jpg', mode='wb') as f:
    f.write(img_content)
print(img_url, title)

Fourth, the effect display

The crawling effect
is enough for the wallpapers.
insert image description hereBrothers, I am tired of reading the article. Give me a little exercise for my hands. Help me to like and favorite. It will be more interesting next time.

Guess you like

Origin blog.csdn.net/fei347795790/article/details/123482652