Python: Save these beautiful pictures locally in batches, do you love it?

When we are roaming the online world, when we see some favorite pictures, we can't help but linger and even want to save them, but saving them one by one is too troublesome, then we have to use our python. .

As we all know, one of Python's specialties is batch downloading, so let's try it together today.

1. If you want to do good work, you must first sharpen your tools

Knowledge points:

  • 1. System analysis of the landing page
  • 2, html tag data analysis method
  • 3. One-click saving of massive image data

software:

python 3.8
pycharm 2021专业版

Those who do not have software can download it on the official website, or scan it on the left to get it

Modules that need to be installed:

requests # 第三方 模块 
parsel # 解析数据

Press win+r on the keyboard and enter cmd to open the command prompt window, enter pip install requests to install, the installation method of the two modules is the same.

2. The idea of ​​this article

1. Analyze the website (thought analysis)

① Determine the content to be crawled

The address where the data is located, kanxiaojiejie, please improve the address yourself, including the following code.

②Analyze the data content by viewing the source code of the webpage

2. The crawler process

send request - get data - parse data - save data

  • Send a network request kanxiaojiejie to the target website
  • Get data (web page source code)
  • Parse data (extract data) Album details page address title
  • Send a network request to the detail page
  • Get data (web page source code)
  • Parse data (extract data image link)
  • Send request to image link
  • save data

3. Code display

module import

import requests     
import parsel       

send request

def get_response(html_url):

    headers = {
    
    
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
    }
    # 为什么这里要 requests.get()  post() 请求会更安全...
    response = requests.get(url=html_url, headers=headers)
    return response

save data

def save(img_url):

    img_data = requests.get(img_url).content
    img_name = img_url.split('/')[-1]
    print("正在下载: " + img_name)
    with open("img\\" + img_name, mode='wb') as f:
        f.write(img_data)

Analytical data

Get image url address and title to extract data

def parse_1(html_data):

    selector = parsel.Selector(html_data)
    link_list = selector.css('.entry-title a::attr(href)').getall()
    return link_list

Get the image url address and title

ef parse_2(html_data):

    selector_1 = parsel.Selector(html_data)
    img_list = selector_1.css('.entry.themeform p img::attr(src)').getall()
    return img_list

main function

def run(url):
    data_html = get_response(url).text
    link_list = parse_1(data_html)
    for link in link_list:
        data_html_1 = get_response(link).text
        img_list = parse_2(data_html_1)
        for img in img_list:
            save(img)

Call the main function to run

for page in range(1, 112):
    url = f'kanxiaojiejie/page/{page}'
    run(url)

I won't show the running effect, let's try it yourself~

Guess you like

Origin blog.csdn.net/fei347795790/article/details/123330709