Get all the pictures of website A through python and save them to the corresponding folder

Table of contents

1. The library that needs to be used,

2. Ideas (graphics),

3. The specific implementation process

4. Case code and results,


1. The library that needs to be used,

The library that needs to be used (if there is no need to use the pip tool to download the corresponding library)

1. OS (an interface for interacting with the operating system, which provides many functions to manipulate files and directories.)

method used

os.mkdir(path): Create a new directory (folder).

os.path.join() 路径的拼接

2.requests (an HTTP client library in the Python language, which can be used to send HTTP/1.1 requests.)

method used

requests.get(url, params=None, **kwargs): Send a GET request.

3.from bs4 import BeautifulSoup (is a Python library for parsing data in HTML and XML documents)

method used

soup = BeautifulSoup(html_doc,'html.parser')

 find_all(): returns all matching elements

2. Ideas (graphics),

  Since the pictures on the webpage are stored in a separate link , if you want to get the pictures, you need to get the link addresses of all the pictures, and then download the links to the pictures one by one .

It is obvious that the pictures on the webpage are in the src next to it

1). First, you can request the overall web page through requests, and then analyze it through beautifulsoup to get the overall code of the web page

2). Next, get all img tag content through the find_all("img") method in beautifulsoup

3). Get the link in src through the get("src") method in beautifulsoup

https://t14.baidu.com/it/u=3871151578,586465891&fm=179&app=42&size=w621&n=0&f=PNG?s=56F72C72CCB47E904B7DA3C40300A026&sec=1679072400&t=7c02d2ed 4ea860881d26c57e9469f20c https://t14.baidu.com/it/u=3871151578 ,586465891&fm=179&app=42&size=w621&n=0&f=PNG?s=56F72C72CCB47E904B7DA3C40300A026&sec=1679072400&t=7c02d2ed4ea860881d26c57e9469f20c is inside the link picture up

4). The next step is to re-access through the get of requests, and get the content of the image address link, that is, the image

The last thing is to open the folder, then give each picture a picture name and save it to the file

3. The specific implementation process

1). Insert the library (call the method in the library)

import os
import requests
from bs4 import BeautifulSoup

2).

#选择爬取的网站
url="https://sc.chinaz.com/tupian/"
#设置文件夹名称并判断是否存在该文件夹,若没有则重新创建一个
save_folder = "./images"
if not os.path.exists(save_folder):
    os.makedirs("./images")

 3).


#发出对网站的请求,并解析返回网页代码
response = requests.post(url)
soup = BeautifulSoup(response.content,'html.parser')

4).

#读取所有img的标签并通过beautifulsoup的get方法得到src即图片存放的链接
images = soup.find_all("img")
for image in images:
    img_url = image.get("src")

5). Note that the judgment here is to judge whether the img_url starts with http, because sometimes the data does not start with http, and the subsequent function requests.get will report an error

#判断url是否存在,再requests.get请求图片地址
    #通过os.path.join拼接文件名
    #打开文件,将请求到的内容写入文件
    if img_url and img_url.startswith('http'):
        img_response = requests.get(img_url)
        filename = os.path.join(save_folder,os.path.basename(img_url))
        with open(filename,"wb") as f:
            f.write(img_response.content)

4. Case code and results,

import os
import requests
from bs4 import BeautifulSoup

url = input('请你输入你想要网址:')
save_folder = "./images"

if not os.path.exists(save_folder):
    os.makedirs("./images")

response = requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
images = soup.find_all("img")

for image in images:
    img_url = image.get("src")
    if img_url and img_url.startswith('http'):
        img_response = requests.get(img_url)
        filename = os.path.join(save_folder,os.path.basename(img_url))
        with open(filename,"wb") as f:
            f.write(img_response.content)

Result display: (you can directly copy the code, and then copy the link to use)

Guess you like

Origin blog.csdn.net/dogxixi/article/details/129598965