urllib crawls pictures

Use the urllib library to crawl images

import urllib.request 

# Image URL link 
image_url = "http://img.netbian.com/file/2023/0415/235643ofSA0.jpg" 

# Get the image and save it to the specified path 
urllib.request.urlretrieve(image_url, "image .jpg")

In this code, first specify the URL link of the picture to be crawled, and then use  urllib.request.urlretrieve() the function to download the picture to the specified path. You can  image_url replace with the actual image URL, and  "image.jpg" replace with the path and file name of the saved image. If the path does not exist, the system will automatically create the corresponding folder.

It should be noted that when using the urllib library to download pictures, you need to ensure the validity of the picture URL, otherwise the program may fail due to failure to connect or the picture does not exist. At the same time, some websites may impose restrictions on crawlers, and corresponding measures need to be taken according to specific situations.

Using the urllib library to crawl multiple images can be achieved by looping through the image links. The specific steps are as follows:

  1. Import the urllib.request module.
  2. Define a list of image links or crawl image links from web pages and save them in a list.
  3. Use a loop to iterate over the image links in the list.
  4. Use the urllib.request.urlretrieve() method to download the image and save it to a local file.

Here is the sample code:

import urllib.request 

# Image link list 
img_urls = [ 
    'http://img.netbian.com/file/2023/0414/small234647agSR11681487207.jpg', 
    'http://img.netbian.com/file/2023/0415/ small2350329sMTe1681573832.jpg', 
    'http://img.netbian.com/file/2023/0414/small233653zJreD1681486613.jpg' 
] 

# Cycle through image links and download and save 
for img_url in img_urls: 
    # Capture the file name in the image link as a local File name 
    file_name = img_url.split('/')[-1] 
    # Download the image and save it to a local file 
    urllib.request.urlretrieve(img_url, file_name)

Note: Images can be crawled through the above code, but there may be copyright issues in crawling website images, please abide by relevant laws and regulations, and do not illegally crawl and use images.

You can use the urlopen function in the urllib library to obtain the source code of the web page, then use regular expressions to match the image links in it, and finally use the urlretrieve function in the urllib library to download the image.

The following is a sample code, which can get the page source code of the specified URL and download all the pictures in it to the local.

import re 
import urllib.request 

# Get page source code 
response = urllib.request.urlopen('http://www.netbian.com/') 
html = response.read() 

# Use regular expressions to match image links 
img_pattern = re .compile('img .*?src="(.*?)"') 
img_urls = re.findall(img_pattern, str(html)) 

# Download the image and save it locally 
for img_url in img_urls: 
    urllib.request.urlretrieve( img_url, img_url. split('/')[-1])

In the above code, we use regular expressions to match the src attribute values ​​in all img tags, and then use urlretrieve to download each image and save it in the current directory.

Guess you like

Origin blog.csdn.net/Relievedz/article/details/130520664