Summarize and solve the problem of full memory when Python processes large batches of pictures

1. Problems

Processing a large number of images can lead to high memory usage, especially if all images are loaded into memory at the same time.

2. The main solution

  1. Batch processing: Divide the pictures into small batches for processing, only process a part of the pictures each time, release the memory after the processing is completed, and then process the next batch of pictures.

  2. Use generators: Use Python's generators to load images one by one instead of loading all images into memory at once. This reduces memory usage and only loads images when needed.

  3. Compress pictures: If possible, use an appropriate image compression algorithm to compress pictures, which can reduce the space occupied by pictures in memory.

  4. Release resources: After processing each image, make sure to release resources that are no longer needed in time, such as closing file handles, etc.

  5. Parallel processing: If your processing can be performed in parallel, consider using parallel processing to speed up processing and reduce memory usage for individual processing tasks.

  6. Use a professional image processing library: If you use an image processing library that supports streaming or loading images one by one, try to use these features to reduce memory usage.

  7. Reduce image resolution: If the resolution of the image is not very important, you can reduce the resolution of the image to reduce memory usage.

Please choose the appropriate method to reduce memory usage according to your specific needs and the image processing library you use. At the same time, make sure to release resources that are no longer needed in time after processing each image, which can effectively reduce memory usage.

3. Batch processing Python sample code

from PIL import Image

def process_images_batch(image_paths, batch_size):
    for i in range(0, len(image_paths), batch_size):
        batch_paths = image_paths[i:i+batch_size]
        batch_images = []

        for path in batch_paths:
            image = Image.open(path)
            # 在这里添加你的图片处理代码
            # 例如:调整图片大小、压缩图片、转换图片格式等
            batch_images.append(image)

        # 在这里添加你的批量处理代码
        # 例如:对每一批图片进行统一处理

        # 在处理完一批图片后,释放内存
        for image in batch_images:
            image.close()

# 示例使用
# 假设你有一个存放图片路径的列表 image_paths
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg", ...]
batch_size = 10  # 每批处理的图片数量

process_images_batch(image_paths, batch_size)

The above code processes images in batches and releases the memory after processing each batch of images, thereby avoiding the problem of full memory. In process_images_batchthe function, you can add any image processing code you need, such as resizing the image, compressing the image, converting the image format, etc.

Please modify the code according to your specific needs, and adjust batch_sizethe value of according to your own number of pictures and memory limitations. This way you can efficiently process a large number of images. Remember to adjust the batch size reasonably according to the actual situation to avoid excessive memory consumption.

Fourth, use the generator Python code example

Using generators is an effective way to optimize memory, especially when dealing with large amounts of data. A generator is a special kind of iterator that can generate data one by one without loading all the data into memory at once. This can greatly reduce memory usage, especially when dealing with large datasets or large numbers of files.

def read_lines(filename):
    with open(filename, 'r') as file:
        for line in file:
            yield line.strip()

def process_data(data):
    # 在这里添加你的数据处理代码
    # 例如:处理每一行数据并返回处理结果
    return processed_data

filename = "large_data.txt"

# 使用生成器逐行读取文件数据
data_generator = read_lines(filename)

# 处理每一行数据
for line in data_generator:
    processed_data = process_data(line)
    # 在这里继续对处理后的数据进行操作,比如输出、保存等
    print(processed_data)

In the above example, read_linesthe function is a generator that reads the contents of the specified file line by line and returns each line of data. Then, we use a forloop to process the data one by one, so that when the data is processed, only one row of data is stored in memory, instead of loading the entire file into memory at once.

The key to using generators to optimize memory is to process data in batches and generate results one by one, so as to avoid loading a large amount of data into memory at one time. This approach is very efficient when dealing with large amounts of data and can significantly reduce the memory footprint.

In addition to using the generator in file processing, you can also use it in other scenarios that need to process a large amount of data, such as database query, network request response processing, etc.

Remember to optimize the design of the generator according to the actual situation, and reasonably process the logic of data batching and generating results to achieve optimal memory usage.

Five, combine the generator and batch processing picture Python sample code

from PIL import Image
import os

def image_generator(image_folder):
    for filename in os.listdir(image_folder):
        image_path = os.path.join(image_folder, filename)
        if os.path.isfile(image_path):
            yield image_path

def process_image_batch(image_paths, batch_size):
    batch_images = []
    for image_path in image_paths:
        with Image.open(image_path) as image:
            # 在这里添加你的图片处理代码
            # 例如:调整图片大小、压缩图片、转换图片格式等
            # 假设处理后返回处理后的图片对象
            processed_image = process_image(image)
            batch_images.append(processed_image)

        if len(batch_images) == batch_size:
            # 在这里添加你的批量处理代码
            # 例如:对每一批图片进行统一处理
            # 假设处理后返回处理后的图片列表
            processed_batch = process_batch(batch_images)
            for processed_image in processed_batch:
                yield processed_image
            batch_images = []

    # 处理剩余不足一批的图片
    if batch_images:
        processed_batch = process_batch(batch_images)
        for processed_image in processed_batch:
            yield processed_image

def process_image(image):
    # 在这里添加你的图片处理代码
    # 例如:调整图片大小、压缩图片、转换图片格式等
    return processed_image

def process_batch(batch_images):
    # 在这里添加你的批量处理代码
    # 例如:对每一批图片进行统一处理
    return processed_batch

image_folder = "images_directory"
batch_size = 10  # 每批处理的图片数量

# 使用图片生成器逐个加载图片
image_gen = image_generator(image_folder)

# 分批处理图片
for batch_images in process_image_batch(image_gen, batch_size):
    # 在这里继续对处理后的图片进行操作,比如保存、展示等
    for image in batch_images:
        image.show()

In the above example, image_generatorthe function is a generator that loads the image paths in the specified folder one by one and returns the path of each image. process_image_batchfunction is a function that processes images in batches, it loads images one by one and processes them when the batch size is reached. Processed batches of images are generated one by one, minimizing memory usage.

Please use generators combined with batch processing to optimize memory usage according to your specific needs and image processing logic, so as to efficiently process a large number of images. Remember to adjust the batch size reasonably according to the actual situation to avoid excessive memory consumption.

Guess you like

Origin blog.csdn.net/xun527/article/details/132087128