Deep learning training uses Python to quickly move the specified pictures selected in the data set to a new file in batches

When doing deep learning training, I collected the data set with video, and extracted it frame by frame in the later stage. Many frames are repeated. At this time, I can only select a certain column as the data set for training. For example, my original data set has 80625 pictures, if you do not use the code to extract, it is too labor-intensive, here I summarize two methods to quickly extract.

Method 1: Chasing wind and electricity method

Directly use the code to quickly extract large batches. For example, mine extracts an image every 30 frames. The code is as follows:

import os
import shutil

## 新建目标文件夹
IsExists = os.path.exists('Images\Image_New_INF')    ##在目录中新建一个文件夹
if not IsExists:
    os.makedirs("Images\Image_New_INF")
else:
    print("目录已存在")
new_img_folder = "Images\Image_New_INF"

## 遍历读取文件夹筛选符合标准的图片
dir_path = "Images/INF2022530"       ## 将原始的数据集文件路径加载进来
for root,dirs,files in os.walk(dir_path):
    for file in files:
        num_name = file.rstrip(".jpg")   ## 将图片名末尾的.jpg去掉
        num_name_int = int(num_name)
        if num_name_int % 30 == 0:
            shutil.copy(os.path.join(root, file), new_img_folder)

The original dataset is below:

insert image description here

The extracted data set is as follows:

insert image description here

The above is the method for extracting my own data set. You can refer to the code I provided above to modify it. The general principle is figured out, mainly because the two libraries of os and shutil are used, and the extraction speed is very fast.

Method 2: Fee-based method

This method is simple and rude, just slide down to the bottom of the picture folder, click the left mouse button to select a column, press and hold the mouse, drag up until it reaches the top of the picture folder, and release the left mouse button , press Ctrl + C (copy), go to the new folder and press Ctrl + V (paste), so that the selected picture will be directly extracted to the new folder, see below for details. This method is suitable for the case where the number of pictures is small. If the number of pictures is large, it is recommended to use the above method 1.

insert image description here

insert image description here

The above is the method of quickly extracting effective data sets in batches when doing deep learning. If the method I summarized is useful to you, please support me a lot, thank you!

Guess you like

Origin blog.csdn.net/qq_40280673/article/details/125863170