For the problem that the data set contains a large number of redundant images, a code to split the data set according to the number of frames is provided [X frame can be customized]

Foreword:

At present, many image data sets are divided from videos, but the files divided in this way contain a large number of redundant images. All for training will add unnecessary training time.

╮(╯▽╰)╭

To this end, I wrote a code to split the data set. The main implementation is to select the original 20 frames of images at intervals of 5 frames (can be customized) through interval sampling, thereby reducing the number of redundant images.

code:

import glob
import shutil
import os
import random

from PIL import Image

def init():
    count = 0
    txtList = []
    # 注意修改提取帧数,目前是25帧选取一张
    frame = 25
    # 按比例选取 
     for image_path in glob.glob(r'E:\project\dataset\UA-DETRAC\images\train/*.jpg'):
         count = count + 1
         # 每25张收取一张
         if count % frame == 0: 
            image_name = image_path.split("\\")[-1][:-3] + 'jpg'
            txtList.append(image_name)

    # 全部选取 (用于转标注文件或者其他)
    #for image_path in glob.glob(r'E:\project\毕业论文数据集\UA-DETRAC\UA-DETRAC\labels\val/*.txt'):
    #    image_name = image_path.split("\\")[-1][:-3] + 'jpg'
    #    txtList.append(image_name)


    return txtList


# 想获取的训练集和测试集移动至新的目录
def move_file(txt_list):
    old_path = r'E:\project\dataset\UA-DETRAC\images\train\old_image'  # 原始文件放置路径
    new_path = r'E:\project\dataset\UA-DETRAC\images\new_iamge'  # 新的文件放置路径


    file_list = os.listdir(old_path)  # 列出该目录下的所有文件,listdir返回的文件列表是不包含路径的。


    if len(file_list) == 0:
        print("该目录为空!请重新检查目录")
        return

    for file in file_list:
        # 移动指定文件
        if file in txt_list:
            train_src = os.path.join(old_path, file)  # 源图片目录
            train_dsc = os.path.join(new_path, file)  # 移动至train图片目录
            print('train_src:', train_src)
            print('train_dsc:', train_dsc)
            shutil.move(train_src, train_dsc)
            print('train images move succeed!')



if __name__ == '__main__':

    txt_list = init()
    print(len(txt_list))
    move_file(txt_list)

Operating procedures

1. Parameter modification

Pay attention to how many frames you need to select one corresponding to the number of frames of the parameter frame.
Pay attention to the modification of the file placement path. The original file directory and the new file directory
can be changed to execute. At this time, the images sampled at intervals have been moved to the new directory. Then we need to transfer its annotation files based on the images in the new directory.

2. Transfer annotation files

Annotate the above for loop, and use the following for loop code to transfer the labeling
principle:It is to read the image file name information in your current new directory , and then move the annotations in the original annotation file directory to the new annotation file according to the file name .

insert image description here
Modify the directory path
insert image description here
Execute it.

The basic operation is like this, the code function is still relatively strong, and it can also be used to move other files and explore by yourself.

Guess you like

Origin blog.csdn.net/lafsca5/article/details/129932051