VOC format or YOLO format data set division train and val implementation

The original image data set and the label data set should be divided at the same time, and they must be used when training the model, and they must correspond to each other one-to-one. That is to say, if an original image is used as a verification set, then its corresponding label data should also be used as a verification set.

This can be easily achieved with the following code

import os, random, shutil


def moveimg(fileDir, tarDir):
    pathDir = os.listdir(fileDir)  # 取图片的原始路径
    filenumber = len(pathDir)
    rate = 0.1  # 自定义抽取图片的比例,比方说100张抽10张,那就是0.1
    picknumber = int(filenumber * rate)  # 按照rate比例从文件夹中取一定数量图片
    sample = random.sample(pathDir, picknumber)  # 随机选取picknumber数量的样本图片
    print(sample)
    for name in sample:
        shutil.move(fileDir + name, tarDir + "\\" + name)
    return


def movelabel(file_list, file_label_train, file_label_val):
    for i in file_list:
        if i.endswith('.jpg'):
            # filename = file_label_train + "\\" + i[:-4] + '.xml'  # 划分xml标签的数据集使用这句,将下面一句注释掉
            filename = file_label_train + "\\" + i[:-4] + '.txt'  # 划分txt标签的数据集使用这句,将上面一句注释掉
            if os.path.exists(filename):
                shutil.move(filename, file_label_val)
                print(i + "处理成功!")


if __name__ == '__main__':
    fileDir = r"D:\code\mydata\image\JPEGImages" + "\\"  # 源图片文件夹路径
    tarDir = r'D:\code\mydata\image\JPEGImages_val'  # 图片移动到新的文件夹路径
    moveimg(fileDir, tarDir)
    file_list = os.listdir(tarDir)
    file_label_train = r"D:\code\mydata\labels" # 源图片txt或者xml标签路径
    file_label_val = r"D:\code\mydata\labels_val"  # txt或者xml标签存入路径
    # 移动到新的文件路径
    movelabel(file_list, file_label_train, file_label_val)

Notice

rate: Randomly extract a data set with a ratio of rate from all data sets as a verification set, and the rest as a verification set

filename: Divide txt tags or xml tags, and the corresponding code statements are different

fileDir: the path of the original image folder, after extracting the verification set, the rest will be used as the training set

tarDir: The folder path where the original image is moved to as the validation set

file_label_train: txt or xml label folder path, after the verification set is extracted, the rest will be used as the training set

file_label_val: txt or xml label as the folder path of the validation set

Guess you like

Origin blog.csdn.net/m0_63769180/article/details/129335080