Target detection algorithm --- divide the data set into training set and verification set

        When doing deep learning target detection model training, the first thing is to obtain a data set, and then label the data set. Then divide the marked data set into training set and verification set, which is more convenient for model training and testing. First, the code for dividing the data set.

import os, random, shutil


def moveimg(fileDir, tarDir):
    pathDir = os.listdir(fileDir)  # 取图片的原始路径
    filenumber = len(pathDir)
    rate = 0.1  # 自定义抽取图片的比例,比方说100张抽10张,那就是0.1
    picknumber = int(filenumber * rate)  # 按照rate比例从文件夹中取一定数量图片
    sample = random.sample(pathDir, picknumber)  # 随机选取picknumber数量的样本图片
    print(sample)
    for name in sample:
        shutil.move(fileDir + name, tarDir + "\\" + name)
    return

def movelabel(file_list, file_label_train, file_label_val):
    for i in file_list:
        if i.endswith('.jpg'):
            # filename = file_label_train + "\\" + i[:-4] + '.xml'  # 可以改成xml文件将’.txt‘改成'.xml'就可以了
            filename = file_label_train + "\\" + i[:-4] + '.txt'  # 可以改成xml文件将’.txt‘改成'.xml'就可以了
            if os.path.exists(filename):
                shutil.move(filename, file_label_val)
                print(i + "处理成功!")



if __name__ == '__main__':
    fileDir = r"C:\Users\86159\Desktop\hat\JPEGImages" + "\\"  # 源图片文件夹路径
    tarDir = r'C:\Users\86159\Desktop\hat\JPEGImages_val'  # 图片移动到新的文件夹路径
    moveimg(fileDir, tarDir)
    file_list = os.listdir(tarDir)
    file_label_train = r"C:\Users\86159\Desktop\hat\Annotations_yolo"  # 源图片标签路径
    file_label_val = r"C:\Users\86159\Desktop\hat\Annotations_val"  # 标签
      # 移动到新的文件路径
    movelabel(file_list, file_label_train, file_label_val)

        The principle of the above code is to extract the pictures in a folder according to a certain ratio (you can set it yourself, modify the value of rate in line 7) and put the pictures into a new folder (you need to create a new folder). In this way, the source folder becomes the training set, and the new one is the verification set. Then the code will put the label file corresponding to the extracted photo into a newly created folder (this folder needs to be created). In this way, we have a training set and a validation set of pictures, as well as their corresponding label files. Using the same code, you can also divide the data set into a test set, which is convenient for testing.

Guess you like

Origin blog.csdn.net/didiaopao/article/details/119927280