Divide the data set proportionally (training set + test set) with a python program

There is an existing data set goods of retail product pictures without any division. Under the goods directory, the name of the subdirectory is the category of the picture, and there are multiple pictures under each category. The file structure is shown in Figure 1. Now, we need to split it into a training set and a test set according to a certain ratio.

Schematic representation of data set division
Figure 1. Schematic diagram of data set division

The idea of ​​realizing this function is to create a new directory tree with the same structure, and randomly select some pictures in proportion to each category in the original directory tree, and move them to the corresponding category in the new directory tree. The complete code is as follows:

import os
import random
import shutil

#源数据集路径和目标数据集路径
path_source = './goods'
path_target = './goods_test'

#参数:源路径、目标路径和测试集所占比例
def seperate(path_source, path_target, percent):
    #生成包含path_source下所有目录名的列表
    categories = os.listdir(path_source)
    for name in categories:
        #在path_target下建立相同名称的子目录
        os.makedirs(os.path.join(path_target, name))
        #生成包含子目录下所有图片的列表
        nums = os.listdir(os.path.join(path_source, name))
        #随机按比例抽取一部分图片
        nums_target = random.sample(nums, int(len(nums)*percent))
        #把图片剪切到目标路径
        for pic in nums_target:
            shutil.move(os.path.join(path_source, name, pic), os.path.join(path_target, name, pic))

#执行完成后,path_source为训练集,path_target为测试集。
seperate(path_source, path_target, 0.3)

If the data set is of another structure, a slight change can be made on this basis.

Among the key functions:

os.listdir(path): List the names of all items under path (including directories and files)

os.makedirs(path): recursively create the directory on the path path, if the directory already exists, an error will be reported

os.makedir(path) can be used to create only one directory

random.sample(list, num): Randomly select num elements from the list to form a new list

shutil.move(source, target): Move the source file to the target file

Note that shutil.move is a very dangerous operation. It is recommended to comment it out when debugging the code to ensure that the code is correct before executing it.

Guess you like

Origin blog.csdn.net/diqiudq/article/details/126026233