Python reads all files in the data set file and shuffles the division to generate training test txt files (generate train.txt, test.txt, the order is random, the default ratio is 8:2)

Because every time I read the data set, I have to write the loading code, which is a waste of time, and I am lazy. Therefore, I wrote a small python tool, one-click operation, read all the files under the data set file and shuffle the division to generate training test txt files (generate train.txt, test.txt, the order is random, the default ratio is 8:2), very Convenience.

Before running the script, you need to organize the folders into this form:
root = "C:\Users\hq\Desktop\HoldingObject\pokemon" (remember to change to your own root path) The
path tree is as follows (the number of folders and pictures is unlimited
): -pokemon
- 0
(here the folder name

tag) ---- 1, .jpg, 2.jpg, 3.jpg ,. . . (The naming here does not matter and does not affect)
——1 (The folder is named label here)
————1,.jpg, 2.jpg, 3.jpg,. . . (The naming here does not matter and does not affect)
——2 (The folder is named label here)
————1,.jpg, 2.jpg, 3.jpg,. . . (The naming here does not matter and does not affect) **

The author's folder is as shown in the figure below:
Insert picture description here

import os
import numpy as np

root = r"C:\Users\hq\Desktop\HoldingObject\pokemon"

#构建所有文件名的列表,dir为label
filename = []
#label = []
dirs = os.listdir(root)
for dir in dirs:
    dir_path = root + '\\' + dir
    names = os.listdir(dir_path)
    for n in names:
        filename.append(dir_path + '\\' + n + '\t' + dir)

#打乱文件名列表
np.random.shuffle(filename)
#划分训练集、测试集,默认比例4:1
train = filename[:int(len(filename)*0.8)]
test = filename[int(len(filename)*0.8):]

#分别写入train.txt, test.txt	
with open('train.txt', 'w') as f1, open('test.txt', 'w') as f2:
    for i in train:
        f1.write(i + '\n')
    for j in test:
        f2.write(j + '\n')

print('成功!')

The generated file is shown below:

Insert picture description here
train.txt
Insert picture description here
test.txt
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44414948/article/details/110205546