Because every time I read the data set, I have to write the loading code, which is a waste of time, and I am lazy. Therefore, I wrote a small python tool, one-click operation, read all the files under the data set file and shuffle the division to generate training test txt files (generate train.txt, test.txt, the order is random, the default ratio is 8:2), very Convenience.
Before running the script, you need to organize the folders into this form:
root = "C:\Users\hq\Desktop\HoldingObject\pokemon" (remember to change to your own root path) The
path tree is as follows (the number of folders and pictures is unlimited
): -pokemon
- 0 (here the folder name
tag) ---- 1, .jpg, 2.jpg, 3.jpg ,. . . (The naming here does not matter and does not affect)
——1 (The folder is named label here)
————1,.jpg, 2.jpg, 3.jpg,. . . (The naming here does not matter and does not affect)
——2 (The folder is named label here)
————1,.jpg, 2.jpg, 3.jpg,. . . (The naming here does not matter and does not affect) **
The author's folder is as shown in the figure below:
import os
import numpy as np
root = r"C:\Users\hq\Desktop\HoldingObject\pokemon"
#构建所有文件名的列表,dir为label
filename = []
#label = []
dirs = os.listdir(root)
for dir in dirs:
dir_path = root + '\\' + dir
names = os.listdir(dir_path)
for n in names:
filename.append(dir_path + '\\' + n + '\t' + dir)
#打乱文件名列表
np.random.shuffle(filename)
#划分训练集、测试集,默认比例4:1
train = filename[:int(len(filename)*0.8)]
test = filename[int(len(filename)*0.8):]
#分别写入train.txt, test.txt
with open('train.txt', 'w') as f1, open('test.txt', 'w') as f2:
for i in train:
f1.write(i + '\n')
for j in test:
f2.write(j + '\n')
print('成功!')
The generated file is shown below:
train.txt
test.txt