After downloading and decompressing Iamge2012, I found that the training set corresponds to 1,000 categories and files, while the verification set only has 50,000 messy pictures. I heard that .sh files can be used to automatically classify them, but I did not find this file, and .sh is a Linux command. I Windows doesn't work, so I simply hand-write a classification script. Most of the corresponding tag files provided by other bloggers have expired. Don't worry, I'm here with the files.
1. Create 1000 empty folders corresponding to the training set
Copy the mkdir.txt tag information placed in github and paste it into the new one mkdir.txt
.
Code: (Because the paths are all absolute paths, the python script path is arbitrary. Note that you only need to change the paths of the two comments in the code)
import os
files_name = []
# 这里是打开mkdir.txt的路径
for line in open(r"E:\ILSVRC2012\done\mkdir.txt", "r"):
files_name.append(line[9:18])
# print(files_name[0],files_name[-1])
# print(len(files_name))
for each in files_name:
# 这里是打开要放一千个文件夹的路径
os.makedirs("E://ILSVRC2012//done//after_categories//{}".format(each))
print("make dir done")
After completion, you will get a thousand empty folders corresponding to the training set.
2. Put the five thousand verification pictures into folders
I have placed the corresponding tag file in categories.txt
. Click Download to get it . The content has 5,000 lines, and the Linux movement instruction code for one picture per line
: (The location of the python script is arbitrary, and only three comments need to be changed in the code. The content of the path is sufficient)
import sys
# import cv2
import matplotlib.pyplot as plt
import re,os
from PIL import Image
import numpy as np
lines = []
# 这个是放categories.txt的路径
for each in open(r"D:\Users\CaiJiyuan\Desktop\categories.txt", "r"):
lines.append((each[3:31],each[32:41]))
# print(len(lines))
for i,item in enumerate(lines):
if i%100==0:
print(i,'done')
#这个是获取原始验证集ILSVRC2012_img_val中每个图像的路径
image = Image.open('E://ILSVRC2012//done//ILSVRC2012_img_val//{}'.format(item[0]))
#这个是上面的代码里放一千个空文件夹的路径
path = 'E://ILSVRC2012//done//after_categories//{}'.format(item[1])
image.save(os.path.join(path, item[0]))
image.close()
I finished the run slowly in about 40 minutes.
There are fifty pictures in each folder in the validation set, and the folders correspond to the training set one-to-one: