The latest feasible ImageNet2012 verification set val data is divided into 1000 categories and corresponding txt tag files

After downloading and decompressing Iamge2012, I found that the training set corresponds to 1,000 categories and files, while the verification set only has 50,000 messy pictures. I heard that .sh files can be used to automatically classify them, but I did not find this file, and .sh is a Linux command. I Windows doesn't work, so I simply hand-write a classification script. Most of the corresponding tag files provided by other bloggers have expired. Don't worry, I'm here with the files.

1. Create 1000 empty folders corresponding to the training set

Copy the mkdir.txt tag information placed in github and paste it into the new one mkdir.txt.
Insert image description here
Code: (Because the paths are all absolute paths, the python script path is arbitrary. Note that you only need to change the paths of the two comments in the code)

import os

files_name = []

# 这里是打开mkdir.txt的路径
for line in open(r"E:\ILSVRC2012\done\mkdir.txt", "r"):  
    files_name.append(line[9:18])
    
# print(files_name[0],files_name[-1])
# print(len(files_name))

for each in files_name:

    # 这里是打开要放一千个文件夹的路径
    os.makedirs("E://ILSVRC2012//done//after_categories//{}".format(each))
print("make dir done")

After completion, you will get a thousand empty folders corresponding to the training set.
Insert image description here

2. Put the five thousand verification pictures into folders

I have placed the corresponding tag file in categories.txt
Insert image description here
. Click Download to get it . The content has 5,000 lines, and the Linux movement instruction code for one picture per line
Insert image description here
: (The location of the python script is arbitrary, and only three comments need to be changed in the code. The content of the path is sufficient)

import sys
# import cv2
import matplotlib.pyplot as plt
import re,os
from PIL import Image
import numpy as np

lines = []

# 这个是放categories.txt的路径
for each in open(r"D:\Users\CaiJiyuan\Desktop\categories.txt", "r"):  

    lines.append((each[3:31],each[32:41]))
    
# print(len(lines))

for i,item in enumerate(lines):
    if i%100==0:
        print(i,'done')

    #这个是获取原始验证集ILSVRC2012_img_val中每个图像的路径
    image = Image.open('E://ILSVRC2012//done//ILSVRC2012_img_val//{}'.format(item[0]))  
    
    
    #这个是上面的代码里放一千个空文件夹的路径
    path = 'E://ILSVRC2012//done//after_categories//{}'.format(item[1])
    image.save(os.path.join(path, item[0]))
    image.close()

I finished the run slowly in about 40 minutes.
Insert image description here

There are fifty pictures in each folder in the validation set, and the folders correspond to the training set one-to-one:

Insert image description here

Guess you like

Origin blog.csdn.net/Tommy_Nike/article/details/130415947