illustrate
Labelme produces a data set for the yolov5 model, which is super detailed and has the main steps:
- labelme installation and usage tutorial
- Python implements json file conversion to txt file format
- Python implements extraction of specified format files
Find a batch of cow and horse data sets to use for YOLOv5 to implement cow and horse detection and recognition tasks. The data set format is as follows:
1. Install labelme environment and labelme annotation tutorial
(1) Enter pycharm and enter in the terminal:
pip install labelme
(2) After installing labelme, enter the labelme labeling tool in the terminal:
(3) Click Open dir to find the location of the data set, and then select the folder where the data set to be labeled is located:
(4) Click Edit and select the label. Method (I chose rectangle labeling (Create Rectangle)):
(5) Frame the target object, click the left mouse button to pop up the label name, and enter the label value (this laboratory is labeling cows and horses, my label value is cattle and horse), click OK after filling in the tag value
(6) After marking one picture, save it and continue marking the next picture. Use the shortcut key Ctrl+s to save, and press 'd' to switch to the next picture: (7
) After all the images are annotated, it probably looks like this (the annotation result json is also placed in the directory where the images are located):
2. Python implements json file conversion to txt file format
Convert the labelme annotated result json file into the txt file format required by the yolov5 model
import os
import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd
classes = ["cattle", "horse"]
# 1.标签路径
labelme_path = r"C:/Users/xxxx/Desktop/images/dataset/cattle/"
isUseTest = True # 是否创建test集
# 3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
print(files)
if isUseTest:
trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
else:
trainval_files = files
# split
train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)
def convert(size, box):
dw = 1. / (size[0])
dh = 1. / (size[1])
x = (box[0] + box[1]) / 2.0 - 1
y = (box[2] + box[3]) / 2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)
wd = getcwd()
print(wd)
def ChangeToYolo5(files, txt_Name):
if not os.path.exists('tmp/'):
os.makedirs('tmp/')
list_file = open('tmp/%s.txt' % (txt_Name), 'w')
for json_file_ in files:
json_filename = labelme_path + json_file_ + ".json"
imagePath = labelme_path + json_file_ + ".jpg"
list_file.write('%s/%s\n' % (wd, imagePath))
out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
json_file = json.load(open(json_filename, "r", encoding="utf-8"))
height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
for multi in json_file["shapes"]:
points = np.array(multi["points"])
xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
label = multi["label"]
if xmax <= xmin:
pass
elif ymax <= ymin:
pass
else:
cls_id = classes.index(label)
b = (float(xmin), float(xmax), float(ymin), float(ymax))
bb = convert((width, height), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
print(json_filename, xmin, ymin, xmax, ymax, cls_id)
ChangeToYolo5(train_files, "train")
ChangeToYolo5(val_files, "val")
ChangeToYolo5(test_files, "test")
-
The code needs to be modified when running the code. In the two marked boxes in the figure below, the first box is the corresponding two tag values. The tag value cattle (cow) corresponds to 0 and horse (horse) corresponds to 1.
The second box is the storage path of the folder where json is located.
-
Running results:
The txt file generated by json file conversion is saved in the current directory, as shown below:
In addition, the following files are also generated in the current directory:
3. Python implements extraction of specified format files
In the second step above, you can see that the files are too messy. All files including pictures, json files of the annotation results of the pictures, and the generated txt format files are all in one folder, so they need to be extracted to the specified folder. Implementation code:
import os
import shutil
#文件存放目录
source_folder = r"C:/Users/xxx/Desktop/data/images/"
#提取文件保存目录
destination_folder = r"C:/Users/xxx/Desktop/data/train/labels/"
# 自动创建输出目录
if not os.path.exists(destination_folder):
os.makedirs(destination_folder)
# 遍历所有子文件夹
for parent_folder, _, file_names in os.walk(source_folder):
# 遍历当前子文件夹中的所有文件
for file_name in file_names:
# 只处理图片文件
# if file_name.endswith(('jpg', 'jpeg', 'png', 'gif')):#提取jpg、jpeg等格式的文件到指定目录
if file_name.endswith(('.txt')):#提取json格式的文件到指定目录
# 构造源文件路径和目标文件路径
source_path = os.path.join(parent_folder, file_name)
destination_path = os.path.join(destination_folder, file_name)
# 复制文件到目标文件夹
shutil.copy(source_path, destination_path)
- Code explanation:
As shown below, the code only modifies the following three places. The first red box is: the original file storage path, the second is the target path saved after extraction, and the third is the format of the extracted file. The following is the picture To extract, my location is saved to the path:
C:/Users/xxx/Desktop/data/train/images/
- operation result:
- Extract txt files in the same way:
- Extract the results:
Then split the data:
training set: validation set: test set = 7:2:1, and get the following data: The
format of the data set is as follows:
dataset
|——test
|——images
|——train
|——images
|——labels
|——val
|——images
|——labels
Among them, images contain pictures:
labels contain txt files converted from the annotation results:
It is worth noting that the images in the train file or val folder correspond to the files in labels:
So far, yolov5 The standard data set is produced.
Please move to the next chapter on how to train and reproduce the yolov5 model.