labelme produces data set of yolov5 model

illustrate

Labelme produces a data set for the yolov5 model, which is super detailed and has the main steps:

  1. labelme installation and usage tutorial
  2. Python implements json file conversion to txt file format
  3. Python implements extraction of specified format files

Find a batch of cow and horse data sets to use for YOLOv5 to implement cow and horse detection and recognition tasks. The data set format is as follows:
Insert image description here

1. Install labelme environment and labelme annotation tutorial

(1) Enter pycharm and enter in the terminal:

pip install labelme

Insert image description here
(2) After installing labelme, enter the labelme labeling tool in the terminal:
Insert image description here
(3) Click Open dir to find the location of the data set, and then select the folder where the data set to be labeled is located:
Insert image description here
(4) Click Edit and select the label. Method (I chose rectangle labeling (Create Rectangle)):
Insert image description here
(5) Frame the target object, click the left mouse button to pop up the label name, and enter the label value (this laboratory is labeling cows and horses, my label value is cattle and horse), click OK after filling in the tag value
Insert image description here
(6) After marking one picture, save it and continue marking the next picture. Use the shortcut key Ctrl+s to save, and press 'd' to switch to the next picture: (7
Insert image description here
) After all the images are annotated, it probably looks like this (the annotation result json is also placed in the directory where the images are located):
Insert image description here

2. Python implements json file conversion to txt file format

Convert the labelme annotated result json file into the txt file format required by the yolov5 model

import os
import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd

classes = ["cattle", "horse"]
# 1.标签路径
labelme_path = r"C:/Users/xxxx/Desktop/images/dataset/cattle/"
isUseTest = True  # 是否创建test集
# 3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
print(files)
if isUseTest:
    trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
else:
    trainval_files = files
# split
train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

wd = getcwd()
print(wd)

def ChangeToYolo5(files, txt_Name):
    if not os.path.exists('tmp/'):
        os.makedirs('tmp/')
    list_file = open('tmp/%s.txt' % (txt_Name), 'w')
    for json_file_ in files:
        json_filename = labelme_path + json_file_ + ".json"
        imagePath = labelme_path + json_file_ + ".jpg"
        list_file.write('%s/%s\n' % (wd, imagePath))
        out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
        json_file = json.load(open(json_filename, "r", encoding="utf-8"))
        height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
            xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
            ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
            ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
            label = multi["label"]
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                cls_id = classes.index(label)
                b = (float(xmin), float(xmax), float(ymin), float(ymax))
                bb = convert((width, height), b)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
                print(json_filename, xmin, ymin, xmax, ymax, cls_id)

ChangeToYolo5(train_files, "train")
ChangeToYolo5(val_files, "val")
ChangeToYolo5(test_files, "test")

  • The code needs to be modified when running the code. In the two marked boxes in the figure below, the first box is the corresponding two tag values. The tag value cattle (cow) corresponds to 0 and horse (horse) corresponds to 1.
    The second box is the storage path of the folder where json is located.
    Insert image description here

  • Running results:
    The txt file generated by json file conversion is saved in the current directory, as shown below:
    Insert image description here
    In addition, the following files are also generated in the current directory:
    Insert image description here

3. Python implements extraction of specified format files

In the second step above, you can see that the files are too messy. All files including pictures, json files of the annotation results of the pictures, and the generated txt format files are all in one folder, so they need to be extracted to the specified folder. Implementation code:

import os
import shutil

#文件存放目录
source_folder = r"C:/Users/xxx/Desktop/data/images/"
#提取文件保存目录
destination_folder = r"C:/Users/xxx/Desktop/data/train/labels/"
# 自动创建输出目录
if not os.path.exists(destination_folder):
    os.makedirs(destination_folder)

# 遍历所有子文件夹
for parent_folder, _, file_names in os.walk(source_folder):
    # 遍历当前子文件夹中的所有文件
    for file_name in file_names:
        # 只处理图片文件
        # if file_name.endswith(('jpg', 'jpeg', 'png', 'gif')):#提取jpg、jpeg等格式的文件到指定目录
        if file_name.endswith(('.txt')):#提取json格式的文件到指定目录
            # 构造源文件路径和目标文件路径
            source_path = os.path.join(parent_folder, file_name)
            destination_path = os.path.join(destination_folder, file_name)
            # 复制文件到目标文件夹
            shutil.copy(source_path, destination_path)
  • Code explanation:
    As shown below, the code only modifies the following three places. The first red box is: the original file storage path, the second is the target path saved after extraction, and the third is the format of the extracted file. The following is the picture To extract, my location is saved to the path:

C:/Users/xxx/Desktop/data/train/images/

Insert image description here

  • operation result:
    Insert image description here
  • Extract txt files in the same way:
    Insert image description here
  • Extract the results:
    Insert image description here
    Then split the data:
    training set: validation set: test set = 7:2:1, and get the following data: The
    Insert image description here
    format of the data set is as follows:
dataset
  |——test
  	   |——images
  |——train
  	   |——images
  	   |——labels
  |——val
  	   |——images
  	   |——labels

Among them, images contain pictures:
Insert image description here
labels contain txt files converted from the annotation results:
Insert image description here
It is worth noting that the images in the train file or val folder correspond to the files in labels:
Insert image description here
So far, yolov5 The standard data set is produced.

Please move to the next chapter on how to train and reproduce the yolov5 model.

Guess you like

Origin blog.csdn.net/weixin_45736855/article/details/129583272