YOLO V3 SPP ultralytics Section 1: Convert VOC annotation file (xml) to YOLO annotation format (txt) and how to customize YOLO data samples

Table of contents

1 Introduction

2. About PASCAL VOC dataset xml --> YOLO txt format

2.1 Path setting

2.2 Function to read xml file

2.3 xml ---> yolo txt

2.4 yolo's label file

2.6 Results

2.7 Code

3. Customize YOLO dataset

3.1 Preparatory work

3.2 open labelimg

3.3 Drawing


The code reference is the boss of station b: 3.2 YOLOv3 SPP source code analysis (Pytorch version)

Link to PASCAL VOC dataset: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

 

The converted yolo format data set is divided into two, one is too large to upload

Training set: PASCAL VOC target detection training set in yolo format

Verification set: Verification set of yolo format for PASCAL VOC target detection

1 Introduction

The label file of target detection is different from classification and segmentation. Generally speaking, in classification tasks, pictures of the same category are placed in the same directory, and the index of the file name is the name of the category. In the segmentation task, different training images correspond to different multi-threshold images, that is, the training is an image, and the label is also an image.

The label of target detection is divided into two types, one is the category of the target to be detected, such as cats, dogs, etc. The other is the position of the target, marked with a bounding box, often a rectangular box of xmin, xman, ymin, ymax.

Usually, the label of target detection is annotated with xml file

For example, in the object below, there are two categories of horse and person, and there are four parameters below the corresponding category that are the information of the bounding box

However, the yolo algorithm causes such xml to not satisfy the yolo format, so an xml-to-yolo format operation is required

As follows, 12 refers to the category of detection, and the next four parameters are the information of the x, y, w, h bounding box

The yolo bounding box is based on the center coordinates of the bounding box, w, h relative to the entire image

2. About PASCAL VOC dataset xml --> YOLO txt format

This chapter only completes the work of data conversion

At the beginning, my_yolo_dataset and my_data_label.names are not available, but are generated by trans_voc2yolo.py to convert the data of VOCdevkit

2.1 Path setting

The VOC data set is separate and used for different tasks, here only for target detection tasks

  • Annotations put the xml tag file for target detection
  • train.txt, val.txt put  the file name of the training set and verification set (only the file name, not including the suffix, nor the absolute path)
  • JPEGImages put all VOC pictures

2.2 Function to read xml file

as follows:

The code here is implemented recursively. I don’t understand it very much. Just know how to use it.

The following is to read an xml file and return the dictionary information

{'annotation': {'folder': 'VOC2012', 'filename': '2008_000008.jpg', 'source': {'database': 'The VOC2008 Database', 'annotation': 'PASCAL VOC2008', 'image': 'flickr'}, 'size': {'width': '500', 'height': '442', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'horse', 'pose': 'Left', 'truncated': '0', 'occluded': '1', 'bndbox': {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'}, 'difficult': '0'}, {'name': 'person', 'pose': 'Unspecified', 'truncated': '1', 'occluded': '0', 'bndbox': {'xmin': '158', 'ymin': '44', 'xmax': '289', 'ymax': '167'}, 'difficult': '0'}]}}

2.3 xml ---> yolo txt

This part is more important, look at it bit by bit

Pay attention to the part in the box, because parse_xml_to_dict returns a dictionary, and the first key is annotation, so take it out of data first

Then traverse the bounding box under the key as object

 Note that index here is the index, starting from 0. Here are the values ​​of the first index and obj

Finally, convert the bounding box to the width and height of the center point coordinates, and then change it to the relative value of the entire image.

2.4 yolo's label file

The implementation code is as follows:

 It is also very simple here, just take out the VOC key and store it

2.6 Results

The operation process is as follows

The generated yolo dataset directory is as follows:

yolo's label information:

2.7 Code

The converted code is as follows:

"""
本脚本有两个功能:
1.将voc数据集标注信息(.xml)转为yolo标注格式(.txt),并将图像文件复制到相应文件夹
2.根据json标签文件,生成对应names标签(my_data_label.names)
"""

import os
from tqdm import tqdm
from lxml import etree
import json
import shutil


# 读取xml 文件信息,并返回字典形式
def parse_xml_to_dict(xml):
    """
    将xml文件解析成字典形式,参考tensorflow的recursive_parse_xml_to_dict
    Args:
        xml: xml tree obtained by parsing XML file contents using lxml.etree

    Returns:
        Python dictionary holding XML contents.
    """

    if len(xml) == 0:  # 遍历到底层,直接返回tag对应的信息
        return {xml.tag: xml.text}

    result = {}
    for child in xml:
        child_result = parse_xml_to_dict(child)  # 递归遍历标签信息
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
        else:
            if child.tag not in result:  # 因为object可能有多个,所以需要放入列表里
                result[child.tag] = []
            result[child.tag].append(child_result[child.tag])
    return {xml.tag: result}


# 将xml文件转换为yolo的 txt文件
def translate_info(file_names: list, save_root: str, class_dict: dict, train_val='train'):
    """
    :param file_names: 所有训练集/验证集 图片的路径
    :param save_root:  带保持的对应的 yolo 文件
    :param class_dict: voc 数据的json 标签
    :param train_val:  判断传入的是训练集还是验证集
    """

    save_txt_path = os.path.join(save_root, train_val, "labels")            # 保存yolo的 txt 标注文件
    if os.path.exists(save_txt_path) is False:
        os.makedirs(save_txt_path)

    save_images_path = os.path.join(save_root, train_val, "images")         # 保存yolo 的训练图像文件
    if os.path.exists(save_images_path) is False:
        os.makedirs(save_images_path)

    for file in tqdm(file_names, desc="translate {} file...".format(train_val)):
        # 检查下图像文件是否存在
        img_path = os.path.join(voc_images_path, file + ".jpg")
        assert os.path.exists(img_path), "file:{} not exist...".format(img_path)

        # 检查xml文件是否存在
        xml_path = os.path.join(voc_xml_path, file + ".xml")
        assert os.path.exists(xml_path), "file:{} not exist...".format(xml_path)

        # read xml
        with open(xml_path) as fid:
            xml_str = fid.read()
        xml = etree.fromstring(xml_str)
        data = parse_xml_to_dict(xml)["annotation"]     # 读取xml文件信息
        img_height = int(data["size"]["height"])        # 读入图像的 h
        img_width = int(data["size"]["width"])          # 读入图像的 w

        # 判断该xml 是否有 ground truth
        assert "object" in data.keys(), "file: '{}' lack of object key.".format(xml_path)
        if len(data["object"]) == 0:
            # 如果xml文件中没有目标,返回该图片路径,然后忽略该样本
            print("Warning: in '{}' xml, there are no objects.".format(xml_path))
            continue

        # 新建xml对应的yolo txt标注文件,并写入
        with open(os.path.join(save_txt_path, file + ".txt"), "w") as f:
            for index, obj in enumerate(data["object"]):    # index是0开始的索引,obj 是object的字典文件
                # 获取每个object的box信息
                xmin = float(obj["bndbox"]["xmin"])
                xmax = float(obj["bndbox"]["xmax"])
                ymin = float(obj["bndbox"]["ymin"])
                ymax = float(obj["bndbox"]["ymax"])
                class_name = obj["name"]        # 获取边界框的分类
                class_index = class_dict[class_name] - 1  # 目标id从0开始

                # 进一步检查数据,有的标注信息中可能有w或h为0的情况,这样的数据会导致计算回归loss为nan
                if xmax <= xmin or ymax <= ymin:
                    print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path))
                    continue

                # 将box信息转换到 yolo格式
                xcenter = xmin + (xmax - xmin) / 2      # 中心点坐标
                ycenter = ymin + (ymax - ymin) / 2
                w = xmax - xmin                         # 边界框的 w 和 h
                h = ymax - ymin

                # 绝对坐标转相对坐标,保存6位小数
                xcenter = round(xcenter / img_width, 6)
                ycenter = round(ycenter / img_height, 6)
                w = round(w / img_width, 6)
                h = round(h / img_height, 6)

                info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]

                if index == 0:
                    f.write(" ".join(info))
                else:       # 自动换行
                    f.write("\n" + " ".join(info))

        # 复制图像到对应的集
        path_copy_to = os.path.join(save_images_path, img_path.split(os.sep)[-1])
        if os.path.exists(path_copy_to) is False:
            shutil.copyfile(img_path, path_copy_to)


# 创建yolo 的 label文件
def create_class_names(class_dict: dict):
    keys = class_dict.keys()
    with open("./data/my_data_label.names", "w") as w:
        for index, k in enumerate(keys):
            if index + 1 == len(keys):
                w.write(k)
            else:
                w.write(k + "\n")


def main():
    # 读取原先的voc数据的json label文件
    json_file = open(label_json_path, 'r')
    class_dict = json.load(json_file)

    # 读取voc数据集所有训练集路径文件 train.txt中的所有行信息,删除空行
    with open(train_txt_path, "r") as r:
        train_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]

    # voc信息转 yolo,并将图像文件复制到相应文件夹
    translate_info(train_file_names, save_file_root, class_dict, "train")

    # 读取voc数据集所有验证集路径文件 val.txt中的所有行信息,删除空行
    with open(val_txt_path, "r") as r:
        val_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
    # voc信息转yolo,并将图像文件复制到相应文件夹
    translate_info(val_file_names, save_file_root, class_dict, "val")

    # 创建my_data_label.names文件
    create_class_names(class_dict)


if __name__ == "__main__":
    # voc数据集根目录以及版本
    voc_root = "VOCdevkit"
    voc_version = "VOC2012"

    # 转换的训练集以及验证集对应txt文件
    train_txt = "train.txt"
    val_txt = "val.txt"

    # 转换后的文件保存目录,yolo格式
    save_file_root = "./my_yolo_dataset"
    if os.path.exists(save_file_root) is False:
        os.makedirs(save_file_root)

    # label标签对应json文件
    label_json_path = './data/pascal_voc_classes.json'

    voc_images_path = os.path.join(voc_root, voc_version, "JPEGImages")                         # voc 训练图像路径
    voc_xml_path = os.path.join(voc_root, voc_version, "Annotations")                           # voc 的 xml 标签文件路径
    train_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", train_txt)        # voc 训练集路径文件
    val_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", val_txt)            # voc 验证集路径文件

    # 检查文件/文件夹都是否存在
    assert os.path.exists(voc_images_path), "VOC images path not exist..."
    assert os.path.exists(voc_xml_path), "VOC xml path not exist..."
    assert os.path.exists(train_txt_path), "VOC train txt file not exist..."
    assert os.path.exists(val_txt_path), "VOC val txt file not exist..."
    assert os.path.exists(label_json_path), "label_json_path does not exist..."

    # 开始转换
    main()

3. Customize YOLO dataset

The labelimg is used here, and the installation is as follows

pip install labelimg

Enter labelimg in the terminal to enter, the interface is as follows:

3.1 Preparatory work

Create a new demo folder, and store these three files below

  • annotation is the saved yolo bounding box file
  • img is an image
  • labels.txt is the label file

 The label is stored as follows:

3.2 open labelimg

Open the terminal in the demo, the first parameter is the folder of the image, and the second is the path of labels

3.3 Drawing

It will be displayed like this after opening. First, change the saved format to yolo. Then select the annotation folder in save dir

On the right is the img file, where two images are placed 

When drawing, just select which category

The end result is this

Guess you like

Origin blog.csdn.net/qq_44886601/article/details/130767420