Table of contents
2. About PASCAL VOC dataset xml --> YOLO txt format
The code reference is the boss of station b: 3.2 YOLOv3 SPP source code analysis (Pytorch version)
Link to PASCAL VOC dataset: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
The converted yolo format data set is divided into two, one is too large to upload
Training set: PASCAL VOC target detection training set in yolo format
Verification set: Verification set of yolo format for PASCAL VOC target detection
1 Introduction
The label file of target detection is different from classification and segmentation. Generally speaking, in classification tasks, pictures of the same category are placed in the same directory, and the index of the file name is the name of the category. In the segmentation task, different training images correspond to different multi-threshold images, that is, the training is an image, and the label is also an image.
The label of target detection is divided into two types, one is the category of the target to be detected, such as cats, dogs, etc. The other is the position of the target, marked with a bounding box, often a rectangular box of xmin, xman, ymin, ymax.
Usually, the label of target detection is annotated with xml file
For example, in the object below, there are two categories of horse and person, and there are four parameters below the corresponding category that are the information of the bounding box
However, the yolo algorithm causes such xml to not satisfy the yolo format, so an xml-to-yolo format operation is required
As follows, 12 refers to the category of detection, and the next four parameters are the information of the x, y, w, h bounding box
The yolo bounding box is based on the center coordinates of the bounding box, w, h relative to the entire image
2. About PASCAL VOC dataset xml --> YOLO txt format
This chapter only completes the work of data conversion
At the beginning, my_yolo_dataset and my_data_label.names are not available, but are generated by trans_voc2yolo.py to convert the data of VOCdevkit
2.1 Path setting
The VOC data set is separate and used for different tasks, here only for target detection tasks
- Annotations put the xml tag file for target detection
- train.txt, val.txt put the file name of the training set and verification set (only the file name, not including the suffix, nor the absolute path)
- JPEGImages put all VOC pictures
2.2 Function to read xml file
as follows:
The code here is implemented recursively. I don’t understand it very much. Just know how to use it.
The following is to read an xml file and return the dictionary information
{'annotation': {'folder': 'VOC2012', 'filename': '2008_000008.jpg', 'source': {'database': 'The VOC2008 Database', 'annotation': 'PASCAL VOC2008', 'image': 'flickr'}, 'size': {'width': '500', 'height': '442', 'depth': '3'}, 'segmented': '0', 'object': [{'name': 'horse', 'pose': 'Left', 'truncated': '0', 'occluded': '1', 'bndbox': {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'}, 'difficult': '0'}, {'name': 'person', 'pose': 'Unspecified', 'truncated': '1', 'occluded': '0', 'bndbox': {'xmin': '158', 'ymin': '44', 'xmax': '289', 'ymax': '167'}, 'difficult': '0'}]}}
2.3 xml ---> yolo txt
This part is more important, look at it bit by bit
Pay attention to the part in the box, because parse_xml_to_dict returns a dictionary, and the first key is annotation, so take it out of data first
Then traverse the bounding box under the key as object
Note that index here is the index, starting from 0. Here are the values of the first index and obj
Finally, convert the bounding box to the width and height of the center point coordinates, and then change it to the relative value of the entire image.
2.4 yolo's label file
The implementation code is as follows:
It is also very simple here, just take out the VOC key and store it
2.6 Results
The operation process is as follows
The generated yolo dataset directory is as follows:
yolo's label information:
2.7 Code
The converted code is as follows:
"""
本脚本有两个功能:
1.将voc数据集标注信息(.xml)转为yolo标注格式(.txt),并将图像文件复制到相应文件夹
2.根据json标签文件,生成对应names标签(my_data_label.names)
"""
import os
from tqdm import tqdm
from lxml import etree
import json
import shutil
# 读取xml 文件信息,并返回字典形式
def parse_xml_to_dict(xml):
"""
将xml文件解析成字典形式,参考tensorflow的recursive_parse_xml_to_dict
Args:
xml: xml tree obtained by parsing XML file contents using lxml.etree
Returns:
Python dictionary holding XML contents.
"""
if len(xml) == 0: # 遍历到底层,直接返回tag对应的信息
return {xml.tag: xml.text}
result = {}
for child in xml:
child_result = parse_xml_to_dict(child) # 递归遍历标签信息
if child.tag != 'object':
result[child.tag] = child_result[child.tag]
else:
if child.tag not in result: # 因为object可能有多个,所以需要放入列表里
result[child.tag] = []
result[child.tag].append(child_result[child.tag])
return {xml.tag: result}
# 将xml文件转换为yolo的 txt文件
def translate_info(file_names: list, save_root: str, class_dict: dict, train_val='train'):
"""
:param file_names: 所有训练集/验证集 图片的路径
:param save_root: 带保持的对应的 yolo 文件
:param class_dict: voc 数据的json 标签
:param train_val: 判断传入的是训练集还是验证集
"""
save_txt_path = os.path.join(save_root, train_val, "labels") # 保存yolo的 txt 标注文件
if os.path.exists(save_txt_path) is False:
os.makedirs(save_txt_path)
save_images_path = os.path.join(save_root, train_val, "images") # 保存yolo 的训练图像文件
if os.path.exists(save_images_path) is False:
os.makedirs(save_images_path)
for file in tqdm(file_names, desc="translate {} file...".format(train_val)):
# 检查下图像文件是否存在
img_path = os.path.join(voc_images_path, file + ".jpg")
assert os.path.exists(img_path), "file:{} not exist...".format(img_path)
# 检查xml文件是否存在
xml_path = os.path.join(voc_xml_path, file + ".xml")
assert os.path.exists(xml_path), "file:{} not exist...".format(xml_path)
# read xml
with open(xml_path) as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
data = parse_xml_to_dict(xml)["annotation"] # 读取xml文件信息
img_height = int(data["size"]["height"]) # 读入图像的 h
img_width = int(data["size"]["width"]) # 读入图像的 w
# 判断该xml 是否有 ground truth
assert "object" in data.keys(), "file: '{}' lack of object key.".format(xml_path)
if len(data["object"]) == 0:
# 如果xml文件中没有目标,返回该图片路径,然后忽略该样本
print("Warning: in '{}' xml, there are no objects.".format(xml_path))
continue
# 新建xml对应的yolo txt标注文件,并写入
with open(os.path.join(save_txt_path, file + ".txt"), "w") as f:
for index, obj in enumerate(data["object"]): # index是0开始的索引,obj 是object的字典文件
# 获取每个object的box信息
xmin = float(obj["bndbox"]["xmin"])
xmax = float(obj["bndbox"]["xmax"])
ymin = float(obj["bndbox"]["ymin"])
ymax = float(obj["bndbox"]["ymax"])
class_name = obj["name"] # 获取边界框的分类
class_index = class_dict[class_name] - 1 # 目标id从0开始
# 进一步检查数据,有的标注信息中可能有w或h为0的情况,这样的数据会导致计算回归loss为nan
if xmax <= xmin or ymax <= ymin:
print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path))
continue
# 将box信息转换到 yolo格式
xcenter = xmin + (xmax - xmin) / 2 # 中心点坐标
ycenter = ymin + (ymax - ymin) / 2
w = xmax - xmin # 边界框的 w 和 h
h = ymax - ymin
# 绝对坐标转相对坐标,保存6位小数
xcenter = round(xcenter / img_width, 6)
ycenter = round(ycenter / img_height, 6)
w = round(w / img_width, 6)
h = round(h / img_height, 6)
info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]
if index == 0:
f.write(" ".join(info))
else: # 自动换行
f.write("\n" + " ".join(info))
# 复制图像到对应的集
path_copy_to = os.path.join(save_images_path, img_path.split(os.sep)[-1])
if os.path.exists(path_copy_to) is False:
shutil.copyfile(img_path, path_copy_to)
# 创建yolo 的 label文件
def create_class_names(class_dict: dict):
keys = class_dict.keys()
with open("./data/my_data_label.names", "w") as w:
for index, k in enumerate(keys):
if index + 1 == len(keys):
w.write(k)
else:
w.write(k + "\n")
def main():
# 读取原先的voc数据的json label文件
json_file = open(label_json_path, 'r')
class_dict = json.load(json_file)
# 读取voc数据集所有训练集路径文件 train.txt中的所有行信息,删除空行
with open(train_txt_path, "r") as r:
train_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
# voc信息转 yolo,并将图像文件复制到相应文件夹
translate_info(train_file_names, save_file_root, class_dict, "train")
# 读取voc数据集所有验证集路径文件 val.txt中的所有行信息,删除空行
with open(val_txt_path, "r") as r:
val_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
# voc信息转yolo,并将图像文件复制到相应文件夹
translate_info(val_file_names, save_file_root, class_dict, "val")
# 创建my_data_label.names文件
create_class_names(class_dict)
if __name__ == "__main__":
# voc数据集根目录以及版本
voc_root = "VOCdevkit"
voc_version = "VOC2012"
# 转换的训练集以及验证集对应txt文件
train_txt = "train.txt"
val_txt = "val.txt"
# 转换后的文件保存目录,yolo格式
save_file_root = "./my_yolo_dataset"
if os.path.exists(save_file_root) is False:
os.makedirs(save_file_root)
# label标签对应json文件
label_json_path = './data/pascal_voc_classes.json'
voc_images_path = os.path.join(voc_root, voc_version, "JPEGImages") # voc 训练图像路径
voc_xml_path = os.path.join(voc_root, voc_version, "Annotations") # voc 的 xml 标签文件路径
train_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", train_txt) # voc 训练集路径文件
val_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", val_txt) # voc 验证集路径文件
# 检查文件/文件夹都是否存在
assert os.path.exists(voc_images_path), "VOC images path not exist..."
assert os.path.exists(voc_xml_path), "VOC xml path not exist..."
assert os.path.exists(train_txt_path), "VOC train txt file not exist..."
assert os.path.exists(val_txt_path), "VOC val txt file not exist..."
assert os.path.exists(label_json_path), "label_json_path does not exist..."
# 开始转换
main()
3. Customize YOLO dataset
The labelimg is used here, and the installation is as follows
pip install labelimg
Enter labelimg in the terminal to enter, the interface is as follows:
3.1 Preparatory work
Create a new demo folder, and store these three files below
- annotation is the saved yolo bounding box file
- img is an image
- labels.txt is the label file
The label is stored as follows:
3.2 open labelimg
Open the terminal in the demo, the first parameter is the folder of the image, and the second is the path of labels
3.3 Drawing
It will be displayed like this after opening. First, change the saved format to yolo. Then select the annotation folder in save dir
On the right is the img file, where two images are placed
When drawing, just select which category
The end result is this