Detailed explanation and code implementation of target detection labeling file yolov5 (txt) format to coco (json) format

Detailed explanation and code implementation of target detection labeling file yolov5 (txt) format to coco (json) format

Reference:https://blog.csdn.net/qq_39686950/article/details/119153685

Preface
Just to do the target detection task by myself and replace the model needs to use annotation files in different formats, so I searched for similar blog posts on the Internet for a long time, and found that most of them only have codes or incomplete explanations, which are not friendly enough for novices. I debugged for a long time during the conversion process The conversion was successful, so I wrote this blog post in order to explain the conversion process as comprehensively as possible, so that other students can avoid detours.
1.yolo v5 format (txt)
The annotation file format of yolo v5 is relatively simple, as shown in the following figure:
insert image description here
each image corresponds to a .txt file, each line represents a annotation frame of the image, and there are as many annotation frames as there are in the image How many rows of data, each row has five columns, which respectively represent: category code, relative center coordinate x_center of the label box horizontally, relative center coordinate y_center of the label box vertically, relative width w of the label box, and relative height h of the label box. Note that x_center, y_center, w, and h are the values ​​after dividing the real pixel value by the height and width of the picture.
2. Coco format (json)
The coco format described in this article is the object instances format in the standard coco dataset. The coordinate information of coco is (xmin, ymin, w, h), and (xmin, ymin) represents the upper left corner of the label box Coordinates, these four values ​​are absolute values, the basic information in coco format is described as follows:

{
    
    
    "info": info,                   #描述数据集的相关信息,内部由字典组成
    "licenses": [license],          #列表形式,内部由字典组成
    "images": [image],              #描述图片信息,列表形式,内部由字典组成,字典数量为图片数量
    "annotations": [annotation],    #描述bounding box信息列表形式,内部由字典组成,字典数量为bounding box数量
     "categories": [category]       # 描述图片类别信息,列表形式 ,内部由字典组成,字典数量为类别个数
}

The difference from the yolo v5 annotation file is that the format of the coco annotation file is a .json file, and the annotation information of all pictures is in a .json file. The json file is composed of the dictionary described above. The dictionary has five keys, as follows The details of the value corresponding to each key will be described:

info{
    
    
    "year": int,               #年份
    "version": str,            #数据集版本
    "description": str,        #数据集描述
    "contributor": str,        #数据集的提供者
    "url": str,                #数据集的下载地址
    "date_created": datetime,  #数据集的创建日期
}
license{
    
                         
    "id": int,
    "name": str,
    "url": str,
} 
image{
    
    
    "id": int,                    #图片标识,相当于图片的身份证
    "width": int,                 #图片宽度
    "height": int,                #图片高度
    "file_name": str,             #图片名称,注意不是图片的路径,仅仅是名称
    "license": int,
    "flickr_url": str,            #flicker网络地址
    "coco_url": str,              #网络地址路径
    "date_captured": datetime,    #图片获取日期
}
annotation{
    
     
    "id": int,                                #bounding box标识,相当于bounding box身份证
    "image_id": int,                          #图片标识,和image中的"id"对应
    "category_id": int,                       #类别id
    "segmentation": RLE or [polygon],         #描述分割信息,iscrowd=0,则segmentation是polygon格式;iscrowd=1,则segmentation就是RLE格式
    "area": float,                            #标注框面积
    "bbox": [x,y,width,height],               #标注框坐标信息,前文有描述
    "iscrowd": 0 or 1,                        #是否有遮挡,无遮挡为0,有遮挡为1
}
category{
    
    
    "id": int,                                #类别id,注意从1开始,而不是从0开始
    "name": str,                              #类别名称
    "supercategory": str,                     #该类别的超类是什么
}

The above is the detailed analysis of the coco dataset.
2.
This part of the code is the key point. I will try my best to fully explain the places that need to be changed.
From the analysis in the first section, we can see that the yolo v5 format only has image name, category, and bounding box coordinate information, while the coco format has more information, and the open source code Basically, we only pay attention to the information provided in the yolo v5 format, so we don't have to worry too much about the redundant information in the coco format.
The detailed code and comment information are as follows:

import os
import json
import cv2
import random
import time
from PIL import Image

coco_format_save_path='D:\\yolov5\\CCTSDB-2021\\train\\'   #要生成的标准coco格式标签所在文件夹
yolo_format_classes_path='D:\\yolov5\\CCTSDB-2021\\names.txt'     #类别文件,一行一个类
yolo_format_annotation_path='D:\\yolov5\\CCTSDB-2021\\labels\\train\\'  #yolo格式标签所在文件夹
img_pathDir='D:\\yolov5\\CCTSDB-2021\\images\\train\\'    #图片所在文件夹

with open(yolo_format_classes_path,'r') as fr:                               #打开并读取类别文件
    lines1=fr.readlines()
# print(lines1)
categories=[]                                                                 #存储类别的列表
for j,label in enumerate(lines1):
    label=label.strip()
    categories.append({
    
    'id':j+1,'name':label,'supercategory':'None'})         #将类别信息添加到categories中
# print(categories)

write_json_context=dict()                                                      #写入.json文件的大字典
write_json_context['info']= {
    
    'description': '', 'url': '', 'version': '', 'year': 2022, 'contributor': '纯粹ss', 'date_created': '2022-07-8'}
write_json_context['licenses']=[{
    
    'id':1,'name':None,'url':None}]
write_json_context['categories']=categories
write_json_context['images']=[]
write_json_context['annotations']=[]

#接下来的代码主要添加'images'和'annotations'的key值
imageFileList=os.listdir(img_pathDir)                                           #遍历该文件夹下的所有文件,并将所有文件名添加到列表中
for i,imageFile in enumerate(imageFileList):
    imagePath = os.path.join(img_pathDir,imageFile)                             #获取图片的绝对路径
    image = Image.open(imagePath)                                               #读取图片,然后获取图片的宽和高
    W, H = image.size

    img_context={
    
    }                                                              #使用一个字典存储该图片信息
    #img_name=os.path.basename(imagePath)                                       #返回path最后的文件名。如果path以/或\结尾,那么就会返回空值
    img_context['file_name']=imageFile
    img_context['height']=H
    img_context['width']=W
    img_context['date_captured']='2022-07-8'
    img_context['id']=i                                                         #该图片的id
    img_context['license']=1
    img_context['color_url']=''
    img_context['flickr_url']=''
    write_json_context['images'].append(img_context)                            #将该图片信息添加到'image'列表中


    txtFile=imageFile[:5]+'.txt'                                               #获取该图片获取的txt文件
    with open(os.path.join(yolo_format_annotation_path,txtFile),'r') as fr:
        lines=fr.readlines()                                                   #读取txt文件的每一行数据,lines2是一个列表,包含了一个图片的所有标注信息
    for j,line in enumerate(lines):

        bbox_dict = {
    
    }                                                          #将每一个bounding box信息存储在该字典中
        # line = line.strip().split()
        # print(line.strip().split(' '))

        class_id,x,y,w,h=line.strip().split(' ')                                          #获取每一个标注框的详细信息
        class_id,x, y, w, h = int(class_id), float(x), float(y), float(w), float(h)       #将字符串类型转为可计算的int和float类型

        xmin=(x-w/2)*W                                                                    #坐标转换
        ymin=(y-h/2)*H
        xmax=(x+w/2)*W
        ymax=(y+h/2)*H
        w=w*W
        h=h*H

        bbox_dict['id']=i*10000+j                                                         #bounding box的坐标信息
        bbox_dict['image_id']=i
        bbox_dict['category_id']=class_id+1                                               #注意目标类别要加一
        bbox_dict['iscrowd']=0
        height,width=abs(ymax-ymin),abs(xmax-xmin)
        bbox_dict['area']=height*width
        bbox_dict['bbox']=[xmin,ymin,w,h]
        bbox_dict['segmentation']=[[xmin,ymin,xmax,ymin,xmax,ymax,xmin,ymax]]
        write_json_context['annotations'].append(bbox_dict)                               #将每一个由字典存储的bounding box信息添加到'annotations'列表中

name = os.path.join(coco_format_save_path,"train"+ '.json')
with open(name,'w') as fw:                                                                #将字典信息写入.json文件中
    json.dump(write_json_context,fw,indent=2)

Key
points
Key
points You may have doubts in the code

coco_format_save_path='D:\\yolov5\\CCTSDB-2021\\train\\'   #要生成的标准coco格式标签所在文件夹
yolo_format_classes_path='D:\\yolov5\\CCTSDB-2021\\names.txt'     #类别文件,一行一个类
yolo_format_annotation_path='D:\\yolov5\\CCTSDB-2021\\labels\\train\\'  #yolo格式标签所在文件夹
img_pathDir='D:\\yolov5\CCTSDB-2021\\images\\train\\'    #图片所在文件夹

1 #coco_format_save_path Here is a path you set yourself, like I created two new folders train and val under the path where the yolo dataset and labels are stored to store the newly generated coco (json) format files, because my images and There are two files under the labels folder, which store the pictures of the training set and the verification set. When running the code in the first round, coco_format_save_path, yolo_format_classes_path, and img_pathDir path are all written in train at the end, so that the json file of the generated training set is stored in the new file In the folder train, the path is finally changed to val before the second round of running, so that the json file of the generated verification set is stored in the newly created folder val.
insert image description here
2 #yolo_format_classes_path It’s fine to write in a txt file here. Here you need to create a new txt to write, because there is no one before, the form should be written in one line and one class. If you don’t know, you can look at the picture and compare it with the corresponding labels. Write in order . I put a picture below to make it clear.
insert image description here

txtFile=`imageFile[:5]`+'.txt'   #代码在第49行的位置

The name of your picture is read here txtFile. My picture name is represented by numbers as shown in the figure.
insert image description here
My label name is also represented by the corresponding number as shown in the figure
insert image description here
. When I read it, imageFileI printed it separately 15454.jpg. And this line of code is to associate the picture name with the label, so .jpgthe format of the label needs to be removed, .txtso imageFile[:5]the index here needs to be set by yourself, you only need to take the name of your picture without adding the format, like my data The set [:5]just happened to be taken 15454.
Well, these are probably the things that need to be paid attention to. These are some places that may be questionable and error-prone. I hope to give some help to those who need it.
If it is helpful to you, please like and collect it. Hee hee

To sum up
  , the above is my understanding and code implementation of yolo v5 format and coco format in target detection. If you have a deeper understanding and better code implementation, you can leave a message or private message to discuss.

Guess you like

Origin blog.csdn.net/qq_45294476/article/details/125685035