visdrone2019转化为coco格式数据集(包含DET和VID)

visdrone2019转化为coco格式数据集

coco数据集的格式

这个应该不用说了,对于久经CV的老玩家来说,已经再熟悉不过了。

visdrone2019(DET)

标签含义

  1. 边界框左上角的x坐标
  2. 边界框左上角的y坐标
  3. 边界框的宽度
  4. 边界框的高度
  5. DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。 GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框,而0表示将忽略边界框。
  6. 忽略区域(0)、行人(1)、人(2)、自行车(3)、汽车(4)、面包车(5)、卡车(6)、三轮车(7)、雨篷三轮车(8)、公共汽车(9)、摩托车(10),其他(11)
  7. DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度(即,无截断=0(截断比率0%),部分截断=1(截断比率1%°´50%))。
  8. DETECTION文件中的分数应设置为常数-1。 GROUNDTRUTH文件中的分数表示被遮挡的对象的分数(即,无遮挡=0(遮挡比率0%),部分遮挡=1(遮挡比率1%°´50%),重度遮挡=2(遮挡率50%~100%))。
 <bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>


    Name                                                  Description
-------------------------------------------------------------------------------------------------------------------------------     
 <bbox_left>	     The x coordinate of the top-left corner of the predicted bounding box

 <bbox_top>	     The y coordinate of the top-left corner of the predicted object bounding box

 <bbox_width>	     The width in pixels of the predicted object bounding box

<bbox_height>	     The height in pixels of the predicted object bounding box

   <score>	     The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing 
                     an object instance.
                     The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, 
                     while 0 indicates the bounding box will be ignored.
                      
<object_category>    The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1), 
                     people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10), 
                     others(11))
                      
<truncation>	     The score in the DETECTION result file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame 
                     (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).
                      
<occlusion>	     The score in the DETECTION file should be set to the constant -1.
                     The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 
                     (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2 
                     (occlusion ratio 50% ~ 100%)).

注:两种有用的注释:truncation截断率,occlusion遮挡率。被遮挡的对象比例来定义遮挡率。截断率用于指示对象部分出现在框架外部的程度。值得一提的是,如果目标的截断率大于50%,则会在评估过程中将其跳过。

转换代码

import os
import cv2
from tqdm import tqdm
import json


def test():
    dir=r'D:\pythonProjects\Test\visdrone2coco'
    train_dir = os.path.join(dir, "annotations")
    print(train_dir)
    id_num = 0
    categories = [
        {
    
    "id": 0, "name": "ignored regions"},
        {
    
    "id": 1, "name": "pedestrian"},
        {
    
    "id": 2, "name": "people"},
        {
    
    "id": 3, "name": "bicycle"},
        {
    
    "id": 4, "name": "car"},
        {
    
    "id": 5, "name": "van"},
        {
    
    "id": 6, "name": "truck"},
        {
    
    "id": 7, "name": "tricycle"},
        {
    
    "id": 8, "name": "awning-tricycle"},
        {
    
    "id": 9, "name": "bus"},
        {
    
    "id": 10, "name": "motor"},
        {
    
    "id": 11, "name": "others"}
    ]
    images = []
    annotations = []
    set = os.listdir('./annotations')
    annotations_path = './annotations'
    images_path = './images'
    print()
    for i in tqdm(set):
        print(annotations_path + "/" + i, "r")
        f = open(annotations_path + "/" + i, "r")
        name = i.replace(".txt", "")
        image = {
    
    }
        height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
        file_name = name + ".jpg"
        image["file_name"] = file_name
        image["height"] = height
        image["width"] = width
        image["id"] = name
        images.append(image)
        for line in f.readlines():
            annotation = {
    
    }
            line = line.replace("\n", "")
            if line.endswith(","):  # filter data
                line = line.rstrip(",")
            line_list = [int(i) for i in line.split(",")]
            bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
            annotation["image_id"] = name
            annotation["score"] = line_list[4]
            annotation["bbox"] = bbox_xywh
            annotation["category_id"] = int(line_list[5])
            annotation["id"] = id_num
            annotation["iscrowd"] = 0
            annotation["segmentation"] = []
            annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
            id_num += 1
            annotations.append(annotation)
        dataset_dict = {
    
    }
        dataset_dict["images"] = images
        dataset_dict["annotations"] = annotations
        dataset_dict["categories"] = categories
        json_str = json.dumps(dataset_dict)
        with open(f'./output.json', 'w') as json_file:
            json_file.write(json_str)
    print("json file write done...")

if __name__ == '__main__':
    test()

visdrone2019(VID)

标签含义

  1. 视频帧的帧索引
  2. 提供时间对应不同帧中边界框的关系
  3. 边界框左上角的x坐标
  4. 边界框左上角的y坐标
  5. 边界框的宽度
  6. 边界框的高度
  7. DETECTION文件中的分数表示包围对象实例的预测边界框的置信度。
    GROUNDTRUTH文件中的分数设置为1或0。1表示在计算中考虑边界框,而0表示将忽略边界框。
  8. 忽略区域(0)、行人(1)、人(2)、自行车(3)、汽车(4)、面包车(5)、卡车(6)、三轮车(7)、雨篷三轮车(8)、公共汽车(9)、摩托车(10),其他(11)
  9. DETECTION文件中的分数应设置为常数-1。
    GROUNDTRUTH文件中的得分表示对象部分出现在帧外的程度(即,无截断=0(截断比率0%),部分截断=1(截断比率1%°´50%))。
  10. DETECTION文件中的分数应设置为常数-1。
    GROUNDTRUTH文件中的分数表示被遮挡的对象的分数(即,无遮挡=0(遮挡比率0%),部分遮挡=1(遮挡比率1%°´50%),重度遮挡=2(遮挡率50%~100%))。
 <frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

        Name	                                                      Description
 ----------------------------------------------------------------------------------------------------------------------------------
    <frame_index>     The frame index of the video frame

     <target_id>      In the DETECTION result file, the identity of the target should be set to the constant -1. 
                      In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames.

     <bbox_left>      The x coordinate of the top-left corner of the predicted bounding box

     <bbox_top>	      The y coordinate of the top-left corner of the predicted object bounding box

    <bbox_width>      The width in pixels of the predicted object bounding box

    <bbox_height>     The height in pixels of the predicted object bounding box

      <score>	      The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance.
                      The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.

  <object_category>   The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))

   <truncation>       The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% °´ 50%)).

    <occlusion>	      The score in the DETECTION file should be set to the constant -1.
                      The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% °´ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

注:两种有用的注释:truncation截断率,occlusion遮挡率。被遮挡的对象比例来定义遮挡率。截断率用于指示对象部分出现在框架外部的程度。值得一提的是,如果目标的截断率大于50%,则会在评估过程中将其跳过。

操作数据集

通过观察我们不难发现,visdrone-DET 的数据集格式为一张图片对应一张txt。由于每个视频具有多张图片(每一帧为一张图片),所以在txt中,应该按照frame_index将相同frame_index的数据整理成一个txt,并且命名为0000XXX。
目标:一张图片对应一个txt文件,然后利用DET转换的代码,对VID进行coco格式数据集转换

  1. 将annotations中的文件进行帧数扩展
  2. 将sequences中的图片进行重命名,然后复制到images文件夹中

其中txt的内容是: 去除 <frame_index> <target_id> 两个标签的其余8个标签。

转换代码

点击即可下载:https://download.csdn.net/download/qq_44824148/86814694?spm=1001.2014.3001.5501

#  复制文件
def copyfile(old_file_path,new_folder_path):
    shutil.copy(old_file_path, new_folder_path)

# 转换
......
# 重命名
......
import os
import cv2
from tqdm import tqdm
import json


def test():
    # 需要修改dir路径,其子文件夹需要有annotations和images
    dir='/usr/ldw/visdrone2coco/'
    train_dir = os.path.join(dir, "annotations")
    print(train_dir)
    id_num = 0
    categories = [
        {
    
    "id": 0, "name": "ignored regions"},
        {
    
    "id": 1, "name": "pedestrian"},
        {
    
    "id": 2, "name": "people"},
        {
    
    "id": 3, "name": "bicycle"},
        {
    
    "id": 4, "name": "car"},
        {
    
    "id": 5, "name": "van"},
        {
    
    "id": 6, "name": "truck"},
        {
    
    "id": 7, "name": "tricycle"},
        {
    
    "id": 8, "name": "awning-tricycle"},
        {
    
    "id": 9, "name": "bus"},
        {
    
    "id": 10, "name": "motor"},
        {
    
    "id": 11, "name": "others"}
    ]
    images = []
    annotations = []
    # 需要修改annotations_path,指向annotations
    # annotations_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\annotations'
    annotations_path='/usr/ldw/visdrone2coco/annotations/'
    set = os.listdir(annotations_path)
    # images_path,指向images
    # images_path = r'J:\Dataset\visdrone\Task 2_ Object Detection in Videos\VisDrone2019-VID-train\images'
    images_path='/usr/ldw/visdrone2coco/images/'
    print()
    for i in tqdm(set):
        print(annotations_path + "/" + i, "r")
        f = open(annotations_path + "/" + i, "r")
        name = i.replace(".txt", "")
        image = {
    
    }
        height, width = cv2.imread(images_path + "/" + name + ".jpg").shape[:2]
        file_name = name + ".jpg"
        image["file_name"] = file_name
        image["height"] = height
        image["width"] = width
        image["id"] = name
        images.append(image)
        for line in f.readlines():
            annotation = {
    
    }
            line = line.replace("\n", "")
            if line.endswith(","):  # filter data
                line = line.rstrip(",")
            line_list = [int(i) for i in line.split(",")]
            bbox_xywh = [line_list[0], line_list[1], line_list[2], line_list[3]]
            annotation["image_id"] = name
            annotation["score"] = line_list[4]
            annotation["bbox"] = bbox_xywh
            annotation["category_id"] = int(line_list[5])
            annotation["id"] = id_num
            annotation["iscrowd"] = 0
            annotation["segmentation"] = []
            annotation["area"] = bbox_xywh[2] * bbox_xywh[3]
            id_num += 1
            annotations.append(annotation)
        dataset_dict = {
    
    }
        dataset_dict["images"] = images
        dataset_dict["annotations"] = annotations
        dataset_dict["categories"] = categories
        json_str = json.dumps(dataset_dict)
        # 修改url,后缀名为json
        url='/usr/ldw/visdrone2coco/annotations/a1.json'
        with open(url, 'w') as json_file:
            json_file.write(json_str)
    print("json file write done...")

if __name__ == '__main__':
    test()

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/qq_44824148/article/details/127270393