MS CoCo dataset

I. Introduction

The Ms CoCo data set is a very large and commonly used data set. It can do tasks such as target detection, image segmentation, image description, etc.

Dataset address: link

Address of the paper describing the dataset: link

One thing to note: the object category of the data set is divided into two categories: 80 and 91, of which object80 is a subset of stuff91, and stuff that does not belong to object refers to materials and objects without clear boundaries, such as sky and grass wait.

When learning target detection, we can use object80 as a classification category

Comparing MS coco with PASCAL VOC, it can be seen that the number of CoCo datasets is significantly larger, and there are more real values ​​of the same category, so the pre-trained model of the CoCo dataset is often used as the initialization of its own model. (Pre-training on the CoCo dataset takes a long time)

2. Dataset download

The test set is not divided here because the file structure of the test set division is actually the same as that of the verification set, so you can test without dividing the test set, and use the verification set to achieve the purpose of testing.

3. Check the verification set and annotation files

3.1 Use python's json library to view

If the training set is relatively large, there is no download. Here we mainly look at the test set and annotation files

import json


json_path = "./annotations/instances_val2017.json"
with open(json_path, "r") as f:
    json_file = json.load(f)
print(json_file["info"])

Setting a breakpoint, we can get the following:

json_file is a dictionary, including: info, licenses, images, annotations, categories five keywords

key

value type

value

info

dict

description、url、version、year、contribute、data_created

licences

list

len = 8

images

list

include 5000 dicts every dict = {licences、filename、coco_url、height、width、date_captured、flickr_url、id}

annotations

list

include 36871dicts every dict = {segmentation、area、iscrowd、image_id、bbox、categoriy_id、id}

categories

list

len = 80 include supercategory name id

in:

images is a list (the number of elements corresponds to the number of images), and each element in the list is a dict , corresponding to the relevant information of a picture. Including the corresponding image name , image width , height and other information.

Annoatations is a list (the number of elements corresponds to the number of objects in the picture), and each element of the list is a dict , which represents the annotation information of each object. Including segmentation information , target detection frame (the four numbers in the frame represent the x and y coordinates of the upper left corner point respectively, and the next two numbers are the width and height of the frame. id is the corresponding image id, category_id is the type id of the object type .iscrowd indicates whether this is a single target object, 0 for a single object, 1 for a collection of objects )

categories is a list (the number of elements corresponds to the number of categories of the detection target). Each element in the list is a dict corresponding to the target information of a category. Including category id , category name and superclass (that is, the meaning of the parent class)

3.2. Use the official API to view

3.2.1 View target detection labeling information

import os
from pycocotools.coco import COCO
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

json_path = "./annotations/instances_val2017.json"
img_path = "./val2017"

# load coco data
coco = COCO(annotation_file=json_path)

# get all image index info
ids = list(sorted(coco.imgs.keys()))
print("number of images: {}".format(len(ids)))

# get all coco class labels
coco_classes = dict([(v["id"], v["name"]) for k, v in coco.cats.items()])

# 遍历前三张图像
for img_id in ids[:3]:
    # 获取对应图像id的所有annotations idx信息
    ann_ids = coco.getAnnIds(imgIds=img_id)

    # 根据annotations idx信息获取所有标注信息
    targets = coco.loadAnns(ann_ids)

    # get image file name
    path = coco.loadImgs(img_id)[0]['file_name']

    # read image
    img = Image.open(os.path.join(img_path, path)).convert('RGB')
    draw = ImageDraw.Draw(img)
    # draw box to image
    for target in targets:
        x, y, w, h = target["bbox"]
        x1, y1, x2, y2 = x, y, int(x + w), int(y + h)
        draw.rectangle((x1, y1, x2, y2))
        draw.text((x1, y1), coco_classes[target["category_id"]])

    # show image
    plt.imshow(img)
    plt.show()

3.2.2 View segmentation annotation

code:


import os
import random

import numpy as np
from pycocotools.coco import COCO
from pycocotools import mask as coco_mask
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

random.seed(0)

json_path = "./annotations/instances_val2017.json"
img_path = "./val2017"

# random pallette
pallette = [0, 0, 0] + [random.randint(0, 255) for _ in range(255*3)]

# load coco data
coco = COCO(annotation_file=json_path)

# get all image index info
ids = list(sorted(coco.imgs.keys()))
print("number of images: {}".format(len(ids)))

# get all coco class labels
coco_classes = dict([(v["id"], v["name"]) for k, v in coco.cats.items()])

# 遍历前三张图像
for img_id in ids[:3]:
    # 获取对应图像id的所有annotations idx信息
    ann_ids = coco.getAnnIds(imgIds=img_id)
    # 根据annotations idx信息获取所有标注信息
    targets = coco.loadAnns(ann_ids)

    # get image file name
    path = coco.loadImgs(img_id)[0]['file_name']
    # read image
    img = Image.open(os.path.join(img_path, path)).convert('RGB')
    img_w, img_h = img.size

    masks = []
    cats = []
    for target in targets:
        cats.append(target["category_id"])  # get object class id
        polygons = target["segmentation"]   # get object polygons
        rles = coco_mask.frPyObjects(polygons, img_h, img_w)
        mask = coco_mask.decode(rles)
        if len(mask.shape) < 3:
            mask = mask[..., None]
        mask = mask.any(axis=2)
        masks.append(mask)

    cats = np.array(cats, dtype=np.int32)
    if masks:
        masks = np.stack(masks, axis=0)
    else:
        masks = np.zeros((0, height, width), dtype=np.uint8)

    # merge all instance masks into a single segmentation map
    # with its corresponding categories
    target = (masks * cats[:, None, None]).max(axis=0)
    # discard overlapping instances
    target[masks.sum(0) > 1] = 255
    target = Image.fromarray(target.astype(np.uint8))

    target.putpalette(pallette)
    plt.imshow(target)
    plt.show()

 3.2.3 View key point annotations

code:

import numpy as np
from pycocotools.coco import COCO

json_path = "./annotations/person_keypoints_val2017.json"
coco = COCO(json_path)
img_ids = list(sorted(coco.imgs.keys()))

# 遍历前5张图片中的人体关键点信息(注意,并不是每张图片里都有人体信息)
for img_id in img_ids[:5]:
    idx = 0
    img_info = coco.loadImgs(img_id)[0]
    ann_ids = coco.getAnnIds(imgIds=img_id)
    anns = coco.loadAnns(ann_ids)
    for ann in anns:
        xmin, ymin, w, h = ann['bbox']
        # 打印人体bbox信息
        print(f"[image id: {img_id}] person {idx} bbox: [{xmin:.2f}, {ymin:.2f}, {xmin + w:.2f}, {ymin + h:.2f}]")
        keypoints_info = np.array(ann["keypoints"]).reshape([-1, 3])
        visible = keypoints_info[:, 2]
        keypoints = keypoints_info[:, :2]
        # 打印关键点信息以及可见度信息
        print(f"[image id: {img_id}] person {idx} keypoints: {keypoints.tolist()}")
        print(f"[image id: {img_id}] person {idx} keypoints visible: {visible.tolist()}")
        idx += 1

output:

    1

    2

    3

    4

    5

    6

    7

    8

    9

   10

loading annotations into memory...

Done (t=0.34s)

creating index...

index created!

[image id: 139] person 0 bbox: [412.80, 157.61, 465.85, 295.62]

[image id: 139] person 0 keypoints: [[427, 170], [429, 169], [0, 0], [434, 168], [0, 0], [441, 177], [446, 177], [437, 200], [430, 206], [430, 220], [420, 215], [445, 226], [452, 223], [447, 260], [454, 257], [455, 290], [459, 286]]

[image id: 139] person 0 keypoints visible: [1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

[image id: 139] person 1 bbox: [384.43, 172.21, 399.55, 207.95]

[image id: 139] person 1 keypoints: [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]

[image id: 139] person 1 keypoints visible: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

3.3. CoCo Official Code API Interpretation

The above three pieces of code visualize bbox, segmentation, and keypoint respectively, mainly using a series of official APIs, and then interpret the code of the official API (take bbox visualization as an example)

coco = COCO(annotation_file=json_path)

This line of code creates a CoCo object, annotation_file is passed in as the path of the json file, and is mainly used to read the json file and visualize annotations

In the initialization function:

Create four empty dictionaries are dataset, anns, cats, imgs, and create two default dictionaries (study notes of the default dictionary) default dictionary CSDN link

Then judge whether the path is empty. If it is not empty, use the json library method to download the annotation file. The downloaded dataset format is dict

Then call the create index function

In this method, create three empty dictionaries anns, cats, and imgs, and then judge whether there are keywords of annotation, images, and categories in the dictionary

For dataset["annotation"], there is a list of all objects, each element is a dictionary, and each image name id list in the imgToAnns dictionary records all objects of this image

anns dictionary each image serial number id keyword records marked object information dictionary

For dataset["images"] is a dictionary containing all images, and the imgs dictionary records the image information dictionary of each image name id

For dataset["categories"] is a list of dictionaries, the cats dictionary records the category information of each category id

The catToImgs dictionary records the image name id for each category id

ids = list(sorted(coco.imgs.keys()))

ids is a sorted list of image names and ids

coco_classes = dict([(v["id"], v["name"]) for k, v in coco.cats.items()])

coco_classes actually extracts the category id and category name for each dictionary in the categories dictionary list to form a dictionary

I feel that the level below is not enough, and I will come back when I become stronger.

4. Verify the target detection task MAP

4.1 Prediction result output format

Object Detection Prediction Format

Suppose we have the following prediction results (the prediction results obtained by the original blogger training)

We save it as predict_results.json file

Then execute the following code to compare the predicted value with ground_truth

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval


# accumulate predictions from all images
# 载入coco2017验证集标注文件
coco_true = COCO(annotation_file="./annotations/instances_val2017.json")
# 载入网络在coco2017验证集上预测的结果
coco_pre = coco_true.loadRes('./predict/predict_results.json')

coco_evaluator = COCOeval(cocoGt=coco_true, cocoDt=coco_pre, iouType="bbox")
coco_evaluator.evaluate()
coco_evaluator.accumulate()
coco_evaluator.summarize()

Get the output:

    1

    2

    3

    4

    5

    6

    7

    8

    9

   10

   11

   12

   13

   14

   15

   16

   17

   18

   19

   20

   21

   22

   23

   24

   25

loading annotations into memory...

Done (t=0.71s)

creating index...

index created!

Loading and preparing results...

DONE (t=0.79s)

creating index...

index created!

Running per image evaluation...

Evaluate annotation type *bbox*

DONE (t=19.72s).

Accumulating evaluation results...

DONE (t=3.82s).

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.233

 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.415

 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.233

 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.262

 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.323

 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.216

 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.319

 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.327

 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.145

 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.361

 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.463

下一篇记录目标检测的评估指标:CoCo数据集-目标检测指标MAP_SL1029_的博客-CSDN博客

五、学习的博客与视频

MS COCO数据集介绍以及pycocotools简单使用_太阳花的小绿豆的博客-CSDN博客

COCO数据集介绍以及pycocotools简单使用_哔哩哔哩_bilibili

CoCo dataset official website: COCO - Common Objects in Context (cocodataset.org)

CoCo official API address: cocodataset/cocoapi: COCO API - Dataset @ http://cocodataset.org/ (github.com)

Guess you like

Origin blog.csdn.net/SL1029_/article/details/130698509