【Dataset Research】MS COCO


1 Introduction

  • MS COCO :Microsoft Common Objects in Context

  • MS COCO is a large dataset created by Microsoft Corporation for image recognition and object detection.

  • There are two versions of the MS COCO dataset: MS COCO 2014 and MS COCO 2017.Here we introduce the 2017 version
    The 2017 version of the dataset is an extension and update of the 2014 version of the dataset. The changes in the 2017 version compared with the 2014 version are:

    • The 2017 and 2014 versions use the exact same image
    • The 2017 version training set/validation set division is 118K/5K, while the 2014 version is 83K/41K
    • The annotations for detection tasks/keypoint detection in the 2017 version are the same as the 2014 version, but 40K training images (a subset of the 118k training set) and the stuff annotations of all validation sets are added (stuff categories are introduced later)
    • The 2017 test set has only two parts (dev/challenge), while the 2014 version has four parts (dev/standard/reserve/challenge).
    • The 2017 release releases 120,000 unlabeled images from COCO, which follow the same class distribution as labeled images and can be used for semi-supervised learning.

Official website address: https://cocodataset.org/
Official paper: https://arxiv.org/pdf/1405.0312.pdf


2. Dataset characteristics

COCO is a large-scale dataset for object detection, segmentation, and image description. It has the following characteristics:

insert image description here

Your main doubts may lie in 80 object categoriesand 91 stuff categories:

  • stuff categories: The description in the paper is: "stuff" categories include materials and objects with no clear boundaries (sky, street, grass), that is, 91 types of objects with no clear boundaries (such as sky, street, grass) are marked.
  • 80 object categories91 stuff categoriesThe difference between and : The paper uses a paragraph to describe their difference. Simply put, the 80 categories are a subset of the 91 categories, and some difficult-to-classify and confusing categories are removed. If you do target detection, basically only use object 80 classes.

(2) 80 categories
insert image description here


3. Computer Vision Tasks Supported by MS COCO Dataset

insert image description here


4. MS COCO 2017 dataset download

Target detection tasks, semantic segmentation tasks, instance segmentation tasks, image description characters, and key point detection tasks, only use the 3 files I framed below: (1) training set image files, (2) validation set image files, ( 3) Training set and verification set annotation files
insert image description here
Here I post the download link
Training set image (2017 train): http://images.cocodataset.org/zips/train2017.zip
Verification set image (2017 val): http:// images.cocodataset.org/zips/val2017.zip
Training set annotation (2017 annotations): http://images.cocodataset.org/annotations/annotations_trainval2017.zip

After the data is downloaded, the file structure is as follows
insert image description here


5. MS COCO file annotation format

Taking instances_train2017.json as an example, after the data is read in, it is a dict containing 5 elements: info, licenses, image, annotation, categories

import json

json_path = "COCO2017/annotations/instances_train2017.json"
with open(json_path, 'r') as f:
    json_labels = json.load(f)

insert image description here

1)info
insert image description here

2)licenses
insert image description here

3)image

insert image description here

4)annotation
insert image description here

5)categories
insert image description here

Guess you like

Origin blog.csdn.net/weixin_37804469/article/details/129800790