MS COCO dataset study notes (Common Objects in COntext)

First, the data sources

COCO pictures resources are referenced from the photo site Flickr

Second, the data set was created to

Image recognition training, mainly for the following three directions:

(1)object instances

(2)object keypoints

(3) image captions

Each direction of training and validation set contains two annotation files

Third, the label structure

Directions share the three basic types of information, including info, image, license three fields, the annotation field is varied.

3.1 General description field

  • General -Info field

 

Example:

 

 

  • General -image field

 

Example:

 

  • General -licence field

Example:

 

3.2 Introduction variant field

  • annotation-Object Instance

iscrowd = 0: indicates that this is a separate object, denoted by contour Polygon (polygon point), i.e., segmentation field represented by Polygon
iscrowd = 1: indicates no two or more separate objects, the contour is represented by RLE encoding, i.e. segmention field represents a RLE encoded form
 
  • annotation-Object keypoint
Compared to object Instance mark, an increase of two fields: Keypoints and num_keypoints
keypoints is a length of an array of 3 * k, where k is the total number of keypoints.
keypoints [i] [0] and keypoints [i] [1] is a (x, y), keypoints [i] [2] is a flag v
v = 0- key unlabeled, v = 2- key points have been labeled and invisible, v = 3- key points have been labeled and visible
 

 

category field:

 keypoints field records key array name, skeleton defines the connection between key points (e.g., wrist and elbow). keypoints of supercategory only marked person.

 

 Image Caption type of labeling is very simply, with respect to the above, here not skip the table

Guess you like

Origin www.cnblogs.com/punkcure/p/11614332.html