First, the data sources
COCO pictures resources are referenced from the photo site Flickr
Second, the data set was created to
Image recognition training, mainly for the following three directions:
(1)object instances
(2)object keypoints
(3) image captions
Each direction of training and validation set contains two annotation files
Third, the label structure
Directions share the three basic types of information, including info, image, license three fields, the annotation field is varied.
3.1 General description field
- General -Info field
Example:
- General -image field
Example:
- General -licence field
Example:
3.2 Introduction variant field
- annotation-Object Instance
iscrowd = 0: indicates that this is a separate object, denoted by contour Polygon (polygon point), i.e., segmentation field represented by Polygon
iscrowd = 1: indicates no two or more separate objects, the contour is represented by RLE encoding, i.e. segmention field represents a RLE encoded form
- annotation-Object keypoint
Compared to object Instance mark, an increase of two fields: Keypoints and num_keypoints
keypoints is a length of an array of 3 * k, where k is the total number of keypoints.
keypoints [i] [0] and keypoints [i] [1] is a (x, y), keypoints [i] [2] is a flag v
v = 0- key unlabeled, v = 2- key points have been labeled and invisible, v = 3- key points have been labeled and visible
category field:
keypoints field records key array name, skeleton defines the connection between key points (e.g., wrist and elbow). keypoints of supercategory only marked person.
Image Caption type of labeling is very simply, with respect to the above, here not skip the table