[Computer Vision] A brief introduction to the COCO Caption dataset

Recently, when doing open domain target detection, I often encounter a data set—COCO Caption data set.

Here is an introduction to the dataset.

COCO Caption dataset:

The launch of the Microsoft COCO Caption dataset builds on work on the Microsoft Common Objects in COntext (COCO) dataset.

In the paper "Microsoft COCO Captions: Data Collection and Evaluation Server", the authors detailed their work on building the MS COCO Caption dataset based on the MS COCO dataset.

Briefly, for about 330,000 images in the original COCO dataset, using Amazon's Mechanical Turk service, at least 5 sentence annotations were artificially generated for each image, and the total number of annotation sentences exceeded about 1.5 million sentences. As for Amazon's "Turk Robot" service, it's just another form of hiring people to work for money.

In fact, the COCO Caption dataset contains two datasets:

  • The first dataset is MS COCO c5. The training set, verification set and test set images it contains are consistent with the original MS COCO database, except that each image has 5 artificially generated annotation sentences;
  • The second dataset is MS COCO c40. It contains only 5000 images, and these images are randomly selected from the test set of the MS COCO dataset. Unlike c5, each of its images has 40 human-generated annotation sentences.

The reason for the MS COCO c40 dataset is that if there are more reference annotation sentences, many automatic calculation standards for the annotations generated by the algorithm can have a higher correlation with human judgment. The next step may be to add 40 artificially generated annotation sentences to all the images in the MS COCO validation set.

Summarize:

In short, the MS COCO Caption dataset was created for the problem of image annotation. There are a large number of images and their annotations, and a ready-made evaluation standard calculation server and code are provided. Judging from the high-level papers published so far, the MS COCO Caption dataset has increasingly become the first choice of researchers.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/129800523