Introduction and use of COCO dataset

        This article is what I learned during my postgraduate entrance period. It is mainly for my own use. It combines the original work and source code of the thesis. I borrowed some videos from station b and explanations from other people's blogs. Links will be posted at the end of the article.

Table of contents

Introduction to the COCO dataset

COCO dataset annotation format

Installation of pycocotools and brief introduction of coco api

Installation of pycocotools

A brief introduction to coco api

pycocotools simple usage example


Introduction to the COCO dataset

        The COCO data set is a data set funded by Microsoft Corporation and is mainly used for object detection, segmentation and image description. According to the official website, it mainly has the following characteristics:

  • Object Segmentation: target segmentation
  • Recognition in Context: image scene recognition
  • Superpixel stuff segmentation: Superpixel segmentation
  • 330K images(>200K labeled): There are 330K images, of which more than 200K are labeled
  • 1.5 million object instances: 1.5 million object instances
  • 80 object categories: 80 target categories
  • 91 stuff categories: 91 object categories
  • 5 captions per image: 5 descriptions per image
  • 250K people with keypotins: There are 250,000 people's pictures marked with key points

        The first few features are well understood and belong to several popular research directions. The main doubts are in  80 object categories  and 91 stuff categories . Next, let’s explain:

  • For the so-called "stuff categories", the description in the paper is where "stuff" categories include materials and objects with no clear boundaries (sky, street, grass) , that is, 91 types of objects without clear boundaries (such as sky, street, grass) are marked. grassland).
  • Secondly, pay attention to the difference between 80 object categories and 91 stuff categories. The paper uses a paragraph to describe their differences. Simply put, 80 categories are a subset of 91 categories , and some categories that are difficult to classify and easily confused are removed. For example, we will use this 80 classification.

80 categories include:

person(person)  
bicycle(bicycle) car(automobile) motorbike(motorcycle) aeroplane(aircraft) bus(bus) train(train) truck(truck) boat(ship) traffic light(signal light) fire hydrant(fire hydrant)  
stop sign (stop sign) parking meter (parking meter) bench (bench)  
bird (bird) cat (cat) dog (dog) horse (horse) sheep (sheep) cow (cattle) elephant (elephant) bear (bear) ) zebra (zebra) giraffe (giraffe)  
backpack (backpack) umbrella (umbrella) handbag (handbag) tie (tie) suitcase (suitcase)  
frisbee (frisbee) skis (skis feet) snowboard (snowboard) sports ball (sports ball ) kite (kite) baseball bat (baseball bat) baseball glove (baseball glove) skateboard (skateboard) surfboard (surfboard) tennis racket (tennis racket) bottle (bottle) wine glass (goblet) cup (tea cup) fork (fork  
) ) knife (knife)
spoon (spoon) bowl (bowl)  
banana (banana) apple (apple) sandwich (sandwich) orange (orange) broccoli (broccoli) carrot (carrot) hot dog (hot dog) pizza (pizza) donut (sweet donut) cake (cake)
chair (chair) sofa (sofa) pottedplant (potted plant) bed (bed) dining table (table) toilet (toilet) tvmonitor (television) laptop (notebook) mouse (mouse) remote (remote control) keyboard (keyboard  
) cell phone ( telephone)  
microwave (microwave oven) oven (oven) toaster (toaster) sink (sink) refrigerator (refrigerator) book (book) clock (alarm clock
) vase (vase) scissors (scissors) teddy bear (teddy bear) hair drier ( hair dryer) toothbrush (toothbrush)


COCO dataset annotation format

        First download the data set from the coco official website. This article takes coco2017 as an example, download the train, val and annotations, and create a coco2017 directory:

...
     _
     _
     _
               _ .json: Annotation file of training set for object detection and segmentation tasks
               ├── instances_val2017.json: Annotation file of validation set for object detection and segmentation tasks
               ├── captions_train2017.json: Annotation file of training set for image description
               ├── captions_val2017.json: The validation set annotation file corresponding to the image description
               ├── person_keypoints_train2017.json: The training set annotation file corresponding to the human key point detection
               └── person_keypoints_val2017.json: The verification set annotation folder corresponding to the human key point detection
          

         In the annotation, I only pay attention to the two annotation files instances_train2017.json and instances_val2017.json for target detection.

        Next, I will analyze the useful information in the annotation file information , use the json library to view the annotation file, and enter the following program:

import json
file_path = './instances_val2017.json'
json_info = json.load(open(file_path,'r'))
print(json_info["info"])

        Then insert a breakpoint before the fourth line to debug, and you can see the following information in the variable table:

        The "info" field and the "licenses" field are completely useless and are not interpreted.

        I mainly focus on the "images" field : click on the images field, which contains information about all images, as shown in the figure below

         Take the first "0000" as an example, as shown in the figure below, the main information inside is "file_name", indicating the file name of the picture; "coco_url", indicating the url address that can be downloaded to this picture; "height" and " "weight" indicates the height and width of the picture; and the rest of the information is of no use to me, so I won't go into details here.

         Next, look at the "annotation" field . Expand the "annotation" field to see the following information similar to the "images" field:

        I still take the expansion of 00000 as an example, where the "segmentation" field indicates the coordinate point information of the segmented image; the "area" field indicates the area of ​​the image; the "iscrowd" field indicates whether the marked objects in the image overlap; the "image_id" field indicates the image The id of the "bbox" field is the bounding box, indicating the anchor box information. Pay attention to the anchor box information here. The first two values ​​​​represent the coordinate information of the upper left corner of the target anchor box, and the last two values ​​represent the width and height respectively. ; The "category_id" field indicates the category of this picture (the index in the 91-category object); other information will not be described in detail.

         The last field is "categories". This field is 80 long, which means that it represents 80 categories. After clicking on it, the information in it contains the name of this category and the category it belongs to (such as bicycles and cars belong to vehicles). Not too much here For details, see the figure below:

        So far, the annotation information of the coco dataset is basically introduced.


Installation of pycocotools and brief introduction of coco api

Installation of pycocotools

        I am installing pycocotools on the Anaconda prompt in the windows environment, and enter the following command:

pip install pycocotools-windows

A brief introduction to coco api

  Initialize the coco instance:

from pycocotools.coco import COCO
val_annotation_file = './instances_val2017.json'
coco = COCO(annotation_file = val_annotation_file)

The coco variable is shown in the following figure:

 We right-click to select COCO, "go to" to "Implementation", we can see that there are the following functions in the COCO class:

         COCO.getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None), the main parameter is imgIds, the id of the input image, and returns the index list of all annotation information of this image. The detailed parameters are shown in the figure below:

        And getAnnIds is usually used in conjunction with loadAnns, COCO.loadAnns(self, ids=[]), passing in the annotation index information, will return the annotation details corresponding to the annotation information index. The detailed parameter list is shown in the figure below:

         Note that the returned dictionary contains a series of information. The bbox field is the anchor box information, which are x, y, w, and h respectively. The first two are the coordinates of the upper left corner, and the latter two are the width and height of the anchor box. The detailed form is shown in the figure below Show:

         COCO.loadImgs(self, ids=[]), the id of the input image will return the detailed information of the image corresponding to the id, as shown in the following figure:

pycocotools simple usage example

        

import os
from pycocotools.coco import COCO
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

val_annotation_file = "./annotations/instances_val2017.json"
val_img_file = './val2017'

coco = COCO(annotation_file=val_annotation_file)
coco_classes = dict([(v["id"], v["name"]) for k, v in coco.cats.items()])

idx = list(sorted(coco.imgs.keys()))
img_id = idx[0] #排序后最小的图片id为139 ,即img_id=139

ann_idx = coco.getAnnIds(imgIds=img_id)
objects = coco.loadAnns(ann_idx)
#获取图片
##获取图片路径名
path = coco.loadImgs(img_id)[0]["file_name"]
##读取139号图片
img = Image.open(os.path.join(val_img_file, path)).convert('RGB')
#在图片上绘制矩形框
draw = ImageDraw.Draw(img)
##一个图片可能会含有多个锚框,对每一个都进行描绘
for object in objects:
    x,y,w,h = object["bbox"]
    x1,y1,x2,y2 = x, y, int(x+w), int(y+h)
    draw.rectangle((x1, y1, x2, y2))
    draw.text((x1, y1), coco_classes[object["category_id"]])
##使用matplotlib绘制
plt.imshow(img)
plt.show()

 The final result is shown in the figure:


Introduction to MS COCO dataset and simple use of pycocotools

 COCO dataset paper download address

Guess you like

Origin blog.csdn.net/Tiao_12/article/details/120270913