Annotation format for the PASCAL VOC dataset

Annotation format of PASCAL VOC dataset

PASCAL VOC Challenge (** The PASCAL Visual Object Classes ) is a world-class computer vision challenge. The full name of PASCAL is Pattern Analysis, Statistical Modeling and Computational Learning. Many excellent computer vision models such as classification, positioning, detection, segmentation, and action recognition are based on the PASCAL VOC challenge and its data sets .

The full name of VOC is Visual Object Classes

The first PASCAL VOC was held in 2005, and then it was held every year until 2012.

When doing deep learning target detection and semantic segmentation work, you will come into contact with the PASCAL VOC data set. Perhaps the entire data set is rarely used, but the public codes are basically written based on the VOC or COCO data format, and generally follow its format. Prepare your own dataset. Therefore, the format of PASCAL VOC, including the directory structure and the content format of each folder, will be explained in detail. It is convenient to use the standard format of VOC to create your own data set in the future to speed up the progress of the project.

1. General overview of the document

Total folder name VOCdevkit (used 2012, 2007 is too old)

.

└── VOCdevkit #root directory

└── VOC2012 #Datasets of different years, only 2012 is downloaded here, and there are other years such as 2007

├── Annotations #Storing xml files, one-to-one correspondence with pictures in JPEGImages, explaining the content of pictures, etc.

├── ImageSets #All txt files are stored in this directory. Each line in the txt file contains the name of a picture, and ±1 will be added at the end to indicate positive and negative samples

│ ├── Action #Action Action recognition, no involvement

│ ├── Layout #Layout folder is used for person layout tasks, not involved

│ ├── Main

└── train.txt #training set

├── trainval.txt #training set and verification set

├── val.txt #validation set

├── test.txt #test set

│ └── Segmentation #semantic segmentation

├── JPEGImages #Store source pictures

├── SegmentationClass #Store pictures, related to semantic segmentation

└── SegmentationObject #Store pictures, related to instance segmentation

2. The specific content of the file

※ The Annotation folder stores the xml file, which is the explanation of the picture, and each picture corresponds to an xml file with the same name.

※ The ImageSets folder stores txt files, which divide the images of the dataset into various sets. As recorded in the train.txt under Main is the collection of pictures used for training.

※ The JPEGImages folder stores the original images of the dataset.

※ The SegmentationClass and SegmentationObject folders store images, and they are all image segmentation results.

2.1, Annotation folder

The contents of the Annotation folder are as follows:

insert image description here

Among them, xml mainly introduces the basic information of the corresponding image, such as which folder, file name, source, image size, and which targets and target information are contained in the image, etc. The content is as follows:

  • filename: filename

  • source, owner: image source, and owner

  • size: image size

  • segmented: Whether to segment

  • object: Indicates that this is a target, and the content inside is the relevant information of the target

    • name: object name, 20 categories
    • pose: shooting angle: front, rear, left, right, unspecified
    • truncated: Whether the target is truncated (such as outside the picture), or occluded (more than 15%)
    • Difficult: The difficulty of detection, which is mainly judged according to the size of the target, the change of illumination, and the quality of the picture
  • bndbox: The 4 coordinate values ​​of the upper left corner and the lower right corner of the bounding box.


<annotation>

<folder>VOC2012</folder> #Indicate the source of the picture

<filename>2007_000027.jpg</filename> #image name

<source> #Image source related information

<database>The VOC2007 Database</database>

<annotation>PASCAL VOC2007</annotation>

<image>flickr</image>

</source>

<size> #image size

<width>486</width>

<height>500</height>

<depth>3</depth>

</size>

<segmented>0</segmented> #Whether it is used for segmentation

<object> #contains the object

<name>person</name> #object category

<pose>Unspecified</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox> #The bbox of the object

<xmin>174</xmin>

<ymin>101</ymin>

<xmax>349</xmax>

<ymax>351</ymax>

</bndbox>

<part> #The head of the object

<name>head</name>

<bndbox>

<xmin>169</xmin>

<ymin>104</ymin>

<xmax>209</xmax>

<ymax>146</ymax>

</bndbox>

</part>

<part> #The hand of the object

<name>hand</name>

<bndbox>

<xmin>278</xmin>

<ymin>210</ymin>

<xmax>297</xmax>

<ymax>233</ymax>

</bndbox>

</part>

<part>

<name>foot</name>

<bndbox>

<xmin>273</xmin>

<ymin>333</ymin>

<xmax>297</xmax>

<ymax>354</ymax>

</bndbox>

</part>

<part>

<name>foot</name>

<bndbox>

<xmin>319</xmin>

<ymin>307</ymin>

<xmax>340</xmax>

<ymax>326</ymax>

</bndbox>

</part>

</object>

</annotation>


2.2, ImageSets folder

ImageSets contains the following four subfolders:

insert image description here

TXT files for various purposes are stored in each folder. For example, there is a file named aeroplane_train.txt under the Main folder, which, as the name implies, is the training data for the aircraft category. The specific content of the txt is as follows.


2008_000008

2008_000015

2008_000019

2008_000023

2008_000028


The contents of the train.txt and trainval.txt files contained in it are similar to the above,

The content of train.txt and trainval.txt only has the name of the picture, but does not contain picture attributes and path information.

2.3, JPEGImages folder

This folder stores all the source images of the dataset, the contents are as follows:

insert image description here

2.4 **, SegmentationClass folder**

Semantic segmentation related:

This folder stores the label maps corresponding to all source pictures of the dataset. Note that the labels are 0, 1, 2, 3....

insert image description here

2.5, SegmentationObject folder

Instance segmentation related, note that the labels are 0, 1, 2, 3... :

insert image description here

Guess you like

Origin blog.csdn.net/weixin_38353277/article/details/128716519