Brief description of the Voc dataset

1. Understanding VOC

It is a data set format - a data set in VOC format

The PASCAL VOC Challenge (The PASCAL Visual Object Classes) is a world-class computer vision challenge.
The PASCAL VOC Challenge mainly includes the following categories:

  • Image classification (Object Classification)
  • Object Detection
  • Object Segmentation
  • Action recognition (Action Classification), etc.

1.1 voc dataset download

Now mainly use two versions of voc, 2007 and 2012
link address: voc official website
voc2007 dataset download
voc2012 dataset download
browser or Thunder (fastest) download.

2. VOC file structure

Press and hold shift + right mouse button under the folder to open the shell (VOC2012 as an example)

VOCdevkit
    └── VOC2012
         ├── Annotations               所有的图像标注信息(XML文件)
         ├── ImageSets    
         │   ├── Action                人的行为动作图像信息
         │   ├── Layout                人的各个部位图像信息
         │   │
         │   ├── Main                  目标检测分类图像信息
         │   │     ├── train.txt       训练集(5717)
         │   │     ├── val.txt         验证集(5823)
         │   │     └── trainval.txt    训练集+验证集(11540)
         │   │
         │   └── Segmentation          目标分割图像信息
         │         ├── train.txt       训练集(1464)
         │         ├── val.txt         验证集(1449)
         │         └── trainval.txt    训练集+验证集(2913)
         │ 
         ├── JPEGImages                所有图像文件
         ├── SegmentationClass         语义分割png图(基于类别)
         └── SegmentationObject        实例分割png图(基于目标)

2.1 Annotations

Store label files in xml format, each xml corresponds to a picture in JPEGImage. And each xml stores the location and category (C=20) information of each marked target, and the naming is usually the same as the corresponding original image, and the coordinate points are saved in the format of (x, y). LabelImg can be used for labeling and viewing.
xml file parsing:

<annotation>  
    <folder>VOC2012</folder>                             
    <filename>2007_000392.jpg</filename> //文件名  
    <source>                             //图像来源(不重要)  
        <database>The VOC2007 Database</database>  
        <annotation>PASCAL VOC2007</annotation>  
        <image>flickr</image>  
    </source>  
    <size>                              //图像尺寸(长宽以及通道数)                        
        <width>500</width>  
        <height>332</height>  
        <depth>3</depth>  
    </size>  
    <segmented>1</segmented>            //是否用于分割(在图像物体识别中01无所谓)  
    <object>                            //检测到的物体  
        <name>horse</name>              //物体类别  
        <pose>Right</pose>              //拍摄角度  
        <truncated>0</truncated>        //是否被截断(0表示完整)  
        <difficult>0</difficult>        //目标是否难以识别(0表示容易识别)  
        <bndbox>                        //bounding-box(包含左下角和右上角xy坐标)  
            <xmin>100</xmin>  
            <ymin>96</ymin>  
            <xmax>355</xmax>  
            <ymax>324</ymax>  
        </bndbox>  
    </object>  
    <object>              //检测到多个物体  
        <name>person</name>  
        <pose>Unspecified</pose>  
        <truncated>0</truncated>  
        <difficult>0</difficult>  
        <bndbox>  
            <xmin>198</xmin>  
            <ymin>58</ymin>  
            <xmax>286</xmax>  
            <ymax>197</ymax>  
        </bndbox>  
    </object>  
</annotation> 

2.2 ImageSets

  • Action Depositor's action
  • Layout stores data with human body parts (head, feet, etc.)
  • Main image object recognition data
  • Segmentation can be used to segment the data

training set and test set

  • train.txt The training set
    is the name of the picture without a suffix. Take train.txt as an example, it is divided into two columns, the first column is the image name such as 00012; the second column is -1 and 1, -1 means that the target does not appear in the corresponding image, and 1 means that it appears.
  • val.txt validation set
  • trainval.txt training and validation sets

2.3 JPEGImages

All pictures, including training and testing pictures,
a total of 17125 pictures
insert image description here

2.4 SegmentationClass

2.5 SegmentationObject

3. Target detection task

How to use the data in the dataset for target detection?

  1. First read the txt file in VOC2012\ImageSets\Main
  • xxx_train The training set of xxx class
  • Validation set of xxx_val xxx class
  • xxx_trainval training and validation sets for xxx class
  1. In VOC2012\Annotations,
    find the corresponding annotation file (.xml) under the Annotations folder through the index.
  2. Then find the corresponding picture in the JPEGImages folder through the filename field in the annotation file. For example, the filename field in the 2007_000323.xml file is 2007_000323.jpg, then the 2007_000323.jpg file can be found in the JPEGImages folder.

4. Semantic Segmentation Task

How to use the dataset for semantic segmentation tasks?

  1. Read the corresponding txt file in VOC2012\ImageSets\Segmentation
  └── Segmentation          目标分割图像信息
        ├── train.txt       训练集(1464)
        ├── val.txt         验证集(1449)
        └── trainval.txt    训练集+验证集(2913)

  1. Find the corresponding picture in VOC2012\JPEGImages
  2. Find the corresponding annotation image (png) in VOC2012\SegmentationClass

Note that when the corresponding labeled image (.png) in semantic segmentation is read by the Image.open() function of PIL, the default is P mode, which is a single-channel image. The pixel value at the background is 0, and the pixel value used at the edge of the target is 255 (the area with a pixel value of 255 is generally ignored during training), and the target area is filled according to the category index information of the target, such as the target index corresponding to a person is 15, so the pixel value of the target area is filled with 15.

5. Instance Segmentation Task

Note that when the corresponding labeled image (.png) in instance segmentation is read by the Image.open() function of PIL, the default is P mode, which is a single-channel image. The pixel value at the background is 0, and the pixel value at the edge of the target or the area that needs to be ignored is 255 (the area with a pixel value of 255 is generally ignored during training). Then find the corresponding xml file in the Annotations folder. After parsing the xml file, you will get the information of each target, and the pixel value of each target in the corresponding annotation file (.png) is arranged according to the order of the targets in the xml file. . As shown in the figure below, the serial number of each target in the xml file corresponds to the pixel value of the target in the annotation file (.png).

6. Correspondence between category index and name

{
    
    
	"background": 0,
    "aeroplane": 1,
    "bicycle": 2,
    "bird": 3,
    "boat": 4,
    "bottle": 5,
    "bus": 6,
    "car": 7,
    "cat": 8,
    "chair": 9,
    "cow": 10,
    "diningtable": 11,
    "dog": 12,
    "horse": 13,
    "motorbike": 14,
    "person": 15,
    "pottedplant": 16,
    "sheep": 17,
    "sofa": 18,
    "train": 19,
    "tvmonitor": 20
}

Reference Blog: Link: Reference

Guess you like

Origin blog.csdn.net/qq_43718758/article/details/128065362