Analysis of VOC2007/2012 Data Set

Official download address: https://pjreddie.com/projects/pascal-voc-dataset-mirror/

Challenge mission of PASCAL VOC

  • Classification/Detection Competitions
    classification: For each classification, determine whether the classification exists on the test photo (a total of 20 categories);
    Detection: detect the position of the target object in the test image and give the bounding box coordinates (bounding box)
  • Segmentation Competition
    分割:Object Segmentation
  • Action Classification Competition
    Human action recognition (Action Classification)
  • Large Scale Visual Recognition Competition ImageNet
    ImageNet large visual identity contest
  • Person Layout Taster Competition
    Human Layout

VOC2007 basic information

The training set (5011 images), the test set (4952 images), a total of 9963 images, including 20 categories:

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor 

download

Download training and validation sets, test sets, toolkits:

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar

Unzip all the tar into a directory called VOCdevkit

tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar

The basic structure is as follows

└── VOCdevkit     #根目录
    └── VOC2007   #不同年份的数据集,这里只下载了2007
        ├── Annotations  #存放xml文件,与JPEGImages中的图片一一对应,解释图片的内容等等
        ├── ImageSets    #该目录下存放的都是txt文件,这些txt将数据集的图片分成了各种集合。如Main下的train.txt中记录的是用于训练的图片集合
        │   ├── Layout
        │   ├── Main
        │   └── Segmentation
        ├── JPEGImages         #存放源图片
        ├── SegmentationClass  #存放的是图片,语义分割相关
        └── SegmentationObject #存放的是图片,实例分割相关

Annotation folder
Insert picture description here This folder contains an annotation for each picture. The format of an XML file is used for annotation. XML is a markup language, shaped like HTML. Each XML file corresponds to the labeling result of a picture, and the labeling result of the XML file (000005.xml) corresponding to 000005.jpg is as follows:

<annotation>
    <folder>VOC2007</folder>
    <!--文件名-->
    <filename>000005.jpg</filename>.   
    <!--数据来源-->
    <source>
        <!--数据来源-->
        <database>The VOC2007 Database</database>
        <annotation>PASCAL VOC2007</annotation>
    <!--来源是flickr,一个雅虎的图像分享网站,下面是id,对于我们没有用-->
        <image>flickr</image>
        <flickrid>325991873</flickrid>
    </source>
    <!--图片的所有者,也没有用-->
    <owner>
        <flickrid>archintent louisville</flickrid>
        <name>?</name>
    </owner>
    <!--图像尺寸,宽、高、长-->
    <size>
        <width>500</width>
        <height>375</height>
        <depth>3</depth>
    </size>
    <!--是否用于分割,0表示用于,1表示不用于-->
    <segmented>0</segmented>
    <!--下面是图像中标注的物体,每一个object包含一个标准的物体-->
    <object>
        <!--物体名称,拍摄角度-->
        <name>chair</name>
        <pose>Rear</pose>
        <!--是否被裁减,0表示完整,1表示不完整-->
        <truncated>0</truncated>
        <!--是否容易识别,0表示容易,1表示困难-->
        <difficult>0</difficult>
        <!--bounding box的四个坐标-->
        <bndbox>
            <xmin>263</xmin>
            <ymin>211</ymin>
            <xmax>324</xmax>
            <ymax>339</ymax>
        </bndbox>
    </object>
    <object>
        <name>chair</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>165</xmin>
            <ymin>264</ymin>
            <xmax>253</xmax>
            <ymax>372</ymax>
        </bndbox>
    </object>
    <object>
        <name>chair</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>1</difficult>
        <bndbox>
            <xmin>5</xmin>
            <ymin>244</ymin>
            <xmax>67</xmax>
            <ymax>374</ymax>
        </bndbox>
    </object>
    <object>
        <name>chair</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>241</xmin>
            <ymin>194</ymin>
            <xmax>295</xmax>
            <ymax>299</ymax>
        </bndbox>
    </object>
    <object>
        <name>chair</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>1</difficult>
        <bndbox>
            <xmin>277</xmin>
            <ymin>186</ymin>
            <xmax>312</xmax>
            <ymax>220</ymax>
        </bndbox>
    </object>
</annotation>

The ImageSets folder
stores the image data corresponding to each type of challenge. For example, there is a file named aeroplane_train.txt in the Main folder. As the name suggests, it is used for training data for the aircraft category, where ±1 should indicate the meaning of positive and negative samples.

Basic information of VOC2012

The VOC2012 data set is an upgraded version of the VOC2007 data set, with a total of 11530 pictures.

  • For the detection task, the trainval/test of VOC2012 contains all corresponding pictures from 2008 to 11 years. trainval has 11,540 images and a total of 27,450 objects.
  • For the segmentation task, the trainval of VOC2012 contains all corresponding pictures from 07-11, and the test only contains 08-11. Trainval has 2913 images and 6929 objects.

The VOC2012 data set is divided into 20 categories, including 21 categories of background, as follows:

Person: person 
Animal: bird, cat, cow, dog, horse, sheep 
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train 
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Download and unzip

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

There are 5 folders under VOCdevkit/VOC2012, the content is roughly the same as 07.
Insert picture description here
Annotations The
Annotations folder stores label files in xml format. Each xml file corresponds to a picture in the JPEGImages folder, with a total of 17,125 files. For example:

<annotation>
	<folder>VOC2012</folder>  #表明图片来源
	<filename>2007_000027.jpg</filename> #图片名称
	<source>                  #图片来源相关信息
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
	</source>
	<size>     #图像尺寸
		<width>486</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented> #是否用于分割
	<object>  #包含的物体
		<name>person</name> #物体类别
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>  #物体的bbox
			<xmin>174</xmin>
			<ymin>101</ymin>
			<xmax>349</xmax>
			<ymax>351</ymax>
		</bndbox>
		<part> #物体的头
			<name>head</name>
			<bndbox>
				<xmin>169</xmin>
				<ymin>104</ymin>
				<xmax>209</xmax>
				<ymax>146</ymax>
			</bndbox>
		</part>
		<part>   #物体的手
			<name>hand</name>
			<bndbox>
				<xmin>278</xmin>
				<ymin>210</ymin>
				<xmax>297</xmax>
				<ymax>233</ymax>
			</bndbox>
		</part>
		<part>
			<name>foot</name>
			<bndbox>
				<xmin>273</xmin>
				<ymin>333</ymin>
				<xmax>297</xmax>
				<ymax>354</ymax>
			</bndbox>
		</part>
		<part>
			<name>foot</name>
			<bndbox>
				<xmin>319</xmin>
				<ymin>307</ymin>
				<xmax>340</xmax>
				<ymax>326</ymax>
			</bndbox>
		</part>
	</object>
</annotation>

ImageSets
There are four folders in ImageSets that
Insert picture description here
store human actions (such as running, jumping, etc., which are also part of the VOC challenge) under Action

Stored under Layout is data with human body parts (human head, hand, feet, etc., which are also part of the VOC challenge)

Stored under Main is the image object recognition data, which is divided into 20 categories.

Stored under Segmentation is data that can be used for segmentation

Guess you like

Origin blog.csdn.net/W1995S/article/details/112805724