Detailed explanation of the organization structure of the VOC data set

PASCAL VOC provides a complete set of standardized and excellent data sets for image recognition and classification. From 2005 to 2012, an image recognition challenge will be held every year. We can download the VOC data set from the VOC data set download address here , which is mainly divided into two years: 2007 and 2012.

After the download is complete, unzip it, and you will find that the contents of the folder are as follows:
Insert picture description here
For target detection, we only need to pay attention to the first three folders: Annotations、ImageSets和JPEGImagesOK. Let's talk about the specific contents of these three folders.

VOCdevkit 
——VOC2007        #文件夹的年份可以自己取,但是要与你其他文件年份一致,看下一步就明白了
————Annotations  #放入所有的xml文件
————ImageSets    
——————Main       #放入train.txt,val.txt文件
————JPEGImages   #放入所有的图片文件

①Annotations

The following figure shows the contents of the Annotations folder:
Insert picture description here

What is stored in the Annotations folder is a label file in xml format, and each xml file corresponds to a picture in the JPEGImages folder. Shown below is the specific content of the first xml:

<?xml version="1.0"?>

-<annotation>

	<folder>VOC2007</folder>
	
	<filename>000005.jpg</filename> //图片名称


	-<source>

		<database>The VOC2007 Database</database>
		
		<annotation>PASCAL VOC2007</annotation>
		
		<image>flickr</image>
		
		<flickrid>325991873</flickrid>

	</source>


	-<owner>

		<flickrid>archintent louisville</flickrid>
		
		<name>?</name>

	</owner>


	-<size> //图片尺寸

		<width>500</width>
		
		<height>375</height>
		
		<depth>3</depth>

	</size>

	<segmented>0</segmented>


	-<object> //图片中包含的在类别中的物体

		<name>chair</name> //物体名称
		
		<pose>Rear</pose> 
		
		<truncated>0</truncated>
		
		<difficult>0</difficult>
		
		
		-<bndbox> //该物体的bounding-box,左上角和右下角的坐标

			<xmin>263</xmin>
			
			<ymin>211</ymin>
			
			<xmax>324</xmax>
			
			<ymax>339</ymax>

		</bndbox>

	</object>


	-<object>//其他物体

		<name>chair</name>
		
		<pose>Unspecified</pose>
		
		<truncated>0</truncated>
		
		<difficult>0</difficult>
		
		
		-<bndbox>

			<xmin>165</xmin>
			
			<ymin>264</ymin>
			
			<xmax>253</xmax>
			
			<ymax>372</ymax>
			
		</bndbox>

	</object>


	-<object>//其他物体

		<name>chair</name>
		
		<pose>Unspecified</pose>
		
		<truncated>1</truncated>
		
		<difficult>1</difficult>
		
		
		-<bndbox>

			<xmin>5</xmin>
			
			<ymin>244</ymin>
			
			<xmax>67</xmax>
			
			<ymax>374</ymax>

		</bndbox>

	</object>


	-<object>//其他物体

		<name>chair</name>
		
		<pose>Unspecified</pose>
		
		<truncated>0</truncated>
		
		<difficult>0</difficult>
		
		
		-<bndbox>

			<xmin>241</xmin>
			
			<ymin>194</ymin>
			
			<xmax>295</xmax>
			
			<ymax>299</ymax>

		</bndbox>

	</object>


	-<object>//其他物体

		<name>chair</name>
		
		<pose>Unspecified</pose>
		
		<truncated>1</truncated>
		
		<difficult>1</difficult>
		
		
		-<bndbox>

			<xmin>277</xmin>
			
			<ymin>186</ymin>
			
			<xmax>312</xmax>
			
			<ymax>220</ymax>
			
		</bndbox>

	</object>

</annotation>

The corresponding is the following one 000005.jpg. The XML file stores the coordinates and category information of the detected objects contained in the corresponding photos.
Insert picture description here
②ImageSets
Insert picture description here

  • Stored under Layout is data with human body parts (human head, hand, feet, etc., which are also part of the VOC challenge)
  • Stored under Main is the target detection data, which is divided into 20 categories in total.
  • Stored under Segmentation is the data that can be used for segmentation.

In fact, we only need to pay attention to the data under the Main folder, as shown below:
Insert picture description here

The Main folder contains 20 categories ***_train.txt、***_val.txt和***_trainval.txt.

The content of opening one of the files is as follows:
Insert picture description here

  • The number in the front represents the name of the image, the 1 at the back represents a positive sample, and -1 represents a negative sample.
  • _train.txt stores the data used for training, _val.txt stores the data used for the verification results, and _trainval.txt combines the above two.
  • There are also three train.txt、val.txt、trainval.txtfiles used to save all the pictures, which pictures are used for training and which pictures are used for verification. The saved content is only the name of these pictures, and there is no other more information.

③JPEGImages

The JPEGImages folder contains all the picture information provided by PASCAL VOC, including training pictures and test pictures. As you can see, the order here and the name of the picture correspond to the XML file.

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_39507748/article/details/110816926