super classic! Introduction to datasets for segmentation tasks.


foreword

In the process of exploring the network, the more basic and important work is to understand the data. Today, I will summarize the segmentation task datasets I have used so far. This blog post will introduce the basic data sets in detail: IRSTD-1k (Infrared Small Target Detection, the largest real infrared weak target single-frame detection data set, supports binary classification semantic segmentation); Pascal VOC2012 (TPattern Analysis, Statical Modeling and Computational Learning a
world Level computer vision challenge dataset, supports multi-class semantic segmentation and multi-class instance segmentation);
iSAID (A Large-scale Dataset for Instance Segmentation in Aerial Images, the first benchmark dataset for aerial image segmentation).


1. IRSTD-1k

IRSTD-1k comes from the paper 2022CVPR "ISNet: Shape Matters for Infrared Small Target Detection", author Mingjing Zhang, attach the paper link paper , data link dataset .
Infrared weak and small target detection and segmentation data sets will present the characteristics of "weak" and "small", where "weak" refers to the low signal-to-noise ratio of the target, poor contrast with the background, and weak infrared radiation intensity; and "small" refers to The disadvantage is that there are few target pixels, and it is difficult to obtain texture information during detection. The IRSTD-Ik dataset provides 1,000 real images with various object shapes, different object sizes and rich clutter background with precise pixel-level annotations. The data set is divided into two folders, IRSTD1k_Img stores real images, and IRSTD1k_Label stores label masks. As shown in the picture,

IMAGES-XDU9
MASK-XDU9
This data set can be used for deep learning image segmentation tasks, and can also study detection algorithms based on filtering, detection algorithms based on the human visual system, detection algorithms based on image data structures, and infrared weak and small target detection algorithms based on deep learning target detection algorithms.


Two, Pascal VOC2012

1. Introduction to data

The Pascal VOC2012 data set comes from the PASCAL VOC challenge (The PASCAL Visual Object Classes) is a world-class computer vision challenge. The full name of PASCAL: Pattern Analysis, Statical Modeling and Computational Learning is a network organization funded by the European Union. The PASCAL VOC Challenge mainly includes the following categories: Object Classification, Object Detection, Object Segmentation, Action Classification, etc. One dataset can accomplish 4 tasks. paper , dataset , introduction .

  1. Image classification and object detection tasks
    insert image description here

  2. Segmentation tasks, note that image segmentation generally includes semantic segmentation, instance segmentation, and panoramic segmentation. Instance segmentation is to represent each individual target with a color (the image in the middle of the figure below), while semantic segmentation is just all of the same category. Targets are represented by the same color (picture on the right in the figure below).
    insert image description here

  3. Behavior Recognition Task

  4. Human Body Layout Detection Task
    insert image description here

2. Introduction to Segmentation Task Dataset

  • The folder directory obtained by downloading the data set is as follows
VOCdevkit
    └── VOC2012
         ├── Annotations               所有的图像标注信息(XML文件)
         ├── ImageSets    
         │   ├── Action                人的行为动作图像信息
         │   ├── Layout                人的各个部位图像信息
         │   │
         │   ├── Main                  目标检测分类图像信息
         │   │     ├── train.txt       训练集(5717)
         │   │     ├── val.txt         验证集(5823)
         │   │     └── trainval.txt    训练集+验证集(11540)
         │   │
         │   └── Segmentation          目标分割图像信息
         │         ├── train.txt       训练集(1464)
         │         ├── val.txt         验证集(1449)
         │         └── trainval.txt    训练集+验证集(2913)
         ├── JPEGImages                所有图像文件
         ├── SegmentationClass         语义分割png图(基于类别)
         └── SegmentationObject        实例分割png图(基于目标)

  • Semantic segmentation task
    First, read the corresponding txt file in the Segmentarion file. For example, use the data in train.txt for training, then read the txt file, parse each line, and each line corresponds to an image index. Use the Segmentation and SegmentationClass folders.
    Note that in semantic segmentation, the corresponding colors of each category are different. For example, the target index corresponding to a person is 15, so the pixel value of the target area is filled with (192, 128, 128). There are 21 categories in total. Borders don't count as categories.

insert image description here

  • Instance segmentation tasks
    use the Segmentation and SegmentationObject folders.
    The label order of instance segmentation corresponds to the detection label one by one, and the specific color changes are as follows:

insert image description here

  • Remarks: Part of the content here is reproduced. For more details, please click the original link: https://blog.csdn.net/qq_37541097/article/details/115787033

3. iSAID

The iSAID dataset and the well-known remote sensing rotating frame target detection dataset are maintained by Xia Guisong's team at Wuhan University. The official website address is: iSAID . iSAID contains 15 categories, a total of 655,451 target instances, and the number of images reaches 2,806. The number of instances in a single image can reach up to 8,000, with an average of 239. It is the first large-scale instance segmentation dataset in the field of remote sensing.

iSAID uses the images in the DOTA dataset for pixel-level annotation, correcting the labeling errors in the DOTA dataset. Compared with the 188,282 target instances in DOTA, the sample size and labeling precision provided by iSAID are greatly increased. The target categories in the dataset include : plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabount, swimming pool, soccer ball field, basically covering the interpretation of urban remote sensing key target. 1/2 of the marked pictures are used as the training set, 1/6 is used for the verification set, and 1/3 is used for the test set. The training and verification sets release pictures and gt annotations at the same time, and only pictures can be downloaded for the test set. The official evaluation server has been set up, which can be used to evaluate the performance of the algorithm on the test set online.

iSAID fully embodies the common feature and scale distribution differences in remote sensing images. The author stipulates that 10 to 144 pixels are small targets, 144 to 1024 pixels are medium targets, and 1024 and above pixels are large targets. The ratio of the targets of these three sizes is 52.0:33.7:9.7. The difference in area of ​​the largest and smallest objects in the dataset can be as much as 20,000 times. In addition, there are a large number of objects with extreme aspect ratios in the dataset, up to 90 with an average of 2.4.
insert image description here
The format of the data set is relatively simple, so it will not be carefully expanded.

  • Remarks: Part of the content here is reprinted. For more details, please click the original link: https://zhuanlan.zhihu.com/p/461021557

Summarize

In short, it is very necessary to understand the data set. If you have any questions, please leave a message.

Supongo que te gusta

Origin blog.csdn.net/jijiarenxiaoyudi/article/details/128042741
Recomendado
Clasificación