Detailed interpretation: MIT's classic semantic segmentation dataset ADE20K, with a download link

Friends, the OpenDataLab that is willing to share is here! This time, I will bring you a detailed "strategy" of using the ADE20K dataset to help you model training.

Don't miss this large dataset released by MIT for scene perception, parsing, segmentation, multi-object recognition, and semantic understanding.

1. Dataset Introduction

Published by: MIT CSAIL Computer Vision Group

Release time: 2016

Background: Semantic understanding of visual scenes is a key problem in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a broad range of scenes and object categories, and lack pixel-wise annotations for scene understanding.

Summary: ADE20K covers various annotations of scenes, objects, parts of objects, and in some cases even parts of parts. There are 25k images of complex everyday scenes containing various objects in natural spatial environments. Each image has an average of 19.5 instances and 10.5 object classes.

2. Dataset details

1. Label the amount of data

● Training set: 20210 images

● Validation set: 2000 images

● Test set: 3000 images

2. Labeling category

The annotation of the dataset consists of three visual concepts:

● discrete objects, which are things with well-defined shapes, such as cars, people;

● Stuff that contains amorphous background areas, such as grass, sky;

● object part, which is a component of some existing object instance that has functional meaning, such as a head or a leg.

A total of 3169 categories are annotated for the three visual concepts, among which there are 2693 categories for discrete objects and things in the amorphous background area. The object part has 476 classes.


3. Visualization

Figure 1: The first row shows the sample image, the second row shows the annotation of the object, and the third row shows the annotation of the object part. The color scheme encodes both object classes and object instances, i.e. different object classes have large color differences, while different instances from the same object class have small color differences (e.g. different person instances in the first image have slightly different color).

3. Dataset task definition and introduction

1. Scene analysis

● Definition

Scene parsing is the dense segmentation of the whole image into semantic classes, where each pixel is assigned a class label, such as regions of trees and regions of buildings.

● Benchmark

The authors select the top 150 categories in the ADE20K dataset, ranked by their total pixel ratio, and construct a scene parsing benchmark for ADE20K, called  SceneParse150 .

Among the 150 categories, there are 35 object classes (i.e. walls, sky, roads) and 115 discrete object classes (i.e. cars, people, tables). The annotated pixels of 150 classes account for 92.75% of all pixels in the dataset, among which the object class in the amorphous background region accounts for 60.92%, and the discrete object class accounts for 31.83%.

Results are reported in four metrics commonly used for semantic segmentation:

- Pixel accuracy: Indicates the proportion of correctly classified pixels;

- Mean accuracy: Indicates the proportion of pixels correctly classified on average across all categories;

- Mean IoU (average IoU): Indicates the intersection ratio between predicted pixels and real pixels, averaged over all classes;

- Weighted IoU (weighted IoU): Indicates the IoU weighted by the total pixel ratio of each class.

2. Instance Segmentation

● Definition

Instance segmentation is to detect object instances in images and further generate accurate segmentation masks of the objects. It differs from the scene parsing task in that there is no instance concept of segmented regions in scene parsing, while in instance segmentation, if there are three people in the scene, the network is required to segment each person region.

● Benchmark

To benchmark the performance of instance segmentation, the authors select 100 foreground object categories from the full dataset, which they call InstSeg100. The total number of object instances in InstSeg100 is 218K, with an average of 2.2K instances per object category and 10 instances per image; all objects except ships have over 100 instances.

Results are reported with the following metrics:

An overall measure of average precision mAP, as well as measures at different object scales, with mAP_S (objects smaller than 32×32 pixels), mAP_M (between 32×32 and 96×96 pixels) and mAP_L (larger than 96×96 pixels) .

4. Interpretation of data set file structure

Directory structure: (language: Python)

ADE20K_2021_17_01/    images/        training/            cultural/                apse__indoor/                    <filename0>.jpg         # 原图像                    <filename0>_seg.png     # 分割图,通道R和G编码对象类别ID,                                            # 通道B编码实例ID                    <filename0>_parts_{i}.png                                             # 部件分割图,i表示第i层部件,如car                                            # 属于第一层部件,wheel属于第二层部件                    <filename0>.json        # 存储图像中所有实例的多边形,属性等信息                    <filename0>/            # 存储图像中所有实例mask的目录                        instance_000_<filename0>.png                        instance_001_<filename0>.png                    ...                ...            ...        validation/            cultural/                apse__indoor/                    <filename1>.jpg                    <filename1>_seg.png                    <filename1>_aprts_{i}.png                    <filename1>.json                    <filename1>/                        instance_000_<filename1>.png                        instance_001_<filename1>.png                        ...                    ...                ...            ...    index_ade20k.pkl                    # 数据和存储图像的文件夹的统计信息

<filename>.json file format:

{
   
       "annotation": {
   
           "filename": "<filename>.jpg",   # 图像名称        "folder": "ADE20K_2021_17_01/images/ADE/training/urban/street",                                        # 图像存储的相对路径        "imsize": [                     # 图像高、宽、通道数            1536,            2048,            3        ],        "source": {                     # 图像来源信息            "folder": "static_sun_database/s/street",            "filename": "labelme_acyknxirsfolpon.jpg",            "origin": ""        },        "scene": [                      # 图像场景信息            "outdoor",            "urban",            "street"        ],        "object": [                     # 标注实例列表            {
   
                   "id":0,                 # 实例ID                "name":"traffic light, traffic signal, stoplight",                                        # 实例标签                "name_ndx": 2836,       # 实例标签                "hypernym": [           # 上位词                    "traffic light, traffic signal, stoplight",                    "light",                    "visual signal",                    "signal, signaling, sign",                    "communication",                    "abstraction, abstract entity",                    "entity"                ],                "raw_name": "traffic light",                "attributes": [],       # 属性                "depth_ordering_rank": 1,                                        # 深度顺序                "occluded": "no",       # 遮挡情况                "crop": "0",                "parts": {              # 部件信息                    "hasparts": [],                    "ispartof": [],                    "part_level": 0                },                "instance_mask": "<filename>/instance_000_<filename>.png",                                        # 对应的实例mask                "polygon": {            # 多边形坐标                    "x": [346, ...],                    "y": [781, ...]                },                "saved_date": "18-Dec-2005 06:56:48"            },            ...        ]

Figure 2: The format of the index_ade20k.pkl file after opening it with Python

The meaning of each field in index_ade20k.pkl:

'filename': Array of length N=27574 with image filenames.

'folder': An array of length N containing the names of the image folders.

'objectIsPart': is the object class of the object part. An array of size [C, N], counts the number of times an object is a part in each image. objectIsPart[c,i]=m if an object of class c is part of another object m times in image i.

'objectPresence': Array of size [C, N], object counts per image. objectPresence(c,i)=n if there are n instances of object class c in image i.

'objectcounts': array of length C, the number of instances of each object class.

'objectnames': Array of length C with object class names.

'proportionClassIsPart': Array of length C with the proportion of times that class c is a part. If ratioClassIsPart[c]=0 it means this is a main object (eg car, chair...).

'scene': array of length N giving each image the scene name (same class as Places database)

'wordnet_found': array of length C. It indicates whether the object name was found in Wordnet.

'wordnet_level1': list of length C. List of WordNet associations.

'wordnet_synset': list of length C. A WordNet synset for each object name.

'wordnet_hypernym': list of length C. List of WordNet hypernyms for each object name.

'wordnet_gloss': list of length C. It stores the definition corresponding to the WordNet synonym set.

'wordnet_frequency': array of length C. The number of occurrences of each WordNet synset.

'description': description of each field in index ade20k.pkl.

5. Dataset resources

The OpenDataLab platform has put the ADE20K dataset on the shelves, providing you with complete dataset information and smooth download speed, come and experience it!

ADE20K 2021 Dataset

References:

[1] Official website: https://groups.csail.mit.edu/vision/datasets/ADE20K/

[2]Introduction: Semantic Understanding of Scenes through ADE20K Dataset. Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso and Antonio Torralba. International Journal on Computer Vision (IJCV).[PDF]

[3]Github:https://github.com/CSAILVision/ADE20K

More data sets are on the shelves, more comprehensive data set content interpretation, the most powerful online Q&A, the most active circle of peers... Welcome to add WeChat opendatalab_yunying  to join the OpenDataLab official communication group.

Guess you like

Origin blog.csdn.net/OpenDataLab/article/details/125293382