CityScapes is currently one of the most authoritative and professional image semantic segmentation evaluation sets in the field of autonomous driving. It focuses on the understanding of urban road environments in real scenes, and the tasks are more difficult and closer to popular needs such as autonomous driving. Let's take a look with everyone today.
1. Dataset Introduction
Published by: Daimler AG R&D, TU Darmstadt, MPI Informatics
Release time: 2015
background:
Focus on semantic understanding of urban street scenes.
Introduction:
The CityScapes dataset has the following characteristics:
Diverse annotations , including instance segmentation, semantic segmentation, and polygonal boxes;
● The category is complex , with 30+ categories, including two dimension descriptions of group and class;
● Scene differences , covering 50 cities, different seasons, and different time periods (daytime);
● The volume is huge , with 5,000 finely annotated pictures and 20,000 rough annotated pictures.
2. Dataset details
1. The amount of labeled data (take fine labeling as an example)
Training set: 2975 images
Validation set: 500 images
Test set: 1525 images
Each picture here has three annotation files (instance segmentation, semantic segmentation, polygon frame annotation) at the same time.
2. Labeling category
The labeled 30+ categories and the groups they are in are as follows:
Group | Class |
flat | road · sidewalk · parking+ · rail track+ |
human | person* · rider* |
vehicle | car* · truck* · bus* · on rails* · motorcycle* · bicycle* · caravan*+ · trailer*+ |
construction | building · wall · fence · guard rail+ · bridge+ · tunnel+ |
object | pole · pole group+ · traffic sign · traffic light |
nature | vegetation · terrain |
sky | sky |
void | ground+ · dynamic+ · static+ |
Among them, * indicates that some areas are connected together, and will be marked as a whole, such as "car group"; + indicates that this category is not included in the verification set and is considered invalid.
3. Visualization
The visualization effect of fine labeling is as follows:
3. Dataset task definition and introduction
1. Semantic Segmentation
● Task definition
Scene parsing is the dense segmentation of the whole image into semantic classes, where each pixel is assigned a class label, such as regions of trees and regions of buildings.
● Scene analysis evaluation index
Four indicators commonly used for semantic segmentation:
Pixel Accuracy (Pixel Accuracy, PA) : Indicates the proportion of correctly classified pixels.
Mean Pixel Accuracy (MPA) : Indicates the average of the proportion of correctly classified pixels across all categories.
Mean Intersection over Union (MIoU) : Indicates the intersection ratio between predicted pixels and real pixels, averaged over all classes.
Weighted IoU (weighted IoU, WIoU) : Indicates the IoU weighted by the total pixel ratio of each class.
2. Instance Segmentation
● Task definition
Instance segmentation is to detect object instances in images and further generate accurate segmentation masks of the objects. It differs from the scene parsing task in that there is no instance concept of segmented regions in scene parsing, while in instance segmentation, if there are three people in the scene, the network is required to segment each person region.
● Instance segmentation evaluation index
Instance segmentation generally has the following evaluation indicators:
Average Precision (Average Precision, AP) : The average accuracy rate, where the accuracy rate P = TP/(TP+FP).
Pixel Accuracy (Pixel Accuracy, PA) : Indicates the proportion of correctly classified pixels.
Mean Pixel Accuracy (MPA) : Indicates the average of the proportion of correctly classified pixels across all categories.
Mean Intersection over Union (MIoU) : Indicates the intersection ratio between predicted pixels and real pixels, averaged over all classes.
Weighted IoU (weighted IoU, WIoU) : Indicates the IoU weighted by the total pixel ratio of each class.
4. Interpretation of data set file structure
1. Directory structure
Directory structure (here we take the finely marked one as an example):
dataset_root/
├── gtFine/
| ├── test/ # 测试集
| | ├── berlin/ # 城市名称
| | | ├── berlin_000543_000019_gtFine_color.png # 可视化的分割图
| | | ├── berlin_000543_000019_gtFine_instanceIds.png # 实例标注文件
| | | ├── berlin_000543_000019_gtFine_labelIds.png # 语义分割标注文件
| | | ├── berlin_000543_000019_gtFine_polygons.json # 存储各个类的名称和相应的区域边界
| | | └── ...
| | ├── bielefeld/ # 城市名称
| | | ├── bielefeld_000000_066495_gtFine_color.png # 可视化的分割图
| | | ├── bielefeld_000000_066495_gtFine_instanceIds.png # 实例标注文件
| | | ├── bielefeld_000000_066495_gtFine_labelIds.png # 语义分割标注文件
| | | ├── bielefeld_000000_066495_gtFine_polygons.json # 存储各个类的名称和相应的区域边界
| | | └── ...
| | └── ...
| ├── train/ # 训练集
| | ├── aachen/ # 城市名称
| | | ├── aachen_000000_000019_gtFine_color.png # 可视化的分割图
| | | ├── aachen_000000_000019_gtFine_instanceIds.png # 实例标注文件
| | | ├── aachen_000000_000019_gtFine_labelIds.png # 语义分割标注文件
| | | ├── aachen_000000_000019_gtFine_polygons.json # 存储各个类的名称和相应的区域边界
| | | └── ...
| | └── ...
| └── val/
| ├── frankfurt/ # 城市名称
| | ├── frankfurt_000000_000294_gtFine_color.png # 可视化的分割图
| | ├── frankfurt_000000_000294_gtFine_instanceIds.png # 实例标注文件
| | ├── frankfurt_000000_000294_gtFine_labelIds.png # 语义分割标注文件
| | ├── frankfurt_000000_000294_gtFine_polygons.json # 存储各个类的名称和相应的区域边界
| | └── ...
| └── ...
└── leftImg8bit/
├── test/
| ├── berlin/ # 城市名称
| | ├── berlin_000543_000019_leftImg8bit.png # 测试集图片
| | └── ...
| ├── bielefeld/ # 城市名称
| | ├── bielefeld_000000_066495_leftImg8bit.png # 测试集图片
| | └── ...
| └── ...
├── train/
| ├── aachen/ # 城市名称
| | ├── aachen_000000_000019_leftImg8bit.png # 测试集图片
| | └── ...
| └── ...
└── val/
├── frankfurt/ # 城市名称
| ├── frankfurt_000000_000294_leftImg8bit.png # 测试集图片
| └── ...
└── ...
2. Label format
There are two annotation formats for Cityscapes data, which are the segmentation map format (.png file) used for instance segmentation and semantic segmentation , and the json format (.json file) for polygonal borders .
Among them, the png file is a grayscale image with the same size as the original image, and the gray value on the pixel corresponds to the category id (or instance id) of the corresponding pixel in the original image, and the category id corresponding to the instance id in the instance segmentation diagram needs to be at the same time Combined with the category id of the same pixel coordinates in the semantic segmentation map to obtain.
For the visualization effect diagram of the segmentation map, please refer to the previous visualization section.
The file format of the json annotation is as follows:
{
"imgHeight": 1024,
"imgWidth": 2048,
"objects": [
{
"label": "ego vehicle",
"polygon": [
[
271,
1023
],
[
387,
1009
],
...
]
},
...
]
}
It contains the following fields:
attribute name | meaning | type of data |
imgHeight | image height | int |
imgWidth | image width | int |
objects | instance list | list |
label | classification name | str |
polygon | boundary point coordinates | list |
5. Dataset download link
The OpenDataLab platform provides you with complete dataset information of the CityScapes dataset, intuitive data distribution statistics, smooth download speed, and convenient visual scripts. Welcome to experience.
https://opendatalab.com/CityScapes
References
[1] Official website: https://www.cityscapes-dataset.com/
[2]论文:M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, The Cityscapes Dataset for Semantic Urban Scene Understanding, in CVPR, 2016.
(Paper download link: https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2016cityscapes.pdf)
[3]Github:https://github.com/mcordts/cityscapesScripts
More data sets are on the shelves, more comprehensive data set content interpretation, the most powerful online Q&A, the most active circle of peers... Welcome to add WeChat opendatalab_yunying to join the OpenDataLab official communication group.