Introduction to target detection data labeling format and tool software

1. Data annotation format

There are actually three commonly used target detection data formats, one is Pascal VOC format, the other is COCO format, and the last one is YOLO format. The data annotation format corresponds to the data loading requirements of the training framework. For example, paddledetection and mmdetection support COCO format and Pascal VOC format, and the yolo series target detection framework supports data in yolo format.

1.1.Pascal VOC format

background

Pascal VOC itself is a competition that started in 2005 and lasted in 2012. The content of the competition includes object classification, target detection, object segmentation, Human Layout, and behavior classification. The data set it uses is the Pascal VOC data set, and the main data sets are Pascal VOC2007 and Pascal VOC2012. The organization form of Pascal VOC data set is called Pascal VOC format.

Data format and data representation

(1) The data in Pascal VOC format consists of three folders: JPEGImages, Annotations, and ImageSets/Main.
Insert image description here

(2) JPEGImages is used to store all images;
(3) Annotations is used to store corresponding annotation files (one image corresponds to an annotation file), usually xml annotations. Note: In the paddleseg framework, when doing semantic segmentation, image annotations in png format are stored in the Annotations directory; (4
) train.txt, val.txt, test.txt, trainval.txt are stored in ImageSets/Main (that is, for partitioning of the data set).
(5) train.txt, val.txt, test.txt, and trainval.txt respectively specify which images are used as training set, verification set, test set, and training verification set (training set + verification set). In each txt, each line is the name of an image (without extension, if cat10086.jpg is selected as the training set, then a certain line in train.txt will be cat10086)
Insert image description here
Insert image description here

The format of the information in Xml is as shown below:
Insert image description here

data-backed framework

Mmdetection, paddledetection and other mainstream target detection frameworks

1.2.COCO format

background

COCO is first a large-scale benchmark data set for target detection built by Microsoft, and its data format is called coco format. It supports detection, segmentation, key point estimation and other tasks. Coco datasets currently use the COCO2017 dataset, and the data address is COCO - Common Objects in Context (cocodataset.org).
Basic generation process:
Three folders need to be created: train, val, annotations. Among them, train and val store training images and verification images respectively, and annotations store annotations. Unlike VOC, where one image corresponds to an annotation file, COCO's annotation file is that the annotations of all images in a subdirectory are saved in a json file ( That is, the labels of all data in the dataset are stored in a json file). All images under train2017 correspond to one instance_train.json, and all images under val correspond to one instance_val.json.
It can be found that if directly labeled into COCO format, the training set and verification set must be divided before labeling.

Data format and data representation

Subsequent additions

data-backed framework

Mmdetection, paddledetection and other mainstream target detection frameworks

1.3.YOLO data set format

background

The emergence of the YOLO data set format is mainly for training the YOLO model, because data can be loaded by modifying the model's configuration file. The only thing to note is that the annotation format of the YOLO data set normalizes the position information of the target box ( Normalization here refers to dividing by the image width and height),

Data format and data representation

The file format has no fixed requirements (usually images store the original image; labels store the position of the txt label), and the basic data format is shown in the following figure:
Insert image description here

The format of the Txt tag is, {target category id} {normalized x-coordinate of target center point} {normalized y-coordinate of target center point} {normalized target box width w} {normalized The height of the target box after h}. Different from other data, yolo tags only have category ids and no specific category names. In addition, it describes the xywh information of the annotation box in relative size, which is not affected by the change of image size.

Insert image description here

data-backed framework

Target detection framework of Yolov5, yolov6, yolov7, yolov8, etc. yolo series

2. Dataset annotation tools

The data annotation tool corresponds to the original data. Usually the label file generated by the data annotation tool is different from the data annotation format in target detection. Currently, common data labeling tools include: labelimg, labelme, and anylabeling.

2.1 Labelimg

Labelimg is an open source data labeling tool that can label three formats. It can only label positive rectangular box data, which means it can only be used for target detection tasks. This software is a python program, and you need to execute the installation and startup commands in a terminal with a python environment.

Software installation command

Enter the terminal with python environment and enter the following command:
pip install labelimg -i https://pypi.tuna.tsinghua.edu.cn/simple

Software startup command

labelimg 或 python -m labelimg

Software interface function description

The software interface is shown in the figure below. Before marking the data, you need to select the data first, and then click Create RectBox.
Insert image description here

The marked data format is as follows, which is strictly consistent with the pascal voc format. But you need to divide the data set yourself to form a txt file.
Insert image description here

2.2 labelme

Labelme can label various forms of image data (such as polygons, rectangles, circles, polylines, line segments, and points), and can be used for target detection, image segmentation, and other tasks. Images can also be annotated in the form of flags (which can be used for image classification and cleaning tasks). Labelme stores labeling information in JSON files, which cannot be directly applied to the target detection framework and requires its own encoding and conversion. This software is a python program, and you need to execute the installation and startup commands in a terminal with a python environment.

Software installation command

pip install labelme

Software startup command

Labelme 或 python -m labelme

Software interface function description

The software interface is shown in the figure below, and the data needs to be selected before marking the data:
Insert image description here

The annotated data is as follows, which is a json file. The annotation information is stored in the shapes field (each element corresponds to an annotation information, shape_type describes the type of annotation [polygon, rectangle, line segment], and label describes the specific category of annotation [ Categories of objects such as people and cars]).
Insert image description here

2.3 anylabeling

anylabeling is an auxiliary labeling software based on deep learning models, which supports data labeling in polygon, rectangle, circle, point, line and other formats. Its auxiliary labeling function supports polygonal labeling (relying on the SAM model) and rectangular box labeling (depending on the YOLO model). When performing auxiliary annotation, you need to select a model (the software will automatically download it) and rely on the onnxruntime library when running the model. The label file format it generates is exactly the same as labelme. It is in json format and requires self-programming to convert the data format.

Software installation command

pip install anylabeling

Software startup command

anylabeling
or
python -m anylabeling.app

Software interface function description

Enter the software to manually mark by default, and you need to click the brain at the bottom of the toolbar to start the auxiliary mark function.
Insert image description here

The usage process of its auxiliary labeling function is as follows:
Select the Brain button on the left to activate automatic labeling.
 Select a model of type Segment Anything Models from the drop-down menu Model. Model accuracy and speed vary from model to model. Among them, Segment Anything Model (ViT-B) is the fastest but the accuracy is not high. Segment Anything Model (ViT-H) is the slowest and most accurate. Quant represents a quantified model. When selecting the model for the first time, it will take a long time to download, so please be patient.
 Mark objects using the automatic segmentation marking tool.
+Point: Add a point belonging to the object.
-Point: Remove a point that you want to exclude from the object.
+Rect: Draws a rectangle containing the object. Segment Anything will automatically segment objects.
Clear: Clear all automatic segmentation marks.
Finish the object (f): When the current tag is completed, press the shortcut key f in time, enter the tag name and save the object.
The generated json file is as follows, which is exactly the same as labelme.

Insert image description here

Guess you like

Origin blog.csdn.net/m0_74259636/article/details/132393909