How can a model go all over the world? Integrate and train multiple data sets to create a general target detection model method in detail

In the practical application of object detection, it is often the case that a generalized object detection system is required. For example, in urban security, the target detection system needs to be able to detect enough types of targets in order to achieve better security effects. However, the number of categories contained in commonly used target detection datasets is limited, and the target detection model trained with a single dataset can no longer meet the demand. How to integrate and train multiple datasets has become a hot research direction.

1. Problems in integrating multiple data sets

If you simply stitch together multiple data sets for training, you will often not get performance improvements. The main reasons are as follows:

NO.1

The label space of each dataset is not uniform

Each dataset specifies its own label space, and there is overlap between different label spaces. However, labels with the same name may have different semantics in different datasets , such asThe mouse in COCO stands for the mouse,The mouse in OpenImages represents a mouse; tags with different names may have the same semantics in different data sets , such asaeroplane in VOC andThe airplane in COCO stands for airplane.

Alignment simply by the name of the label will lead to ambiguity in the semantics of the label.

NO.2

Inconsistent labeling across datasets

Since the label space of each dataset is not uniform, the foreground and background definitions of each dataset are different. As shown in Figure 1, the A dataset contains person and car categories, but does not contain face categories. The B data set contains person and face categories, but does not contain car categories. Then in the respective annotations, the A data set will mark the car as the foreground, but will ignore the face as the background; and the B data set will mark the face as the foreground, but will ignore the car as the background [2]. This leads to ambiguity in the definition of foreground and background between different datasets.

Figure 1 Labeling inconsistency problem

NO.3

Domain differences across datasets

Due to the application scenarios of each data set and the different methods of image acquisition, there may be huge domain differences between data sets. The left of Figure 2 is the autonomous driving scene, and the right of Figure 2 is the indoor scene. It can be clearly seen that the styles of the left and right images are different, which will lead to different feature distributions extracted by the model.

 Figure 2 Domain difference problem

2. Interpretation of SimpleMulti-dataset Detection paper

Due to limited space, this article only introduces a paper "Simple Multi-dataset Detection" [1] that integrates and trains multiple datasets by "building a unified label space". Here is an overview of the method in the paper, analysis of its experimental results, and a brief introduction to the code usage to help readers better understand integrated training.

NO.1

method

UT-Austin proposed a method for multiple large datasets (COCOObjects365OpenImages , etc.) is a simple method for training general-purpose object detectors, and based on this method, it won the first place in the two tracks of object detection and instance segmentation in the ECCV2020 Robust Vision Challenge.

Figure 3 Different types of target detection models

The traditional target detection model is shown in Figure 3(a), a single data set corresponds to a single model, so usingThe model trained on the COCO dataset cannot be used for detectionThe target of the Objects365 dataset . Therefore, the author uses the method shown in Figure 3(b) to train a multi-head (multi-detection head) model, share the backbone of the model, use different detection heads for different data sets, and calculate the loss of each data set separately. The model trained in this way can identify targets in multiple data sets, but the current model does not achieve a unified label space, such asCOCOObjects365OpenImages detects objects of the Car category, but outputs three duplicate detection boxes.

But in this way, we have obtained the basis of a unified label space, and the detection results of the detection heads of each data set can be obtained on the same input image, so that the labels of different data sets can be aligned by comparing the detection results.

The author proposes a novel data-diven way to learn a unified label space: learn the label space according to the detection results of different detection heads on the verification set of each data set. The author regards the fusion of the label space as learning a    mapping relationship matrix from the original space L_k to the unified space   :L

The specific optimization objective is the following formula:


The meanings of the variables appearing in the above formula are as follows:

 T_k : mapping matrix corresponding to data set k

D: the category probability distribution of each prediction box in the unified label space

●  \tilde{D^k}: the category probability distribution of each prediction box in the label space of the dataset k

 L_c : optimize the loss function

Among them, since the mapping matrix cannot be directly optimized (the initial mapping relationship cannot be known), the author cleverly transforms the problem into a method of exhaustively enumerating the possible mapping relationships of each category on each data set to find the best mapping matrix. See the derivation in the paper [1].

Table 1 Effects of different sampling strategies

As shown in Table 1, the author compared the effects of different sampling strategies. Among them, uniform dataset sampling means to balance the number of samples between different data sets; class-aware sampling means to balance the long-tail effect in the data set and increase the probability of the tail category being sampled. It can be seen from the results that both uniform dataset sampling and class-aware sampling are very important.

NO.2

Experimental results

The author used CascadeRCNN of ResNet-50 backbone for experiments. First, as shown in Table 2, the multi-detection head model proposed by the author can already compare to the single detection head model (8x schedule) without fusing the label space, and provides reliable detection results for the subsequent learning of the label mapping matrix.

Table 2 Effect comparison between multi-detection head model and single detection head model

After learning the unified label space through the above method, the author compares the impact of different versions of the unified label space on the model performance in Table 3 (2x schedule).

● GloVe embedding means using GloVe word vectors to merge similar labels;

● Learned, distortion and Learned, AP represent the unified label space learned through different optimization loss functions using the method proposed in the paper;

● Expert human represents a unified label space manually merged.

Table 3 Comparison of label space effects of different versions

As shown in Table 4, the model after unifying the label space can be generalized and applied to other downstream datasets, and achieve better results on most downstream datasets.

Table 4 Generalization performance on other datasets

Finally, if the scale of the model is expanded, using ResNeSt200 as the backbone and training 8x schedule, the effect of the model can be comparable to some SOTA models.

Table 5 Expanding the scale of the model

The author analyzed the learned unified label space and found that this method successfully separates labels with the same name but different semantics, and fuses labels with different names but the same semantics.

3. Actual operation

The author has open sourced the code of the paper, download address: https://github.com/xingyizhou/UniDet

This paper relies on the detectron2 framework, and you need to install detectron2 according to the instructions. Documentation: https://detectron2.readthedocs.io/en/latest/tutorials/install.html

After installing detectron2, you need to put the paper code in the projects directory:

cd /path/to/detectron2/projects
git clone https://github.com/xingyizhou/UniDet.git

recommended will be requiredCOCOObjects365OpenImages (using the 2019 challenge version) dataset is downloaded to a unified folder. (Download address: https://storage.googleapis.com/openimages/web/challenge2019.html ). downloadedCOCO andObjects365 can be used directly, butOpenImages needs to do the conversion:

# 使用UniDet/tools/convert_datasets/convert_oid.py进行转换 
python convert_oid.py --path /path/to/openimage --version challenge_2019 --subsets train
python convert_oid.py --path /path/to/openimage --version challenge_2019 --subsets val --expand_label

Use the following command to point the DETECTRON2_DATASETS variable to the folder where the dataset is downloaded, so that detectron2 can read the dataset.

export DETECTRON2_DATASETS=/path/to/datasets

Next, you need to download the challenge-2019-label500-hierarchy.json file. (Download address: https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-label500-hierarchy.json ). And modify the oid_hierarchy_path in the UniDet/unidet/evaluation/oideval.py file to the actual path of the file.

 

In addition, in UniDet/configs/, you can see that there is an openimages_challenge_2019_train_v2_cat_info.json file in the config file related to OpenImages data, which is not given in the code warehouse.

 By observing the training code, we can know that the statistical information of the number of pictures in each category of the OpenImages dataset is stored in the file , which can be calculated by yourself.The number of pictures in each category of OpenImages is counted to generate, and the specific format is as follows:

 [{"id": 419, "image_count": 45938}, {"id": 231, "image_count": 31351}, {"id": 71, "image_count": 130723}, {"id": 114, "image_count": 378077}, {"id": 117, "image_count": 3262}, {"id": 30, "image_count": 289999}, {"id": 11, "image_count": 58145}, {"id": 165, "image_count": 265635}, {"id": 345, "image_count": 29521}, ...]

At this point, all preparations are completed and training can begin. If you want to train a multi-detection head model with a 2x schedule, you can use the following command line to achieve:

python UniDet/train_net.py --config-file UniDet/configs/Partitioned_COI_R50_2x.yaml

In UniDet/datasets/label_spaces/, the author gives various unified label spaces. Here is a look at the label space learned by the method proposed in the paper:

If you want to reproduce the steps of learning the label space, you can get the inference results on the verification set of the three data sets after training a lot of detection head models, and follow the steps in UniDet/tools/UniDet_learn_labelspace_mAP.ipynb to run.

If you want to add a custom dataset, you can register the custom dataset in UniDet/unidet/data/datasets/ and modify the config file. For details on how to register, please refer to the detectron2 document: https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html

4. Dataset download

The relevant data sets mentioned in this article are free, high-speed retrieval, and download addresses:

● COCO 2014:https://opendatalab.org.cn/COCO_2014

● COCO 2017:https://opendatalab.org.cn/COCO_2017

● Objects365:​https://opendatalab.org.cn/Objects365

● OpenImages Challenge 2019:https://opendatalab.org.cn/Open_Images_Challenge_2019

● OpenImages V4:https://opendatalab.org.cn/Open_Images_V4

● OpenImages V6:https://opendatalab.org.cn/OpenImages-v6

What other datasets do you want to know about? For more resources, please visit the official website of OpenDataLab. There are 3700+ massive, safe and convenient data set resources to meet your needs. Welcome to experience and download. (Click to view the original text)

● OpenDataLab official website:https://opendatalab.org.cn/

5. Postscript

This article introduces the problems of integrating multiple datasets for training and a simple paper on integrating multiple datasets for training, and introduces how to use the code of the paper and related dataset resources. Follow-up will continue to introduce other methods of integrating multiple datasets for training, like, forward, and share to help update~

references

[1] Zhou, Xingyi, Vladlen Koltun, and Philipp Krähenbühl. "Simple multi-dataset detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[2] Zhao, Xiangyun, et al. "Object detection with a unified label space from multiple datasets." European Conference on Computer Vision. Springer, Cham, 2020.

Author丨Langy 

believe in the power of light

- End -

The above is this sharing. To obtain massive dataset resources, please visit OpenDataLab official website ; to obtain more open source tools and projects, please visit OpenDataLab Github space . In addition, if there is anything else you want to see, come and tell the little assistant. More data sets are on the shelves, more comprehensive data set content interpretation, the most powerful online Q&A, the most active circle of peers... Welcome to add WeChat opendatalab_yunying to join the OpenDataLab official communication group.

Guess you like

Origin blog.csdn.net/OpenDataLab/article/details/127791600