[论文阅读] Active Learning for Deep Object Detection via Probabilistic Modeling

Paper address: https://openaccess.thecvf.com/content/ICCV2021/html/Choi_Active_Learning_for_Deep_Object_Detection_via_Probabilistic_Modeling_ICCV_2021_paper.html
Code: https://github.com/NVlabs/AL-MDN
Published in: ICCV 21

Abstract

The purpose of active learning is to reduce the cost of labeling by selecting only the most informative samples on the dataset. Few existing works involve active learning for object detection. Most of these methods are based on multiple models, or are direct extensions of classification methods, so only the classification head is used to estimate the information content of the image. In this paper, we propose a novel deep active learning method for object detection. Our method relies on a mixed density network to estimate a probability distribution for the output of each localization and classification head. We explicitly estimate both aleatoric and epistemic uncertainty in a single forward pass of a single model. Our method uses a scoring function that aggregates these two uncertainties for the two heads to obtain an informativeness score for each image. We demonstrate the effectiveness of our method on the PASCAL VOC and MS-COCO datasets. Our method outperforms single-model-based methods and performs on par with multi-model-based methods at low computational cost.

I. Introduction

This paper is one of the few active learning methods specifically tailored for object detection models. In fact, the existing active learning classification methods that claim to be applicable to object detection are basically not open source. From the point of innovation, the highlight of this paper is the decoupling of the traditional concept of uncertainty, which is divided into two parts:

  • Occasional uncertainty: The uncertainty of the model itself about the sample. That is,
    if the model has low confidence in the predictions of the current sample, the chance uncertainty for that sample is high.
  • Cognitive uncertainty: the difference between the current sample and the existing samples in the training set. That is, if the difference between the features of the sample and the feature space of the existing samples in the labeled set is large, the cognitive uncertainty is high.

It is easy to find that these two kinds of uncertainty are equivalent to two mainstream solutions when taken out separately. Most of the task-independent methods are based on accidental uncertainty, while most of the methods relying on task models are based on cognitive uncertainty. The characteristic of this paper is that these two ideas are used at the same time. The solution to these two uncertainties is achieved through "probabilistic modeling".

II. Method

This paper is one of the few active learning algorithms designed solely for object detection. In the past, an algorithm for classification was generally designed and then extended to target detection (ie, applied to the classification head of the target detection model). So why are so few people doing this native approach? Because for the localization head, it is essentially a fixed box (x, y, w, h), so it is not easy to derive uncertainty and the like from this fixed output. If you want to deduce, you have to use the method based on the ensemble idea with high probability, that is, train multiple models with slight differences, and calculate the uncertainty by comparing the difference between the output results. But the training cost of this method is huge. In contrast, the classification head does not have this problem. Although the prediction is essentially a fixed value, the uncertainty can still be simply calculated by observing the output of the softmax layer.

So is there an easier way to estimate the uncertainty of the localization head? This involves the core innovation of this article, the Probabilistic Modeling in the title. This so-called probabilistic modeling is implemented through a plug-in, the Mixture Density Network (MDN). For a general network, inputting an x, the output is a certain value y, and deriving the uncertainty from y is an indirect process. For MDN, the input is an x, the output is a mixture of Gaussian distribution, and the final y is sampled from this distribution. That is to say, by observing the characteristics such as the mean and variance of the distribution, the uncertainty can be solved directly. In fact, both contingent and epistemic uncertainties in this paper are heuristically derived from this distribution.

insert image description here

Guess you like

Origin blog.csdn.net/qq_40714949/article/details/123502715