The impact of negative samples\positive samples on the performance of target detection algorithms

Recently, when doing pedestrian detection tasks, after cleaning the data, there are some samples with empty labels. Therefore, I want to consider the impact of these samples with empty labels on the performance of the model.

1. Concept definition

Negative samples: In the target detection task, some pictures in the data set do not have targets, and these pictures are usually called negative samples.

Positive sample: refers to the image containing the target.

Background: The background refers to the region in the whole image that does not contain the object, which is different from the negative samples. Negative samples are for the entire image, and background is for the bounding box.

2. Thinking

1. In the target detection task, some images in the data set do not have targets (there are many negative samples), how will the elimination or non-elimination affect the model performance?

In object detection tasks, eliminating images without objects may have an impact on the performance of the model. Specifically, if there are a large number of untargeted pictures in the data set, and these pictures are not eliminated, it may cause the model to learn wrong features and fail to correctly identify the target object. For example, there are a large number of negative samples in ImageNet, and there are some target labeling errors, but large models will have the ability to correct errors, which can be compensated for by a lot of learning. Please refer to: Remove ImageNet label errors, the model ranking has a large Variety

On the other hand, if these untargeted pictures are eliminated, the size of the data set may be reduced, which may affect the generalization ability of the model. The generalization ability of a model refers to the model's ability to correctly detect objects when encountering new, unseen data. If the data set size becomes smaller, the model may not be able to learn enough features, thereby reducing the model's generalization ability.

Therefore, trade-offs and choices need to be made on a case-by-case basis. If there is a large proportion of images without objects in the data set, you can consider eliminating these images. If the proportion of images without objects is small, these images can be kept in the dataset and other methods (such as data augmentation) can be used to improve the generalization ability of the model.

The question is, what is the approximate proportion that needs to be eliminated. Generally speaking, if the proportion of negative samples to the total data set is small, it can be retained. If the proportion of negative samples to the total data set is large, it needs to be eliminated. However, the reduction of the data size in the elimination is not a good solution. The optimal solution should be Use the method of balancing positive and negative samples.

Detailed analysis is as follows:

(1) Determining whether the proportion of an image without a target is acceptable needs to be decided on a case-by-case basis. Generally speaking, a relatively small proportion of images without targets usually means that the proportion of images without targets in the entire data set is less than 10%. This ratio is not a hard and fast rule, but needs to be considered on a case-by-case basis. If the dataset size is small, or the target object is difficult to identify, there may be a higher proportion of images without targets.

In addition, even if the proportion of images without targets is small, it is necessary to pay attention to the impact of these images on the model. If there are backgrounds or scenes similar to the target object in the pictures without the target, these pictures can still provide some useful information. In this case, retaining these images may help improve the model's generalization ability.

In general, determining whether to retain images without targets requires comprehensive consideration of factors such as the size of the data set, the difficulty of the target object, and the similarity of images without targets. The final decision should be based on the results of experimentation and practice.

(2) If the application scenario of the model overlaps with the scenario in the test set, and the number of images in the test set is large enough, you can consider retaining these images without targets. The advantage of this is that it can better evaluate the performance of the model in real-life scenarios. However, if there are too many images without objects in the test set, it may bias the performance evaluation of the model, because the model only learns to recognize images with objects, but does not learn how to recognize images without objects. If there are too many images without targets in the data set, it may cause the model to overfit the target objects, thus affecting the performance of the model. In this case, you can consider eliminating a portion of the images without targets to reduce the risk of model overfitting.

On the other hand, if the application scenario of the model does not include the situation where there are no targets in the test set, or the number of pictures without targets in the test set is too small, it is also possible to eliminate these pictures. This can reduce the learning difficulty of the model and allow the model to focus more on identifying target objects. However, it is important to note that excluding these images may bias the evaluation results because the model does not learn how to recognize images without objects.

To sum up, whether to eliminate pictures without targets in the test set needs to be decided based on specific scenarios and needs, and different factors need to be weighed to make decisions.

2. Eliminate negative samples from the test set and the evaluation index will become worse.

Theoretical analysis: In the target detection task, after removing the samples without targets in the test set, the test indicators become worse. It may be because after removing these samples, the distribution of the test set is different from the distribution of the actual scene, which leads to the generalization of the model. Chemical performance decreases.

Specifically, if there are too many samples without targets in the test set, and there are many pictures without targets in the actual scene, then removing the samples without targets in the test set will cause the distribution of the test set to be inconsistent with the distribution of the actual scene. As a result, the performance of the model decreases in actual scenarios.

In addition, excluding samples without targets in the test set may also cause bias in the model's performance on the test set. Because the model does not learn how to deal with untargeted situations during training, if no target samples are removed from the test set, the model cannot be properly evaluated for these situations.

Therefore, if there are too many samples without targets in the test set, we should consider how to solve the performance problem of the model in these cases, instead of removing these samples. For example, you can consider using data enhancement methods to increase some samples without targets so that the model can better learn the characteristics of these situations.

Indicator analysis: By combing the evaluation indicators, it is found that the elimination of negative samples will change the total number of pictures, and some calculation indicators related to the total number of pictures will be affected, such as FPPI indicators.

3. Method of balancing positive and negative samples 

(1) Resampling: Balance the number of positive and negative samples by increasing minority class samples or reducing majority class samples. Specifically, methods such as undersampling and oversampling can be used.

(2) Class Weighting: Assign different weights to samples of different classes, so that the model pays more attention to the learning of rare classes during training. Specifically, methods such as sample weighting (Sample Weighting) and loss function weighting (Loss Weighting) can be used.

(3) Data Augmentation: Expand the sample set by performing various transformations (such as rotation, flipping, scaling, cropping, etc.) on the samples in the training set, so as to balance the number and category distribution of positive and negative samples. Specifically, methods such as Random Scaling, Random Cropping, and Random Rotation can be used.

(4) Generative Adversarial Networks (GANs): By generating some virtual minority class samples, the model can better learn the characteristics of the minority class. Specifically, methods such as Conditional GANs can be used.

(5) Adjustment at the algorithm level: For some algorithms, such as Faster R-CNN, hyperparameters such as the threshold of the RPN network and the threshold of nms can be adjusted to balance positive and negative samples.

Focus on the method angle, the display here is not complete, you can take a brief look first (subsequent updates will continue to be updated):

Oksuz et al. introduced the imbalance problem in target detection in detail in a review article and divided it into four categories: spatial imbalance, target imbalance, category imbalance and scale imbalance. Spatial imbalance and object imbalance mainly focus on the spatial properties of bounding boxes and multiple loss functions, while class imbalance is caused by significant inequality between different classes in the training data. Park, K., Kim, S., Sohn, K.: Uni ed multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognition 80, 143{155 (2018)

RetinaNet addresses the class imbalance problem by adjusting the standard cross-entropy loss to avoid overwhelming the detector with a large number of simple negative samples.

AP-Loss and DR Loss also provide ideas for designing loss functions to solve the problem of category imbalance. Chen, K., Li, J., Lin, W., See, J., Wang, J., Duan, L., Chen, Z., He, C., Zou, J.:Towards accurate one-stage object detection with ap-loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5119{5127 (2019)

Scale imbalance occurs when object bounding boxes of certain sizes are over-represented in the network. For example, SSD makes independent predictions from features at different layers. Since different layers have different levels of information abstraction, predictions directly from different layers of the backbone are unreliable. FPN utilizes additional top-down paths in order to obtain a balanced feature mixture from features at different scales. FPN can be further enhanced by integrating and refining pyramid feature maps.

In addition to balancing different levels, the features of different modalities should be balanced and integrated in the two-stream network. In other words, the features of different modalities should be fully integrated and represented to achieve balanced modality optimized training.

Personal clumsy understanding, everyone can discuss and exchange in the comment area.

Guess you like

Origin blog.csdn.net/qq_37424778/article/details/129802859