Table of contents
1. Introduction to small target detection
2. Small target and difficult solution
2.1 Attention improves the detection accuracy of small targets
2.1.4 Generalized building blocks of multi-head context aggregation (Context Aggregation)
2.3.1 Wasserstein Distance Loss
1. Introduction to small target detection
1.1 Definition of small goals
1) Taking the COCO object definition, a general data set in the field of object detection, as an example, a small object refers to a pixel smaller than 32×32 (a medium object refers to 32*32-96*96, and a large object refers to a larger object than 96*96);
2) In actual application scenarios, it is usually more inclined to use the ratio relative to the original image to define: the product of the length and width of the object label box, divided by the product of the length and width of the entire image, and then open the root sign, if the result is less than 3%, called small goals;
1.2 Difficulties
1) The number of samples containing small targets is small, which potentially makes the target detection model pay more attention to the detection of medium and large targets;
2) The area covered by small objects is smaller, so the location of small objects will lack diversity. We speculate that this makes generalizability of small object detection difficult to verify;
2. Small target and difficult solution
Mainly through data optimization (such as graffiti data enhancement, mosaic enhancement), network optimization, attention mechanism, loss optimization, etc.;
2.1 Attention improves the detection accuracy of small targets
2.1.1 Context Information CAM
Due to low resolution and small size, tiny objects are difficult to detect. The main reason for the poor performance of tiny object detection is the limitation of the network and the imbalance of the training dataset. In this paper, we propose a novel feature pyramid network that combines contextual augmentation and feature refinement. The features obtained by multi-scale expansion convolution are fused and injected into the feature pyramid network from top to bottom to supplement context information. In multi-scale feature fusion, channel and spatial feature refinement mechanisms are introduced to suppress conflict formation and prevent tiny objects from being submerged in conflict information. In addition, a data augmentation method called copy-reduce-paste is also proposed, which can increase the contribution of tiny objects to missed detections during training, thus ensuring more balanced training.
2.1.2 ConvNeXt
2.1.3 ECVBlock
YoloV5-based CFPNet---ECVBlock's small target detection, plug and play, help to detect rising points_AI Little Monster's Blog-CSDN Blog The EVC proposed is mainly composed of two blocks connected in parallel, in which lightweight MLP is used to capture the global long-term dependencies (i.e., global information) of top-level features.
2.1.4 Generalized building blocks of multi-head context aggregation (Context Aggregation)
2.2 Multi-head detection head
YOLOv5 has 3 detection heads, which can detect targets on multiple scales, but the detection ability of tiny targets may be poor. Therefore, adding a detection head for tiny objects can increase a lot of points, and the map improvement is obvious;
2.3 loss optimization
2.3.1 Wasserstein Distance Loss
1) The sensitivity of IoU to small object position deviation is analyzed, and NWD is proposed as a better indicator to measure the similarity between two bounding boxes;
2) Design powerful tiny object detectors by applying NWD to label assignment, NMS and loss functions in anchor-based detectors;
3) The proposed NWD can significantly improve the TOD performance of popular anchor-based detectors, and it achieves a performance improvement from 11.1% to 17.6% on Faster R-CNN on the AI-TOD dataset;
The main advantages of the Wasserstein distance are :
- The distribution similarity can be measured regardless of whether there is overlap between small objects;
- NWD is insensitive to objects of different scales and is more suitable for measuring the similarity between small objects.