A Preliminary Study of Small Target Detection in Deep Learning

This article was first published on the WeChat public account CVHub, and shall not be reproduced or sold in any form, and offenders will be held accountable!

guide

Small object detection is an extremely challenging problem in the field of computer vision. With the continuous development of deep learning and computer vision, more and more application scenarios require accurate detection and recognition of small targets. This article will start with the definition, significance and challenges of small targets, and comprehensively introduce various solutions for small target detection.

Considering that this column will continue to update the content on the subject of target detection, share the latest research progress and technology applications. Therefore, we have set up a target detection seminar group. Interested partners can add the assistant WeChat: cv_huber or scan the QR code at the end of the article, and remark "target detection communication" to join the group and exchange learning experience and engineering experience with each other. . We hope that through this group chat, everyone can better understand the various solutions for object detection, and at the same time make some like-minded friends, grow and make progress together! Come and join us, looking forward to your joining!

definition

In a broad sense, small target detection refers to the detection and recognition of target objects with small size and small area in the image. Generally speaking, the definition of a small target depends on the specific application scenario, but it can generally be considered that a small target refers to a size smaller than 32 × 32 32 \times 3232×A 32 -pixel object, as defined in the COCO dataset as shown below. Of course, the size and area requirements of small targets may vary for different tasks and applications.

In the COCO dataset, measurement standards are proposed for pictures of three different sizes (small, medium, large), which contain about 41% of small objects (area<32×32), 34% of medium objects (32×32< area<96×96), and 24% of large objects (area>96×96). Among them, the AP of small targets is difficult to improve!

significance

The significance of small target detection is that it can improve the application range of the technology, and at the same time help everyone better understand the details in the image. In addition, small target detection is actually widely used in many fields in our daily life, such as traffic monitoring, medical image analysis, drone aerial photography, etc. for example:

  • In the field of traffic monitoring, small object detection can be used to identify traffic lights, license plates, etc.
  • In the field of medical image analysis, small target detection can be used to identify tiny tumor cells, etc.
  • In the field of autonomous driving, small target detection can be used to identify tiny obstacles to make up for the difficulty of lidar detection.

challenge

Students who have done detection tasks should be very clear about this, that is, small target detection has always been a very challenging problem. Here are a few small examples to give you a feel:

  1. Small objects usually occupy a small area in the image, and it is actually difficult for deep learning algorithms to extract effective information, let alone traditional feature extraction methods. For example, for an indoor conference scene, suppose our camera is installed in the upper area of ​​the upper left corner. If you have trained a detection model and applied it at this time, you will observe that the detection effect in the diagonal area far away from the camera is better than other Generally speaking, the area will be much worse, and it is especially easy to cause missed and false detections.

  2. Small targets do not have rich details such as texture and color of conventional size targets, which makes the detection of small targets more difficult, and is easily mistaken for "noise points" by the model.

  3. Small targets are sometimes difficult to define. Take the simplest pedestrians and vehicles as an example. Take a look at the picture below:

Roughly divided, the targets within the green frame are actually easy to mark, mainly the targets within the red frame. Most of the target pixels account for a very small proportion, neither is it marked, nor is it not marked. Of course, you can use ignorethe label to not calculate the loss or simply delete this area mask, but the reality is that in many cases, this "small target" is actually very large. The probability will be missed, and too many can easily cause the training curve to "jitter".

solution

Today, let us focus on how to solve the problem of small target detection. Everyone should have critical thinking and adopt appropriate methods according to the actual situation.

It should be noted that due to the existence RoI Poolingof such operations in the two-stage target detection algorithm, the features of small targets will be enlarged, and the feature outlines will be clearer, so the detection rate is usually higher. However, this article mainly focuses on the relatively mature single-stage target detection algorithm.

Increase input image resolution

Image resolution is the biggest culprit. Imagine that if the resolution of an image is too small, assuming we downsample by 32 times, theoretically, the target information smaller than this pixel will be completely lost. Therefore, when dealing with small target detection, due to the small size of the target object, it is usually necessary to increase the resolution of the input image in order to better capture the details of the target. By increasing the resolution of the input image, the accuracy and recall of small object detection can be improved to better identify and track target objects. Of course, in reality, due to various reasons (money), uh, everyone understands.

Increase model input size

Image scaling is another common solution that can also improve the accuracy of small object detection. A common practice is to directly enable "multi-scale training" and set a relatively large size range. However, increasing the model input size may lead to an increase in model computation and a decrease in speed. Therefore, everyone needs to balance the balance between accuracy and efficiency when using it. Tuning is often required to find the optimal model input size based on actual needs and available resources.

Similarly, during inference, you can also turn on the enhancement during testing according to the situation Test Time Augmentation, TTA, especially when playing games.

feature fusion

Multi-scale feature fusion

Due to the small size of small objects, their feature information is often distributed in multiple scales of the image, so it needs to be fused in feature maps of multiple scales to improve the model's perception of small objects. Common multi-scale feature fusion methods include Feature Pyramid Networks, FPNand Path Aggregation Network, PANetc.

Extended Feature Pyramid Network for Small Object Detection

long hop connections

Long skip connection refers to a method of fusing feature maps at different levels, which can help the model better capture feature information at different levels. It is well known that shallow feature maps are rich in detail information but weak in semantic information, while deep feature maps are the opposite. Therefore, in small object detection, low-level feature maps and high-level feature maps can be fused to enhance the localization ability of small objects.

attention mechanism

The attention mechanism is a technology that can focus the model's attention on important areas. By weighting the feature map, more attention can be focused on the area where the small target is located, thereby improving the detection ability of small targets. . Common attention mechanisms include SENet, , SKNetetc.

Regarding the various operations of attention, please search directly in the background of the WeChat public account CVHub "One article takes you to see all kinds of attention mechanisms in deep learning", or directly add the editor's WeChat friend: cv_huber, and send the electronic version to you .

data augmentation

Data augmentation is to increase the number and diversity of data samples by randomly transforming the data under the premise of keeping the data itself unchanged, so as to improve the generalization ability and robustness of the model. For small target detection tasks, data enhancement can be solved in the following ways:

scale transformation

For small objects, their size is often small, so the scale change of the data sample can be increased by scaling or enlarging the original image. For example, the original image can be scaled down to obtain multiple image samples with smaller sizes.

random cropping

For images containing small objects, without changing the location of the object, multiple different image samples can be obtained by random cropping to increase the diversity of the data. In addition, non-rectangular cropping methods, such as polygonal cropping, can be used to better adapt to the irregular shape of small objects.

advanced combination

The most familiar of this big guy may be the Mosaic enhancement in YOLO, which is stitched from multiple original images, so that each image has a greater probability of containing small targets. In addition, we can also fully "copy-paste" various small targets through methods such as Copy-Paste, thereby increasing the "exposure" of small targets and increasing their probability of being detected.

Regarding the various coquettish operations of data enhancement, please search directly in the background of the WeChat public account CVHub "One article takes you to see all kinds of data enhancement in deep learning", or directly add the editor's WeChat friend: cv_huber, and send the electronic version to you.

Large image segmentation

Tiling

TilingIt is an effective preprocessing operation for segmenting large images. The above image is Roboflowa demonstration on the platform. This tileeffectively allows the object detection network to better focus on small objects, while allowing us to maintain the small input resolution required to be able to run fast inference. However, it should be noted that the consistency of the input should also be maintained during reasoning.

ACTUALLY

Tiling is an old technology. At present, the author strongly recommends Slicing Aided Hyper Inference, SAHIslice-assisted super inference, which is a reasoning framework dedicated to small target detection. In theory, it can be integrated into any target detector without any fine-tuning. This method has been integrated into many mature target detection frameworks and models, such as YOLOv5, , Detectron2and MMDetectionso on.

loss function

weighted sum

This is very easy to understand, that is, we can define the size of small target detection by ourselves. Since we have GT, we can artificially apply greater weight to small targets when calculating Loss, so that the network can pay more attention to this part.

Stitcher

StitcherIt is a product that was published a few years ago, and it comes from Stitcher: Feedback-driven Data Provider for Object Detectionthe article " ". The author observed through statistical analysis that the reason why the detection performance of small targets is poor is that the contribution to the loss during training is very small (either missed or missed). Therefore, the article proposes a mechanism based on dynamic feedback during training, that is, according to the calculated loss, it automatically decides whether to perform image stitching.

other

The following is a brief summary of some representative small target detection articles.

2023

  • TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors

  • YOLO-Drone: Airborne real-time detection of dense small targets from high-altitude perspective

2022

  • Towards Large-Scale Small Object Detection: Survey and Benchmarks

2020

  • Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

2019

  • Augmentation for small object detection

Guess you like

Origin blog.csdn.net/CVHub/article/details/131270568