Optimize and improve the YOLOv5 algorithm: add the SPD-Conv module to make small targets have nowhere to hide - (super detailed)

1 SPD-Conv module

论文:https://arxiv.org/pdf/2208.03641v1.pdf

Abstract:Convolutional neural networks (CNNs) have achieved remarkable success in computational sensory tasks such as image classification and object detection. However, their performance degrades catastrophically when the image resolution is lower or the objects are smaller. This is due to flaws in the common design architecture of existing CNNs, namely the use of convolution strides and/or pooling layers, which leads to the loss of fine-grained information and the learning of less efficient feature representations. To this end, we propose a new CNN building block called SPD-Conv to replace each convolution step and each pooling layer (thus eliminating them completely). SPD-Conv consists of a spatial-to-depth (SPD) layer and a convolution-free stride (Conv) layer and can be applied to most CNN architectures. We explain this new design from two of the most representative computational vision tasks: object detection and image classification. We then apply SPD-Conv to YOLOv5 and ResNet, create a new CNN architecture, and empirically demonstrate that our method significantly outperforms state-of-the-art deep learning models, especially when dealing with low-resolution images and small objects, etc. more difficult tasks.

SPD-Conv is a new building block that replaces strided convolution and pooling layers in existing CNN architectures. It consists of a spatial-to-depth (SPD) layer and a non-stride convolution (Conv) layer. The role of the space-to-depth (SPD) layer is to reduce each spatial dimension of the input feature map to the channel dimension while retaining the information within the channel.

Guess you like

Origin blog.csdn.net/qq_40716944/article/details/134130153