Convolution for low resolution or small objects - SPDConv

Convolution for low resolution or small objects - SPDConv

Summary

Convolutional neural networks have achieved great success in many computer vision tasks. However, their performance drops rapidly on tasks with low image resolution or small targets . In this paper, we show that this is rooted in a flawed but common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which leads to the loss of fine-grained information and the learning of less effective feature representations . . To this end, we propose a new CNN construct, called SPD-Conv, that replaces each strided convolution layer and each pool layer (by eliminating them completely). SPD-Conv consists of a space-to-depth (SPD) layer and a non-strided convolution layer and can be applied to most CNN architectures. We interpret this new design for two of the most representative computer vision tasks: object detection and image classification, and then we create a new CNN architecture by applying SPD-Conv to YOLOv5 and ResNet.
Code address: https://github.com/LabSAINT/SPD-Conv

introduction

Since AlexNet was proposed, convolutional neural networks have performed well in many computer vision tasks. However, all these CNN models require high-quality input in training and inference, for example: AlexNet was initially trained and evaluated on clear images of 227 X 227, but reduced the image resolution to 1/4 and 1/ After 8, the accuracy of its classification dropped by 14% and 30% respectively, similar to that used in VGG and ResNet. In the case of target detection, small target detection is a very challenging task because smaller targets The resolution itself is low, and contextual information is also limited for model learning . In addition, they usually coexist with large objects in the same image, and large objects often also dominate the feature learning process, making small objects undetectable.
This paper believes that a flawed but common design in existing CNNs leads to performance degradation, that is, using strided convolution or pooling in the early layers of the CNN architecture. The adverse effects of this design usually do not show up because Most of the studied situations are friendly, the images have good resolution and the objects are of moderate size. Therefore, there is a large amount of redundant pixel information that can be conveniently skipped by convolution or pooling, and the model can still learn good features. However, in more difficult tasks, when images are blurry or objects are small, the luxury assumption of redundant information no longer holds, and current designs begin to suffer from insufficient detail and learned features.
To solve this problem, we propose a new construction of CNN called SPD-Conv that completely replaces downsampling and pooling. SPD-Conv is a space-to-depth layer followed only by a non-stride convolutional layer. The SPD layer samples the feature map , but we widely generalize it to the sampling of feature maps inside the network and throughout the network. In addition, we add a convolution operation after each SPD to reduce the number of channels using the parameters of the science department. Our proposed method is General and unified, that is, SPD can be applied to most CNN architectures and replaces strided convolution and pooling in the same way.
Insert image description here

A New Building Block:SPD-Conv

Insert image description here
Insert image description here
Insert image description here
Paper address: https://arxiv.org/pdf/2208.03641v1.pdf
Insert image description here

appendix

Code:

# SPD-Conv
import torch
import torch.nn as nn
class Spd(nn.Module):
    def __init__(self,dimension=1):
        super().__init__()
        self.d = dimension
    def forward(self,x):
        return torch.cat([x[...,::2,::2],x[...,1::2,::2],x[...,::2,1::2],x[...,1::2,1::2]],1)
data = torch.zeros(3,64,640,640)
con = Spd()
con(data).shape

Insert image description here

Guess you like

Origin blog.csdn.net/shengweiit/article/details/132297912