yolov7 improvement series

1. YOLOv7 improved structure series: The latest combination of new CNN convolution construction for small targets

(71 messages) YOLOv7 improved structure series: the latest combination of new CNN convolution building blocks for small targets_Mango Juice No Mango Blog-CSDN Blog

1. Theoretical part of SPD paper
Convolutional Neural Networks (CNN) have achieved great success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on more difficult tasks where the image resolution is low or the objects are small. In this paper, we point out that this stems from a flawed but common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which leads to loss of fine-grained information and to less efficient features Learning of representations. To this end, we propose a new CNN building block named SPD-Conv to replace each strided convolutional layer and each pooling layer (thus eliminating them entirely). SPD-Conv consists of a spatial-to-depth (SPD) layer followed by a non-stripped convolutional (Conv) layer and can be applied to most, if not all, CNN architectures. We explain this new design under two of the most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically demonstrate that our method significantly outperforms state-of-the-art deep learning models, especially in more complex environments with low-resolution images and small objects. on difficult tasks.

 

 

  YOLOv7 improves the RepFPN structure|The latest combination: the latest paper in 2023 designs an efficient RepFPN structure, with an efficient Repvgg-style ConvNet network structure designed with hardware-aware neural network, which has a strong performance

EfficientRep An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design 

(71 messages) EfficientRep An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design_Xiaomengxinxin's Blog-CSDN Blog 

 

Target detection model design guidelines | Interpretation of the ELAN model for YOLOv7 reference, the design source of the YOLO series model ideas

(71 messages) Target detection model design guidelines | ELAN model interpretation of YOLOv7 reference, design source of YOLO series model ideas_Artificial Intelligence Algorithm Research Institute Blog-CSDN Blog

 

=================================================================

yolov5 improves spd-conv 

yaml file

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],     # 0-P1/2
   [-1, 1, Conv, [128, 3, 1]],  # 1
   [-1,1,space_to_depth,[1]],   # 2 -P2/4
   [-1, 3, C3, [128]],          # 3
   [-1, 1, Conv, [256, 3, 1]],  # 4
   [-1,1,space_to_depth,[1]],   # 5 -P3/8
   [-1, 6, C3, [256]],          # 6
   [-1, 1, Conv, [512, 3, 1]],  # 7-P4/16
   [-1,1,space_to_depth,[1]],   # 8 -P4/16
   [-1, 9, C3, [512]],          # 9
   [-1, 1, Conv, [1024, 3, 1]], # 10-P5/32
   [-1,1,space_to_depth,[1]],   # 11 -P5/32
   [-1, 3, C3, [1024]],         # 12
   [-1, 1, SPPF, [1024, 5]],    # 13
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],                    # 14
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],    # 15
   [[-1, 9], 1, Concat, [1]],                     # 16 cat backbone P4
   [-1, 3, C3, [512, False]],                     # 17

   [-1, 1, Conv, [256, 1, 1]],                    # 18
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],    # 19
   [[-1, 6], 1, Concat, [1]],                     # 20 cat backbone P3
   [-1, 3, C3, [256, False]],                     # 21 (P3/8-small)

   [-1, 1, Conv, [256, 3, 1]],                    # 22
   [-1,1,space_to_depth,[1]],                     # 23 -P2/4
   [[-1, 18], 1, Concat, [1]],                    # 24 cat head P4
   [-1, 3, C3, [512, False]],                     # 25 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 1]],                    # 26
   [-1,1,space_to_depth,[1]],                     # 27 -P2/4
   [[-1, 14], 1, Concat, [1]],                    # 28 cat head P5
   [-1, 3, C3, [1024, False]],                    # 29 (P5/32-large)

   [[21, 25, 29], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

commen.py

class space_to_depth(nn.Module):
    # Changing the dimension of the Tensor
    def __init__(self, dimension=1):
        super().__init__()
        self.d = dimension

    def forward(self, x):
         return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)

yolo.py

 

 

Guess you like

Origin blog.csdn.net/Hoshea_sun/article/details/129402305