foreword

YOLOv5 is like a gold mine with countless things to learn. The previous blog posts have always used YOLOv5 as a black box, only considering the input and output of the model, so as to carry out secondary development of the model.
This blog post will go one level closer and go deep into the "gold mine" to try to replace the model structure.

Model building analysis

YOLOv5 builds the model architecture through the model configuration file in yaml format. Here, my previous blog post [Target Detection] YOLOv5: Model Construction Analysis has been interpreted, and I will not repeat it.

The YOLOv5 model is mainly divided into versions 5.0 and 6.0 and above. There are a few differences between the two. This article focuses on the latter model.

The YOLOv5s model architecture diagram is as follows. This diagram comes from the summary of the target detection YOLOv5 network v6 0 version

insert image description here

Modify the model

The goal of this modification is to modify the two convolution blocks 18 and 21. Here, downsampling is achieved through a convolution kernel with a convolution kernel of 3 and a step size of 2. My goal is to modify it to two volumes of different sizes. The product kernel, the output result is the sum of two different convolution kernels.

insert image description here

Verification Dimensions

The most troublesome thing to modify the size is the dimension change, so before modifying, it is best to simulate the data and check the shape of the modified part separately.
Here is a test example:

import torch.nn as nn
import torch

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    # Pad to 'same' shape outputs
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p


class Conv(nn.Module):
    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))


class Multi_Conv(nn.Module):
    # Multi Different Kernel-size Conv
    def __init__(self, c1, c2, e=1.0):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 3, 2)
        self.cv2 = Conv(c1, c_, 7, 2)

    def forward(self, x):
        return self.cv1(x) + self.cv2(x)


if __name__ == '__main__':
    input_tensor = torch.rand(1, 128, 80, 80)
    conv = Conv(128, 256, 3, 2)
    mult_conv = Multi_Conv(128, 256)
    output_tensor1 = conv(input_tensor)
    print(output_tensor1.shape)  # torch.Size([1, 256, 40, 40])
    output_tensor2 = mult_conv(input_tensor)
    print(output_tensor2.shape)   # torch.Size([1, 256, 40, 40])

Note: Conv is not a native convolution of pytorch. The author of yolov5 refactored it and added autopadthis function. This allows people to automatically fill padding when modifying the size of the convolution kernel to ensure that the output results have the same dimension.

As can be seen from the above example, after adding my original double convolution kernel structure Multi_Conv, the output dimension is consistent with the single-core output.

embedded model

There are two main ways to modify the model. The first is to directly modify the configuration file (.yaml). Yaml is mainly used to control the serial connection of the model. After modification, it means that the subsequent labels also need to be adjusted, which is more troublesome.
Another way of thinking is module replacement. In the single-core module of the model, replace it with a complex structure. Here, choose the second method.

First add the original structure created to models/common.pythe file:

class Multi_Conv(nn.Module):
    # Multi Different Kernel-size Conv
    def __init__(self, c1, c2, e=1.0):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 3, 2)
        self.cv2 = Conv(c1, c_, 7, 2)

    def forward(self, x):
        return self.cv1(x) + self.cv2(x)

Then models/yolo.pyadd in Multi_Conv:

if m in {
    
    
        Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
        BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, Multi_Conv}

After the addition is complete, run yolo.py, and you can see that the module you created has been successfully loaded:
insert image description here

View speed and parameters

When designing a network model, it is best to be able to visually check the operating efficiency of each layer of the model. In yolo.py, the author has reserved line-profilethis parameter interface. After setting it to True, you can see the time spent on the parameters of each layer of the model:

time (ms)     GFLOPs     params  module
      6.75       0.73       3520  models.common.Conv
      0.70       0.96      18560  models.common.Conv
      2.09       0.98      18816  models.common.C3
      0.54       0.95      73984  models.common.Conv
      1.86       1.49     115712  models.common.C3
      0.40       0.95     295424  models.common.Conv
      2.59       2.01     625152  models.common.C3
      0.60       0.95    1180672  models.common.Conv
      1.40       0.95    1182720  models.common.C3
      0.60       0.53     656896  models.common.SPPF
      0.20       0.11     131584  models.common.Conv
      0.10       0.00          0  torch.nn.modules.upsampling.Upsample
      0.00       0.00          0  models.common.Concat
      1.50       1.16     361984  models.common.C3
      0.30       0.11      33024  models.common.Conv
      0.00       0.00          0  torch.nn.modules.upsampling.Upsample
      0.00       0.00          0  models.common.Concat
      1.40       1.17      90880  models.common.C3
      6.65       3.04     950784  models.common.Multi_Conv
      0.00       0.00          0  models.common.Concat
      1.40       0.95     296448  models.common.C3
      4.19       1.52    1901056  models.common.Multi_Conv
      0.00       0.00          0  models.common.Concat
      1.30       0.90    1117184  models.common.C3
      0.50       0.73     229245  Detect
     35.05          -          -  Total

[Target Detection] YOLOv5: Modify your own network structure