foreword
YOLOv5 is like a gold mine with countless things to learn. The previous blog posts have always used YOLOv5 as a black box, only considering the input and output of the model, so as to carry out secondary development of the model.
This blog post will go one level closer and go deep into the "gold mine" to try to replace the model structure.
Model building analysis
YOLOv5 builds the model architecture through the model configuration file in yaml format. Here, my previous blog post [Target Detection] YOLOv5: Model Construction Analysis has been interpreted, and I will not repeat it.
The YOLOv5 model is mainly divided into versions 5.0 and 6.0 and above. There are a few differences between the two. This article focuses on the latter model.
The YOLOv5s model architecture diagram is as follows. This diagram comes from the summary of the target detection YOLOv5 network v6 0 version
Modify the model
The goal of this modification is to modify the two convolution blocks 18 and 21. Here, downsampling is achieved through a convolution kernel with a convolution kernel of 3 and a step size of 2. My goal is to modify it to two volumes of different sizes. The product kernel, the output result is the sum of two different convolution kernels.
Verification Dimensions
The most troublesome thing to modify the size is the dimension change, so before modifying, it is best to simulate the data and check the shape of the modified part separately.
Here is a test example:
import torch.nn as nn
import torch
def autopad(k, p=None, d=1): # kernel, padding, dilation
# Pad to 'same' shape outputs
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
# Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class Multi_Conv(nn.Module):
# Multi Different Kernel-size Conv
def __init__(self, c1, c2, e=1.0):
super().__init__()
c_ = int(c2 * e)
self.cv1 = Conv(c1, c_, 3, 2)
self.cv2 = Conv(c1, c_, 7, 2)
def forward(self, x):
return self.cv1(x) + self.cv2(x)
if __name__ == '__main__':
input_tensor = torch.rand(1, 128, 80, 80)
conv = Conv(128, 256, 3, 2)
mult_conv = Multi_Conv(128, 256)
output_tensor1 = conv(input_tensor)
print(output_tensor1.shape) # torch.Size([1, 256, 40, 40])
output_tensor2 = mult_conv(input_tensor)
print(output_tensor2.shape) # torch.Size([1, 256, 40, 40])
Note: Conv is not a native convolution of pytorch. The author of yolov5 refactored it and added autopad
this function. This allows people to automatically fill padding when modifying the size of the convolution kernel to ensure that the output results have the same dimension.
As can be seen from the above example, after adding my original double convolution kernel structure Multi_Conv
, the output dimension is consistent with the single-core output.
embedded model
There are two main ways to modify the model. The first is to directly modify the configuration file (.yaml). Yaml is mainly used to control the serial connection of the model. After modification, it means that the subsequent labels also need to be adjusted, which is more troublesome.
Another way of thinking is module replacement. In the single-core module of the model, replace it with a complex structure. Here, choose the second method.
First add the original structure created to models/common.py
the file:
class Multi_Conv(nn.Module):
# Multi Different Kernel-size Conv
def __init__(self, c1, c2, e=1.0):
super().__init__()
c_ = int(c2 * e)
self.cv1 = Conv(c1, c_, 3, 2)
self.cv2 = Conv(c1, c_, 7, 2)
def forward(self, x):
return self.cv1(x) + self.cv2(x)
Then models/yolo.py
add in Multi_Conv
:
if m in {
Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, Multi_Conv}
After the addition is complete, run yolo.py, and you can see that the module you created has been successfully loaded:
View speed and parameters
When designing a network model, it is best to be able to visually check the operating efficiency of each layer of the model. In yolo.py, the author has reserved line-profile
this parameter interface. After setting it to True, you can see the time spent on the parameters of each layer of the model:
time (ms) GFLOPs params module
6.75 0.73 3520 models.common.Conv
0.70 0.96 18560 models.common.Conv
2.09 0.98 18816 models.common.C3
0.54 0.95 73984 models.common.Conv
1.86 1.49 115712 models.common.C3
0.40 0.95 295424 models.common.Conv
2.59 2.01 625152 models.common.C3
0.60 0.95 1180672 models.common.Conv
1.40 0.95 1182720 models.common.C3
0.60 0.53 656896 models.common.SPPF
0.20 0.11 131584 models.common.Conv
0.10 0.00 0 torch.nn.modules.upsampling.Upsample
0.00 0.00 0 models.common.Concat
1.50 1.16 361984 models.common.C3
0.30 0.11 33024 models.common.Conv
0.00 0.00 0 torch.nn.modules.upsampling.Upsample
0.00 0.00 0 models.common.Concat
1.40 1.17 90880 models.common.C3
6.65 3.04 950784 models.common.Multi_Conv
0.00 0.00 0 models.common.Concat
1.40 0.95 296448 models.common.C3
4.19 1.52 1901056 models.common.Multi_Conv
0.00 0.00 0 models.common.Concat
1.30 0.90 1117184 models.common.C3
0.50 0.73 229245 Detect
35.05 - - Total