YOLOv6 Pro introduction file link: YOLOv6 Pro | Make it easier to build a network and replace modules in YOLOv6, and help improve the network structure in scientific research, including Backbone, Neck, DecoupleHead (refer to the way YOLOv5 builds a network)
· YOLOv6 Pro is based on the overall architecture of the official YOLOv6, and uses the network construction method of YOLOv5 to build a YOLOv6 network, including backbone, neck, and effidehead structures.
· You can modify or add modules arbitrarily in the yaml file, and each modified file is independently executable, the purpose is to facilitate scientific research.
· In the future, more network structure improvements will be added based on the modules in yolov5 and yoloair.
· Pre-trained weights have been converted from official weights to ensure they can match.· Pre-released p6 model (unofficial)
Project link: GitHub - yang-0201/YOLOv6_pro: Make it easier for yolov6 to change the network structure
Interested friends can click Star and Fork, and feedback in time if you have any questions. In the early stage of the project, some functional suggestions will be adopted and developed. PRs are also welcome. The project will continue to be updated in the future, so stay tuned!
Entering the topic, today is a tutorial on building a YOLOv6 network structure step by step in the yaml file format. It is full of dry goods. I hope that friends can become more and more proficient in building and modifying the network! ! !
(The construction method is the same as the official yolov5, and it is also applicable to friends who want to change 5)
Today, take the YOLOV6-L model as an example. The structure of the s, t, and n models of yolov6 is different from that of M and L. The main reason is that the modules for building the network are different. The module of the small model is RepBlocks, and the large model is CSPStackRep .
First of all, we need to be familiar with the network structure of YOLOv6. After understanding it, it is more convenient to start building from scratch. Let us first look at the official YOLOv6 code to build the backbone! Not much to say, open the liver, open the liver, open the liver! !
in yolov6/models/efficientrep.py
class CSPBepBackbone(nn.Module):
"""
CSPBepBackbone module.
"""
def __init__(
self,
in_channels=3,
channels_list=None,
num_repeats=None,
block=RepVGGBlock,
csp_e=float(1)/2,
):
super().__init__()
assert channels_list is not None
assert num_repeats is not None
self.stem = block(
in_channels=in_channels,
out_channels=channels_list[0],
kernel_size=3,
stride=2
)
self.ERBlock_2 = nn.Sequential(
block(
in_channels=channels_list[0],
out_channels=channels_list[1],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[1],
out_channels=channels_list[1],
n=num_repeats[1],
e=csp_e,
block=block,
)
)
self.ERBlock_3 = nn.Sequential(
block(
in_channels=channels_list[1],
out_channels=channels_list[2],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[2],
out_channels=channels_list[2],
n=num_repeats[2],
e=csp_e,
block=block,
)
)
self.ERBlock_4 = nn.Sequential(
block(
in_channels=channels_list[2],
out_channels=channels_list[3],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[3],
out_channels=channels_list[3],
n=num_repeats[3],
e=csp_e,
block=block,
)
)
channel_merge_layer = SimSPPF
if block == ConvWrapper:
channel_merge_layer = SPPF
self.ERBlock_5 = nn.Sequential(
block(
in_channels=channels_list[3],
out_channels=channels_list[4],
kernel_size=3,
stride=2,
),
BepC3(
in_channels=channels_list[4],
out_channels=channels_list[4],
n=num_repeats[4],
e=csp_e,
block=block,
),
channel_merge_layer(
in_channels=channels_list[4],
out_channels=channels_list[4],
kernel_size=5
)
)
def forward(self, x):
outputs = []
x = self.stem(x)
x = self.ERBlock_2(x)
x = self.ERBlock_3(x)
outputs.append(x)
x = self.ERBlock_4(x)
outputs.append(x)
x = self.ERBlock_5(x)
outputs.append(x)
return tuple(outputs)
The parameters passed in are:
backbone=dict(
type='CSPBepBackbone',
num_repeats=[1, 6, 12, 18, 6],
out_channels=[64, 128, 256, 512, 1024],
csp_e=float(1)/2,
),
training_mode = "conv_silu"
The backbone network is mainly composed of stem, ERBlock_2, ERBlock_3, ERBlock_4, ERBlock_5 (including SPPF), each ERBlock module contains a block and BepC3 module, which can be found by building the backbone network code (or in the training code, Set the conf file to configs/office/yolov6l.py, you can see it during debugging):
block = get_block(config.training_mode)
def get_block(mode):
if mode == 'repvgg':
return RepVGGBlock
elif mode == 'hyper_search':
return LinearAddBlock
elif mode == 'repopt':
return RealVGGBlock
elif mode == 'conv_relu':
return SimConvWrapper
elif mode == 'conv_silu':
return ConvWrapper
else:
raise NotImplementedError("Undefied Repblock choice for mode {}".format(mode))
So you can get the block is the ConvWrapper module, and then come to yolov6/layers/common.py to see what structure this strange ConvWrapper is
class ConvWrapper(nn.Module):
'''Wrapper for normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
super().__init__()
self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)
def forward(self, x):
return self.block(x)
The official also said that this is not just a traditional convolution plus silu activation function, it is actually the same as the Conv module in yolo5, which belongs to the basic convolution module of YOLOv6~
So in short, the overall network consists of:
stem: ConvWrapper basic convolution module
ERBlock_2: ConvWrapper (downsampling) + BepC3
ERBlock_3: ConvWrapper (downsampling) + BepC3
ERBlock_4: ConvWrapper (downsampling) + BepC3
ERBlock_5: ConvWrapper (downsampling) + BepC3 + SPPF
Let's take a look at the distribution of the number of channels and num_block, which can be obtained one by one according to the parameters passed in
c1 stands for in_channel, c2 stands for out_channel, k stands for kernel, s stands for stride
stem: ConvWrapper basic convolution module: c1: 3, c2: 64, k= 3, s= 2
ERBlock_2: ConvWrapper (downsampling): c1: 64, c2: 128, k= 3, s= 2
+BepC3: c1: 128, c2: 128, num_block = 6, csp_e = 0.5, block = ConvWrapper
ERBlock_3: ConvWrapper (downsampling): c1: 128, c2: 256, k= 3, s= 2
+BepC3: c1: 256, c2: 256, num_block = 12, csp_e = 0.5, block = ConvWrapper
ERBlock_4: ConvWrapper (downsampling): c1: 256, c1: 512, k= 3, s= 2
+BepC3: c1: 512, c2: 512, num_block = 16, csp_e = 0.5, block = ConvWrapper
ERBlock_5: ConvWrapper (downsampling): c1: 512, c2: 1024, k= 3, s= 2
+BepC3: c1: 1024, c2: 1024, num_block = 6, csp_e = 0.5, block = ConvWrapper
+SPPF: c1: 1024, c2: 1024, k= 5
Next, we can see what structure BepC3 is:
class BepC3(nn.Module):
'''Beer-mug RepC3 Block'''
def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(out_channels * e) # hidden channels
self.cv1 = Conv_C3(in_channels, c_, 1, 1)
self.cv2 = Conv_C3(in_channels, c_, 1, 1)
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
if block == ConvWrapper:
self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())
self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
self.concat = concat
if not concat:
self.cv3 = Conv_C3(c_, out_channels, 1, 1)
def forward(self, x):
if self.concat is True:
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
else:
return self.cv3(self.m(self.cv1(x)))
(a) RepBlock consists of N RepVGG blocks + a ReLU activation function
(b) During inference, the RepVGG block is converted to RepConv
(c) The CSPStackRep module, which is the BepC3 module, consists of three 1x1 convolutions and N/2 twice the RepBlock modules, and also adds residual links and concat operations
Ok, now we understand all the modules and parameter information for building the backbone network, and now we can start building the network with yaml files! !
Unfamiliar friends will introduce the meaning of the parameters [from, number, module, args]
The first parameter represents where the input of this module comes from, -1 means from the previous layer, and it can also be 2, 3
· The second parameter represents the number of stacking times of this module, which is equivalent to the num_block parameter of the module, and the default value of 1 is not required
· The third parameter represents the module name
· The fourth parameter represents the parameter information of the incoming module, in the form of a list
depth_multiple: 1.0 # model depth multiple width_multiple: 1.0 # layer channel multiple
depth_multiple, width_multiple represent depth coefficient and width coefficient respectively
width_multiple can transform the number of channels, and depth_multiple can transform the number of num_blocks of the module
The first Stem is the ConvWrapper basic convolution module
Then the first line is [-1, 1, ConvWrapper, [64, 3, 2]]
The parameter is [64, 3, 2]. Subsequent codes will default the input channel to the output channel of the previous layer, so you can not specify it. 64 represents the output channel, 3 represents the kernel, and 2 represents the stride.
second and third behavior
[-1, 1, ConvWrapper, [128, 3, 2]],
[-1, 1, BepC3, [128, 6, "ConvWrapper"]],
Among them, in BepC3, 128 represents the output channel, 6 represents num_block, and "ConvWrapper" represents the information of the block
eventually for
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
backbone:
# [from, number, module, args]
[[-1, 1, ConvWrapper, [64, 3, 2]], # 0-P1/2
[-1, 1, ConvWrapper, [128, 3, 2]], # 1-P2/4
[-1, 1, BepC3, [128, 6, "ConvWrapper"]],
[-1, 1, ConvWrapper, [256, 3, 2]], # 3-P3/8
[-1, 1, BepC3, [256, 12, "ConvWrapper"]],
[-1, 1, ConvWrapper, [512, 3, 2]], # 5-P4/16
[-1, 1, BepC3, [512, 18, "ConvWrapper"]],
[-1, 1, ConvWrapper, [1024, 3, 2]], # 7-P5/32
[-1, 1, BepC3, [1024, 6, "ConvWrapper"]],
[-1, 1, SPPF, [1024, 5]]] # 9
Then the first step is to register the module information in yolov6/layers/common.py or common.py of yolov5, and add all the modules involved respectively. If it is yolov5, it is recommended to create a new py file to add
import torch
import torch.nn as nn
from torch.nn.parameter import Parameter
import numpy as np
class RepVGGBlock(nn.Module):
'''RepVGGBlock is a basic rep-style block, including training and deploy status
This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
'''
def __init__(self, in_channels, out_channels, kernel_size=3,
stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
super(RepVGGBlock, self).__init__()
""" Initialization of the class.
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 1
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
padding_mode (string, optional): Default: 'zeros'
deploy: Whether to be deploy status or training status. Default: False
use_se: Whether to use se. Default: False
"""
self.deploy = deploy
self.groups = groups
self.in_channels = in_channels
self.out_channels = out_channels
assert kernel_size == 3
assert padding == 1
padding_11 = padding - kernel_size // 2
self.nonlinearity = nn.ReLU()
if use_se:
raise NotImplementedError("se block not supported yet")
else:
self.se = nn.Identity()
if deploy:
self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=True, padding_mode=padding_mode)
else:
self.rbr_identity = nn.BatchNorm2d(num_features=in_channels) if out_channels == in_channels and stride == 1 else None
self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups)
self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride, padding=padding_11, groups=groups)
def forward(self, inputs):
'''Forward process'''
if hasattr(self, 'rbr_reparam'):
return self.nonlinearity(self.se(self.rbr_reparam(inputs)))
if self.rbr_identity is None:
id_out = 0
else:
id_out = self.rbr_identity(inputs)
return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
def _fuse_bn_tensor(self, branch):
if branch is None:
return 0, 0
if isinstance(branch, nn.Sequential):
kernel = branch.conv.weight
running_mean = branch.bn.running_mean
running_var = branch.bn.running_var
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn.eps
else:
assert isinstance(branch, nn.BatchNorm2d)
if not hasattr(self, 'id_tensor'):
input_dim = self.in_channels // self.groups
kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
for i in range(self.in_channels):
kernel_value[i, i % input_dim, 1, 1] = 1
self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
kernel = self.id_tensor
running_mean = branch.running_mean
running_var = branch.running_var
gamma = branch.weight
beta = branch.bias
eps = branch.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
def switch_to_deploy(self):
if hasattr(self, 'rbr_reparam'):
return
kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels, out_channels=self.rbr_dense.conv.out_channels,
kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation, groups=self.rbr_dense.conv.groups, bias=True)
self.rbr_reparam.weight.data = kernel
self.rbr_reparam.bias.data = bias
for para in self.parameters():
para.detach_()
self.__delattr__('rbr_dense')
self.__delattr__('rbr_1x1')
if hasattr(self, 'rbr_identity'):
self.__delattr__('rbr_identity')
if hasattr(self, 'id_tensor'):
self.__delattr__('id_tensor')
self.deploy = True
class BepC3(nn.Module):
'''Beer-mug RepC3 Block'''
def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(out_channels * e) # hidden channels
self.cv1 = Conv_C3(in_channels, c_, 1, 1)
self.cv2 = Conv_C3(in_channels, c_, 1, 1)
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
if block == ConvWrapper:
self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())
self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
self.concat = concat
if not concat:
self.cv3 = Conv_C3(c_, out_channels, 1, 1)
def forward(self, x):
if self.concat is True:
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
else:
return self.cv3(self.m(self.cv1(x)))
class Conv_C3(nn.Module):
'''Standard convolution in BepC3-Block'''
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class ConvWrapper(nn.Module):
'''Wrapper for normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
super().__init__()
self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)
def forward(self, x):
return self.block(x)
class Conv(nn.Module): ##如果是yolov5,需要改一改
'''Normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
super().__init__()
padding = kernel_size // 2
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=bias,
)
self.bn = nn.BatchNorm2d(out_channels)
self.act = nn.SiLU()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class RepBlock(nn.Module):
'''
RepBlock is a stage block with rep-style basic block
'''
def __init__(self, in_channels, out_channels, n=1, block=RepVGGBlock, basic_block=RepVGGBlock):
super().__init__()
self.conv1 = block(in_channels, out_channels)
self.block = nn.Sequential(*(block(out_channels, out_channels) for _ in range(n - 1))) if n > 1 else None
if block == BottleRep:
self.conv1 = BottleRep(in_channels, out_channels, basic_block=basic_block, weight=True)
n = n // 2
self.block = nn.Sequential(*(BottleRep(out_channels, out_channels, basic_block=basic_block, weight=True) for _ in range(n - 1))) if n > 1 else None
def forward(self, x):
x = self.conv1(x)
if self.block is not None:
x = self.block(x)
return x
class BottleRep(nn.Module):
def __init__(self, in_channels, out_channels, basic_block=RepVGGBlock, weight=False):
super().__init__()
self.conv1 = basic_block(in_channels, out_channels)
self.conv2 = basic_block(out_channels, out_channels)
if in_channels != out_channels:
self.shortcut = False
else:
self.shortcut = True
if weight:
self.alpha = Parameter(torch.ones(1))
else:
self.alpha = 1.0
def forward(self, x):
outputs = self.conv1(x)
outputs = self.conv2(outputs)
return outputs + self.alpha * x if self.shortcut else outputs
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
'''Basic cell for rep-style block, including conv and bn'''
result = nn.Sequential()
result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=False))
result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
return result
def autopad(k, p=None): # kernel, padding
# Pad to 'same'
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
Then join in yolov6/models/yolo.py, or join in yolo.py of yolov5
elif m in [ConvWrapper]:
c1 = ch[f]
c2 = args[0]
args = [c1, c2, *args[1:]]
elif m in [BepC3]:
c1, c2 = ch[f], args[0]
c2 = make_divisible(c2 * gw, 8)
args = [c1, c2, *args[1:]]
if m in [RepBlock]:
args.insert(2, n) # number of repeats
n = 1
So far, the backbone network has been built, and it seems that there is still more to say. The next article will build YOLOv6 Rep-PAN!