YOLOv6 Pro | Use the yaml file format to build a YOLOv6L network structure tutorial step by step (1) - the main network article, you can understand it at a glance, and share it with pure white dry goods (also applicable to the official YOLOv5 improvement)

YOLOv6 Pro introduction file link: YOLOv6 Pro | Make it easier to build a network and replace modules in YOLOv6, and help improve the network structure in scientific research, including Backbone, Neck, DecoupleHead (refer to the way YOLOv5 builds a network)

· YOLOv6 Pro is based on the overall architecture of the official YOLOv6, and uses the network construction method of YOLOv5 to build a YOLOv6 network, including backbone, neck, and effidehead structures.
· You can modify or add modules arbitrarily in the yaml file, and each modified file is independently executable, the purpose is to facilitate scientific research.
· In the future, more network structure improvements will be added based on the modules in yolov5 and yoloair.
· Pre-trained weights have been converted from official weights to ensure they can match.

· Pre-released p6 model (unofficial)

Project link: GitHub - yang-0201/YOLOv6_pro: Make it easier for yolov6 to change the network structure

Interested friends can click Star and Fork, and feedback in time if you have any questions. In the early stage of the project, some functional suggestions will be adopted and developed. PRs are also welcome. The project will continue to be updated in the future, so stay tuned! 

Entering the topic, today is a tutorial on building a YOLOv6 network structure step by step in the yaml file format. It is full of dry goods. I hope that friends can become more and more proficient in building and modifying the network! ! !

(The construction method is the same as the official yolov5, and it is also applicable to friends who want to change 5)

Today, take the YOLOV6-L model as an example. The structure of the s, t, and n models of yolov6 is different from that of M and L. The main reason is that the modules for building the network are different. The module of the small model is RepBlocks, and the large model is CSPStackRep .

First of all, we need to be familiar with the network structure of YOLOv6. After understanding it, it is more convenient to start building from scratch. Let us first look at the official YOLOv6 code to build the backbone! Not much to say, open the liver, open the liver, open the liver! !

in yolov6/models/efficientrep.py

class CSPBepBackbone(nn.Module):
    """
    CSPBepBackbone module.
    """
    def __init__(
        self,
        in_channels=3,
        channels_list=None,
        num_repeats=None,
        block=RepVGGBlock,
        csp_e=float(1)/2,
    ):
        super().__init__()

        assert channels_list is not None
        assert num_repeats is not None

        self.stem = block(
            in_channels=in_channels,
            out_channels=channels_list[0],
            kernel_size=3,
            stride=2
        )

        self.ERBlock_2 = nn.Sequential(
            block(
                in_channels=channels_list[0],
                out_channels=channels_list[1],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[1],
                out_channels=channels_list[1],
                n=num_repeats[1],
                e=csp_e,
                block=block,
            )
        )

        self.ERBlock_3 = nn.Sequential(
            block(
                in_channels=channels_list[1],
                out_channels=channels_list[2],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[2],
                out_channels=channels_list[2],
                n=num_repeats[2],
                e=csp_e,
                block=block,
            )
        )

        self.ERBlock_4 = nn.Sequential(
            block(
                in_channels=channels_list[2],
                out_channels=channels_list[3],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[3],
                out_channels=channels_list[3],
                n=num_repeats[3],
                e=csp_e,
                block=block,
            )
        )

        channel_merge_layer = SimSPPF
        if block == ConvWrapper:
            channel_merge_layer = SPPF

        self.ERBlock_5 = nn.Sequential(
            block(
                in_channels=channels_list[3],
                out_channels=channels_list[4],
                kernel_size=3,
                stride=2,
            ),
            BepC3(
                in_channels=channels_list[4],
                out_channels=channels_list[4],
                n=num_repeats[4],
                e=csp_e,
                block=block,
            ),
            channel_merge_layer(
                in_channels=channels_list[4],
                out_channels=channels_list[4],
                kernel_size=5
            )
        )

    def forward(self, x):

        outputs = []
        x = self.stem(x)
        x = self.ERBlock_2(x)
        x = self.ERBlock_3(x)
        outputs.append(x)
        x = self.ERBlock_4(x)
        outputs.append(x)
        x = self.ERBlock_5(x)
        outputs.append(x)

        return tuple(outputs)

 The parameters passed in are:

backbone=dict(
        type='CSPBepBackbone',
        num_repeats=[1, 6, 12, 18, 6],
        out_channels=[64, 128, 256, 512, 1024],
        csp_e=float(1)/2,
        ),
training_mode = "conv_silu"

The backbone network is mainly composed of stem, ERBlock_2, ERBlock_3, ERBlock_4, ERBlock_5 (including SPPF), each ERBlock module contains a block and BepC3 module, which can be found by building the backbone network code (or in the training code, Set the conf file to configs/office/yolov6l.py, you can see it during debugging):

block = get_block(config.training_mode)
def get_block(mode):
    if mode == 'repvgg':
        return RepVGGBlock
    elif mode == 'hyper_search':
        return LinearAddBlock
    elif mode == 'repopt':
        return RealVGGBlock
    elif mode == 'conv_relu':
        return SimConvWrapper
    elif mode == 'conv_silu':
        return ConvWrapper
    else:
        raise NotImplementedError("Undefied Repblock choice for mode {}".format(mode))

So you can get the block is the ConvWrapper module, and then come to yolov6/layers/common.py to see what structure this strange ConvWrapper is

class ConvWrapper(nn.Module):
    '''Wrapper for normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
        super().__init__()
        self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)

    def forward(self, x):
        return self.block(x)

The official also said that this is not just a traditional convolution plus silu activation function, it is actually the same as the Conv module in yolo5, which belongs to the basic convolution module of YOLOv6~

So in short, the overall network consists of:

stem: ConvWrapper basic convolution module

ERBlock_2: ConvWrapper (downsampling) + BepC3

ERBlock_3: ConvWrapper (downsampling) + BepC3

ERBlock_4: ConvWrapper (downsampling) + BepC3

ERBlock_5: ConvWrapper (downsampling) + BepC3 + SPPF

Let's take a look at the distribution of the number of channels and num_block, which can be obtained one by one according to the parameters passed in

c1 stands for in_channel, c2 stands for out_channel, k stands for kernel, s stands for stride

stem: ConvWrapper basic convolution module: c1: 3, c2: 64, k= 3, s= 2

ERBlock_2: ConvWrapper (downsampling): c1: 64, c2: 128, k= 3, s= 2

+BepC3: c1: 128, c2: 128, num_block = 6, csp_e = 0.5, block = ConvWrapper

ERBlock_3: ConvWrapper (downsampling): c1: 128, c2: 256, k= 3, s= 2

+BepC3: c1: 256, c2: 256, num_block = 12, csp_e = 0.5, block = ConvWrapper

ERBlock_4: ConvWrapper (downsampling): c1: 256, c1: 512, k= 3, s= 2

+BepC3: c1: 512, c2: 512, num_block = 16, csp_e = 0.5, block = ConvWrapper

ERBlock_5: ConvWrapper (downsampling): c1: 512, c2: 1024, k= 3, s= 2

+BepC3: c1: 1024, c2: 1024, num_block = 6, csp_e = 0.5, block = ConvWrapper

+SPPF: c1: 1024, c2: 1024,  k= 5

Next, we can see what structure BepC3 is:

class BepC3(nn.Module):
    '''Beer-mug RepC3 Block'''
    def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(out_channels * e)  # hidden channels
        self.cv1 = Conv_C3(in_channels, c_, 1, 1)
        self.cv2 = Conv_C3(in_channels, c_, 1, 1)
        self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
        if block == ConvWrapper:
            self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())

        self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
        self.concat = concat
        if not concat:
            self.cv3 = Conv_C3(c_, out_channels, 1, 1)

    def forward(self, x):
        if self.concat is True:
            return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
        else:
            return self.cv3(self.m(self.cv1(x)))

(a) RepBlock consists of N RepVGG blocks + a ReLU activation function

(b) During inference, the RepVGG block is converted to RepConv

(c) The CSPStackRep module, which is the BepC3 module, consists of three 1x1 convolutions and N/2 twice the RepBlock modules, and also adds residual links and concat operations

Ok, now we understand all the modules and parameter information for building the backbone network, and now we can start building the network with yaml files! !

Unfamiliar friends will introduce the meaning of the parameters [from, number, module, args]

The first parameter represents where the input of this module comes from, -1 means from the previous layer, and it can also be 2, 3

· The second parameter represents the number of stacking times of this module, which is equivalent to the num_block parameter of the module, and the default value of 1 is not required

· The third parameter represents the module name

· The fourth parameter represents the parameter information of the incoming module, in the form of a list

depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

depth_multiple, width_multiple represent depth coefficient and width coefficient respectively

width_multiple can transform the number of channels, and depth_multiple can transform the number of num_blocks of the module


The first Stem is the ConvWrapper basic convolution module

Then the first line is [-1, 1, ConvWrapper, [64, 3, 2]]

The parameter is [64, 3, 2]. Subsequent codes will default the input channel to the output channel of the previous layer, so you can not specify it. 64 represents the output channel, 3 represents the kernel, and 2 represents the stride.

second and third behavior

[-1, 1, ConvWrapper, [128, 3, 2]],

[-1, 1, BepC3, [128, 6, "ConvWrapper"]],

Among them, in BepC3, 128 represents the output channel, 6 represents num_block, and "ConvWrapper" represents the information of the block

eventually for

depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

backbone:
  # [from, number, module, args]
  [[-1, 1, ConvWrapper, [64, 3, 2]],  # 0-P1/2
   [-1, 1, ConvWrapper, [128, 3, 2]],  # 1-P2/4
   [-1, 1, BepC3, [128, 6, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [256, 3, 2]],  # 3-P3/8
   [-1, 1, BepC3, [256, 12, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [512, 3, 2]],  # 5-P4/16
   [-1, 1, BepC3, [512, 18, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, BepC3, [1024, 6, "ConvWrapper"]],
   [-1, 1, SPPF, [1024, 5]]]  # 9

Then the first step is to register the module information in yolov6/layers/common.py or common.py of yolov5, and add all the modules involved respectively. If it is yolov5, it is recommended to create a new py file to add

import torch
import torch.nn as nn
from torch.nn.parameter import Parameter
import numpy as np
class RepVGGBlock(nn.Module):
    '''RepVGGBlock is a basic rep-style block, including training and deploy status
    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
    '''
    def __init__(self, in_channels, out_channels, kernel_size=3,
                 stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
        super(RepVGGBlock, self).__init__()
        """ Initialization of the class.
        Args:
            in_channels (int): Number of channels in the input image
            out_channels (int): Number of channels produced by the convolution
            kernel_size (int or tuple): Size of the convolving kernel
            stride (int or tuple, optional): Stride of the convolution. Default: 1
            padding (int or tuple, optional): Zero-padding added to both sides of
                the input. Default: 1
            dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
            groups (int, optional): Number of blocked connections from input
                channels to output channels. Default: 1
            padding_mode (string, optional): Default: 'zeros'
            deploy: Whether to be deploy status or training status. Default: False
            use_se: Whether to use se. Default: False
        """
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels
        self.out_channels = out_channels

        assert kernel_size == 3
        assert padding == 1

        padding_11 = padding - kernel_size // 2

        self.nonlinearity = nn.ReLU()

        if use_se:
            raise NotImplementedError("se block not supported yet")
        else:
            self.se = nn.Identity()

        if deploy:
            self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
                                         padding=padding, dilation=dilation, groups=groups, bias=True, padding_mode=padding_mode)

        else:
            self.rbr_identity = nn.BatchNorm2d(num_features=in_channels) if out_channels == in_channels and stride == 1 else None
            self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups)
            self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride, padding=padding_11, groups=groups)

    def forward(self, inputs):
        '''Forward process'''
        if hasattr(self, 'rbr_reparam'):
            return self.nonlinearity(self.se(self.rbr_reparam(inputs)))

        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)

        return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))

    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid

    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
        if kernel1x1 is None:
            return 0
        else:
            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])

    def _fuse_bn_tensor(self, branch):
        if branch is None:
            return 0, 0
        if isinstance(branch, nn.Sequential):
            kernel = branch.conv.weight
            running_mean = branch.bn.running_mean
            running_var = branch.bn.running_var
            gamma = branch.bn.weight
            beta = branch.bn.bias
            eps = branch.bn.eps
        else:
            assert isinstance(branch, nn.BatchNorm2d)
            if not hasattr(self, 'id_tensor'):
                input_dim = self.in_channels // self.groups
                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim, 1, 1] = 1
                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
            kernel = self.id_tensor
            running_mean = branch.running_mean
            running_var = branch.running_var
            gamma = branch.weight
            beta = branch.bias
            eps = branch.eps
        std = (running_var + eps).sqrt()
        t = (gamma / std).reshape(-1, 1, 1, 1)
        return kernel * t, beta - running_mean * gamma / std

    def switch_to_deploy(self):
        if hasattr(self, 'rbr_reparam'):
            return
        kernel, bias = self.get_equivalent_kernel_bias()
        self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels, out_channels=self.rbr_dense.conv.out_channels,
                                     kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
                                     padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation, groups=self.rbr_dense.conv.groups, bias=True)
        self.rbr_reparam.weight.data = kernel
        self.rbr_reparam.bias.data = bias
        for para in self.parameters():
            para.detach_()
        self.__delattr__('rbr_dense')
        self.__delattr__('rbr_1x1')
        if hasattr(self, 'rbr_identity'):
            self.__delattr__('rbr_identity')
        if hasattr(self, 'id_tensor'):
            self.__delattr__('id_tensor')
        self.deploy = True
class BepC3(nn.Module):
    '''Beer-mug RepC3 Block'''
    def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(out_channels * e)  # hidden channels
        self.cv1 = Conv_C3(in_channels, c_, 1, 1)
        self.cv2 = Conv_C3(in_channels, c_, 1, 1)
        self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
        if block == ConvWrapper:
            self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())

        self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
        self.concat = concat
        if not concat:
            self.cv3 = Conv_C3(c_, out_channels, 1, 1)

    def forward(self, x):
        if self.concat is True:
            return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
        else:
            return self.cv3(self.m(self.cv1(x)))
class Conv_C3(nn.Module):
    '''Standard convolution in BepC3-Block'''
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def forward_fuse(self, x):
        return self.act(self.conv(x))
class ConvWrapper(nn.Module):
    '''Wrapper for normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
        super().__init__()
        self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)

    def forward(self, x):
        return self.block(x)
class Conv(nn.Module): ##如果是yolov5,需要改一改
    '''Normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
        super().__init__()
        padding = kernel_size // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=bias,
        )
        self.bn = nn.BatchNorm2d(out_channels)
        self.act = nn.SiLU()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))
class RepBlock(nn.Module):
    '''
        RepBlock is a stage block with rep-style basic block
    '''
    def __init__(self, in_channels, out_channels, n=1, block=RepVGGBlock, basic_block=RepVGGBlock):
        super().__init__()

        self.conv1 = block(in_channels, out_channels)
        self.block = nn.Sequential(*(block(out_channels, out_channels) for _ in range(n - 1))) if n > 1 else None
        if block == BottleRep:
            self.conv1 = BottleRep(in_channels, out_channels, basic_block=basic_block, weight=True)
            n = n // 2
            self.block = nn.Sequential(*(BottleRep(out_channels, out_channels, basic_block=basic_block, weight=True) for _ in range(n - 1))) if n > 1 else None

    def forward(self, x):
        x = self.conv1(x)
        if self.block is not None:
            x = self.block(x)
        return x
class BottleRep(nn.Module):

    def __init__(self, in_channels, out_channels, basic_block=RepVGGBlock, weight=False):
        super().__init__()
        self.conv1 = basic_block(in_channels, out_channels)
        self.conv2 = basic_block(out_channels, out_channels)
        if in_channels != out_channels:
            self.shortcut = False
        else:
            self.shortcut = True
        if weight:
            self.alpha = Parameter(torch.ones(1))
        else:
            self.alpha = 1.0

    def forward(self, x):
        outputs = self.conv1(x)
        outputs = self.conv2(outputs)
        return outputs + self.alpha * x if self.shortcut else outputs
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
    '''Basic cell for rep-style block, including conv and bn'''
    result = nn.Sequential()
    result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                                                  kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=False))
    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
    return result
def autopad(k, p=None):  # kernel, padding
    # Pad to 'same'
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p

Then join in yolov6/models/yolo.py, or join in yolo.py of yolov5

         elif m in [ConvWrapper]:
            c1 = ch[f]
            c2 = args[0]
            args = [c1, c2, *args[1:]]
         elif m in [BepC3]:
            c1, c2 = ch[f], args[0]
            c2 = make_divisible(c2 * gw, 8)
            args = [c1, c2, *args[1:]]
            if m in [RepBlock]:
                args.insert(2, n)  # number of repeats
                n = 1

So far, the backbone network has been built, and it seems that there is still more to say. The next article will build YOLOv6  Rep-PAN!

Guess you like

Origin blog.csdn.net/qq_43000647/article/details/128258692