yolov7 adds mobileone

代码地址:GitHub - apple/ml-mobileone: This repository contains the official implementation of the research paper, "An Improved One millisecond Mobile Backbone".

Paper address: https://arxiv.org/abs/2206.04040

MobileOne comes from Apple, and its author claims that the inference time of MobileOne on iPhone 12 is only 1 millisecond, which is also the meaning of One in the name MobileOne. From the rapid implementation of MobileOne, we can see the potential of heavy parameterization on the mobile side: simple, efficient, plug-and-play.

The left part in Figure 3 constitutes a complete building block of MobileOne. It consists of two parts, the upper part is based on depthwise convolution (Depthwise Convolution), and the lower part is based on pointwise convolution (Pointwise Convolution). The terms depth convolution and point convolution come from MobileNet. Depth convolution is essentially a grouped convolution, and its group number g is the same as the input channel. The point convolution is a 1×1 convolution.

The depth convolution module in Figure 3 consists of three branches. The leftmost branch is a 1×1 convolution; the middle branch is an over-parameterized 3×3 convolution, that is, k 3×3 convolutions; the right part is a shortcut connection containing a BN layer. The 1×1 convolution and 3×3 convolution here are both depth convolutions (that is, grouped convolutions, the number of groups g is equal to the number of input channels).

The point convolution module in Figure 3 consists of two branches. The left branch is an over-parameterized 1×1 convolution, consisting of k 1×1 convolutions. The right branch is a skip connection containing a BN layer. In the training phase, MobileOne is stacked by such building blocks. When training is completed, the building block shown on the left in Figure 3 can be reparameterized to the structure on the right of Figure 3 using the reparameterization method.

The network structure of yolov7tiny is used as a demonstration here, and the modification to v7 is almost the same. Here, my idea of ​​modification is not to replace the backbone of mobileone as a whole, but to retain each ELAN block of v7tiny and replace the 3*3 convolution in each block with the heavily parameterized depth-separable convolution in Figure 3. , which not only retains the overall structure of the network, but also adds the heavily parameterized mobileone block to the network structure.

[-1, 1, Conv, [32, 1, 1, None, 1]], 
[-2, 1, Conv, [32, 1, 1, None, 1]], 
[-1, 1, Conv, [32, 3, 1, None, 1]], # replace 
[-1, 1, Conv, [32, 3, 1, None, 1]], # replace 
[[-1, -2, -3, - 4], 1, Concat, [1]], 
[-1, 1, Conv, [64, 1, 1, None, 1]],

That is, the replaced part above

Here I have simplified the above structure, you can refer to yolov7 simplified yaml configuration file-CSDN blog

First create yolov7-tiny-ELANMO.yaml

# parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

activation: nn.ReLU()
# anchors
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# yolov7-tiny backbone
backbone:
  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True, num_blocks_per_stage=1, num_conv_branches=4,
  [[-1, 1, Conv, [32, 3, 2, None, 1]],  # 0-P1/2

   [-1, 1, Conv, [64, 3, 2, None, 1]],  # 1-P2/4
   [-1, 1, ELANMO, [64, 1, 1, None, 1, 1, 4]],  # 2

   [-1, 1, MP, []],  # 3-P3/8
   [-1, 1, ELANMO, [128, 1, 1, None, 1, 1, 4]],  # 4

   [-1, 1, MP, []],  # 5-P4/16
   [-1, 1, ELANMO, [256, 1, 1, None, 1, 1, 4]],  # 6

   [-1, 1, MP, []],  # 7-P5/32
   [-1, 1, ELANMO, [512, 1, 1, None, 1, 1, 4]],  # 8
  ]

# yolov7-tiny head
head:
  [[-1, 1, SPPCSPCSIM, [256]], # 9

   [-1, 1, Conv, [128, 1, 1, None, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [6, 1, Conv, [128, 1, 1, None, 1]], # route backbone P4
   [[-1, -2], 1, Concat, [1]], # 13

   [-1, 1, ELANMO, [128, 1, 1, None, 1, 1, 4]],  # 14

   [-1, 1, Conv, [64, 1, 1, None, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [4, 1, Conv, [64, 1, 1, None, 1]], # route backbone P3
   [[-1, -2], 1, Concat, [1]],

   [-1, 1, ELANMO, [64, 1, 1, None, 1, 1, 4]],  # 19

   [-1, 1, Conv, [128, 3, 2, None, 1]],
   [[-1, 14], 1, Concat, [1]],

   [-1, 1, ELANMO, [128, 1, 1, None, 1, 1, 4]],  # 22

   [-1, 1, Conv, [256, 3, 2, None, 1]],
   [[-1, 9], 1, Concat, [1]],

   [-1, 1, ELANMO, [256, 1, 1, None, 1, 1, 4]],  # 25

   [19, 1, Conv, [128, 3, 1, None, 1]],
   [22, 1, Conv, [256, 3, 1, None, 1]],
   [25, 1, Conv, [512, 3, 1, None, 1]],

   [[26,27,28], 1, Detect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

 Add in common.py

import torch.nn.functional as F


class SEBlock(nn.Module):
    """ Squeeze and Excite module.

        Pytorch implementation of `Squeeze-and-Excitation Networks` -
        https://arxiv.org/pdf/1709.01507.pdf
    """

    def __init__(self,
                 in_channels: int,
                 rd_ratio: float = 0.0625) -> None:
        """ Construct a Squeeze and Excite Module.

        :param in_channels: Number of input channels.
        :param rd_ratio: Input channel reduction ratio.
        """
        super(SEBlock, self).__init__()
        self.reduce = nn.Conv2d(in_channels=in_channels,
                                out_channels=int(in_channels * rd_ratio),
                                kernel_size=1,
                                stride=1,
                                bias=True)
        self.expand = nn.Conv2d(in_channels=int(in_channels * rd_ratio),
                                out_channels=in_channels,
                                kernel_size=1,
                                stride=1,
                                bias=True)

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """ Apply forward pass. """
        b, c, h, w = inputs.size()
        x = F.avg_pool2d(inputs, kernel_size=[h, w])
        x = self.reduce(x)
        x = F.relu(x)
        x = self.expand(x)
        x = torch.sigmoid(x)
        x = x.view(-1, c, 1, 1)
        return inputs * x


class MobileOneBlock(nn.Module):
    """ MobileOne building block.

        This block has a multi-branched architecture at train-time
        and plain-CNN style architecture at inference time
        For more details, please refer to our paper:
        `An Improved One millisecond Mobile Backbone` -
        https://arxiv.org/pdf/2206.04040.pdf
    """

    def __init__(self,
                 in_channels: int,
                 out_channels: int,
                 kernel_size: int,
                 stride: int = 1,
                 padding: int = 0,
                 dilation: int = 1,
                 groups: int = 1,
                 inference_mode: bool = False,
                 use_se: bool = False,
                 num_conv_branches: int = 1) -> None:
        """ Construct a MobileOneBlock module.

        :param in_channels: Number of channels in the input.
        :param out_channels: Number of channels produced by the block.
        :param kernel_size: Size of the convolution kernel.
        :param stride: Stride size.
        :param padding: Zero-padding size.
        :param dilation: Kernel dilation factor.
        :param groups: Group number.
        :param inference_mode: If True, instantiates model in inference mode.
        :param use_se: Whether to use SE-ReLU activations.
        :param num_conv_branches: Number of linear conv branches.
        """
        super(MobileOneBlock, self).__init__()
        self.inference_mode = inference_mode
        self.groups = groups
        self.stride = stride
        self.kernel_size = kernel_size
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.num_conv_branches = num_conv_branches

        # Check if SE-ReLU is requested
        if use_se:
            self.se = SEBlock(out_channels)
        else:
            self.se = nn.Identity()
        self.activation = nn.ReLU()

        if inference_mode:
            self.reparam_conv = nn.Conv2d(in_channels=in_channels,
                                          out_channels=out_channels,
                                          kernel_size=kernel_size,
                                          stride=stride,
                                          padding=padding,
                                          dilation=dilation,
                                          groups=groups,
                                          bias=True)
        else:
            # Re-parameterizable skip connection
            self.rbr_skip = nn.BatchNorm2d(num_features=in_channels) \
                if out_channels == in_channels and stride == 1 else None

            # Re-parameterizable conv branches
            rbr_conv = list()
            for _ in range(self.num_conv_branches):
                rbr_conv.append(self._conv_bn(kernel_size=kernel_size,
                                              padding=padding))
            self.rbr_conv = nn.ModuleList(rbr_conv)

            # Re-parameterizable scale branch
            self.rbr_scale = None
            if kernel_size > 1:
                self.rbr_scale = self._conv_bn(kernel_size=1,
                                               padding=0)

    def forward(self, x: torch.Tensor):
        """ Apply forward pass. """
        # Inference mode forward pass.
        if self.inference_mode:
            return self.activation(self.se(self.reparam_conv(x)))

        # Multi-branched train-time forward pass.
        # Skip branch output
        identity_out = 0
        if self.rbr_skip is not None:
            identity_out = self.rbr_skip(x)

        # Scale branch output
        scale_out = 0
        if self.rbr_scale is not None:
            scale_out = self.rbr_scale(x)

        # Other branches
        out = scale_out + identity_out
        for ix in range(self.num_conv_branches):
            out += self.rbr_conv[ix](x)

        return self.activation(self.se(out))

    def reparameterize(self):
        """ Following works like `RepVGG: Making VGG-style ConvNets Great Again` -
        https://arxiv.org/pdf/2101.03697.pdf. We re-parameterize multi-branched
        architecture used at training time to obtain a plain CNN-like structure
        for inference.
        """
        if self.inference_mode:
            return
        kernel, bias = self._get_kernel_bias()
        self.reparam_conv = nn.Conv2d(in_channels=self.rbr_conv[0].conv.in_channels,
                                      out_channels=self.rbr_conv[0].conv.out_channels,
                                      kernel_size=self.rbr_conv[0].conv.kernel_size,
                                      stride=self.rbr_conv[0].conv.stride,
                                      padding=self.rbr_conv[0].conv.padding,
                                      dilation=self.rbr_conv[0].conv.dilation,
                                      groups=self.rbr_conv[0].conv.groups,
                                      bias=True)
        self.reparam_conv.weight.data = kernel
        self.reparam_conv.bias.data = bias

        # Delete un-used branches
        for para in self.parameters():
            para.detach_()
        self.__delattr__('rbr_conv')
        self.__delattr__('rbr_scale')
        if hasattr(self, 'rbr_skip'):
            self.__delattr__('rbr_skip')

        self.inference_mode = True

    def _get_kernel_bias(self):
        """ Method to obtain re-parameterized kernel and bias.
        Reference: https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py#L83

        :return: Tuple of (kernel, bias) after fusing branches.
        """
        # get weights and bias of scale branch
        kernel_scale = 0
        bias_scale = 0
        if self.rbr_scale is not None:
            kernel_scale, bias_scale = self._fuse_bn_tensor(self.rbr_scale)
            # Pad scale branch kernel to match conv branch kernel size.
            pad = self.kernel_size // 2
            kernel_scale = torch.nn.functional.pad(kernel_scale,
                                                   [pad, pad, pad, pad])

        # get weights and bias of skip branch
        kernel_identity = 0
        bias_identity = 0
        if self.rbr_skip is not None:
            kernel_identity, bias_identity = self._fuse_bn_tensor(self.rbr_skip)

        # get weights and bias of conv branches
        kernel_conv = 0
        bias_conv = 0
        for ix in range(self.num_conv_branches):
            _kernel, _bias = self._fuse_bn_tensor(self.rbr_conv[ix])
            kernel_conv += _kernel
            bias_conv += _bias

        kernel_final = kernel_conv + kernel_scale + kernel_identity
        bias_final = bias_conv + bias_scale + bias_identity
        return kernel_final, bias_final

    def _fuse_bn_tensor(self, branch):
        """ Method to fuse batchnorm layer with preceeding conv layer.
        Reference: https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py#L95

        :param branch:
        :return: Tuple of (kernel, bias) after fusing batchnorm.
        """
        if isinstance(branch, nn.Sequential):
            kernel = branch.conv.weight
            running_mean = branch.bn.running_mean
            running_var = branch.bn.running_var
            gamma = branch.bn.weight
            beta = branch.bn.bias
            eps = branch.bn.eps
        else:
            assert isinstance(branch, nn.BatchNorm2d)
            if not hasattr(self, 'id_tensor'):
                input_dim = self.in_channels // self.groups
                kernel_value = torch.zeros((self.in_channels,
                                            input_dim,
                                            self.kernel_size,
                                            self.kernel_size),
                                           dtype=branch.weight.dtype,
                                           device=branch.weight.device)
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim,
                                 self.kernel_size // 2,
                                 self.kernel_size // 2] = 1
                self.id_tensor = kernel_value
            kernel = self.id_tensor
            running_mean = branch.running_mean
            running_var = branch.running_var
            gamma = branch.weight
            beta = branch.bias
            eps = branch.eps
        std = (running_var + eps).sqrt()
        t = (gamma / std).reshape(-1, 1, 1, 1)
        return kernel * t, beta - running_mean * gamma / std

    def _conv_bn(self,
                 kernel_size: int,
                 padding: int) -> nn.Sequential:
        """ Helper method to construct conv-batchnorm layers.

        :param kernel_size: Size of the convolution kernel.
        :param padding: Zero-padding size.
        :return: Conv-BN module.
        """
        mod_list = nn.Sequential()
        mod_list.add_module('conv', nn.Conv2d(in_channels=self.in_channels,
                                              out_channels=self.out_channels,
                                              kernel_size=kernel_size,
                                              stride=self.stride,
                                              padding=padding,
                                              groups=self.groups,
                                              bias=False))
        mod_list.add_module('bn', nn.BatchNorm2d(num_features=self.out_channels))
        return mod_list


class ELANMO(nn.Module):
    # Yolov7 ELANMO with args(ch_in, ch_out, kernel, stride, padding, groups, num_blocks, num_conv, activation)
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1,
                 num_blocks_per_stage=1,
                 num_conv_branches=4,
                 act=True,
                 down_sample=False,
                 use_se=False,
                 inference_mode=False):
        """ Construct a ELAN module with MobileOneBlock.

        :param c1: Number of channels in the input.
        :param c2: Number of channels produced by the block.
        :param k: Size of the convolution kernel.
        :param s: Stride size.
        :param p: Zero-padding size.
        :param g: Group number.
        :param num_blocks_per_stage: If True, instantiates model in inference mode.
        :param num_conv_branches: Number of linear conv branches.
        :param act: If True, use activations
        :param down_sample:If True, first conv block set stride 2
        :param use_se: Whether to use SE-ReLU activations.
        :param inference_mode: If True, instantiates model in inference mode.
        """
        super().__init__()
        c_ = int(c2 // 2)
        c_out = c_ * 4
        self.inference_mode = inference_mode
        self.in_planes = c_
        self.down_sample = down_sample
        self.use_se = use_se
        self.num_blocks_per_stage = num_blocks_per_stage
        self.num_conv_branches = num_conv_branches
        # self.cur_layer_idx = 1

        self.cv1 = Conv(c1, c_, k=k, s=s, p=p, g=g, act=act)
        self.cv2 = Conv(c1, c_, k=k, s=s, p=p, g=g, act=act)
        self.cv3 = self._make_stage(c_, self.num_blocks_per_stage, num_se_blocks=0)
        self.cv4 = self._make_stage(c_, self.num_blocks_per_stage, num_se_blocks=0)
        self.cv5 = Conv(c_out, c2, k=k, s=s, p=p, g=g, act=act)

    def _make_stage(self,
                    planes: int,
                    num_blocks: int,
                    num_se_blocks: int) -> nn.Sequential:
        """ Build a stage of MobileOne model.

        :param planes: Number of output channels.
        :param num_blocks: Number of blocks in this stage.
        :param num_se_blocks: Number of SE blocks in this stage.
        :return: A stage of MobileOne model.
        """
        # Get strides for all layers
        strides = [2 if self.down_sample else 1] + [1] * (num_blocks - 1)
        blocks = []
        for ix, stride in enumerate(strides):
            use_se = False
            if num_se_blocks > num_blocks:
                raise ValueError("Number of SE blocks cannot "
                                 "exceed number of layers.")
            if ix >= (num_blocks - num_se_blocks):
                use_se = True

            # Depthwise conv
            blocks.append(MobileOneBlock(in_channels=self.in_planes,
                                         out_channels=self.in_planes,
                                         kernel_size=3,
                                         stride=stride,
                                         padding=1,
                                         groups=self.in_planes,
                                         inference_mode=self.inference_mode,
                                         use_se=use_se,
                                         num_conv_branches=self.num_conv_branches))
            # Pointwise conv
            blocks.append(MobileOneBlock(in_channels=self.in_planes,
                                         out_channels=planes,
                                         kernel_size=1,
                                         stride=1,
                                         padding=0,
                                         groups=1,
                                         inference_mode=self.inference_mode,
                                         use_se=use_se,
                                         num_conv_branches=self.num_conv_branches))
            self.in_planes = planes
            # self.cur_layer_idx += 1
        return nn.Sequential(*blocks)

    def forward(self, x):
        x1 = self.cv1(x)
        x2 = self.cv2(x)
        x3 = self.cv3(x2)
        x4 = self.cv4(x3)
        x5 = torch.cat((x1, x2, x3, x4), 1)
        return self.cv5(x5)

Add ELANMO in parse_model of yolo.py

        if m in (Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
                 BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x, SPPCSPC, RepConv,
                 RFEM, ELAN, SPPCSPCSIM,ELANMO):
            c1, c2 = ch[f], args[0]
            if c2 != no:  # if not output
                c2 = make_divisible(c2 * gw, 8)

            args = [c1, c2, *args[1:]]
            if m in [BottleneckCSP, C3, C3TR, C3Ghost, C3x]:
                args.insert(2, n)  # number of repeats
                n = 1

At the same time, add reparameterize() in the BaseModel of yolo.py

    def fuse(self):  # fuse model Conv2d() + BatchNorm2d() layers
        LOGGER.info('Fusing layers... ')
        for m in self.model.modules():
            if isinstance(m, (Conv, DWConv)) and hasattr(m, 'bn'):
                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
                delattr(m, 'bn')  # remove batchnorm
                m.forward = m.forward_fuse  # update forward
            if isinstance(m, RepConv):
                # print(f" fuse_repvgg_block")
                m.fuse_repvgg_block()
                # m.switch_to_deploy()
            if hasattr(m, 'reparameterize'):
                m.reparameterize()
        self.info()
        return self

Replace the new configuration file and run yolo.py

The number of parameters and calculations of the original yolov7tiny: 

It can be seen that the amount of parameters and calculations are much less compared to tiny.

After exporting onnx, you can take a look at the network structure. The following picture is the original v7tiny network structure:

Add the network structure of mobileone block without integrating heavy parameters:

This structure looks complicated, but it will be fine after fusion.

Network structure after fusing heavy parameters:

After fusion, it seems that the two 3*3 convolutions in ELAN are replaced with depth-separable convolutions. 

Guess you like

Origin blog.csdn.net/athrunsunny/article/details/132784492