Detailed explanation of YOLOv7 improvement

Introduction

YOLOv7 is currently the most advanced algorithm in the YOLO series, surpassing the previous YOLO series in terms of accuracy and speed.
insert image description here

SPPCSPC module

The module obtained by using the pyramid pooling structure and the CSP structure. The total input is split into two different branches. The 3×3 convolution in the middle is not grouped, it is still a standard convolution, and the right side is a point conv, and finally the information flow output by all branches is concat.

class SPPCSPC(nn.Module):
    # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
        super(SPPCSPC, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))

insert image description here

CSPVoVNet module

Starting from the characteristics of memory access cost, Ma et al. also analyzed the influence of input/output channel ratio, number of architecture branches and element intelligence operations on network inference speed. Dollár et al. also considered activation when performing model scaling (Fast and accurate model scaling), that is, more consideration was given to the number of elements in the output tensor of the convolutional layer. The architecture of CSPVoVNet, in addition to considering the above basic design issues, also analyzes the gradient path so that the weights of different layers can learn more different features. The gradient analysis method described above makes inference faster and more accurate.
insert image description here

E-ELAN module

  • How to design an efficient network?
    ELAN draws a conclusion: by controlling the shortest and longest gradient path, a deeper network can learn and converge efficiently. In large-scale ELAN, it reaches a steady state regardless of the gradient path length and the stacked number of computation blocks. If more computing blocks are stacked without limit, this steady state may be broken and the parameter utilization rate drops.
  • What did E-ELAN do?
    E-ELAN uses expansion, shuffling, and merging bases to realize the ability to continuously enhance the learning ability of the network without destroying the original gradient path, and guide different computing block groups to learn more different features.
    insert image description here## ##

DownC module

The DownC module will use three basic structures, including 1×1 point conv, 3×3 standard conv, and MaxPool for mp operation. These three basic modules are used to assemble the DownC large module.

class DownC(nn.Module):
    # Spatial pyramid pooling layer used in YOLOv3-SPP
    def __init__(self, c1, c2, n=1, k=2):
        super(DownC, self).__init__()
        c_ = int(c1)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2//2, 3, k)
        self.cv3 = Conv(c1, c2//2, 1, 1)
        self.mp = nn.MaxPool2d(kernel_size=k, stride=k)

    def forward(self, x):
        return torch.cat((self.cv2(self.cv1(x)), self.cv3(self.mp(x))), dim=1)

insert image description here
Information comes from the Internet

Guess you like

Origin blog.csdn.net/lijiahao1212/article/details/128159157