Pytorch builds YOLOV7 network structure


foreword

Not long ago, the authentic YOLOV7 was born, which attracted the attention of many people, because this is another masterpiece of the official author, and I also took the time to look at the structure.
The link to the code is as follows: https://github.com/WongKinYiu/yolov7
The link to the paper is as follows: https://arxiv.org/abs/2207.02696


1. Network structure diagram

The two structure diagrams I refer to here are:
The first diagram was found in the WeChat official account, but due to some reasons, it could not be published. The link below is its link. You can click on the link to view it.
From: https://mp.weixin.qq.com/s?__biz=MzU3ODk2Njc5Mg==&mid=2247496582&idx=1&sn=df6ca2fdebd524d2116c97c05c00424e&chksm=fd6ff7e1ca187ef77f4d110dd41c7d2
20f8bdcb22906f32558c897240bee0c94125272f9ac11&scene=21# wechat_redirect The second picture is as follows:
insert image description here
from: https://blog.csdn. net/u010899190/article/details/125883770
Combining these two pictures, the initial reproduction of the network structure is completed.

Second, the realization of each module

1.BConv module

At the very beginning of the backbone of YOLOV7, it is composed of several BConv modules, that is, Conv+BN+Silu(). The structure diagram is as follows: It should be noted that
insert image description here
different colors represent different convolution kernel sizes and step sizes. , the implementation code is as follows:

class Bconv(nn.Module):
    def __init__(self,ch_in,ch_out,k,s):
        '''
        :param ch_in: 输入通道数
        :param ch_out: 输出通道数
        :param k: 卷积核尺寸
        :param s: 步长
        :return:
        '''
        super(Bconv, self).__init__()
        self.conv=nn.Conv2d(ch_in,ch_out,k,s,padding=k//2)
        self.bn=nn.BatchNorm2d(ch_out)
        self.act=nn.SiLU()
    def forward(self,x):
        '''
        :param x: 输入
        :return:
        '''
        return self.act(self.bn(self.conv(x)))

2. E-ELAN module

The length and width of this module remain unchanged, and the number of output channels becomes twice that of the input. It should be noted that in the last layer of E-ELAN of the backbone, the number of output channels is 1024 instead of 2048 (the official code is also 1024) , the structure diagram is as follows:
insert image description here

code show as below:

class E_ELAN(nn.Module):
    def __init__(self,ch_in,ch_out,flg=False):
        '''
        :param ch_in: 输入通道
        :param ch_out: 这里给的是中间层的输出通道
        :param flg: 判断是否为backbone的最后一层,因为这里的输出通道数有所改变
        '''
        super(E_ELAN, self).__init__()
        # 卷积类型一
        self.conv1=Bconv(ch_in,ch_out,k=1,s=1)
        # 卷积类型二
        self.conv2=Bconv(ch_out,ch_out,k=3,s=1)

        #cat之后的卷积
        if flg:
            self.conv3=Bconv(2*ch_in,ch_in,k=1,s=1)
        else:
            self.conv3=Bconv(2*ch_in,2*ch_in,k=1,s=1)

    def forward(self,x):
        '''
        :param x: 输入
        :return:
        '''
        #分支一输出
        output1=self.conv1(x)

        #分支二输出
        output2_1=self.conv1(x)
        output2_2=self.conv2(output2_1)
        output2_3=self.conv2(output2_2)
        output2_4=self.conv2(output2_3)
        output2_5=self.conv2(output2_4)
        output_cat=torch.cat((output1, output2_1, output2_3, output2_5), dim=1)
        return self.conv3(output_cat)

3. MPConv module

There is nothing to pay attention to in this module, the structure diagram is as follows:
insert image description here
the code is as follows:

class MPConv(nn.Module):
    def __init__(self,ch_in,ch_out):
        '''
        :param ch_in: 输如通道
        :param ch_out: 这里给的是中间层的输出通道
        '''
        super(MPConv, self).__init__()
        #分支一
        self.conv1=nn.Sequential(
            nn.MaxPool2d(2,2),
            Bconv(ch_in,ch_out,1,1),
        )
        #分支二
        self.conv2=nn.Sequential(
            Bconv(ch_in,ch_out,1,1),
            Bconv(ch_out,ch_out,3,2),
        )

    def forward(self,x):
        #分支一输出
        output1=self.conv1(x)

        #分支二输出
        output2=self.conv2(x)
        return torch.cat((output1,output2),dim=1)

4.SPPCSPC module

Here are some changes in the traditional SPP structure. The changed structure diagram is as follows:
insert image description here
the code is as follows:

class SppCSPC(nn.Module):
    def __init__(self,ch_in,ch_out):
        '''
        :param ch_in: 输入通道
        :param ch_out: 输出通道
        '''
        super(SppCSPC, self).__init__()
        #分支一
        self.conv1=nn.Sequential(
            Bconv(ch_in,ch_out,1,1),
            Bconv(ch_out,ch_out,3,1),
            Bconv(ch_out,ch_out,1,1)
        )
        #分支二(SPP)
        self.mp1=nn.MaxPool2d(5,1,5//2) #卷积核为5的池化
        self.mp2=nn.MaxPool2d(9,1,9//2) #卷积核为9的池化
        self.mp3=nn.MaxPool2d(13,1,13//2) #卷积核为13的池化

        #concat之后的卷积
        self.conv1_2=nn.Sequential(
            Bconv(4*ch_out,ch_out,1,1),
            Bconv(ch_out,ch_out,3,1)
        )


        #分支三
        self.conv3=Bconv(ch_in,ch_out,1,1)

        #此模块最后一层卷积
        self.conv4=Bconv(2*ch_out,ch_out,1,1)
    def forward(self,x):
        #分支一输出
        output1=self.conv1(x)

        #分支二池化层的各个输出
        mp_output1=self.mp1(output1)
        mp_output2=self.mp2(output1)
        mp_output3=self.mp3(output1)

        #合并以上并进行卷积
        result1=self.conv1_2(torch.cat((output1,mp_output1,mp_output2,mp_output3),dim=1))

        #分支三
        result2=self.conv3(x)

        return self.conv4(torch.cat((result1,result2),dim=1))

5. CatConv module

The naming is directly used in the reference article. This structure is similar to the E-ELAN structure mentioned above, but the concat part is different. The structure diagram is as follows:
insert image description here

code show as below:

class CatConv(nn.Module):
    def __init__(self,ch_in,ch_out):
        '''
        :param ch_in: 输入通道
        :param ch_out: 输出通道
        '''
        super(CatConv, self).__init__()
        c_=ch_out//2 # hidden_channels
        #分之一
        self.conv1=Bconv(ch_in,ch_out,1,1)

        #分支二
        self.conv2=Bconv(ch_in,ch_out,1,1)
        self.conv3=Bconv(ch_out,c_,3,1)
        self.conv4=Bconv(c_,c_,3,1)
        self.conv5=Bconv(c_,c_,3,1)
        self.conv6=Bconv(c_,c_,3,1)
    def forward(self,x):
        conv1=self.conv1(x)

        conv2=self.conv2(x)
        conv3=self.conv3(conv2)
        conv4=self.conv4(conv3)
        conv5=self.conv5(conv4)
        conv6=self.conv6(conv5)
        return torch.cat((conv1,conv2,conv3,conv4,conv5,conv6),dim=1)

6. RepConv module

It should be noted here that in the training phase, when the number of input and output channels is the same, in addition to a 3 3 convolution and a 1 1 convolution, a BN layer will be added, and the output will be the addition of the three . In the deployment phase, there is only one 3*3 convolution to replace. The structure diagram is as follows:
insert image description here

The code here only implements the training part of the code as follows:

class RepConv(nn.Module):
    def __init__(self,ch_in,ch_out,s=1):
        '''
        :param ch_in: 输入通道
        :param ch_out: 输出通道
        :param s:卷积核的步长
        '''
        super(RepConv, self).__init__()
        self.ch_out=ch_out
        self.conv1=nn.Sequential(
            nn.Conv2d(ch_in,ch_out,3,1,padding=3//2),
            nn.BatchNorm2d(ch_out)
        )
        self.conv2=nn.Sequential(
            nn.Conv2d(ch_in,ch_out,1,1,padding=0),
            nn.BatchNorm2d(ch_out)
        )
        if ch_in==ch_out and s==1:
            self.bn=nn.BatchNorm2d(self.ch_out)
        else:
            self.bn=None
    def forward(self,x):
        output1=self.conv1(x)
        output2=self.conv2(x)
        if self.bn==None:
            output3=0
        else:
            output3=self.bn(x)
        return output1+output2+output3

3. Overall realization

After RepConv, a 1*1 convolutional layer is added for dimensionality reduction, as follows:

class YoloV7(nn.Module):
    def __init__(self,ch_in=3,cl=85):
        '''
        :param ch_in: 输入通道数
        :param cl: 类别数
        '''
        super(YoloV7, self).__init__()
        #backbone
        self.bconv1=nn.Sequential(
            Bconv(ch_in,32,3,1),
            Bconv(32,64,3,2),
            Bconv(64,64,3,1),
            Bconv(64,128,3,2)
        )
        self.e_elan1=nn.Sequential(
            E_ELAN(128,64),
            )
        self.mpconv1=MPConv(256,128)
        self.e_elan2=E_ELAN(256,128)

        self.mpconv2=MPConv(512,256)
        self.e_elan3=E_ELAN(512,256)


        self.mpconv3=MPConv(1024,512)
        self.e_elan4=E_ELAN(1024,512,flg=True)



        #head
        self.sppcsp=SppCSPC(1024,512)

        self.bconv2=nn.Sequential(
            Bconv(512,256,1,1),
            nn.Upsample(None, 2, "nearest")  # 上采样
        )
        self.bconv3=Bconv(1024,256,1,1)
        self.catconv1=CatConv(512,256)
        self.bconv4=Bconv(1024,256,1,1)

        self.bconv5=nn.Sequential(
            Bconv(256,128,1,1),
            nn.Upsample(None,2,"nearest") #上采样
        )
        self.bconv6=Bconv(512,128,1,1)
        self.catconv2=CatConv(256,128)
        self.bconv7=Bconv(512,128,1,1)
        self.rep1=RepConv(128,256)
        self.head1=nn.Conv2d(256,cl,1)

        self.mpconv4=MPConv(128,128)
        self.catconv3=CatConv(512,256)
        self.bconv8=Bconv(1024,256,1,1)
        self.rep2=RepConv(256,512)
        self.head2=nn.Conv2d(512,cl,1)

        self.mpconv5=MPConv(256,256)
        self.catconv4=CatConv(1024,512)
        self.bconv9=Bconv(2048,512,1,1)
        self.rep3=RepConv(512,1024)
        self.head3=nn.Conv2d(1024,cl,1)

    def forward(self,x):
        '''
        :param x: 输入
        :return:
        '''
        output1_0=self.bconv1(x)
        output1_1=self.e_elan1(output1_0)
        output1_2=self.mpconv1(output1_1)
        result1=self.e_elan2(output1_2)
        print(result1.shape)

        output2_0=self.mpconv2(result1)
        result2=self.e_elan3(output2_0)
        print(result2.shape)

        output3_0=self.mpconv3(result2)
        result3=self.e_elan4(output3_0)
        print(result3.shape)
        #head
        spp_output=self.sppcsp(result3)

        bconv2_output=self.bconv2(spp_output)
        bconv3_output=torch.cat((self.bconv3(result2),bconv2_output),dim=1)
        catconv1_output=self.catconv1(bconv3_output)
        bconv4_output=self.bconv4(catconv1_output)

        bconv5_output=self.bconv5(bconv4_output)
        bconv6_output=torch.cat((self.bconv6(result1),bconv5_output),dim=1)
        catconv2_output=self.catconv2(bconv6_output)
        bconv7_output=self.bconv7(catconv2_output)
        rep1_output=self.rep1(bconv7_output)
        head1=self.head1(rep1_output)
        print(head1.shape)

        mpconv4_output=torch.cat((self.mpconv4(bconv7_output),bconv4_output),dim=1)
        catconv3_output=self.catconv3(mpconv4_output)
        bconv8_output=self.bconv8(catconv3_output)
        rep2_output=self.rep2(bconv8_output)
        head2=self.head2(rep2_output)
        print(head2.shape)

        mpconv5_output=torch.cat((self.mpconv5(bconv8_output),spp_output),dim=1)
        catconv4_output=self.catconv4(mpconv5_output)
        bconv9_output=self.bconv9(catconv4_output)
        rep3_output=self.rep3(bconv9_output)
        head3=self.head3(rep3_output)
        print(head3.shape)
        return head1,head2,head3
        
 if __name__ == '__main__':
    x=torch.Tensor(1,3,640,640)
    model=YoloV7(3)
    result=model(x)

The output dimensions are as follows:
insert image description here


Summarize

The above is the entire content of this article. If there are any mistakes, welcome to comment and correct them, or join the QQ group: 995760755 to communicate.

Guess you like

Origin blog.csdn.net/qq_55068938/article/details/125981242