darknet-19与darknet53

Darknet is the most classic deep network. Combining the characteristics of Resnet, it can ensure the super expression of features and avoid the gradient problem caused by too deep network. There are mainly Darknet19 and Darknet53.

Introduction, why study this

The backbone used by yolo v3 to extract features is Darknet-53, which borrows from the network (Darknet-19) structure in yolo v2, and we can also get a clue from the name. Different from Darknet-19, Darknet-53 introduces a large number of residual structures, and uses a step size of 2 and a convolution kernel size of 3×3 convolutional layer Conv2D instead of pooling layer Maxpooling2D. Through the classification performance on ImageNet, Darknet-53 has greatly improved the speed of the network while ensuring the accuracy after the above transformation, which proves the effectiveness of Darknet-53 in feature extraction capabilities.
insert image description here
insert image description here
It can be seen from the figure that a large number of residual structures are stacked in the network, and a convolution layer with a step size of 2 and a convolution kernel size of 3×3 is inserted between each two residual structures to complete the downsampling operation. In the source code, the input size of the Darknet-53 network is 416416, the feature map size of the final convolutional layer output is 1313, and the number of channels is 1024. If it is a classification task, the last residual structure is connected to the global pooling layer Global Avgpool, the fully connected layer Connected of 1000 neurons, and an activation function layer Softmax. However, in YOLO v3, Darknet-53 is only used to extract features, so there are no last three layers (remove the last part, connect the neck and head, and act as a backbone), but only output feature maps of three different sizes (13 * 13, 26 * 26, 52 * 52).

Network structure of residual units in Darknet-53

insert image description here

basic introduction

The Darknet-53 network structure is a deep convolutional neural network that is commonly used in object detection, image recognition, and other computer vision tasks. This network is designed to handle the challenges of detecting objects and patterns in images that are taken in low light or low contrast environments, where traditional image recognition networks struggle. The Darknet-53 network has over 53 layers of neural network structures, which allows it to learn and identify image features at different scales and abstract levels. Its unique architecture allows it to process images quickly and accurately, making it a powerful tool for a wide range of image and video analysis applications. The Darknet-53 network structure has gained popularity in recent years due to its superior accuracy, efficiency, and versatility.
The Darknet-53 network structure is a deep convolutional neural network commonly used in object detection, image recognition, and other computer vision tasks. The network is designed to address the challenge of detecting objects and patterns in images taken in low-light or low-contrast environments, where traditional image recognition networks struggle. The Darknet-53 network has a neural network structure of more than 53 layers, which enables it to learn and recognize image features of different scales and abstraction levels. Its unique architecture enables fast and accurate image processing, making it a powerful tool for a wide range of image and video analysis applications. The Darknet-53 network structure has gained popularity in recent years for its superior accuracy, efficiency, and generality.

network design

Network structure design concept
Darknet-53 adds a large amount of residual structure Residual on the basis of Darknet-19, and uses a step size of 2 and a convolution kernel size of 3×3 convolutional layer Conv2D instead of pooling layer Maxpooling2D. Why did the author make these two improvements?

residual structure

First of all, the purpose of adding the residual structure Residual is to increase the depth of the network to support the network to extract higher-level semantic features, and the structure of the residual can help us avoid the disappearance or explosion of the gradient. Because the physical structure of the residual is reflected in the reverse gradient propagation, the gradient can be transmitted to the network layer far ahead, weakening the chain reaction of reverse derivation.
Secondly, we see that the input in the residual structural unit will first pass through a 1×1 convolution layer to reduce the input channel by half, and then perform a 3×3 convolution, which helps the network to reduce the amount of calculation to a considerable extent, making the network run faster and more efficient.

Convolution with a stride of 2 replaces the pooling layer

In terms of function, the convolution with a step size of 2 can replace the pooling layer to complete the downsampling work, but in fact, in the current neural network, there are relatively few pooling layers, and everyone has begun to try other downsampling methods, such as convolution with a step size of 2. So why replace it like this? Refer to Why does CNN not need pooling layer downsampling? .For
the pooling layer and the convolutional layer with a step size of 2, my personal understanding is this. The pooling layer is a priori downsampling method, that is, artificially determine the downsampling rules (select the largest value in the coverage range, and the default maximum value contains the most information); for the convolutional layer with a step size of 2, its parameters are obtained through learning, and the sampling rules are uncertain. This uncertainty will increase the learning ability of the network.

Code

class Conv_BN_LeakyReLU(nn.Module):
    def __init__(self, in_channels, out_channels, ksize, padding=0, dilation=1):
        super(Conv_BN_LeakyReLU, self).__init__()
        self.convs = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, ksize, padding=padding, dilation=dilation),
            nn.BatchNorm2d(out_channels),
            nn.LeakyReLU(0.1, inplace=True)
        )

    def forward(self, x):
        return self.convs(x)

class DarkNet_19(nn.Module):
    def __init__(self, num_classes=1000):
        print("Initializing the darknet19 network ......")
        
        super(DarkNet_19, self).__init__()
        # backbone network : DarkNet-19
        # output : stride = 2, c = 32
        self.conv_1 = nn.Sequential(
            Conv_BN_LeakyReLU(3, 32, 3, 1),
            nn.MaxPool2d((2,2), 2),
        )

        # output : stride = 4, c = 64
        self.conv_2 = nn.Sequential(
            Conv_BN_LeakyReLU(32, 64, 3, 1),
            nn.MaxPool2d((2,2), 2)
        )

        # output : stride = 8, c = 128
        self.conv_3 = nn.Sequential(
            Conv_BN_LeakyReLU(64, 128, 3, 1),
            Conv_BN_LeakyReLU(128, 64, 1),
            Conv_BN_LeakyReLU(64, 128, 3, 1),
            nn.MaxPool2d((2,2), 2)
        )

        # output : stride = 16, c = 256
        self.conv_4 = nn.Sequential(
            Conv_BN_LeakyReLU(128, 256, 3, 1),
            Conv_BN_LeakyReLU(256, 128, 1),
            Conv_BN_LeakyReLU(128, 256, 3, 1),
            nn.MaxPool2d((2,2), 2)
        )

        # output : stride = 32, c = 512
        self.conv_5 = nn.Sequential(
            Conv_BN_LeakyReLU(256, 512, 3, 1),
            Conv_BN_LeakyReLU(512, 256, 1),
            Conv_BN_LeakyReLU(256, 512, 3, 1),
            Conv_BN_LeakyReLU(512, 256, 1),
            Conv_BN_LeakyReLU(256, 512, 3, 1),
            nn.MaxPool2d((2,2), 2)
        )

        # output : stride = 32, c = 1024
        self.conv_6 = nn.Sequential(
            Conv_BN_LeakyReLU(512, 1024, 3, 1),
            Conv_BN_LeakyReLU(1024, 512, 1),
            Conv_BN_LeakyReLU(512, 1024, 3, 1),
            Conv_BN_LeakyReLU(1024, 512, 1),
            Conv_BN_LeakyReLU(512, 1024, 3, 1)
        )

        self.conv_7 = nn.Conv2d(1024, 1000, 1)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

    def forward(self, x):
        x = self.conv_1(x)
        x = self.conv_2(x)
        x = self.conv_3(x)
        x = self.conv_4(x)
        x = self.conv_5(x)
        x = self.conv_6(x)

        x = self.conv_7(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        return x

The code of darknet53:

class Conv_BN_LeakyReLU(nn.Module):#图里名为DBL的模块
    def __init__(self, in_channels, out_channels, ksize, padding=0, stride=1, dilation=1):
        super(Conv_BN_LeakyReLU, self).__init__()
        self.convs = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, ksize, padding=padding, stride=stride, dilation=dilation),
            nn.BatchNorm2d(out_channels),
            nn.LeakyReLU(0.1, inplace=True)
        )

    def forward(self, x):
        return self.convs(x)

class resblock(nn.Module):  #图里名为Res_unit的模块
    def __init__(self, ch, nblocks=1):
        super().__init__()
        self.module_list = nn.ModuleList()
        for _ in range(nblocks):
            resblock_one = nn.Sequential(
                Conv_BN_LeakyReLU(ch, ch//2, 1),
                Conv_BN_LeakyReLU(ch//2, ch, 3, padding=1)
            )
            self.module_list.append(resblock_one)

    def forward(self, x):
        for module in self.module_list:
            x = module(x) + x
        return x

# darknet-53 code
class DarkNet_53(nn.Module):
    def __init__(self, num_classes=1000):
        super(DarkNet_53, self).__init__()
        # stride = 2
        self.layer_1 = nn.Sequential(
            Conv_BN_LeakyReLU(3, 32, 3, padding=1),
            Conv_BN_LeakyReLU(32, 64, 3, padding=1, stride=2),
            resblock(64, nblocks=1)
        )
        # stride = 4
        self.layer_2 = nn.Sequential(
            Conv_BN_LeakyReLU(64, 128, 3, padding=1, stride=2),
            resblock(128, nblocks=2)
        )
        # stride = 8
        self.layer_3 = nn.Sequential(
            Conv_BN_LeakyReLU(128, 256, 3, padding=1, stride=2),
            resblock(256, nblocks=8)
        )
        # stride = 16
        self.layer_4 = nn.Sequential(
            Conv_BN_LeakyReLU(256, 512, 3, padding=1, stride=2),
            resblock(512, nblocks=8)
        )
        # stride = 32
        self.layer_5 = nn.Sequential(
            Conv_BN_LeakyReLU(512, 1024, 3, padding=1, stride=2),
            resblock(1024, nblocks=4)
        )

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x, targets=None):
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        x = self.layer_4(x)
        x = self.layer_5(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

insert image description here

pytorch implementation of the structure

Quoted from this post

import math
from collections import OrderedDict

import torch.nn as nn


#---------------------------------------------------------------------#
#   残差结构
#   利用一个1x1卷积下降通道数,然后利用一个3x3卷积提取特征并且上升通道数
#   最后接上一个残差边
#---------------------------------------------------------------------#
class BasicBlock(nn.Module):
    def __init__(self, inplanes, planes):
        super(BasicBlock, self).__init__()
        self.conv1  = nn.Conv2d(inplanes, planes[0], kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1    = nn.BatchNorm2d(planes[0])
        self.relu1  = nn.LeakyReLU(0.1)
        
        self.conv2  = nn.Conv2d(planes[0], planes[1], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2    = nn.BatchNorm2d(planes[1])
        self.relu2  = nn.LeakyReLU(0.1)

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu2(out)

        out += residual
        return out

class DarkNet(nn.Module):
    def __init__(self, layers):
        super(DarkNet, self).__init__()
        self.inplanes = 32
        # 416,416,3 -> 416,416,32
        self.conv1  = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1    = nn.BatchNorm2d(self.inplanes)
        self.relu1  = nn.LeakyReLU(0.1)

        # 416,416,32 -> 208,208,64
        self.layer1 = self._make_layer([32, 64], layers[0])
        # 208,208,64 -> 104,104,128
        self.layer2 = self._make_layer([64, 128], layers[1])
        # 104,104,128 -> 52,52,256
        self.layer3 = self._make_layer([128, 256], layers[2])
        # 52,52,256 -> 26,26,512
        self.layer4 = self._make_layer([256, 512], layers[3])
        # 26,26,512 -> 13,13,1024
        self.layer5 = self._make_layer([512, 1024], layers[4])

        self.layers_out_filters = [64, 128, 256, 512, 1024]

        # 进行权值初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    #---------------------------------------------------------------------#
    #   在每一个layer里面,首先利用一个步长为2的3x3卷积进行下采样
    #   然后进行残差结构的堆叠
    #---------------------------------------------------------------------#
    def _make_layer(self, planes, blocks):
        layers = []
        # 下采样,步长为2,卷积核大小为3
        layers.append(("ds_conv", nn.Conv2d(self.inplanes, planes[1], kernel_size=3, stride=2, padding=1, bias=False)))
        layers.append(("ds_bn", nn.BatchNorm2d(planes[1])))
        layers.append(("ds_relu", nn.LeakyReLU(0.1)))
        # 加入残差结构
        self.inplanes = planes[1]
        for i in range(0, blocks):
            layers.append(("residual_{}".format(i), BasicBlock(self.inplanes, planes)))
        return nn.Sequential(OrderedDict(layers))

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu1(x)

        x = self.layer1(x)
        x = self.layer2(x)
        out3 = self.layer3(x)
        out4 = self.layer4(out3)
        out5 = self.layer5(out4)

        return out3, out4, out5

def darknet53():
    model = DarkNet([1, 2, 8, 8, 4])
    return model


insert image description here

Guess you like

Origin blog.csdn.net/weixin_42001184/article/details/129866875