Artificial intelligence (pytorch) building model 19-step by step using the pytorch framework to build the target detection DarkNet model and show the network structure

Hello everyone, I am Wei Xue AI. Today I will introduce to you the artificial intelligence (pytorch) building model 19-step by step using the pytorch framework to build the target detection DarkNet model and show the network structure. With the continuous development of deep learning technology, various convolutional neural network models emerge in endlessly. Among them, DarkNet, as a fast and accurate target detection model, has been widely used in the field of computer vision. This article will introduce the architecture and principles of the DarkNet model in detail, and demonstrate its application in target detection tasks through example training.

1. DarkNet model architecture and principles

1.1 Introduction to DarkNet

DarkNet is an open source neural network framework created by Joseph Redmon. It is implemented in C and CUDA and supports CPU and GPU computing. This framework is very lightweight and easy to use.

1.2 DarkNet model architecture

In the field of deep learning, we often encounter the term "Darknet". The “Darknet” mentioned here actually refers to the neural network structure used by the Yolo series of target detection algorithms—that is, the “Darknet” we are going to talk about today.

This article will introduce the Darknet53 model, which is a deep convolutional neural network model used for image recognition and target detection tasks. It is the basic network structure in the YOLO object detection algorithm. It is the backbone feature extraction network used after YOLOV3.

The Darknet53 network model uses 53 convolutional layers. Instead of using a pooling layer to reduce the size of the feature map, it uses a convolution operation with a stride of 2 to achieve downsampling of the feature map. This design can better retain the detailed information in the image, effectively reduce the number of parameters, and improve calculation efficiency.

The input of the Darknet53 model is an original image. After a series of convolution, batch normalization and activation function operations, the abstract features of the image are gradually extracted. Among them, 3×3 and 1×1 convolution kernels and LeakyReLU activation functions are mainly used to increase nonlinearity. The network finally outputs a feature map, whose channel number corresponds to the target detection results at different scales.

The Darknet53 model has a good receptive field and can capture image features of different scales and levels. By combining Darknet53 with a target detection head, accurate detection and positioning of multiple targets in the image can be achieved.
Insert image description here

1.3 How DarkNet works

For the incoming image, feature information is first extracted through multiple convolutional layers and pooling layers. The extracted feature information is then passed to the fully connected layer for classification or regression operations. Finally, the prediction results are obtained in the output layer.

2. Application background: Target detection task

As deep learning technology is increasingly widely adopted in various industries, target detection has become an important research direction in the field of computer vision and is widely used in fields such as driverless driving and security monitoring.

3. Code practice: Use PyTorch to train and test Darknet53

The following example will be based on the PyTorch framework to demonstrate how to use csv image data samples to perform target detection tasks, train the DarkNet53 model, and print the loss value and accuracy.

3.1 Data preparation

First, we need to prepare image data samples in csv format. Suppose we already have a directory containing multiple csv files, each of which contains a set of image address data.

import pandas as pd

# 读取CSV文件
data = pd.read_csv("my_csv")

3.2 Model definition

Next I will define the DarkNet model. This is implemented using the PyTorch framework:

import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary

class SE(nn.Module):

    def __init__(self, in_chnls, ratio):
        super(SE, self).__init__()
        self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
        self.compress = nn.Conv2d(in_chnls, in_chnls // ratio, 1, 1, 0)
        self.excitation = nn.Conv2d(in_chnls // ratio, in_chnls, 1, 1, 0)

    def forward(self, x):
        out = self.squeeze(x)
        out = self.compress(out)
        out = F.relu(out)
        out = self.excitation(out)
        return x*F.sigmoid(out)
    
class BN_Conv2d(nn.Module):

    def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
                 dilation=1, groups=1, bias=False, activation=nn.ReLU(inplace=True)) -> object:
        super(BN_Conv2d, self).__init__()
        layers = [nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
                            padding=padding, dilation=dilation, groups=groups, bias=bias),
                  nn.BatchNorm2d(out_channels)]
        if activation is not None:
            layers.append(activation)
        self.seq = nn.Sequential(*layers)

    def forward(self, x):
        return self.seq(x)
    
class BN_Conv2d_Leaky(nn.Module):

    def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
                 dilation=1, groups=1, bias=False) -> object:
        super(BN_Conv2d_Leaky, self).__init__()
        self.seq = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
                      padding=padding, dilation=dilation, groups=groups, bias=bias),
            nn.BatchNorm2d(out_channels)
        )

    def forward(self, x):
        return F.leaky_relu(self.seq(x))
    
class Dark_block(nn.Module):
    """block for darknet"""
    def __init__(self, channels, is_se=False, inner_channels=None):
        super(Dark_block, self).__init__()
        self.is_se = is_se
        if inner_channels is None:
            inner_channels = channels // 2
        self.conv1 = BN_Conv2d_Leaky(channels, inner_channels, 1, 1, 0)
        self.conv2 = nn.Conv2d(inner_channels, channels, 3, 1, 1)
        self.bn = nn.BatchNorm2d(channels)
        if self.is_se:
            self.se = SE(channels, 16)

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.bn(out)
        if self.is_se:
            coefficient = self.se(out)
            out *= coefficient
        out += x
        return F.leaky_relu(out)
    
class DarkNet(nn.Module):

    def __init__(self, layers: object, num_classes, is_se=False) -> object:
        super(DarkNet, self).__init__()
        self.is_se = is_se
        filters = [64, 128, 256, 512, 1024]

        self.conv1 = BN_Conv2d(3, 32, 3, 1, 1)
        self.redu1 = BN_Conv2d(32, 64, 3, 2, 1)
        self.conv2 = self.__make_layers(filters[0], layers[0])
        self.redu2 = BN_Conv2d(filters[0], filters[1], 3, 2, 1)
        self.conv3 = self.__make_layers(filters[1], layers[1])
        self.redu3 = BN_Conv2d(filters[1], filters[2], 3, 2, 1)
        self.conv4 = self.__make_layers(filters[2], layers[2])
        self.redu4 = BN_Conv2d(filters[2], filters[3], 3, 2, 1)
        self.conv5 = self.__make_layers(filters[3], layers[3])
        self.redu5 = BN_Conv2d(filters[3], filters[4], 3, 2, 1)
        self.conv6 = self.__make_layers(filters[4], layers[4])
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(filters[4], num_classes)

    def __make_layers(self, num_filter, num_layers):
        layers = []
        for _ in range(num_layers):
            layers.append(Dark_block(num_filter, self.is_se))
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.redu1(out)
        out = self.conv2(out)
        out = self.redu2(out)
        out = self.conv3(out)
        out = self.redu3(out)
        out = self.conv4(out)
        out = self.redu4(out)
        out = self.conv5(out)
        out = self.redu5(out)
        out = self.conv6(out)
        out = self.global_pool(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        # return F.softmax(out)
        return out


def darknet_53(num_classes=1000):
    return DarkNet([1, 2, 8, 8, 4], num_classes)


def test():
    net = darknet_53()
    summary(net, (3, 256, 256))

test()

3.3 Training model

After defining the model, we can start training:

# 实例化模型并设置为训练模式
model = darknet_53()
model.train()

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

# 开始训练循环 
for epoch in range(100):  
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        
        # 前向传播
        outputs = model(inputs)
        
        # 计算损失值 
        loss = criterion(outputs, labels)
        
        # 反向传播和优化 
        optimizer.zero_grad()  
        loss.backward()   
        optimizer.step() 
        
    print('Epoch [%d/%d], Loss: %.4f' %(epoch+1, num_epochs, loss.item()))

3.4 Model structure display

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 32, 256, 256]             864
       BatchNorm2d-2         [-1, 32, 256, 256]              64
              ReLU-3         [-1, 32, 256, 256]               0
              ReLU-4         [-1, 32, 256, 256]               0
              ReLU-5         [-1, 32, 256, 256]               0
              ReLU-6         [-1, 32, 256, 256]               0
              ReLU-7         [-1, 32, 256, 256]               0
              ReLU-8         [-1, 32, 256, 256]               0
         BN_Conv2d-9         [-1, 32, 256, 256]               0
           Conv2d-10         [-1, 64, 128, 128]          18,432
      BatchNorm2d-11         [-1, 64, 128, 128]             128
             ReLU-12         [-1, 64, 128, 128]               0
             ReLU-13         [-1, 64, 128, 128]               0
             ReLU-14         [-1, 64, 128, 128]               0
             ReLU-15         [-1, 64, 128, 128]               0
             ReLU-16         [-1, 64, 128, 128]               0
             ReLU-17         [-1, 64, 128, 128]               0
        BN_Conv2d-18         [-1, 64, 128, 128]               0
           Conv2d-19         [-1, 32, 128, 128]           2,048
      BatchNorm2d-20         [-1, 32, 128, 128]              64
  BN_Conv2d_Leaky-21         [-1, 32, 128, 128]               0
           Conv2d-22         [-1, 64, 128, 128]          18,496
      BatchNorm2d-23         [-1, 64, 128, 128]             128
       Dark_block-24         [-1, 64, 128, 128]               0
           Conv2d-25          [-1, 128, 64, 64]          73,728
      BatchNorm2d-26          [-1, 128, 64, 64]             256
             ReLU-27          [-1, 128, 64, 64]               0
             ReLU-28          [-1, 128, 64, 64]               0
             ReLU-29          [-1, 128, 64, 64]               0
             ReLU-30          [-1, 128, 64, 64]               0
             ReLU-31          [-1, 128, 64, 64]               0
             ReLU-32          [-1, 128, 64, 64]               0
        BN_Conv2d-33          [-1, 128, 64, 64]               0
           Conv2d-34           [-1, 64, 64, 64]           8,192
      BatchNorm2d-35           [-1, 64, 64, 64]             128
  BN_Conv2d_Leaky-36           [-1, 64, 64, 64]               0
           Conv2d-37          [-1, 128, 64, 64]          73,856
      BatchNorm2d-38          [-1, 128, 64, 64]             256
       Dark_block-39          [-1, 128, 64, 64]               0
           Conv2d-40           [-1, 64, 64, 64]           8,192
      BatchNorm2d-41           [-1, 64, 64, 64]             128
  BN_Conv2d_Leaky-42           [-1, 64, 64, 64]               0
           Conv2d-43          [-1, 128, 64, 64]          73,856
      BatchNorm2d-44          [-1, 128, 64, 64]             256
       Dark_block-45          [-1, 128, 64, 64]               0
           Conv2d-46          [-1, 256, 32, 32]         294,912
      BatchNorm2d-47          [-1, 256, 32, 32]             512
             ReLU-48          [-1, 256, 32, 32]               0
             ReLU-49          [-1, 256, 32, 32]               0
             ReLU-50          [-1, 256, 32, 32]               0
             ReLU-51          [-1, 256, 32, 32]               0
             ReLU-52          [-1, 256, 32, 32]               0
             ReLU-53          [-1, 256, 32, 32]               0
        BN_Conv2d-54          [-1, 256, 32, 32]               0
           Conv2d-55          [-1, 128, 32, 32]          32,768
      BatchNorm2d-56          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-57          [-1, 128, 32, 32]               0
           Conv2d-58          [-1, 256, 32, 32]         295,168
      BatchNorm2d-59          [-1, 256, 32, 32]             512
       Dark_block-60          [-1, 256, 32, 32]               0
           Conv2d-61          [-1, 128, 32, 32]          32,768
      BatchNorm2d-62          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-63          [-1, 128, 32, 32]               0
           Conv2d-64          [-1, 256, 32, 32]         295,168
      BatchNorm2d-65          [-1, 256, 32, 32]             512
       Dark_block-66          [-1, 256, 32, 32]               0
           Conv2d-67          [-1, 128, 32, 32]          32,768
      BatchNorm2d-68          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-69          [-1, 128, 32, 32]               0
           Conv2d-70          [-1, 256, 32, 32]         295,168
      BatchNorm2d-71          [-1, 256, 32, 32]             512
       Dark_block-72          [-1, 256, 32, 32]               0
           Conv2d-73          [-1, 128, 32, 32]          32,768
      BatchNorm2d-74          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-75          [-1, 128, 32, 32]               0
           Conv2d-76          [-1, 256, 32, 32]         295,168
      BatchNorm2d-77          [-1, 256, 32, 32]             512
       Dark_block-78          [-1, 256, 32, 32]               0
           Conv2d-79          [-1, 128, 32, 32]          32,768
      BatchNorm2d-80          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-81          [-1, 128, 32, 32]               0
           Conv2d-82          [-1, 256, 32, 32]         295,168
      BatchNorm2d-83          [-1, 256, 32, 32]             512
       Dark_block-84          [-1, 256, 32, 32]               0
           Conv2d-85          [-1, 128, 32, 32]          32,768
      BatchNorm2d-86          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-87          [-1, 128, 32, 32]               0
           Conv2d-88          [-1, 256, 32, 32]         295,168
      BatchNorm2d-89          [-1, 256, 32, 32]             512
       Dark_block-90          [-1, 256, 32, 32]               0
           Conv2d-91          [-1, 128, 32, 32]          32,768
      BatchNorm2d-92          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-93          [-1, 128, 32, 32]               0
           Conv2d-94          [-1, 256, 32, 32]         295,168
      BatchNorm2d-95          [-1, 256, 32, 32]             512
       Dark_block-96          [-1, 256, 32, 32]               0
           Conv2d-97          [-1, 128, 32, 32]          32,768
      BatchNorm2d-98          [-1, 128, 32, 32]             256
  BN_Conv2d_Leaky-99          [-1, 128, 32, 32]               0
          Conv2d-100          [-1, 256, 32, 32]         295,168
     BatchNorm2d-101          [-1, 256, 32, 32]             512
      Dark_block-102          [-1, 256, 32, 32]               0
          Conv2d-103          [-1, 512, 16, 16]       1,179,648
     BatchNorm2d-104          [-1, 512, 16, 16]           1,024
            ReLU-105          [-1, 512, 16, 16]               0
            ReLU-106          [-1, 512, 16, 16]               0
            ReLU-107          [-1, 512, 16, 16]               0
            ReLU-108          [-1, 512, 16, 16]               0
            ReLU-109          [-1, 512, 16, 16]               0
            ReLU-110          [-1, 512, 16, 16]               0
       BN_Conv2d-111          [-1, 512, 16, 16]               0
          Conv2d-112          [-1, 256, 16, 16]         131,072
     BatchNorm2d-113          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-114          [-1, 256, 16, 16]               0
          Conv2d-115          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-116          [-1, 512, 16, 16]           1,024
      Dark_block-117          [-1, 512, 16, 16]               0
          Conv2d-118          [-1, 256, 16, 16]         131,072
     BatchNorm2d-119          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-120          [-1, 256, 16, 16]               0
          Conv2d-121          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-122          [-1, 512, 16, 16]           1,024
      Dark_block-123          [-1, 512, 16, 16]               0
          Conv2d-124          [-1, 256, 16, 16]         131,072
     BatchNorm2d-125          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-126          [-1, 256, 16, 16]               0
          Conv2d-127          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-128          [-1, 512, 16, 16]           1,024
      Dark_block-129          [-1, 512, 16, 16]               0
          Conv2d-130          [-1, 256, 16, 16]         131,072
     BatchNorm2d-131          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-132          [-1, 256, 16, 16]               0
          Conv2d-133          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-134          [-1, 512, 16, 16]           1,024
      Dark_block-135          [-1, 512, 16, 16]               0
          Conv2d-136          [-1, 256, 16, 16]         131,072
     BatchNorm2d-137          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-138          [-1, 256, 16, 16]               0
          Conv2d-139          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-140          [-1, 512, 16, 16]           1,024
      Dark_block-141          [-1, 512, 16, 16]               0
          Conv2d-142          [-1, 256, 16, 16]         131,072
     BatchNorm2d-143          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-144          [-1, 256, 16, 16]               0
          Conv2d-145          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-146          [-1, 512, 16, 16]           1,024
      Dark_block-147          [-1, 512, 16, 16]               0
          Conv2d-148          [-1, 256, 16, 16]         131,072
     BatchNorm2d-149          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-150          [-1, 256, 16, 16]               0
          Conv2d-151          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-152          [-1, 512, 16, 16]           1,024
      Dark_block-153          [-1, 512, 16, 16]               0
          Conv2d-154          [-1, 256, 16, 16]         131,072
     BatchNorm2d-155          [-1, 256, 16, 16]             512
 BN_Conv2d_Leaky-156          [-1, 256, 16, 16]               0
          Conv2d-157          [-1, 512, 16, 16]       1,180,160
     BatchNorm2d-158          [-1, 512, 16, 16]           1,024
      Dark_block-159          [-1, 512, 16, 16]               0
          Conv2d-160           [-1, 1024, 8, 8]       4,718,592
     BatchNorm2d-161           [-1, 1024, 8, 8]           2,048
            ReLU-162           [-1, 1024, 8, 8]               0
            ReLU-163           [-1, 1024, 8, 8]               0
            ReLU-164           [-1, 1024, 8, 8]               0
            ReLU-165           [-1, 1024, 8, 8]               0
            ReLU-166           [-1, 1024, 8, 8]               0
            ReLU-167           [-1, 1024, 8, 8]               0
       BN_Conv2d-168           [-1, 1024, 8, 8]               0
          Conv2d-169            [-1, 512, 8, 8]         524,288
     BatchNorm2d-170            [-1, 512, 8, 8]           1,024
 BN_Conv2d_Leaky-171            [-1, 512, 8, 8]               0
          Conv2d-172           [-1, 1024, 8, 8]       4,719,616
     BatchNorm2d-173           [-1, 1024, 8, 8]           2,048
      Dark_block-174           [-1, 1024, 8, 8]               0
          Conv2d-175            [-1, 512, 8, 8]         524,288
     BatchNorm2d-176            [-1, 512, 8, 8]           1,024
 BN_Conv2d_Leaky-177            [-1, 512, 8, 8]               0
          Conv2d-178           [-1, 1024, 8, 8]       4,719,616
     BatchNorm2d-179           [-1, 1024, 8, 8]           2,048
      Dark_block-180           [-1, 1024, 8, 8]               0
          Conv2d-181            [-1, 512, 8, 8]         524,288
     BatchNorm2d-182            [-1, 512, 8, 8]           1,024
 BN_Conv2d_Leaky-183            [-1, 512, 8, 8]               0
          Conv2d-184           [-1, 1024, 8, 8]       4,719,616
     BatchNorm2d-185           [-1, 1024, 8, 8]           2,048
      Dark_block-186           [-1, 1024, 8, 8]               0
          Conv2d-187            [-1, 512, 8, 8]         524,288
     BatchNorm2d-188            [-1, 512, 8, 8]           1,024
 BN_Conv2d_Leaky-189            [-1, 512, 8, 8]               0
          Conv2d-190           [-1, 1024, 8, 8]       4,719,616
     BatchNorm2d-191           [-1, 1024, 8, 8]           2,048
      Dark_block-192           [-1, 1024, 8, 8]               0
AdaptiveAvgPool2d-193           [-1, 1024, 1, 1]               0
          Linear-194                 [-1, 1000]       1,025,000
================================================================

Parameter situation:

Total params: 41,620,488
Trainable params: 41,620,488
Non-trainable params: 0

Input size (MB): 0.75
Forward/backward pass size (MB): 472.52
Params size (MB): 158.77
Estimated Total Size (MB): 632.04

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/133173199