Hello everyone, I am Wei Xue AI. Today I will introduce to you the artificial intelligence (pytorch) building model 19-step by step using the pytorch framework to build the target detection DarkNet model and show the network structure. With the continuous development of deep learning technology, various convolutional neural network models emerge in endlessly. Among them, DarkNet, as a fast and accurate target detection model, has been widely used in the field of computer vision. This article will introduce the architecture and principles of the DarkNet model in detail, and demonstrate its application in target detection tasks through example training.
1. DarkNet model architecture and principles
1.1 Introduction to DarkNet
DarkNet is an open source neural network framework created by Joseph Redmon. It is implemented in C and CUDA and supports CPU and GPU computing. This framework is very lightweight and easy to use.
1.2 DarkNet model architecture
In the field of deep learning, we often encounter the term "Darknet". The “Darknet” mentioned here actually refers to the neural network structure used by the Yolo series of target detection algorithms—that is, the “Darknet” we are going to talk about today.
This article will introduce the Darknet53 model, which is a deep convolutional neural network model used for image recognition and target detection tasks. It is the basic network structure in the YOLO object detection algorithm. It is the backbone feature extraction network used after YOLOV3.
The Darknet53 network model uses 53 convolutional layers. Instead of using a pooling layer to reduce the size of the feature map, it uses a convolution operation with a stride of 2 to achieve downsampling of the feature map. This design can better retain the detailed information in the image, effectively reduce the number of parameters, and improve calculation efficiency.
The input of the Darknet53 model is an original image. After a series of convolution, batch normalization and activation function operations, the abstract features of the image are gradually extracted. Among them, 3×3 and 1×1 convolution kernels and LeakyReLU activation functions are mainly used to increase nonlinearity. The network finally outputs a feature map, whose channel number corresponds to the target detection results at different scales.
The Darknet53 model has a good receptive field and can capture image features of different scales and levels. By combining Darknet53 with a target detection head, accurate detection and positioning of multiple targets in the image can be achieved.
1.3 How DarkNet works
For the incoming image, feature information is first extracted through multiple convolutional layers and pooling layers. The extracted feature information is then passed to the fully connected layer for classification or regression operations. Finally, the prediction results are obtained in the output layer.
2. Application background: Target detection task
As deep learning technology is increasingly widely adopted in various industries, target detection has become an important research direction in the field of computer vision and is widely used in fields such as driverless driving and security monitoring.
3. Code practice: Use PyTorch to train and test Darknet53
The following example will be based on the PyTorch framework to demonstrate how to use csv image data samples to perform target detection tasks, train the DarkNet53 model, and print the loss value and accuracy.
3.1 Data preparation
First, we need to prepare image data samples in csv format. Suppose we already have a directory containing multiple csv files, each of which contains a set of image address data.
import pandas as pd
# 读取CSV文件
data = pd.read_csv("my_csv")
3.2 Model definition
Next I will define the DarkNet model. This is implemented using the PyTorch framework:
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class SE(nn.Module):
def __init__(self, in_chnls, ratio):
super(SE, self).__init__()
self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
self.compress = nn.Conv2d(in_chnls, in_chnls // ratio, 1, 1, 0)
self.excitation = nn.Conv2d(in_chnls // ratio, in_chnls, 1, 1, 0)
def forward(self, x):
out = self.squeeze(x)
out = self.compress(out)
out = F.relu(out)
out = self.excitation(out)
return x*F.sigmoid(out)
class BN_Conv2d(nn.Module):
def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
dilation=1, groups=1, bias=False, activation=nn.ReLU(inplace=True)) -> object:
super(BN_Conv2d, self).__init__()
layers = [nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=bias),
nn.BatchNorm2d(out_channels)]
if activation is not None:
layers.append(activation)
self.seq = nn.Sequential(*layers)
def forward(self, x):
return self.seq(x)
class BN_Conv2d_Leaky(nn.Module):
def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
dilation=1, groups=1, bias=False) -> object:
super(BN_Conv2d_Leaky, self).__init__()
self.seq = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=bias),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
return F.leaky_relu(self.seq(x))
class Dark_block(nn.Module):
"""block for darknet"""
def __init__(self, channels, is_se=False, inner_channels=None):
super(Dark_block, self).__init__()
self.is_se = is_se
if inner_channels is None:
inner_channels = channels // 2
self.conv1 = BN_Conv2d_Leaky(channels, inner_channels, 1, 1, 0)
self.conv2 = nn.Conv2d(inner_channels, channels, 3, 1, 1)
self.bn = nn.BatchNorm2d(channels)
if self.is_se:
self.se = SE(channels, 16)
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
out = self.bn(out)
if self.is_se:
coefficient = self.se(out)
out *= coefficient
out += x
return F.leaky_relu(out)
class DarkNet(nn.Module):
def __init__(self, layers: object, num_classes, is_se=False) -> object:
super(DarkNet, self).__init__()
self.is_se = is_se
filters = [64, 128, 256, 512, 1024]
self.conv1 = BN_Conv2d(3, 32, 3, 1, 1)
self.redu1 = BN_Conv2d(32, 64, 3, 2, 1)
self.conv2 = self.__make_layers(filters[0], layers[0])
self.redu2 = BN_Conv2d(filters[0], filters[1], 3, 2, 1)
self.conv3 = self.__make_layers(filters[1], layers[1])
self.redu3 = BN_Conv2d(filters[1], filters[2], 3, 2, 1)
self.conv4 = self.__make_layers(filters[2], layers[2])
self.redu4 = BN_Conv2d(filters[2], filters[3], 3, 2, 1)
self.conv5 = self.__make_layers(filters[3], layers[3])
self.redu5 = BN_Conv2d(filters[3], filters[4], 3, 2, 1)
self.conv6 = self.__make_layers(filters[4], layers[4])
self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(filters[4], num_classes)
def __make_layers(self, num_filter, num_layers):
layers = []
for _ in range(num_layers):
layers.append(Dark_block(num_filter, self.is_se))
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv1(x)
out = self.redu1(out)
out = self.conv2(out)
out = self.redu2(out)
out = self.conv3(out)
out = self.redu3(out)
out = self.conv4(out)
out = self.redu4(out)
out = self.conv5(out)
out = self.redu5(out)
out = self.conv6(out)
out = self.global_pool(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
# return F.softmax(out)
return out
def darknet_53(num_classes=1000):
return DarkNet([1, 2, 8, 8, 4], num_classes)
def test():
net = darknet_53()
summary(net, (3, 256, 256))
test()
3.3 Training model
After defining the model, we can start training:
# 实例化模型并设置为训练模式
model = darknet_53()
model.train()
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
# 开始训练循环
for epoch in range(100):
for i, data in enumerate(train_loader, 0):
inputs, labels = data
# 前向传播
outputs = model(inputs)
# 计算损失值
loss = criterion(outputs, labels)
# 反向传播和优化
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('Epoch [%d/%d], Loss: %.4f' %(epoch+1, num_epochs, loss.item()))
3.4 Model structure display
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 256, 256] 864
BatchNorm2d-2 [-1, 32, 256, 256] 64
ReLU-3 [-1, 32, 256, 256] 0
ReLU-4 [-1, 32, 256, 256] 0
ReLU-5 [-1, 32, 256, 256] 0
ReLU-6 [-1, 32, 256, 256] 0
ReLU-7 [-1, 32, 256, 256] 0
ReLU-8 [-1, 32, 256, 256] 0
BN_Conv2d-9 [-1, 32, 256, 256] 0
Conv2d-10 [-1, 64, 128, 128] 18,432
BatchNorm2d-11 [-1, 64, 128, 128] 128
ReLU-12 [-1, 64, 128, 128] 0
ReLU-13 [-1, 64, 128, 128] 0
ReLU-14 [-1, 64, 128, 128] 0
ReLU-15 [-1, 64, 128, 128] 0
ReLU-16 [-1, 64, 128, 128] 0
ReLU-17 [-1, 64, 128, 128] 0
BN_Conv2d-18 [-1, 64, 128, 128] 0
Conv2d-19 [-1, 32, 128, 128] 2,048
BatchNorm2d-20 [-1, 32, 128, 128] 64
BN_Conv2d_Leaky-21 [-1, 32, 128, 128] 0
Conv2d-22 [-1, 64, 128, 128] 18,496
BatchNorm2d-23 [-1, 64, 128, 128] 128
Dark_block-24 [-1, 64, 128, 128] 0
Conv2d-25 [-1, 128, 64, 64] 73,728
BatchNorm2d-26 [-1, 128, 64, 64] 256
ReLU-27 [-1, 128, 64, 64] 0
ReLU-28 [-1, 128, 64, 64] 0
ReLU-29 [-1, 128, 64, 64] 0
ReLU-30 [-1, 128, 64, 64] 0
ReLU-31 [-1, 128, 64, 64] 0
ReLU-32 [-1, 128, 64, 64] 0
BN_Conv2d-33 [-1, 128, 64, 64] 0
Conv2d-34 [-1, 64, 64, 64] 8,192
BatchNorm2d-35 [-1, 64, 64, 64] 128
BN_Conv2d_Leaky-36 [-1, 64, 64, 64] 0
Conv2d-37 [-1, 128, 64, 64] 73,856
BatchNorm2d-38 [-1, 128, 64, 64] 256
Dark_block-39 [-1, 128, 64, 64] 0
Conv2d-40 [-1, 64, 64, 64] 8,192
BatchNorm2d-41 [-1, 64, 64, 64] 128
BN_Conv2d_Leaky-42 [-1, 64, 64, 64] 0
Conv2d-43 [-1, 128, 64, 64] 73,856
BatchNorm2d-44 [-1, 128, 64, 64] 256
Dark_block-45 [-1, 128, 64, 64] 0
Conv2d-46 [-1, 256, 32, 32] 294,912
BatchNorm2d-47 [-1, 256, 32, 32] 512
ReLU-48 [-1, 256, 32, 32] 0
ReLU-49 [-1, 256, 32, 32] 0
ReLU-50 [-1, 256, 32, 32] 0
ReLU-51 [-1, 256, 32, 32] 0
ReLU-52 [-1, 256, 32, 32] 0
ReLU-53 [-1, 256, 32, 32] 0
BN_Conv2d-54 [-1, 256, 32, 32] 0
Conv2d-55 [-1, 128, 32, 32] 32,768
BatchNorm2d-56 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-57 [-1, 128, 32, 32] 0
Conv2d-58 [-1, 256, 32, 32] 295,168
BatchNorm2d-59 [-1, 256, 32, 32] 512
Dark_block-60 [-1, 256, 32, 32] 0
Conv2d-61 [-1, 128, 32, 32] 32,768
BatchNorm2d-62 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-63 [-1, 128, 32, 32] 0
Conv2d-64 [-1, 256, 32, 32] 295,168
BatchNorm2d-65 [-1, 256, 32, 32] 512
Dark_block-66 [-1, 256, 32, 32] 0
Conv2d-67 [-1, 128, 32, 32] 32,768
BatchNorm2d-68 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-69 [-1, 128, 32, 32] 0
Conv2d-70 [-1, 256, 32, 32] 295,168
BatchNorm2d-71 [-1, 256, 32, 32] 512
Dark_block-72 [-1, 256, 32, 32] 0
Conv2d-73 [-1, 128, 32, 32] 32,768
BatchNorm2d-74 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-75 [-1, 128, 32, 32] 0
Conv2d-76 [-1, 256, 32, 32] 295,168
BatchNorm2d-77 [-1, 256, 32, 32] 512
Dark_block-78 [-1, 256, 32, 32] 0
Conv2d-79 [-1, 128, 32, 32] 32,768
BatchNorm2d-80 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-81 [-1, 128, 32, 32] 0
Conv2d-82 [-1, 256, 32, 32] 295,168
BatchNorm2d-83 [-1, 256, 32, 32] 512
Dark_block-84 [-1, 256, 32, 32] 0
Conv2d-85 [-1, 128, 32, 32] 32,768
BatchNorm2d-86 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-87 [-1, 128, 32, 32] 0
Conv2d-88 [-1, 256, 32, 32] 295,168
BatchNorm2d-89 [-1, 256, 32, 32] 512
Dark_block-90 [-1, 256, 32, 32] 0
Conv2d-91 [-1, 128, 32, 32] 32,768
BatchNorm2d-92 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-93 [-1, 128, 32, 32] 0
Conv2d-94 [-1, 256, 32, 32] 295,168
BatchNorm2d-95 [-1, 256, 32, 32] 512
Dark_block-96 [-1, 256, 32, 32] 0
Conv2d-97 [-1, 128, 32, 32] 32,768
BatchNorm2d-98 [-1, 128, 32, 32] 256
BN_Conv2d_Leaky-99 [-1, 128, 32, 32] 0
Conv2d-100 [-1, 256, 32, 32] 295,168
BatchNorm2d-101 [-1, 256, 32, 32] 512
Dark_block-102 [-1, 256, 32, 32] 0
Conv2d-103 [-1, 512, 16, 16] 1,179,648
BatchNorm2d-104 [-1, 512, 16, 16] 1,024
ReLU-105 [-1, 512, 16, 16] 0
ReLU-106 [-1, 512, 16, 16] 0
ReLU-107 [-1, 512, 16, 16] 0
ReLU-108 [-1, 512, 16, 16] 0
ReLU-109 [-1, 512, 16, 16] 0
ReLU-110 [-1, 512, 16, 16] 0
BN_Conv2d-111 [-1, 512, 16, 16] 0
Conv2d-112 [-1, 256, 16, 16] 131,072
BatchNorm2d-113 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-114 [-1, 256, 16, 16] 0
Conv2d-115 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-116 [-1, 512, 16, 16] 1,024
Dark_block-117 [-1, 512, 16, 16] 0
Conv2d-118 [-1, 256, 16, 16] 131,072
BatchNorm2d-119 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-120 [-1, 256, 16, 16] 0
Conv2d-121 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-122 [-1, 512, 16, 16] 1,024
Dark_block-123 [-1, 512, 16, 16] 0
Conv2d-124 [-1, 256, 16, 16] 131,072
BatchNorm2d-125 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-126 [-1, 256, 16, 16] 0
Conv2d-127 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-128 [-1, 512, 16, 16] 1,024
Dark_block-129 [-1, 512, 16, 16] 0
Conv2d-130 [-1, 256, 16, 16] 131,072
BatchNorm2d-131 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-132 [-1, 256, 16, 16] 0
Conv2d-133 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-134 [-1, 512, 16, 16] 1,024
Dark_block-135 [-1, 512, 16, 16] 0
Conv2d-136 [-1, 256, 16, 16] 131,072
BatchNorm2d-137 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-138 [-1, 256, 16, 16] 0
Conv2d-139 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-140 [-1, 512, 16, 16] 1,024
Dark_block-141 [-1, 512, 16, 16] 0
Conv2d-142 [-1, 256, 16, 16] 131,072
BatchNorm2d-143 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-144 [-1, 256, 16, 16] 0
Conv2d-145 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-146 [-1, 512, 16, 16] 1,024
Dark_block-147 [-1, 512, 16, 16] 0
Conv2d-148 [-1, 256, 16, 16] 131,072
BatchNorm2d-149 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-150 [-1, 256, 16, 16] 0
Conv2d-151 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-152 [-1, 512, 16, 16] 1,024
Dark_block-153 [-1, 512, 16, 16] 0
Conv2d-154 [-1, 256, 16, 16] 131,072
BatchNorm2d-155 [-1, 256, 16, 16] 512
BN_Conv2d_Leaky-156 [-1, 256, 16, 16] 0
Conv2d-157 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-158 [-1, 512, 16, 16] 1,024
Dark_block-159 [-1, 512, 16, 16] 0
Conv2d-160 [-1, 1024, 8, 8] 4,718,592
BatchNorm2d-161 [-1, 1024, 8, 8] 2,048
ReLU-162 [-1, 1024, 8, 8] 0
ReLU-163 [-1, 1024, 8, 8] 0
ReLU-164 [-1, 1024, 8, 8] 0
ReLU-165 [-1, 1024, 8, 8] 0
ReLU-166 [-1, 1024, 8, 8] 0
ReLU-167 [-1, 1024, 8, 8] 0
BN_Conv2d-168 [-1, 1024, 8, 8] 0
Conv2d-169 [-1, 512, 8, 8] 524,288
BatchNorm2d-170 [-1, 512, 8, 8] 1,024
BN_Conv2d_Leaky-171 [-1, 512, 8, 8] 0
Conv2d-172 [-1, 1024, 8, 8] 4,719,616
BatchNorm2d-173 [-1, 1024, 8, 8] 2,048
Dark_block-174 [-1, 1024, 8, 8] 0
Conv2d-175 [-1, 512, 8, 8] 524,288
BatchNorm2d-176 [-1, 512, 8, 8] 1,024
BN_Conv2d_Leaky-177 [-1, 512, 8, 8] 0
Conv2d-178 [-1, 1024, 8, 8] 4,719,616
BatchNorm2d-179 [-1, 1024, 8, 8] 2,048
Dark_block-180 [-1, 1024, 8, 8] 0
Conv2d-181 [-1, 512, 8, 8] 524,288
BatchNorm2d-182 [-1, 512, 8, 8] 1,024
BN_Conv2d_Leaky-183 [-1, 512, 8, 8] 0
Conv2d-184 [-1, 1024, 8, 8] 4,719,616
BatchNorm2d-185 [-1, 1024, 8, 8] 2,048
Dark_block-186 [-1, 1024, 8, 8] 0
Conv2d-187 [-1, 512, 8, 8] 524,288
BatchNorm2d-188 [-1, 512, 8, 8] 1,024
BN_Conv2d_Leaky-189 [-1, 512, 8, 8] 0
Conv2d-190 [-1, 1024, 8, 8] 4,719,616
BatchNorm2d-191 [-1, 1024, 8, 8] 2,048
Dark_block-192 [-1, 1024, 8, 8] 0
AdaptiveAvgPool2d-193 [-1, 1024, 1, 1] 0
Linear-194 [-1, 1000] 1,025,000
================================================================
Parameter situation:
Total params: 41,620,488
Trainable params: 41,620,488
Non-trainable params: 0
Input size (MB): 0.75
Forward/backward pass size (MB): 472.52
Params size (MB): 158.77
Estimated Total Size (MB): 632.04