Literature reading: Corn leaf disease classification based on improved ConvNext

Literature reading: Corn leaf disease classification based on improved ConvNext

CBAM attention mechanism module:

1: Channel attention module performs global average pooling (AvgPool) and global maximum pooling (MaxPool) on the input feature layer (both pooling are aimed at the height and width of the input feature layer), and then average pooling The results of max pooling and max pooling are processed using the shared fully connected layer (Shared MLP), and then the results obtained by the shared fully connected layer are added and then the Sigmoid activation function is used to obtain the channel attention map, that is, the input feature layer is obtained The weight of each channel (between 0 and 1). Finally, the weights are weighted channel-by-channel onto the feature layer.

2: Different from the channel attention module, the spatial attention module focuses on which part of the input image information is more important and is a complement to the channel attention module. To calculate spatial attention, average pooling and max pooling are first applied along the channel direction of each feature point (both pooling are targeted at the channel of the input feature layer) and stacked to generate an effective feature descriptor. (Use two pooling to aggregate the channel information of a Feature Map to generate two 2D Maps, which are the average pooling characteristics and the maximum pooling characteristics of the channel), that is, generate two effective two-dimensional feature maps , and then use a standard The convolutional layer (convolution with a channel number of 1) is connected and convolved (the number of channels is adjusted), and then the Sigmoid activation function is used to obtain a two-dimensional spatial attention map, that is, the weight value of each feature point of the input feature map is obtained (between 0 and 1). Finally, the weights are weighted to the feature layer channel by channel.

ConvNext network structure

First input a 224*224 image, then pass through a convolution kernel of size 4×4 with a stride of 4, then pass through a normalization layer, followed by downsampling and ConvNext blocks, and finally pass global max pooling, normalization Unification (reduces the differences between different samples and improves the model's generalization ability for the input), and fully connects to output the final image.

The composition of the ConvNext block: First, it goes through a Depthwise convolution layer with a size of 7×7, a step size of 1 and a filling of 3. (One convolution kernel of Depthwise convolution is only responsible for one channel, and one convolution kernel is only convolved with one channel. .Then the number of convolution kernels needs to be equal to the number of input channels, and the number of output channels remains unchanged, equal to the number of input channels, and equal to the number of convolution kernels. Therefore, depthwise convolution only changes the size of the feature map, not the number of channels . However, this operation performs a convolution operation on each channel of the input layer independently, and does not effectively utilize the feature information of different channels at the same spatial position.) Then convolution and activation (GELU activation function: GELU (Gaussian Error Linear Units) ) is an activation function based on Gaussian error function. Compared with activation functions such as ReLU, GELU is smoother, which helps to improve the convergence speed and performance of the training process). After two convolutions, the feature map changes from 1 dimension to 4 dimensions, then through 1×1 convolution (equivalent to a fully connected layer) and Layer Scale size scaling, and finally output the result through a Drop Path regularization.

ConvNext original framework:
Insert image description here

Improve the framework:
Please add image description

ConvNeXt Block module code:


class Block(nn.Module): # ConvNeXt Block模块
def __init__(self, dim, drop_rate=0., layer_scale_init_value=1e-6): # 初始化函数
    super().__init__()
    self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim)  # 构建卷积depthwise conv
    self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_last")
    self.pwconv1 = nn.Linear(dim, 4 * dim)  # 1x1的卷积层和全连接层的作用是一样的 pointwise/1x1 convs, implemented with linear layers
    self.act = nn.GELU() # GELU激活函数
    self.pwconv2 = nn.Linear(4 * dim, dim) # 注意pwconv1和pwconv2的输入输出channel是不同的
    self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim,)), # layer_scale层
                              requires_grad=True) if layer_scale_init_value > 0 else None
    self.drop_path = DropPath(drop_rate) if drop_rate > 0. else nn.Identity() # 构建DropPath层
 
def forward(self, x: torch.Tensor) -> torch.Tensor: # 正向传播过程
    shortcut = x
    x = self.dwconv(x) # 通过DW卷积
    x = x.permute(0, 2, 3, 1)  # 通过permute方法调整通道顺序 [N, C, H, W] -> [N, H, W, C]
    x = self.norm(x) # LayerNorm层
    x = self.pwconv1(x) # 1x1的卷积层
    x = self.act(x) # GELU激活函数
    x = self.pwconv2(x) # 1x1的卷积层
    if self.gamma is not None:
        x = self.gamma * x   # 对每个通道的数据进行缩放
    x = x.permute(0, 3, 1, 2)  # 还原通道顺序 [N, H, W, C] -> [N, C, H, W]
 
    x = shortcut + self.drop_path(x) # 通过drop_path层并融合shortcut
    return x`

Activation function: Relu and LeakyRelu

Since the output of ReLU is zero on the negative half-axis, some neurons in the neural network can be made inactive, thereby improving the sparsity of the model. However, the negative semi-axis is zero, so that when the input data is negative, the neuron will not learn. Therefore, the LeakyRelu activation function was chosen to deal with this problem.
Insert image description here
Insert image description here

The increase in the number of channels is related to the number of image features obtained after convolution

The function of the convolutional layer is to separate the features in the input into a new feature map. Each output channel is a feature extracted by a convolution operation. In this process, ReLU activation plays a filtering role, removing the negatively correlated feature points and leaving the positively correlated ones. The greater the number of output channels, the more features can be distinguished, but there may also be duplicate features. After all, it is a probability problem.

data augmentation

Dataset introduction

The dataset used in this study underwent data augmentation. By using data enhancement methods such as rotation, Gaussian blur, adding random noise, adding occlusion at random positions, and brightness adjustment, we simulate the interference of different angles, occlusion of other background leaves, and different weather and other external factors in image acquisition, thereby preventing the model from overshooting. fitting, while improving the robustness and generalization ability of the model. The original data set is divided into training set, verification set and test set according to the ratio of 6:2:2. This experiment conducts experimental research on three common diseases in corn planting: corn gray leaf spot, corn rust, and corn leaf spot and healthy leaves. The PlantVillagedataset public data set and the Jilin Agricultural Science and Technology Institute's "Smart Agriculture" platform data set were used as test objects for data enhancement processing.

Insert image description here

experiment procedure

This experiment uses the PaddlePaddle2.3.2 deep learning framework. The programming language is python3.7, and uses a 4-core CPU and TaslaV100 GPU to accelerate training.

The network is trained using the cross-entropy loss function combined with adaptive moment estimation (Adam) as the optimizer. The optimizer can adaptively adjust the learning rate according to the training parameters for 100 training iterations. The batch size is set to 64 and the learning rate is set to 0.000001. .

Experimental results

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/perfectzxiny/article/details/134836351