[Pytorch Neural Network Theory] 48 Image classification model: ResNet model + DenseNet model + EffcientNet model

 

1 ResNet model

 In the field of deep learning, the deeper the model, the stronger the fitting ability. It is normal for overfitting problems to occur, but it is not normal for the training error to increase.

1.1 The reason why the training error is getting bigger and bigger

In backpropagation, the gradient of each layer is calculated on the basis of the previous layer. As the number of layers increases, the gradient will become smaller and smaller as it propagates through multiple layers, until the gradient disappears, so as the number of layers increases, the training error will become larger and larger.

The motivation of the ResNet model is to solve the problem that the network cannot be trained when the network level is relatively deep. Drawing on the idea of ​​​​the high-speed network model, a residual connection module is designed. This module allows the depth of the model to reach 152 layers.

1.2 Structure of Residual Connections

On the standard feedforward convolutional neural network, a direct connection is added, bypassing the connection method of the intermediate layer, so that the input can go directly to the output.

1.2.1 Structure diagram of residual connection

1.2.2 Definition and Description of Residual Connections

 1.3 The principle of residual connection

The residual connection passes the original input directly to Addiion by bypassing the change in the middle. In the process of backpropagation, when the error is passed to the input layer, the sum of two errors will be obtained, one is the multi-layer network error on the left , one is the original error on the right. The left side will get smaller and smaller as the number of layers increases, and the right side will be directly connected to the input layer by the Addition, so the gradient of the Addition will be retained. In this way, the added gradient obtained by the input layer is not so small, which can ensure that the error is then passed down.

1.3.1 The principle of residual connection

This method seems to solve the problem that the gradient is getting smaller and smaller, but the residual connection also plays a role in the forward direction.

Due to the positive effect, the network structure is a parallel model, that is, the role of the residual connection is to change the network serial to parallel.

This is the principle of the lnception V4 model combined with the residual network, instead of using the residual connection, it achieves the same effect as the Inception-ResNet V2 model.

2 DenSeNet model

The DenseNet model was proposed in 2017. This model is a densely connected convolutional neural network CNN. Each network layer will accept all the previous layers as its input, which means that the input of each layer of the network is the output of all previous layers. set.

2.1 Network structure of DenSeNet model

2.1.1 DenSeNet model

Each feature map is connected to the feature maps of all previous layers, that is, each layer will accept all previous layers as its input. For an L-layer network, the DenseNet model contains a total of L(L+1)2 connections.

2.1.2 DenSeNet model diagram

2.2 Advantages of DenseNet Model

  1. Each layer of the DenseNet model is closely connected to all the previous layers and can reach the final error signal. The backpropagation of the ascending gradient alleviates the problem of gradient disappearance and makes the network easier to train.
  2. The DenseNet model realizes short-circuit connections by splicing feature maps, thereby realizing feature reuse, and adopts a small growth rate, and the feature maps unique to each layer are relatively small.
  3. By enhancing the propagation of feature maps, the feature maps of the previous layer are directly transmitted to the subsequent layers, which can make full use of the features of different levels.

2.3 Defects of DenseNet Model

The DenseNet model may consume a lot of GPU memory. Generally, the graphics card cannot store deeper DenseNet models and needs to be carefully optimized.

2.4 Dense and fast

Dense blocks are a unique structure in the DenseNet model.

2.4.1 Composition of dense blocks

The dense block contains two convolutional layers with different kernel sizes (1×1 and 3×3, respectively). Each dense block consists of L fully connected layers.

The full connection is only in one dense block, and there is no full connection between different dense blocks, that is, the full connection only occurs in the dense block.

3 EffcientNet model

The MnasNet model is an automatic neural architecture search method for resource-constrained terminal CNN models proposed by the Google team. This method is implemented using the idea of ​​reinforcement learning.

3.1 Steps of EffcientNet Model

1. Use the MnasNet model implemented by the reinforcement learning algorithm to generate the benchmark model EfficientNet-B0.
2. Using the compound scaling method, under the constraints of the preset memory and calculation amount, the three dimensions of depth, width (the number of channels of the feature map), and image size of the EfficientNet-BO model are simultaneously scaled. The scaling of 3 dimensions is obtained by grid search, and finally the EfficientNet model is output.

3.2 Schematic diagram of parameter tuning of the EffcientNet model

  • Figure 1-12(a) is the benchmark model.
  • Figure 1-12(b) is based on the benchmark model to perform width scaling, that is, to increase the number of channels in the image.
  • Figure 1-12(c) is depth scaling based on the baseline model, that is, increasing the number of layers of the network.
  • Figure 1-12(d) scales the image size based on the benchmark model.
  • Figure 1-12(e) scales the depth, width, and size of the image at the same time on the basis of the benchmark model.

3.3 Effect of EfficientNet Model

On the ImageNet dataset, the Top-1 accuracy rate reaches 84.4%, and the Top-5 accuracy rate reaches 97.1%, but its size is only 1/8.4 of the best known deep convolution model, and it is faster than the best known deep convolution model. The product model is 6.1 times faster.

The EfcientNet model meets the requirements of reducing the computational load or memory requirements of the model without reducing the accuracy of the model (see the paper numbered "1905.11946" on the arXIV website).

3.4 MBConv convolution block

The internals of the EficientNet model are implemented through multiple MBConv convolution blocks.

3.4.1 Features of MBConv convolution block

The MBConv convolution block uses a structure similar to the residual connection. The difference is that the SE module is used in the short connection part, and the commonly used ReLU activation function is replaced by the Swish activation function, and the DropConnect layer is used instead of the traditional Dropout layer.

3.4.2 Structure of MBConv Convolutional Block

3.4.3  Tip

The BN operation is not used in the SE module, and the Sigmoid activation function is not replaced by the Swish activation function.

3.5 DropConnect

In the deep neural network, the functions of the DropConnect layer and the Dropout layer are to prevent the model from overfitting. The DropConnect layer will work better.

3.5.1 Comparison between DropConnect layer and Dropout layer

DropConnect layer: In the process of training the neural network model, the input of the hidden layer nodes is randomly dropped.

Dropout layer: In the process of training the neural network model, the output of the hidden layer nodes is randomly discarded

3.5.2 DropConnect layer and Dropout layer structure diagram

 

   

Guess you like

Origin blog.csdn.net/qq_39237205/article/details/123926241