Convolutional Neural Network Model

Convolutional Neural Network Model

Convolutional Neural Network (LeNet)

Model structure : convolution layer block, full connection layer block

  • Convolutional layer block: 2 convolutional layers + maximum pooling layer structure. Since LeNet is an earlier CNN, after each convolutional layer + pooling layer, there will be a sigmod layer to correct the output result. Now, Relu is more used.
  • Fully connected layer block: The input is a two-dimensional vector. When the output of a single convolutional layer block is passed to the fully connected layer, each sample will be flattened (flatten) in a small batch

LeNet will gradually decrease in width and increase in channels as the network deepens.

Deep Convolutional Neural Network (AlexNet)

Model structure : 5 layers of convolution + 2 layers of fully connected hidden layers + 1 layer of fully connected output layers

  • Convolution layer: The first two use 11x11 and 5x5 convolution kernels, and the rest are 3x3 convolution kernels. The first, second, and fifth convolutional layers all use a 3x3 max pooling layer with a stride of 2.
  • Fully connected layer: 2 fully connected layers with 4096 outputs carry nearly 1GB of model parameters.
  • Activation function: AlexNet uses the Relu activation function. Compared to sigmod, Relu has simpler computation and is easier to train with different initializations. For example, under some special initializations, the output of sigmod in the positive interval is extremely close to 0, which will make it difficult for the model to continue updating, while the value of Relu in the positive interval is always 1.
  • Overfitting: AlexNet uses the dropout method to control model complexity and prevent overfitting. And it uses a lot of image augmentation, including flipping, cropping, changing colors, etc., to further prevent overfitting.

Networks using repeating elements (VGG)

Model structure : VGG block + fully connected layer block

  • VGG block: convolutional layer + pooling layer, the convolutional layers are all filled with 1, 3x3 convolution kernel connected to a maximum pooling layer with a stride of 2 and a window of 2x2
  • Fully connected layer block: similar to LeNet

VGG is a very symmetrical network, each layer increases or decreases exponentially. Compared with AlexNet, it provides a simple and fixed construction idea of ​​convolution model and depth model.

Networks in Networks (NiN)

Model structure : NiN block

  • NiN block: AlexNet is a structure that uses multiple convolutional layers + fully connected layer output. NiN proposes another idea, which forms a network by connecting small convolutional layers + "fully connected" layers in series . Since the fully connected layer is two-dimensional and the convolutional layer is generally four-dimensional, the NiN block uses a 1x1 convolutional layer instead of a fully connected layer (each element in the spatial dimension (height and width) is equivalent to a sample, and the channel is equivalent to on features). Each convolutional layer is similar to AlexNet, 11x11, 5x5, 3x3. And each NiN block is followed by a maximum pooling layer with a stride of 2 and a window size of 3x3.

Compared with AlexNet, NiN removes the last 3 fully connected layers, uses a NiN block whose output channel is equal to the label category, and then uses a global average pooling layer to average all elements in each channel and directly use it for classification. This benefit is that the model parameter size can be significantly reduced, but it will increase the training time.

Networks with parallel connections (GoogLeNet)

  • Inception block: The basic block of GoogLeNet, which draws on the idea of ​​NiN's network series network. Include 4 parallel lines in each Inception block. The first three lines use 1x1, 3x3, and 5x5 convolutional layers to extract feature information at different spatial scales. In the second and third lines of the interim, 1x1 convolutional layers are used to reduce the number of input channels and reduce the complexity of the model. . The last one uses a 3x3 max pooling layer followed by a 1x1 convolutional layer to change the number of channels. Appropriate padding is applied to all 4 lines to ensure that the height and width of the input and output are consistent.

Residual network (ResNet)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-XmZToNSs-1649515875246)(https://d2l.ai/_images/resnet-block.svg)]

  • Residual block: Generally speaking, the input to the activation function is the output result of the calculation of the neural network layer by layer, but due to the continuous deepening of the network, gradient instability (gradient explosion, gradient disappearance) is prone to occur. As the network gradually deepens, the error will not become smaller and smaller. The purpose of the residual block is to solve the gradient instability. It makes the output result need to refer to the input result through a skip connection.

  • 残差块原理: a [ l + 2 ] = g ( z [ l + 2 ] + a [ l ] ) = g ( w [ l + 2 ] a [ l + 1 ] + b [ l + 2 ] a [ l ] ) a^{[l+2]}=g(z^{[l+2]}+a^{[l]})=g(w^{[l+2]}a^{[l+1]} + b^{[l+2]}a^{[l]}) a[l+2]=g(z[l+2]+a[l])=g(w[l+2]a[l+1]+b[l+2]a[ l ] )we don't consider b now[ l + 2 ] b^{[l+2]}b[ l + 2 ] , when the gradient disappears,w [ l + 2 ] = 0 w^{[l+2]}=0w[l+2]=0, 此时 a [ l + 2 ] = g ( a [ l ] ) a^{[l+2]}=g(a^{[l]}) a[l+2]=g(a[ l ] ), which is equivalent to outputting the output of the first layer directly through Relu. There will be no negative impact due to gradient disappearance.

Densely connected network (DenseNet)

Model structure : dense layer + transition layer

  • Dense layer: DenseNet and ResNet are very similar, the difference is that DenseNet does not directly add the output of the previous module to the output of the module like ResNet, but directly superimposes on the channel
  • Transition layer: In order to prevent the number of channels from being superimposed and lead to excessive model complexity, the transition layer reduces the number of channels by using a 1x1 convolutional layer, and uses an average pooling layer with a stride of 2 to halve the height and width to further reduce complexity. .

Guess you like

Origin blog.csdn.net/Kevin_Carpricron/article/details/124070006