Clear and clear AlexNet network structure

  • AlexNet is the champion network of the 2012 ISLVRC 2012 competition. The classification accuracy has increased from 70% to 80%. At that time, due to the bottleneck period, the increase of 10% is very impressive. The designer of this network is Hinton and him. The academic Alex Krizhevsky, also from this year, deep learning began to develop rapidly.
    The following figure is taken from the paper of the AlexNet network:
    Insert picture description here
    the two points of the network are:
    1>The GPU is used for network acceleration for the first time;
    2>The ReLU activation function is used instead of the traditional Sigmoid activation function (in the process of derivation It is more troublesome and the gradient disappears when the network is deep) and Tanh activation function;
    3>Using LRN local response normalization;
    4>Using Dropout random inactivation nerve in the first two layers of the fully connected layer Meta operations to reduce overfitting.
    Note: The Dropout operation is used between the sampling and flattening and fully connected layers under maximum pooling, and the input neurons are the input neurons that are inactivated by Dropout.

So what is the phenomenon of overfitting?
The root cause of over-fitting is too many feature dimensions, too complex model assumptions, too few training sets, and too much noise. The fitted network predicts the training set perfectly, but the test set of new data has poor prediction results. Overfitting the training data without considering the generalization of the model.

Insert picture description here
How to solve the phenomenon of overfitting?

  • The author of AlexNet proposed the use of Dropout to randomly deactivate some neurons during the forward propagation process of the network. The picture on the left is a normal fully connected forward propagation process, and each node is fully connected to the lower node. If After using Dropout, a part of the neurons in each layer will be inactivated randomly. It can be considered that the Dropout operation reduces the training parameters in the network, so as to solve the overfitting.
    Insert picture description here

Let's start with a detailed explanation of AlexNet's network structure:
the network in the original text has two upper and lower layers. The reason is that the author used two GPUs for parallel computing at the time. In order to facilitate understanding, only a part of it is needed, because the upper and lower layers of the network are both It is exactly the same. It can be seen from the figure that the original image is a color image of [224,224,3],

  • The size of the convolution kernel of the first convolution layer is [11,11], stride=4, and the number of convolution kernels is 48. Since there are 48 in the upper and lower layers, there are 96 convolution kernels in total. Of course, this The figure does not indicate the number of padding, only the size of the feature layer after convolution is [55,55,96], which can be calculated by the formula, add a column of 0 to the left of the input and two columns of 0 to the right. Add a column of 0 above, and add two columns of 0 below;
  • The second layer is the maximum pooling downsampling. The figure does not show the size and stride of the pooling core, but it can be obtained by checking some other data. Its kernel_size=3, padding=0, stride=2, note, Pooling operation will only change the width and height of the feature, but will not change the depth of the feature matrix;
  • The next time it is a convolutional layer. According to the annotations in the figure, the number of convolution kernels is 128×2=256, and the size of the convolution kernel is 5. At the same time, according to the information and some source code, you can Get padding=[2,2], stride=1, and finally the output is [27,27,256] through the formula;
  • After sampling with a maximum pooling, the size of the convolution kernel is also equal to 3, padding=1, stride=2, and the output is [13,13,256];
  • The third convolution layer: From the information in the figure, it can be concluded that the number of convolution kernels is 192×2=384, the size of the convolution kernel is 3, and the information is obtained, padding=[1,1], stride=1 , Substitute the formula to get the output [13,13,384];
  • The fourth convolutional layer: the configuration is exactly the same as the third convolutional layer, so the dimensions of the input and output are both [13,13,384]
  • The fifth convolution layer: the number of convolution kernels is 158×2=256, the size of the convolution kernel=3, padding=[1,1], stride=1, the output is [13,13,256]
  • The last maximum pooling downsampling layer, the figure does not give any information about this layer, by referring to some materials and source code, kernel_size=3, padding=0, stride=2, so the final output is [6,6,256]
  • Finally, three fully connected layers are connected, so you don’t need to analyze them. You only need to flatten and fully connect the output after downsampling. It is necessary to mention the last layer here. There are 1000 in the figure. Nodes, because the data set of the paper has a thousand categories, so there are a thousand nodes. If we want to apply the network to our own data set, it is okay to change several categories to a few.

Insert picture description here
The overall structure of the AlexNet network is as follows:
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_42308217/article/details/110004915