Depth understanding of network AlexNet

  In the field of image classification must mention is the large-scale visual ImageNet Challenge (ILSVRC), it is called the depth benchmark study of progress in the study of image classification task. AlexNet network participated ILSVRC2012年大赛to 10% higher than the second performance advantage made the championship. AlexNet network is also VGGNet, foundation and start GoogLeNet, ResNet, DenseNet other classic network.
  There are three common ways of convolution, convolution that is Full, SAME Vaild convolution and convolution. If the input image pixels for the convolution kernel size. Valid When using convolution, magnitude of the output signal is:
  using another convolutional convolution SAME way, the size of the output signal is:
  Alex Krizhevsky like imageNet classification with deep convolutional neural networks proposed AlexNet network. Krizhevsky and other two GITX 580 3GB GPU using the trained neural network training convolution ImageNet training set, and achieved a high degree of accuracy. They also pointed out that there are simply better performance GPU and larger data sets can get better performance of the network.
  Proposed in this paper AlexNet network is a network layer 8, comprising 5 layers convolution layer (wherein the first, second, Volume laminate attached on the maximum cell layer) and layer 3 fully connected layers (the last layer contains 1000 Road softmax classifier). AlexNet network allocation on two GPU, GPU only exchange data in the fixed layer. Using the techniques of parallel cross convolution evenly onto the two GPU, the GPU so that two identical network configuration. Select two GPU treated as a whole for analysis, the structure of FIG see FIG third, not shown in FIG normalized partial response operation and no effect on the data dropout, etc. of varying size. Convolution way the network using Valid. Relationship between input and output size of the first formula, see FIG.
  The second, fourth and V convolution laminated only on one and the same GPU in connection convolution kernel mapping. The third layer is a convolution kernel convolution connected to the second layer mapping all the convolution kernel. Fully connected layers of neurons in the preceding layer and is connected to all neurons. In response to a normalization operation may be performed after the first layer and a second layer convolution. It is the largest confluent layer follows the response normalization layer and fifth layer of convolution. Relu output activation function convolution in each layer is fully connected and.
  The first layer is a convolution of the input image size 227 × 227 × 3 using 96 dimensions 11 × 11 × 3, step 4 of the convolution kernel. Volume second convolutional layer laminated output as an input, and with a size of the convolution kernel 256 performs convolution of 5 × 5 × 48. The third, fourth and V laminated to each other without any pooling layer or a normalized layer. The third layer is connected to a second convolution convolution layer (pooled normalized) output, a third layer 384 convolution size of 3 × 3 × 256 of the convolution kernel. Convolution fourth layer 384 convolution kernel size of 3 × 3 × 192, and 256 V of laminate of size 3 × 3 × 192 convolution kernel. Each layer has fully connected 4096 neurons. This is the most concise description of the network AlexNet I've ever seen.
  There is also a network structure using convolution SAME. In the same network configuration, the size of the input data vary. Fourth figure, the normalized partial response operation on the data of the dimensional change does not affect a dropout and the like not shown in the drawings.
  Detailed analysis is as follows, FIG fifth data flow diagram conv1 phase.
  Herein the first convolutional layer, a first layer and a pool of normalized partial response conv1 stage operations are considered. The first layer is a convolution of the input image to obtain a fixed size of 224 × 224 × 3 image by resize. Then the input image using a convolution kernel 11 × 11 × 3 convolution operation, convolution method is selected SAME convolution. Each convolution generates a feature map on the new pixel, the convolution kernel in the right downward direction of pixels in the original image are sequentially convolving each movement step size is four pixels. When there is insufficient pixel of the original image, using zero padding zeros. In each row, the movement of the convolution kernel 56 generates a new pixel in the feature map, 56 × 56 rows and columns of pixel points formed after the original image of FIG characterized convolution. Characterized Conv1 stage to give a size of 56 × 56 × 96 in FIG's. These features relu FIG treated cells, the size of which is still 56 × 56 × 96. Treatment of a cell layer is a diagram of the convolution characteristics were pooled layer output operation. Zero padding the same manner as the first convolution layer manner, i.e. SAME convolution. After pooling the output width or height of the image is 26 pixels. I.e. the pool size of the data after 26 × 26 × 96. Finally, in response to local normalization processing. Size of the data will not change. Conv2 data input the output stage, and pools convolution process is repeated, since the other layers of the convolution stage conv1 similar omitted.
  Although FIG. AlexNet fifth network does not normalized partial response painting operation, but has partial response normalized AlexNet layer after layer in each pool. The purpose of normalization is to avoid layer information with the deepening levels and lead to the emergence of a decreasing trend layer by layer, play a role in accelerating the convergence speed of neural networks. However, studies in recent years indicate that partial response normalization layer of the neural network training process play a little help, so gradually no longer be used.
  Sixth figure is analyzing data flow diagram fc6 stage.
  AlexNet sixth layer is a layer fully connected. The input data size is 7 × 7 × 256, and convolution collation data having the same size convolution operation. Each convolution of the input 7 × 7 × 256 data size to obtain a convolution calculation result of the convolution. A total of 4096 fc6 layer neurons, each neuron outputting a result of the convolution, the convolution result 4096 common output. Relu convolution result by the activation function, and then inactivated by a random operation, to retain 50% of each neuron. Fc6 final output stage of the data, a size of 4096.
  In the course of operation fc6 stage, the size of the size of the convolution kernel to be used in the process are characterized in FIG. 7 × 7 × 256, i.e., each parameter in the convolution kernel with only a pixel value of the phase characteristic of FIG. multiply. Convolution in other layers, each convolution kernel coefficients corresponding to a plurality of pixel values, the origin of which is fully connected layer.
  Random inactivation strategy sixth figure used can effectively prevent over-fitting, the principle is the training in each batch, so some random hidden layer neural element does not participate in the work, but the weight of these neurons is still It has been retained. Random inactivation policy conclusion is that the neural network to improve performance by reducing the interdependence between neurons.
  The fc7 stage and fc6 same stage. fc6 data output layer neurons 4096 fc7 full connection and via relu activation function, and then out through a random size and outputs the processed live data 4096. fc7 full stage 10 is connected to the eighth layer neurons Softmax layer 10 after the training class probability value output target.
  AlexNet network there are two important, first and foremost AlexNet many items used by the network to be set up convolution neural network common default settings, it is still widely used. In addition, it also enables researchers to realize highly optimized GPU 2-dimensional convolution function sufficient to support training on a large high-resolution data sets of large-scale network CNN.
  AlexNet network has a very significant milestone. For on gradient descent training time, a saturated linear function: tanh Sigmoid function or a function of the ratio of unsaturated linear function: long with ReLU function. Activation function AlexNet network using ReLU function, accelerate the training speed neural network. Now ReLU depth network function almost become the default activation function. Use data enhancement strategies and tactics random inactivation (dropout) to prevent over-fitting the training data, it has been drawing a lot of networks. AlexNet network combined with continuous convolution concourse level, and finally the whole way connection layer, remains the foundation of today's most advanced networks.

Guess you like

Origin www.cnblogs.com/xiaokeaia/p/11543922.html