SIGAI depth study of eighth convolutional neural network 2

Teaches Lenet, Alexnet, VGGNet, GoogLeNet other classic convolutional neural network, the Inception module, small scale convolution kernel 1x1-convolution kernel convolution implemented using deconvolution layer visualization.

Outline:

LeNet network

AlexNet network

VGG network

GoogLeNnet network

Deconvolution Visualization

Mathematical properties

The reconstructed image convolution result

This episode summary

LeNet network:

LeNet-5 network is Y.LeCun1998年提出来的, now revered as the father of Y.LeCun convolution neural networks, and later he went to the AI ​​lab of Facebook.

This network is the convolution of the first widely spread, a very small scale, but perfectly formed (convolution, pooling, are fully connected layers), for recognition of handwritten characters, using a standard convolution layer, pool layer, fully connected layer structure, then various design convolution networks have borrowed its ideas.

Ten years between the method proposed in this article has not been using large-scale and widespread attention, was the SVM, AdaBoost advantage, after LeNet appeared from 1998 to 2012 AlexNet appear inside, convolution nerve the network has not been very good development.

LeNet network structure:

MINIST data sets, convolution of two layers, two layers pooled, several fully connected layers, an input image gray image of 32 × 32, a first convolution kernel convolution layer size of 5 × 5 six groups convolution kernel, then the first layer of the convolution of the image 6 (6 channels) 28 × 28, and then through the cell layer becomes 2 × 2 image of 14 × 14 6.

The second layer is a convolution convolution kernel 16 group, is the convolution kernel size of 5 × 5, speaking in front of a multichannel convolutional time is in accordance with conventional practice, upper pool layer output channels 6, 16 which group convolution kernel, each channel should be 6 inputs and 6 respectively convolution channels together, but there is no doing so, the 0-th convolution only three front channels convolution, convolution kernel behind They are different, as in FIG.

 

 

 After completion of the convolution 16 images (16 channels) 10 × 10, and then a second layer pooled downsampling 16 to obtain the image of 5 × 5.

Then the whole connection layer, the image 16 in a 5 × 5 word swing, all the pixel pattern spliced ​​into a vector, as a fully connected network input layer, followed by a 120 behind neuron fully connected layers, and then followed by 84 is a fully connected neural layer, a final output layer 10 neurons, i.e. 10 class number.

Each convolution layer, fully connected layers with a role activation function, uniform application activation function tanh function, loss function using the Euclidean distance (Euclidean distance before multi-function loss, and the other behind the use of cross-entropy loss function). Training time for training gradient descent method, the label values ​​of the samples (ten categories) using the one-hot encoded form.

AlexNet network:

As the computing power (GPU is not used to make large-scale machine learning to use) and the number of training samples (no digital cameras and mobile phones) limits the number of layers increases network problems caused by the disappearance of gradients and other issues, in 1989 proposed to LeNet convolution neural network between 2012 and did not receive widespread attention and large-scale applications.
Until 2012, Hinton, who designed (××× Net employment is named after, Alex is the name, is a deep neural networks, and LeNet not much different in nature) called AlexNet deep convolution neural network in image classification (ImageNet data set, there are many images and labels, and LeNet is MINIST dataset) success on the task.

Alex Krizhevsky, Ilya Sutskever, Geoffrey E.Hinton. ImageNet Classification with Deep Convolutional Neural Networks.2012

 

 

Convolution kernel is 11 × 11, the input image is an image of 224 × 224 three-channel RGB color image (LeNet input single channel is 32 × 32, a convolution kernel is a 5 × 5), divided into two groups convolution finished convoluted , using the output softmax transform, a convolution kernel becomes gradually smaller. The structure and LeNet compared to levels deepened, the number of neurons increased, a lot more parameters.

AlexNex network of major improvements:

The number of layers deeper, more parameters, larger, more training samples (ImageNet data set, and LeNet is MINIST data set), the use of GPU acceleration.
The real innovation:

① new activation function ReLU function (LeNet is tanh function, sigmoid function is earlier), tanh function is easy to produce saturation gradient disappearing.

ReLU (x) = gradient mitigation disappearing (but not eradicate alleviate, figure simple derivative, a derivative is often many times the multiplicative constant amplitude gradient, gradient disappears max (0, x) to a certain extent more than 1 multiplying the derivative is 0), x = 0 is not turned on and does not affect the overall situation.

②dropout mechanism

Is a regularization technique, at training time, a portion of the pick neurons, such as 128 64 randomly picked neurons neurons do not participate in the training, i.e., data input and output through the same transparent, that is when the backpropagation neurons do not update, just use neural network training time, such a mechanism would not complete the training.

Randomly selecting a portion of training neurons forward propagation and back propagation, some other parameters kept constant the value of neurons, in order to reduce over-fitting.
dropout mechanism so that each neuron in the training only a portion of the sample in the sample set, which corresponds to the sample set sampled, i.e. bagging practice. The resulting combination of multiple neural networks, but this is not a strict interpretation.

Similar mechanisms with bagging machine learning, from the whole large sample set of N samples extracted randomly set out, to train each of weak learners, if a trained between each study are independent of each other by respective training sample set but here is a whole neural network, and can not be seen as more weak learners integration.

This little trick can be achieved very good results, can be reduced through the proposed merger does not eliminate the over-fitting. In 2012 won the first place ahead of the second place, compared to the 2011 championship results raise a lot of points in the classification problem ImageNet one thousand classes, from the convolution neural network has been large-scale attention. ReLU function out very early, the first time convolution neural network is used to ease the gradient disappears non eradicate gradient disappears.

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/wisir/p/11826291.html