Relu advantage activation function

Relu advantages:

1, the network can make training faster.

  Compared to the sigmoid, tanh, derivative seek even better, back-propagation process is constantly updated parameters, simply because it is not complicated derivative form.

2, an increase of nonlinear network.

 Non-linear function itself, is added to the neural network may be a non-linear mapping grid fitting.

3, to prevent gradient disappears.

When the value is too large or too small, sigmoid, tanh derivative close to 0, relu saturated non-activation function of this phenomenon.

4, so that the mesh having sparsity.

Since 0 is less than 0 part, only part of the value greater than 0, it is possible to reduce over-fitting.

softmax role:

The output of the neural network becomes a probability distribution.

1, 1 is the sum of the data.

2, negative to positive.

cross entropy

Measure the distance between two probability distributions.

1, the smaller the value, the closer.

2, the larger the value, the greater the distance.

AlexNet advantages:

(1) used successfully as cnn relu activation function, and to verify the effect of the network than in the deeper sigmoid, successfully solved network sigmoid gradient deep diffusion problems.

Use Dropout random ignored part of neurons (2) training, in order to avoid over-fitting model. Although dropout separate discussion paper. But AlexNet to practical use. Through practice it confirmed its effect. In the last few AlexNet it is mainly full link layer uses Dropout.

(3) using the maximum overlap in pooling cnn, cnn commonly had an average pooled. AlexNet the largest pool of all use. Avoid the average pool of blurring effects. And withdrew longer than the size of a small pool of core. Have overlapping coverage between such output and pooled. Enhance the richness of features.

(4) A layer of LRN, create a competitive mechanism of local neuronal activity, such that the respective wherein a relatively large value becomes relatively larger. And other feedback inhibition smaller neurons, enhance the generalization ability of the model.

(5) using CUDA convolution depth training to deepen the network, the use of powerful GPU parallel computing power. When dealing with a large number of matrix operations neural network training. AlexNet using two GTX 580 GPU for training. AlexNet design allows simultaneous communication between Gpu only in certain layers of the network, the performance loss of control communication.

(6) data enhancement. Taken randomly area 224 * 224 size (horizontal rotation and mirroring) of the RGB image data of the PCA process is performed from an original image of 256 * 256, and the main component to make a standard deviation for the Gaussian disturbance 0,1, add noise.

Guess you like

Origin www.cnblogs.com/limingqi/p/12238630.html