Gradient disappears and explosion

Gradient gradient disappears and explosion:

Gradient gradient disappears and explosion can be explained from the same point of view, the fundamental reason is that the neural network is based on the derivation method chain guide right between neurons function according to weight loss by line update, input neurons through activation function activated, in general, if we choose as the sigmoid activation function:
in general, if the activation function used for the sigmoid function, its derivative is:

In this way we can see, if we use a standardized initial w, then all levels are multiplied by a decimal between 0 and 1, and the derivative of the activation function f is a number between 0 and 1, which even by post, the result will It becomes too small, resulting in the disappearance of the gradient. If we initialize w is a big number, multiplied by the derivative of w as large as the activation function are greater than 1, even after the ride, could lead to significant results derivation of a gradient explosion.

How to solve?

    1. Replacement activation function, such as Relu, Tanh, but also Tanh derivative is less than 1, there may occur a gradient disappears / explosion

Seen from the figure, ReLU derivative function, the value of a constant portion, and therefore does not cause the disappearance of a gradient or gradient explosion.

In addition ReLU function has several advantages:

Convenient calculation, calculation speed
to solve the problem disappear gradient, fast convergence

    1. Phase parameters, to be taken within a range of w, wgan is to do so
    1. Residual connection
    1. BN
    1. Regularization, the penalty parameter project

https://blog.csdn.net/weixin_39853245/article/details/90085307

Guess you like

Origin www.cnblogs.com/zhouyc/p/12505364.html