Causes and solutions of exploding and vanishing gradients

Article directory

1. Reason:

  • The root cause of gradient disappearance and gradient explosion is that during the backpropagation process, when calculating using the chain rule, the cumulative multiplication effect causes the gradient to be too large or too small.
  • The main reasons are:

1) Activation function: such as sigmoid or tanh activation function, when the input or output is large, it is easy to cause the gradient to disappear. 2)
Inappropriate parameter initialization strategy: if the weight initialization is too large or too small, it is easy to cause the gradient to disappear and explode.
3) The number of network layers is too deep: When the number of network layers increases, gradients will gradually accumulate during the backpropagation process, which may cause gradients to disappear and explode.

2. Solution

1) Choose a more appropriate activation function, such as ReLU
2) Choose an appropriate weight initialization strategy, such as Xavier, He initialization
3) Use the BN layer to normalize the distribution of the input of each layer
4) Use the residual network: Yes While deepening the network layers, alleviate the vanishing gradient problem
5) Use gradient clipping: prevent gradient explosion
6) Use a more appropriate optimizer, such as Adam, etc.

Guess you like

Origin blog.csdn.net/m0_48086806/article/details/132336725