Saturated (unsaturated) activation function

1. What is a saturated (unsaturated) activation function

If h(x) satisfies: , then h(x) is called a saturated activation function, such as sigmoid and tanh, otherwise it is a non-saturated activation function, such as Relu and its variants.

2. There are two advantages of the non-saturated activation function

  • Can solve the so-called "vanishing gradient" problem
  • Speed ​​up model convergence

3.ReLU(Rectified Linear Units)

  • Accelerates convergence due to its linear, non-saturated form
  • Compared with the exponential operation in sigmoid and tanh, relu calculation is simple
  • If x>0, the gradient is always 1, which effectively alleviates gradient dispersion and gradient explosion
  • Provides the sparse expression ability of the neural network (relu will make the output of some neurons 0, alleviate the problem of over-dependence between neurons, and alleviate the occurrence of over-fitting problems. Disadvantages
    :
  • Dead ReLU Problem: As the training progresses, the nerves "die", and the gradient flowing through the neurons will always be 0 from this point onwards, causing the weights to be unable to continue to be updated

Guess you like

Origin blog.csdn.net/weixin_54633033/article/details/131968733