1. What is a saturated (unsaturated) activation function
If h(x) satisfies: , then h(x) is called a saturated activation function, such as sigmoid and tanh, otherwise it is a non-saturated activation function, such as Relu and its variants.
2. There are two advantages of the non-saturated activation function
- Can solve the so-called "vanishing gradient" problem
- Speed up model convergence
3.ReLU(Rectified Linear Units)
- Accelerates convergence due to its linear, non-saturated form
- Compared with the exponential operation in sigmoid and tanh, relu calculation is simple
- If x>0, the gradient is always 1, which effectively alleviates gradient dispersion and gradient explosion
- Provides the sparse expression ability of the neural network (relu will make the output of some neurons 0, alleviate the problem of over-dependence between neurons, and alleviate the occurrence of over-fitting problems. Disadvantages
: - Dead ReLU Problem: As the training progresses, the nerves "die", and the gradient flowing through the neurons will always be 0 from this point onwards, causing the weights to be unable to continue to be updated