1. Numerical stability gradient explosion gradient disappears
2. Make training more stable
f(x) = x
3. QA
- nan is generally caused by dividing by 0; inf means infinity, or infinitesimal
- sigmoid will easily cause the gradient to disappear, because the value of sigmoid is in the range
(0,1)
- The normal distribution is easier to derive
4 * sigmoid(x) - 2
can improve stability, because when near 0, makingf(x)=x
refer to
https://www.bilibili.com/video/BV1u64y1i75a?p=1