Numerical stability gradient explosion gradient disappearance + model initialization and activation function hands-on deep learning v2 pytorch

1. Numerical stability gradient explosion gradient disappears

insert image description here

insert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

2. Make training more stable

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

f(x) = x

insert image description here
insert image description here

3. QA

  1. nan is generally caused by dividing by 0; inf means infinity, or infinitesimal
  2. sigmoid will easily cause the gradient to disappear, because the value of sigmoid is in the range(0,1)
  3. The normal distribution is easier to derive
  4. 4 * sigmoid(x) - 2can improve stability, because when near 0, makingf(x)=x

refer to

https://www.bilibili.com/video/BV1u64y1i75a?p=1

Guess you like

Origin blog.csdn.net/zgpeace/article/details/123932629