Various activation functions of deep learning network Sigmoid, Tanh, ReLU, Leaky_ReLU, SiLU, Mish

The purpose of the activation function is to provide nonlinearity to the network

Gradient disappears : the gradient is 0, and cannot be backpropagated, resulting in parameters not being updated.
Gradient saturation : As the data changes, the gradient does not change significantly .
Gradient explosion : The gradient becomes larger and larger, unable to converge

The problem of gradient disappearance:
1. The backpropagation link is too long, and gradually decreases after accumulation.
2. The data enters the gradient saturation area

How to solve:
1. Choose the correct activation function, relu, silu
2. BN normalized data
3. resnet shorter backpropagation path
4. LSTM memory network

1、Sigmoid

Function and derivative:
insert image description here
insert image description here
Features : The derivative of the data falling into both ends tends to 0, causing the gradient to disappear, and it is difficult to converge when used in a deep network. Batch normalization with BN can optimize this problem.

2、Tanky

Functions and derivatives:
insert image description here
insert image description here
Features : It is similar to sigmoid, except that the mapping interval is different.

3、ReLU

insert image description here
insert image description here
Features : simple and rude, solve the problem of gradient disappearance, and the derivative of the response interval is 1. The neurons less than 0 are suppressed, causing the network to be sparse, suppressing over-fitting, which is beneficial for the network to learn effective information and speed up the convergence speed.

4、Leaky_ReLU

insert image description here
insert image description here
Features : ** An improvement to relu, there is also a small activation when it is less than 0, to avoid the gradient sawtooth problem. **

5、SiLU(swish)

insert image description here
insert image description here
Features : ** Improvements to relu, smoothing around 0, disadvantages: the introduction of exponential operations increases the amount of calculation. **

6、Mish

insert image description here

Features : ** Similar to silu. **

Guess you like

Origin blog.csdn.net/long630576366/article/details/128854678