The function of the activation function is to perform nonlinear mapping on the feature information extracted by the network and provide the ability of nonlinear modeling of the network. Common activation functions include Sigmoid, Tanh, ReLU, LeakyReLU and ELU, etc.
1. Sigmoid activation function
Sigmoid is a common nonlinear activation function that takes real-valued inputs and squeezes them into the (0,1) range. The disadvantage is that when the input value is large or small, the gradient will be close to 0 , causing the gradient disappearance problem. The output of the function is not centered on 0 , and the exponential calculation is complex and more time-consuming. Its calculation formula is as follows:
Sigmoid activation function diagram: