Activation function summary (sigmoid, tanh, RELU, LRel, PReLU, ELU, GELU and SELUS)

sigmoid function

Here Insert Picture Description

Features: function value is between (0,1), x between negative infinity and positive infinity.
Disadvantages:
1, saturated region, soft saturation, the large positive and negative numbers as input when the gradient becomes zero, so that the neurons can not substantially updated.
2, only the positive output (not a zero-centered), which led to the phenomenon of so-called zigzag
Here Insert Picture Description
3, computationally intensive (exp)

tanh (x)

Mind function values between (1,1). tanh is the sigmoid function and a certain relationship, it can be seen from the formula, their shape is the same, but different scales and ranges. tanh is a zero-centered, but still saturated.
Here Insert Picture Description
Here Insert Picture Description

RELU

Here Insert Picture Description
CNN commonly used. Is output to the positive, negative, zero directly. In the unsaturated positive, negative in hard saturation . On relu than sigmoid calculation or computation province tanh more, because you do not exp, so convergence faster . It can help solve the sigmoid gradient layers with increasing attenuation phenomenon . But still non-centered ZERO .

RELU known in the negative region kill phenomenon dead relu, this case, someone with a number slightly greater than zero, such as neuronal 0.01 initialized by the initialization time so that RELU prefer to die rather than activation, but this The method is valid controversial.
References: Deep Sparse Recti fi er Neural Networks
function value between
Here Insert Picture Description
Here Insert Picture Description

LREL

Here Insert Picture Description
Here Insert Picture Description
In order to solve the above-mentioned dead ReLU phenomenon. Pick a number, let the negative area is not saturated died. The slope here is to determine the.
References: Recti fi er Nonlinearities Improve Neural Network Acoustic Models

necessary

f (x) = max (ax , x), but this is a not fixed, but may be learning .

LIVING

Here Insert Picture Description
Relu have the advantage, and the average output close to zero, in fact prelu and LeakyReLU have this advantage. There are negative saturation region, so that there are robust to noise. It can be seen as something between a between relu and LeakyReLU. Of course, this function may also need to calculate exp, so the larger the amount of calculation.

GELU

In the neural network modeling process, it is important that the nonlinear nature of the model, while the generalization to the model, the need for random regular, e.g. Dropout (output of the random number is set to 0, it is actually a disguised form of random nonlinear activation ), while regular and random nonlinear activation are two separate things, and in fact, enter the model is non-linear and random activation of both regular common decision Here Insert Picture DescriptionHere Insert Picture Description

TELUS

SNN spiking neural network has a mapping g: a mapping layer to the mean and variance of another layer, then this function is self-normalizing. SNN little variance, more robust to disturbances, and learn faster.
Here Insert Picture Description
References: Self-NormalizingNeuralNetworks

Released eight original articles · won praise 0 · Views 298

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/104618786