Explanation of Sigmoid, Tanh, ReLU, LeakyReLU functions in the activation function module of PyTorch foundation (with source code)

If you need the source code, please like and follow the collection and leave a private message in the comment area~~~

Activation functions are an important part of neural networks. In a multi-layer neural network, there is a functional relationship between the output of the upper layer nodes and the input of the lower layer nodes. If we set this function as a nonlinear function, the expressive ability of the deep network will be greatly improved, and almost any function can be approximated. Here, we call these nonlinear functions activation functions. The role of the activation function is to provide the network with nonlinear modeling capabilities.

1. Sigmoid function

The Sigmoid function refers to a type of S-shaped curve function, which is a saturated function at both ends. The Sigmoid function is the most widely used type of activation function, which is closest to biological neurons in the physical sense.

Since its output is between (0,1), it can also be expressed as a probability or used as a normalization of the input, that is, with a "squeeze" function

Sigmoid function image and formula

torch.sigmoid(): function or method torch.nn.Sigmoid(): network layer torch.nn.functional.sigmoid(): layer method, used in forward

 

The Sigmoid function is a good explanation of whether the neuron is activated and passed back when it is stimulated. When the value is close to 0, it is almost not activated, and when the value is close to 1, it is almost completely activated.

The disadvantage of the sigmoid function is that the use of the sigmoid function is prone to gradient disappearance, and even a small probability of gradient explosion.

The analytical formula contains a power function, and the computer is time-consuming to solve it. For a relatively large-scale network, it will greatly increase the time for network training.

The output of sigmoid is not zero-mean, which will cause the input of neurons in the later layer to be a signal with non-zero mean, which will affect the gradient and make the convergence slow

code show as below

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10,10)
y_sigmoid = 1/(1+np.exp(-x))
y_tanh = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

fig = plt.figure()
# plot sigmoid
ax = fig.add_subplot()
ax.plot(x,y_tanh)
ax.grid()
ax.set_title('Sigmoid')
plt.show()

2. Tanh function

 The tanh function is a deformation of the sigmoid function, and the relationship between the two is tanh⁡(x)=2sigmoid(2x)-1

Tanh function image and formula

Map the output value between (-1,1), thus solving the non-zero mean problem of the sigmoid function

The tanh function also has disadvantages, that is, it also has the problem of gradient disappearance and gradient explosion

Exponentiation can also take a long time to compute

In order to prevent the occurrence of saturation, a step of batch normalization can be added before the activation function, so as to ensure that the input of the neural network has a 0 center distribution with a small mean value in each layer as much as possible.

code show as below

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10,10)
y_sigmoid = 1/(1+np.exp(-x))
y_tanh = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

fig = plt.figure()
ax = fig.add_subplot()
ax.plot(x,y_tanh)
ax.grid()
ax.set_title('Tanh')
plt.show()

3. ReLU function

 Relu is the abbreviation of The Rectified Linear Unit. Compared with the sigmoid and tanh functions, the Relu function greatly promotes the convergence speed of stochastic gradient descent.

The Relu function is a popular activation function in recent years and is currently very commonly used in the field of deep learning

ReLU function image and formula

 There is no exponential operation part in the ReLU function, and there is almost no calculation amount

Fast convergence, simple calculation, biological rationality with unilateral inhibition and wide excitation boundary, which can alleviate the problem of gradient disappearance

The disadvantage is that it is sometimes fragile and may lead to the death of neurons. For example, after a large gradient passes through the Relu unit, the weight update result may be 0, after which it will never be activated again

code show as below

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10,10)
y_sigmoid = 1/(1+np.exp(-x))
y_tanh = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

fig = plt.figure()
ax = fig.add_subplot()
y_relu = np.array([0*item  if item<0 else item for item in x ])
ax.plot(x,y_relu)
ax.grid()
ax.set_title('ReLu')
plt.show()

 4. LeakyReLU function

In the formula, γ is a small negative gradient value

LeakyRelu function image and formula

LeakyReLU solves the problem that some ReLUs may kill neurons. It assigns a non-zero slope to all non-negative values ​​to ensure that the negative axis is not zero and ensures the existence of information on the negative axis, thus solving the problem of some neurons dying 

However, in actual use, the LeakyRelu function is not always better than the Relu function

Generally speaking, it is rare to use various activation functions in a network at the same time

You can try the Relu function first. If the effect is not good, you can try the LeakyRelu function, tanh function, etc. It is best not to use the sigmoid function lightly. All in all, the use of the activation function needs to be analyzed according to the specific model, try different activation functions, and finally select the one with the best effect, and analyze the specific problems in detail, and cannot be generalized

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10,10)
y_sigmoid = 1/(1+np.exp(-x))
y_tanh = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

fig = plt.figure()
ax = fig.add_subplot()
y_relu = np.array([0.2*item  if item<0 else item for item in x ])
ax.plot(x,y_relu)
ax.grid()
ax.set_title('Leaky ReLu')
plt.show()

 It's not easy to create and find it helpful, please like, follow and collect~~~

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/130441213