PyTorch - activation function

What is an activation function?

  • In neural networks we often use linear operations to solve classification problems, which requires activation functions to solve nonlinear problems

  • 传统的全连接网络是让数据不断的通过线性函数和激活函数层,从而得到最终的预测结果。


Sigmoid function

The sigmoid function is the most classic and earliest used activation function, the formula is as follows:

ρ = 1 1 + e − z \rho = \frac{1}{1+ e^{-z}} r=1+ez1

insert image description here

  • The activation function Sigmoid can be derived everywhere in the domain of definition. When a small or large data is input, the derivative of the function will become very small and the gradient will approach 0.
  • If the gradient value decreases each time, the neural network has many layers. When the gradient passes through many layers, it will gradually approach 0, and the 梯度消失现象model cannot continue to converge. The sigmoid function was widely used before, but it is rarely used now.
import torch
import torch.nn as nn
# 实现方式1
x = torch.tensor([-1.0,1.0,2.0,3.0])
output = torch.sigmoid(x)
print(output)
# 实现方式2
s = nn.Sigmoid()
output = s(x)
print(output)

insert image description here


Tanh activation function

  • Tanh is the hyperbolic tangent function in the hyperbolic function. In mathematics, the hyperbolic tangent function is derived from the hyperbolic sine function and hyperbolic cosine function

  • tanh ( x ) = ex − e − xex + e − x tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}t a n h ( x )=ex +exexex

insert image description here

  • It is very similar to the Sigmoid function, but the output range of the Tanh function is (-1, 1), and the output range of the Sigmoid function is (0, 1).
  • There are also two implementations in PyTorch
import torch
import torch.nn as nn
# 方式1
output = torch.tanh(x)
print(output)
# 方式2
t = nn.Tanh()
output = t(x)
print(output)

insert image description here


ReLU activation function

  • The hyperbolic tangent function and the sigmoid function are similar, and both have the phenomenon of gradient disappearance. There is also power operation in the analytical formula, and the calculation time is also slow.

  • In order to solve the problem of gradient disappearance, Rectified Linear Units (ReLU for short) was invented.

The ReLU function is one of the most commonly used activation functions today:

insert image description here

  • x is a constant. When x<0, all ReLU values ​​are 0, and the gradient is also 0, which reduces the cost of gradient operations.

  • When x>=0, the value of ReLU is x, and the gradient is always a fixed value, which solves the problem of gradient disappearance.
    insert image description here

import torch
import torch.nn as nn
# 方式1
output = torch.relu(x)
print(output)
# 方式2
t = nn.ReLU()
output = t(x)
print(output)

Summarize

  • To solve the binary classification problem, we generally set the last layer as the Sigmoid function layer. Therefore, as can be seen from the Sigmoid function image, the function range is (0,1), which can well represent the probability value.

  • If the activation function needs to be used inside the neural network (hidden layer), the ReLU function or the improvement of the ReLU function is generally used for activation.

  • If it is a binary classification problem, then add a Sigmoid function layer to the last layer of the neural network.

  • If it is a multi-classification problem, then the Softmax function layer will be added to the last layer of the neural network.

  • Softmax function is usually used together with cross entropy function.

Guess you like

Origin blog.csdn.net/bjsyc123456/article/details/124896207