pytorch (1) - activation function sigmoid, loss of function MSE, CrossEntropyLoss

1. Activate the function

Network called fully connected multilayer perceptron , the multilayer perceptron base unit neurons mimic the human neuronal excitation and inhibition mechanism performs its inputs weighted addition , if exceeding a certain threshold value then the output of the artificial neuron 1 otherwise, output is 0. I.e. the original activation function is a step function . Since, Sigmoid function facilitates derivative, facilitates derivation. (W due to the optimization, it is required to be continuous activation function, it is possible to guide the weight w can be evaluated). So the practical application of the activation function used for the sigmoid function.
Summary activation function characteristic: a continuous smooth function, easy derivation.
Here Insert Picture Description
Here Insert Picture Description
sigmoid function is a step function smooth approximation.

2. loss function

Neural networks function is used to measure the loss of network output and the desired output gap is. When the neural network for a multi-class classification , the commonly used one-hot vector is encoded.

The neural network for handwritten digit recognition MNIST, after each image input network, a 10-dimensional output vector Y, 10-dimensional vector of dimension k can be viewed as a probability that the digital picture is k, then this pictures are most likely to be the output of the maximum dimensions the number corresponding.

Loss function is used to measure the gap, in other words, when the network output and the desired output closer, loss of function value should be smaller. And when the network output == desired output , the loss function should be zero, that is, loss of function having a non-negative . Summary of characteristic points loss function:
(1) the network output and the desired output closer loss function value should be smaller;
(2) loss function having a non-negative
function as long as satisfying the above two constraints can be designed to loss function, in practical applications, a well-designed loss function may lead to a new wave of research. Therefore, the design of the loss of function is an important neural network content.

Herein below summarizes some common loss function, and attach pytorch corresponding instantiation.

2.1 mean squared error loss function

Here Insert Picture Description
Explanation from official website: seeking a network output vector x and the target vector y mean square error between the, difference in each dimension, squaring, summing, averaging. If Reduction = 'SUM' , then only to the summation, averaging no.

Mean squared error loss function using the examples:

loss = nn.MSELoss()
input = torch.randn(3, 10, requires_grad=True)
target = torch.randn(3, 10)
output = loss(input, target)
output.backward()

损失函数输入数据格式要求(3为minibatch的大小):Input: (3, 10),Target: (3, 10)
网络最后一层输出采用sigmoid函数作为激活函数。

2.2交叉熵损失函数

交叉熵概念源于信息论, 用于衡量估计模型概率分布真实概率分布之间的差异,随机变量X~p(n),q(n) 为p(n) 的近似概率分布,则随机变量X与模型 q 之间的交叉熵为:
H ( X , q ) = n p ( n ) l O g q ( n ) H(X,q)=-\sum_np(n)logq(n)
通过数学推导可得,交叉熵=随机变量的熵+真实分布与模型分布的差距
H ( X , q ) = H ( X ) + D ( p q ) H(X,q)=H(X)+D(p||q)

其中, D ( p q ) D(p||q) 为相对熵, H ( X ) H(X) 为随机变量的熵。因为,在同一随机变量的前提下(H(X)相同),真实分布与模型分布的差距(即相对熵 D ( p q ) D(p||q) )越小,则交熵越小。也就是说, 交叉熵满足 损失函数的两条特性。所以可以采用交叉熵作为损失函数。

交叉熵衡量的是两个两个概率分布之间的差距,所以,全连接网络的最后一层输出应该采用softmax() 函数,将输出转化为概率分布;期望输出(标签向量)采用onehot编码。网络最后一层的加权输出向量 x x 通过 s o f t m a x ( ) softmax() 激活后:
s o f t m a x ( z ) k = e x k i e x i softmax(z)_k=\frac{e^{x_k}}{\sum_ie^{x_i} }\\
计算网络输出目标输出之间的交叉熵
L o s s ( z , l a b e l ) = H ( z , l a b e l ) = l o g e x j i e x i Loss(z,label)=H(z,label)=-log\frac{e^{x_j}}{\sum_ie^{x_i} }
其中, j j 为l l a b e l label 中为1 的维度,也就是,待检测目标所属的类别。

因为,上面两个式子可以化简,直接取最后的结果:

L o s s ( x , l a b e l ) = l o g e x j i e x i Loss(x,label)=-log\frac{e^{x_j}}{\sum_ie^{x_i}}

所以,在pytorch 中**CrossEntropyLoss()**函数的输入为:最后一层全连接的加权输出向量(不经过softmax()激活),和期望输出的onehot编码中为1的位置(类别标号,是一个标量,如手写数字识别,真实数字8的标号为7)。
Here Insert Picture Description
交叉差熵损失函数使用例子:

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 10), requires_grad=True) #minibatch=3,即3行,每一行为一个10维向量
target = torch.empty(3, dtype=torch.long).random_(10) # 1行,3列,每一列的数字为0-10之间的整数
output = loss(input, target)
output.backward()

2.3 NLLLoss()

同时,pytorch 还提供了nn.LogSoftmax()与NLLLoss()的组合,实现交叉熵损失函数。此时,网络的最后一层加权输出,要经过nn.LogSoftmax()进行激活;
NLLLoss()使用例子:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
#input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
#each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()

Here Insert Picture Description
Here Insert Picture Description

Documents connected to the official website (English): https://pytorch.org/docs/stable/nn.html

Guess you like

Origin blog.csdn.net/sinat_40624829/article/details/89819040