Loss function of neural network---study notes

					损失函数

1. What is a loss function?
The first step in any model training is to define the loss function. The model training process is nothing more than optimizing the loss function, so as to find the parameters of the model that minimizes the loss function. The loss
function is to measure the difference between the network output and the real value.
The loss function does not use test data to measure the performance of the network.
The loss function is used to guide the training process. , so that the parameters of the network change in the direction of loss reduction
. Assuming that our neural network is used for classification, the loss function is defined as a cross-entropy loss function. When the output layer of the neural network is softmaxed, the current input is output, corresponding to the probability of each category, Select the category with the highest probability, and finally use the cross-entropy loss function to detect whether the category output by the current neural network is consistent with the category of the real label, so as to reversely adjust the network

2. Absolute error function (Absolute value, L1-norm)
insert image description here

This function is derived, and the gradient obtained is constant, that is,
when our error is large, the gradient obtained is also constant, so it is not sensitive to outliers.
3. Variance function (Square error, Euclidean loss, L2-norm)
insert image description here

当我们的误差很大时,求出来的梯度也会变大的,所以对outliers敏感

Four, cross entropy Corss-entropy-loss
insert image description here

S is the softmax function
K is the number of categories
L is the label one-hot encoding label
Multiply all the probabilities output by softmax with the corresponding one-hot encoding label, and finally calculate the value of the loss function, (only the real label and the corresponding one-hot encoding label probabilities are calculated)
insert image description here

Derivative calculation:

insert image description here
insert image description here

Five, Multi-label classification
insert image description here

The categories are not mutually exclusive, and can belong to multiple categories.
The final output layer does not use softmax, but uses Sigmoid alone. The last is the output probability.
Assume that the final output three values ​​are x1=9, x2=8, x3=6, using Sigmoid for x1, x2, and x3 respectively, the resulting probabilities are 0.6, 0.7, and 0.8, which means that the probability of the input belonging to the three categories is different from softmax. The sum of the probabilities is not is equal to 1.
insert image description here

K等于标签类别的合集
累加属于真实值标签的输出值,加上不属于该标签的输出值。

Guess you like

Origin blog.csdn.net/weixin_43391596/article/details/128157608