Summary of the loss function (continuously updated)

In the interview, I was asked by the boss what loss functions I know about? ? ? Instantly stunned. .

Now summarize several common loss functions:

(1)0-1损失:L = L(Y, f(x)) = 1 if ( Y != f(x)) else 0

01 Loss feels that it is rarely used, but it is only to judge whether the function value is GT, and this loss function is non-convex, and it can count how many samples are wrongly predicted.

Errata: The 01 loss appears in SVM, but because the 01 error is non-convex and the discontinuous mathematical properties are not good, the objective function is not easy to solve, so the Hingeloss is generally used instead, which is called the substitution loss. Generally, the substitution loss is a convex continuous function And it is the last session of 01 loss (drawing proof)


(2) The squared loss is also the mean squared error MSE:

L(Y, f(x)) = (Y - f(x)) ^ 2 

This function is most commonly used. There are some problems: when the predicted value deviates greatly, the loss will be large (when the loss > 1, the square will be very large, otherwise when the loss < 1, the square will be small), and the weight of each sample is the same (equalization). However, in our own project, we need such attributes (that is, for the deviation of different parts, we can get loss of different magnitudes - "Hip-based center point deviation, the arms and legs have a large range of motion, so it will lead to a large loss of the final predicted position. , then the network will pay more attention to the hands and legs, while the head and spine have a smaller range of motion and are closer to the hip, and the loss is smaller, and the gradient during backpropagation is not large, so it is differentiated from the hands and legs).

(3) Absolute loss function:

L(Y,f(x) ) = |Y - f(x)|

This loss function is also used less, but the final loss obtained by the goose in the RPN network is Smooth l1 loss. That is, the loss function in |x| < 1 part is 1/2 |x|^2, in |x| > 1 Part of the loss function is |x| , the function image drawn in this way is said to be more robust to outliers (my personal understanding is that the gradient obtained by derivation of L1loss is (+/- )f'(x), while for The L2 loss is derived to obtain the gradient as (Y- f(x)) * f'(x)), that is, when f(x) is very strange, the L1loss gradient update is only related to f'(x), and the final predicted value It doesn't matter. When there are outliers, the derivative of L2loss is easy to explode. It can be said that the L1 loss can avoid the gradient explosion caused by outliers.

(4) Logarithmic loss or log-likelihood loss function

L(Y , P(Y|X)) = -log P(Y|X)

This loss function is the objective function in LR.



Finally, a multi-task loss that is pretentious:

It appears most in object detection tasks. For example, the final loss in Yolo is the score loss including label prediction, as well as the IOU loss, and the h and w losses of BB. My personal understanding of multi-task loss is that multiple losses are linked together using some hyperparameters. There is some reason for this that when the optimized parameters are finally derived, the irrelevant items are directly discarded.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325182409&siteId=291194637