L1, L2 loss function, Huber loss function

 

L1 norm loss function, also referred to as minimum absolute deviation (the LAD), the minimum absolute error (LAE)

L2 norm loss function, also referred to as the least square error (LSE)

Loss function L2 L1 loss function
Not very robust (robust) Robust
Stable solution Unstable solution
Always a solution Possible multiple solutions

Robustness

The reason why the minimum absolute deviation is robust, because it can handle outliers in the data. If you need to consider any or all of the outlier, the minimum absolute deviation is a better choice.

The L2 norm of the squared error (if the error is greater than 1, the error will be much enlarged), the error model will be far greater than the norm L1, thus this sample would be more sensitive to the model, this model needs to be adjusted to minimize the error. If this is a sample outliers, the model needs to be adjusted to suit individual outliers, which will sacrifice many other types of samples, because these errors are smaller than the normal sample error of this single outliers.

stability

Instability minimum absolute deviation of the method means that for a small fluctuation in the horizontal direction of the data set, great regression line may jump.

Conversely, the least squares method of solution is stable, because any small fluctuations for a data point, the regression line always occurs only slight movement

to sum up

MSE took the square error, if there is an abnormal value, then the MSE is very substantial.

MAE same gradient always updated, even for a very small value, the gradient is also great, you can use varying learning rate. MSE like a lot, with a fixed rate of learning can effectively converge.

In summary, processing outliers, Ll loss function is more stable, but its derivative discontinuities, thus less computational efficiency. L2 loss function is more sensitive to outliers, but 0, can be obtained by more stable closed form solution is allowed to derivative.

Huber

l1 and l2 are the problems:

If the data in 90% of the target value 150 corresponding to the sample, the remaining 10% between 0 and 30.

Then use the loss as a function of the MAE model may ignore 10% of the outliers, and predictive values ​​for all samples are 150, because the model will be according to the median forecast;

Many predicted value of between 0 and 30 will be given by MSE model because the model will shift to outliers.

The easiest way is to transform the target variable in these cases. And the other approach is for a loss of function, which leads to the next point of the third loss function, i.e. Huber loss function.

Huber loss, smoothed mean absolute error

Huber squared error loss no loss of outliers in the less sensitive data.

Essentially, Huber loss is absolute error, but the error is small, it becomes a square error. Errors to a quadratic error becomes more than an hour is controlled by the hyper-parameters δ (delta). When Huber loss between [0-δ, 0 + δ], equivalent to MSE, in [-∞, δ], and [δ, + ∞] when MAE.

Huber loss combines the advantages of MSE and MAE, and more robust to outliers.

 

Guess you like

Origin www.cnblogs.com/pacino12134/p/11104446.html