We denote the real data by $Y = \{ y_1, y_2,...,y_n \}$, and $\hat Y = \{ \hat{y}_1, \hat{y}_2,..., \hat{y}_n\}$ indicates the predicted data
1: mean square error (MSE)
$MSE = \frac{1}{n} \sum \limits_{i=1}^{n} (y_i – \hat{y}_i)^2$
2: Root mean squared error RMSE
Root mean square error is the result of root mean square error
$RMSE = \sqrt{ \frac{1}{n} \sum \limits_{i=1}^{n} (y_i – \hat{y}_i)^2 }$
3: mean absolute error (mean absolute error) MAE
$MAE = \frac{1}{n} \sum \limits_{i=1}^{n} \lvert y_i – \hat{y}_i \rvert$
4: Root mean squared logarithmic error RMSLE
$RMSLE = \sqrt{ \frac{1}{n} \sum \limits_{i=1}^{n} (log(\hat{y}_i + 1) – log(y_i + 1) )^2 }$
One of the benefits of using RMSLE:
If the true value is 1000, if the predicted value is 600, then RMSE=400, RMSLE=0.510
If the true value is 1000, if the predicted result is 1400, then RMSE=400, RMSLE=0.336
It can be seen that in the case of the same root mean square error, the predicted value is smaller than the true value, the error is relatively large, that is, the penalty for the small predicted value is larger.
The second benefit of using RMSLE:
The intuitive experience is that when there are a small number of values in the data and the difference between the true value is large, using the log function can reduce the impact of these values on the overall error.
Use sklearn.metrics to calculate:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_absolute_error
x = [1,2,3,4,5]
y = [2,4,3,2,6]
MSE = mean_squared_error(x, y)
RMSE = MSE ** 0.5
MAE = mean_absolute_error(x, y)
MSLE = mean_squared_log_error(x, y)
RMSLE = MSLE ** 0.5
print(MSE, RMSE, MAE, RMSLE)
The result is:
2.0 1.4142135623730951 1.2 0.3768421477956514
refer to: