机器学习——性能度量_回归

回归预测误差的绝对值的平均值（mae），回归预测误差的平方的平均值（mse）

from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

def test_mean_absolute_error():
    y_true=[1,1,1,1,1,2,2,2,0,0]
    y_pred=[0,0,0,1,1,1,0,0,0,0]
    
    print('Mean Absolute Error:',mean_absolute_error(y_true,y_pred))
test_mean_absolute_error()

def test_mean_squared_error():
    y_true=[1,1,1,1,1,2,2,2,0,0]
    y_pred=[0,0,0,1,1,1,0,0,0,0]
    
    print('Mean Absolute Error:',mean_absolute_error(y_true,y_pred))
    print('Mean Square Error:',mean_squared_error(y_true,y_pred))
test_mean_squared_error()

SSE=sum((y_actual-y_predict)^2)
同样的数据集情况下，SSE越小，误差越小，模型效果越好
缺点：SSE数值大小本身没有意义，随着样本的增加，sse必然增加
也就是说，不同的数据集的情况下，sse比较没有意义

r^2=1-(sum( (y_actual-Y_predict)^2)/sum((Y_actual-Y_mean)^2))
数字理解:
分母理解为原始数据的离散程度，分子为预测数据和原始数据的误差
二者相除可以消除原始数据离散程度的影响
‘决定系数’是通过数据的变化来表征一个拟合的好坏

理论上取值为（-00,1】，正常取值范围为【0 1】
实际操作中通常会选择拟合较好的曲线计算R2，因此很少出现-00
越接近1，表明方程的变量对y的解释能力越强，这个模型模型对数据拟合的也挺好
越接近0，表明模型拟合的越差
经验值：>0.4,拟合效果好
缺点：数据集的样本越大，r2越大，因此，不同数据集的模型结果比较会有点误差

def test_r2():
    y_true=[1,1,1,1,1,2,2,2,0,0]
    y_pred=[0,0,0,1,1,1,0,0,0,0]
    
    print('Mean Absolute Error:',mean_absolute_error(y_true,y_pred))
    print('Mean Absolute Error:',mean_squared_error(y_true,y_pred))
    print('r2:',r2_score(y_true,y_pred))
test_r2()

#adjusted R2(矫正决定系数)
def test_r2_adjusted():
'''
R^2_adjusted=1-((1-R2)(n-1)/(n-p-1))
n为样本数量，p为特征数量

消除了样本数量和特征数量的影响
'''
    
    y_true=[1,1,1,1,1,2,2,2,0,0]
    y_pred=[0,0,0,1,1,1,0,0,0,0]
    
    print('Mean Absolute Error:',mean_absolute_error(y_true,y_pred))
    print('Mean Absolute Error:',mean_squared_error(y_true,y_pred))
    print('r2:',r2_score(y_true,y_pred))
    r2=r2_score(y_true,y_pred)
    adjusted=1-((1-r2)*(10-1)/(10-0-1))
    print('adjusted:',adjusted)
test_r2_adjusted()

机器学习——性能度量_回归

回归预测误差的绝对值的平均值（mae），回归预测误差的平方的平均值（mse）

猜你喜欢