Evaluation of machine learning algorithms linear regression (simple linear regression)

//2019.08.04
# linear regression algorithm Basics (Linear Regression)
1, linear regression algorithm is a very typical regression problem solving supervised learning algorithm , which has the following characteristics:
(1) typical regression algorithm, can return to solve practical problems; and
(2) a simple idea, easy to implement;
(3) the basis for many powerful nonlinear algorithm model;
(4) the results can have a good explanatory;
(5) contains machine learning many important ideas.
2, the difference between the linear regression and classification continuity and discreteness results in marked thereon, further data on the distribution of the image points, wherein the axes are sample data classification problem , the linear regression must exist a coordinate axis is labeled data (i.e., result values)

1
3, the simple linear regression: refers to a characteristic of the sample data is only one, the linear regression problem becomes a simple linear regression.

2
4, the machine almost all with parameters in learning regression algorithm, the specific mathematics to solve idea is: through the analysis of specific issues, to determine the loss of function or utility function problems; then Solutions mathematical optimization loss function or utility function to determine the parameters required in the size of the model, the model machine learning finally obtained. In the whole of this issue is resolved, the optimization and convex optimization principle were plays a key role.

3
5, for simple linear regression problem, i.e., only a feature data base data sets, for such loss function (here means squared error between the predicted value and the true value) minimum, and to achieve optimum parameters a and b, this particular method is called least squares method, using the least squares method, you can get the best parameters a and b of the calculation formula as follows:

 

图4
6、在简单线性回归中,对于模型参数a和b的求取最后需要尽可能地化简为向量化之间的运算,向量化运算可以大幅度地降低整体运算的计算量,提高整体的运算效率。
向量化运算是机器学习算法中非常重要的思想,它是提高机器学习算法计算效率的非常有效的方法。

 

图5
7、对于线性回归算法的评判标准主要有以下指标:
(1)MSE:均方误差

(2)RMSE:均方根误差

(3)MAE:平均绝对误差

图6
8、在scikitlearn中调用回归问题的三大指标的方法:
#利用sklearn里面的merics模块导出三大函数,直接进行调用计算
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,y_predict))
print(mean_absolute_error(y_test,y_predict))
9、对于不同的评价指标RMSE和MAE两者,它们都与原始数据的y的量纲是相同的,所以也常用来作为不同训练模型的评价指标。
对于这两个指标,由它们的表达式可知,RMSE中存在平方操作,将平方累加后再开方,这样的操作具有放大样本中较大误差的趋势,因此使得RMSE最小更加有意义,因为这样意味着样本中所存在的最大的误差值最小,而MAE主要是所有误差的平均值。
另外,对于我们训练优化的目标函数与RMSE中的函数组成一致,这样有利于使得测试数据中的目标函数值具有变小的趋势。综上所述,我们对于不同的训练模型应该尽可能使会更有意义。

图7
10、对于线性回归,其最好的评价指标是R2,并且sklearn中最终输出的准确度score(),其实际输出也就是所谓的R2,其计算方式如下:

图8
根据R2的计算方式,可以知道不同的线性回归模型的准确度也可以将R2归类到0-1之间,并且随着R2增大,其训练模型的准确度也越来越高。

在sklearn中也可以直接进行调用输出上面的四个回归问题的评价指标MSE,RMSE,MAE以及R2,其实现代码如下:

#利用sklearn里面的merics模块导出三大函数,直接进行调用计算
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score #直接调用库函数进行输出R2
print(mean_squared_error(y_test,y_predict))
print(mean_absolute_error(y_test,y_predict))
print(r2_score(y_test,y_predict))
实现结果如下:


Guess you like

Origin www.cnblogs.com/Yanjy-OnlyOne/p/11299330.html