机器学习基础(七)之线性回归2

多元线性回归,与单元线性回归相比,所比较的参数更多
附上公式: 在这里插入图片描述
线性的表达式:y = k + k1x1 + k2x2 + k3x3 +…
x_predict:我们要预测的数据集
K:我们所求的参数k的所有元素的集合即向量
y_predict:我们所要求的预测结果
就是我们预测的结果:y_predict = x_predict.dot(K) 即两个向量的乘法,乘起来刚好是线性的表达式有不有

Jupyter Notebook

导入相应的库

import numpy as np
from sklearn import datasets  //sklearn数据集
from sklearn.model_selection import train_test_split  //分割训练和测试集

//这里采用波士顿房产来进行玩
boston = datasets.load_boston()

boston.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

X = boston.data
Y = boston.target

Y
X = X[Y < 50]
Y = Y[Y < 50]

​

X_train,X_test,Y_train,Y_test = train_test_split(X,Y)

X_train.shape
(367, 13)

​
多元线性回归的

此时我们应该对X_train进行一下操作,把第一列加上个全一的
X_train = np.hstack([np.ones((len(X_train),1)),X_train])

X_train.shape
(367, 14)

X_train[:10,:]
array([[1.00000e+00, 6.89900e-02, 0.00000e+00, 2.56500e+01, 0.00000e+00,
        5.81000e-01, 5.87000e+00, 6.97000e+01, 2.25770e+00, 2.00000e+00,
        1.88000e+02, 1.91000e+01, 3.89150e+02, 1.43700e+01],
       [1.00000e+00, 6.39312e+00, 0.00000e+00, 1.81000e+01, 0.00000e+00,
        5.84000e-01, 6.16200e+00, 9.74000e+01, 2.20600e+00, 2.40000e+01,
        6.66000e+02, 2.02000e+01, 3.02760e+02, 2.41000e+01],
       [1.00000e+00, 1.25179e+00, 0.00000e+00, 8.14000e+00, 0.00000e+00,
        5.38000e-01, 5.57000e+00, 9.81000e+01, 3.79790e+00, 4.00000e+00,
        3.07000e+02, 2.10000e+01, 3.76570e+02, 2.10200e+01],
       [1.00000e+00, 4.52700e-02, 0.00000e+00, 1.19300e+01, 0.00000e+00,
        5.73000e-01, 6.12000e+00, 7.67000e+01, 2.28750e+00, 1.00000e+00,
        2.73000e+02, 2.10000e+01, 3.96900e+02, 9.08000e+00],
       [1.00000e+00, 4.41700e-02, 7.00000e+01, 2.24000e+00, 0.00000e+00,
        4.00000e-01, 6.87100e+00, 4.74000e+01, 7.82780e+00, 5.00000e+00,
        3.58000e+02, 1.48000e+01, 3.90860e+02, 6.07000e+00],
       [1.00000e+00, 9.60400e-02, 4.00000e+01, 6.41000e+00, 0.00000e+00,
        4.47000e-01, 6.85400e+00, 4.28000e+01, 4.26730e+00, 4.00000e+00,
        2.54000e+02, 1.76000e+01, 3.96900e+02, 2.98000e+00],
       [1.00000e+00, 4.83567e+00, 0.00000e+00, 1.81000e+01, 0.00000e+00,
        5.83000e-01, 5.90500e+00, 5.32000e+01, 3.15230e+00, 2.40000e+01,
        6.66000e+02, 2.02000e+01, 3.88220e+02, 1.14500e+01],
       [1.00000e+00, 8.37000e-02, 4.50000e+01, 3.44000e+00, 0.00000e+00,
        4.37000e-01, 7.18500e+00, 3.89000e+01, 4.56670e+00, 5.00000e+00,
        3.98000e+02, 1.52000e+01, 3.96900e+02, 5.39000e+00],
       [1.00000e+00, 1.20830e-01, 0.00000e+00, 2.89000e+00, 0.00000e+00,
        4.45000e-01, 8.06900e+00, 7.60000e+01, 3.49520e+00, 2.00000e+00,
        2.76000e+02, 1.80000e+01, 3.96900e+02, 4.21000e+00],
       [1.00000e+00, 1.31100e-02, 9.00000e+01, 1.22000e+00, 0.00000e+00,
        4.03000e-01, 7.24900e+00, 2.19000e+01, 8.69660e+00, 5.00000e+00,
        2.26000e+02, 1.79000e+01, 3.95930e+02, 4.81000e+00]])

//对应上面求K的公式
K= np.linalg.inv(X_train.T.dot(X_train)).dot(X_train.T).dot(Y_train)

coef = K[:1] #截距
coef
array([32.94768575])
传进X_test预测
X_test = np.hstack([np.ones((len(X_test),1)),X_test])

y_predict = X_test.dot(K)

//用R square来进行测试呀
from sklearn.metrics import r2_score

r2_score(Y_test,y_predict)
0.7227007236489184

至此,完成啦,兄弟。。```




​

猜你喜欢

转载自blog.csdn.net/qq_37982109/article/details/87924933