(6) Linear regression algorithm

1. Import the package

import numpy as np
import matplotlib.pylab as plt
from sklearn import datasets
boston = datasets.load_boston()
boston.keys()

 dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])

print(boston.DESCR)

 

2. Get data: No data cleaning is done here

x = boston.data
y = boston.target

x.shape   # (506, 13)
y.shape   # (506,)

3. Data segmentation 

from sklearn.model_selection  import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=666)
from  sklearn.linear_model import LinearRegression

reg = LinearRegression()

4. Model training 

reg.fit(X_train, y_train)

 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

5. Feature parameters, that is, the coefficients of the model in each feature dimension 

# 每一个特征对应的参数
reg.coef_
array([-7.56857766e-02,  4.93306230e-02,  6.85902135e-02,  2.55876122e+00,
       -1.60400649e+01,  4.09692993e+00,  6.55718540e-03, -1.41742836e+00,
        2.92373287e-01, -1.41859462e-02, -9.68019957e-01,  1.16809189e-02,
       -5.33536333e-01])
# 截距
reg.intercept_
32.926954792283404

6. Model accuracy evaluation index: R square

# R Squared(r2 score)
reg.score(X_test, y_test)
0.6336069713055628

R2_score can be understood as the proportion of model fitting characteristics, taking the benchmark (mean) error as a reference.

  • R2_score = 1, the predicted value is exactly the same as the true value without any error;
  • R2_score = 0. At this time, the numerator is equal to the denominator, and the model is the same as the benchmark model;
  • R2_score<0, the foolish model is not as good as the prediction of the average benchmark model. At this time, the data does not have any linear correlation.

Guess you like

Origin blog.csdn.net/qq_29644709/article/details/115055553