git: https://github.com/linyi0604/MachineLearning
When doing linear regression prediction, in order to improve the generalization ability of the model, multiple linear functions are often used to build a model
f = k*x + b A function
f = a*x ^2 + b*x + w quadratic function
f = a*x^3 + b*x^2 + c*x + w cubic function
. . .
Generalization: Make
predictions on untrained data samples.
Underfitting:
The generalization ability of the model is insufficient due to the insufficient fit to the training samples.
Overfitting:
The training samples fit very well, and learned features that are not expected to be learned, resulting in insufficient generalization ability of the model.
Before building a linear regression model with more than one function, it is necessary to generate polynomial features for the default features and then input them into the model
poly2 = PolynomialFeatures(degree=2) # 2nd degree polynomial feature generator
x_train_poly2 = poly2.fit_transform(x_train)
The following simulation predicts the cake price based on the diameter of the cake
1 from sklearn.linear_model import LinearRegression 2 import numpy as np 3 import matplotlib.pyplot as plt 4 5 ''' 6 When doing linear regression prediction, 7 In order to improve the generalization ability of the model, multiple linear functions are often used to build models8 9 f = k*x + b linear function 10 f = a*x^2 + b*x + w quadratic function 11 f = a*x^3 + b*x^2 + c*x + w cubic function 12 . . . 13 14 Generalization: 15 Make predictions on untrained data samples. 16 17 Underfitting: 18 The generalization ability of the model is insufficient due to the insufficient fit to the training samples. 19 20 overfitting: 21 The training samples fit very well, and learned features that are not expected to be learned, resulting in insufficient generalization ability of the model. 22 23 24 Before establishing a linear regression model with more than one linear function, you need to generate polynomial features for the default features and then input them to the model. 25 26 The following simulates the prediction of cake price according to the diameter of the cake. 27 28 ''' 29 30 #The training data of the sample, Features and target values 31 x_train = [[6], [8], [10], [14], [18 ]] 32 y_train = [[7], [9], [13], [17.5], [18 ]] 33 34 #Learning and predicting a linear regression 35 #Learning a linear regression model 36 regressor = LinearRegression () 37 regressor.fit(x_train, y_train) 38 #Draw a linear regression fitting curve 39xx = np.linspace(0, 25, 100) # 0 to 16 uniformly collect 100 points for the x-axis 40 xx = xx.reshape(xx.shape[0], 1 ) 41 yy = regressor.predict(xx) # Calculate the y corresponding to each point 42 plt.scatter(x_train, y_train) # draw the point 43 of the training data plt1, = plt.plot(xx, yy, label= " degree=1 " ) 44 plt.axis([0 , 25, 0, 25 ]) 45 plt.xlabel( " Diameter " ) 46 plt.ylabel( " Price " ) 47 plt.legend(handles= [plt1]) 48 plt.show()
The result of fitting a curve with a linear function is an underfitting case:
The following is the establishment of a quadratic linear regression model for prediction:
1 # 2nd degree linear regression for prediction 2 poly2 = PolynomialFeatures(degree=2) # 2nd degree polynomial feature generator 3 x_train_poly2 = poly2.fit_transform(x_train) 4 #build model prediction 5 regressor_poly2 = LinearRegression () 6 regressor_poly2.fit(x_train_poly2 , y_train) 7 #Draw a 2nd linear regression figure 8 xx_poly2 = poly2.transform(xx) 9 yy_poly2 = regressor_poly2.predict(xx_poly2) 10 plt.scatter(x_train, y_train) 11 plt1, = plt.plot(xx, yy, label= " Degree1 " ) 12plt2, = plt.plot(xx, yy_poly2, label= " Degree2 " ) 13 plt.axis([0, 25, 0, 25 ]) 14 plt.xlabel( " Diameter " ) 15 plt.ylabel( " Price " ) 16 plt.legend(handles= [plt1, plt2]) 17 plt.show() 18 #Output the predicted sample score of the quadratic regression model 19 print ( " Secondary linear model prediction score: " , regressor_poly2.score(x_train_poly2, y_train )) # 0.9816421639597427
Curve fitted by a quadratic linear regression model:
The degree of fit is significantly better than that of a linear fit
The following 4 linear regression models are performed:
1 #Fit a quadratic linear regression model 2 poly4 = PolynomialFeatures(degree=4) # 4th degree polynomial feature generator 3 x_train_poly4 = poly4.fit_transform(x_train) 4 #Build a model to predict 5 regressor_poly4 = LinearRegression () 6 regressor_poly4.fit (x_train_poly4, y_train) 7 #Draw a graph of 2 linear regression 8 xx_poly4 = poly4.transform(xx) 9 yy_poly4 = regressor_poly4.predict(xx_poly4) 10 plt.scatter(x_train, y_train) 11 plt1, = plt.plot( xx, yy, label= " Degree1 " ) 12plt2, = plt.plot(xx, yy_poly2, label= " Degree2 " ) 13 plt4, = plt.plot(xx, yy_poly4, label= " Degree2 " ) 14 plt.axis([0, 25, 0, 25 ]) 15 plt.xlabel( " Diameter " ) 16 plt.ylabel( " Price " ) 17 plt.legend(handles= [plt1, plt2, plt4]) 18 plt.show() 19 #Output the predicted sample score of the quadratic regression model 20 print ( " Quadic linear model prediction score: " , regressor_poly4.score(x_train_poly4, y_train)) # 1.0
The prediction accuracy of the quartic linear model is 100%, but looking at the fitted curve, it is obvious that there is an illogical prediction curve,
Outside the sample point, the prediction may be very inaccurate, which is overfitting