The road to machine learning: python polynomial feature generation PolynomialFeatures underfitting and overfitting

git: https://github.com/linyi0604/MachineLearning 

When doing linear regression prediction, in order to improve the generalization ability of the model, multiple linear functions are often used to build a model

f = k*x + b A function
f = a*x ^2 + b*x + w quadratic function
f = a*x^3 + b*x^2 + c*x + w cubic function
. . .

Generalization: Make
predictions on untrained data samples.

Underfitting:
The generalization ability of the model is insufficient due to the insufficient fit to the training samples.

Overfitting:
The training samples fit very well, and learned features that are not expected to be learned, resulting in insufficient generalization ability of the model.


Before building a linear regression model with more than one function, it is necessary to generate polynomial features for the default features and then input them into the model
  poly2 = PolynomialFeatures(degree=2) # 2nd degree polynomial feature generator
  x_train_poly2 = poly2.fit_transform(x_train)


The following simulation predicts the cake price based on the diameter of the cake


1  from sklearn.linear_model import LinearRegression
 2  import numpy as np
 3  import matplotlib.pyplot as plt
 4  
5  ''' 
6  When doing linear regression prediction,
 7  In order to improve the generalization ability of the model, multiple linear functions are often used to build
 models8  
9  f = k*x + b linear function
 10  f = a*x^2 + b*x + w quadratic function
 11  f = a*x^3 + b*x^2 + c*x + w cubic function
 12  . . .
13  
14  Generalization:
 15 Make      predictions on untrained data samples.
16      
17  Underfitting:
 18      The generalization ability of the model is insufficient due to the insufficient fit to the training samples.
19  
20  overfitting:
21      The training samples fit very well, and learned features that are not expected to be learned, resulting in insufficient generalization ability of the model.
22      
23  
24  Before establishing a linear regression model with more than one linear function, you need to generate polynomial features for the default features and then input them to the model.
 25  
26  The following simulates the prediction of cake price according to the diameter of the cake.
 27   
28  ''' 
29  
30  #The training data of the sample, Features and target values 
​​31 x_train = [[6], [8], [10], [14], [18 ]]
 32 y_train = [[7], [9], [13], [17.5], [18 ]]
 33  
34  #Learning and predicting a linear regression 
35  #Learning a linear regression model 
36 regressor = LinearRegression ()
 37  regressor.fit(x_train, y_train)
 38  #Draw a linear regression fitting curve 
39xx = np.linspace(0, 25, 100)    # 0 to 16 uniformly collect 100 points for the x-axis 
40 xx = xx.reshape(xx.shape[0], 1 )
 41 yy = regressor.predict(xx)   # Calculate the y corresponding to each point 
42 plt.scatter(x_train, y_train)    # draw the point 
43 of the training data plt1, = plt.plot(xx, yy, label= " degree=1 " )
 44 plt.axis([0 , 25, 0, 25 ])
 45 plt.xlabel( " Diameter " )
 46 plt.ylabel( " Price " )
 47 plt.legend(handles= [plt1])
 48 plt.show()

The result of fitting a curve with a linear function is an underfitting case:

 

The following is the establishment of a quadratic linear regression model for prediction:

1  # 2nd degree linear regression for prediction 
2 poly2 = PolynomialFeatures(degree=2)     # 2nd degree polynomial feature generator 
3 x_train_poly2 = poly2.fit_transform(x_train)
 4  #build model prediction 
5 regressor_poly2 = LinearRegression ()
 6  regressor_poly2.fit(x_train_poly2 , y_train)
 7  #Draw a 2nd linear regression figure 
8 xx_poly2 = poly2.transform(xx)
 9 yy_poly2 = regressor_poly2.predict(xx_poly2)
 10  plt.scatter(x_train, y_train)
 11 plt1, = plt.plot(xx, yy, label= " Degree1 " )
 12plt2, = plt.plot(xx, yy_poly2, label= " Degree2 " )
 13 plt.axis([0, 25, 0, 25 ])
 14 plt.xlabel( " Diameter " )
 15 plt.ylabel( " Price " )
 16 plt.legend(handles= [plt1, plt2])
 17  plt.show()
 18  #Output the predicted sample score of the quadratic regression model 
19  print ( " Secondary linear model prediction score: " , regressor_poly2.score(x_train_poly2, y_train ))      # 0.9816421639597427

Curve fitted by a quadratic linear regression model:

The degree of fit is significantly better than that of a linear fit

 

The following 4 linear regression models are performed:

1  #Fit a quadratic linear regression model 
2 poly4 = PolynomialFeatures(degree=4)     # 4th degree polynomial feature generator 
3 x_train_poly4 = poly4.fit_transform(x_train)
 4  #Build a model to predict 
5 regressor_poly4 = LinearRegression ()
 6  regressor_poly4.fit (x_train_poly4, y_train)
 7  #Draw a graph of 2 linear regression 
8 xx_poly4 = poly4.transform(xx)
 9 yy_poly4 = regressor_poly4.predict(xx_poly4)
 10  plt.scatter(x_train, y_train)
 11 plt1, = plt.plot( xx, yy, label= " Degree1 " )
 12plt2, = plt.plot(xx, yy_poly2, label= " Degree2 " )
 13 plt4, = plt.plot(xx, yy_poly4, label= " Degree2 " )
 14 plt.axis([0, 25, 0, 25 ])
 15 plt.xlabel( " Diameter " )
 16 plt.ylabel( " Price " )
 17 plt.legend(handles= [plt1, plt2, plt4])
 18  plt.show()
 19  #Output the predicted sample score of the quadratic regression model 
20  print ( " Quadic linear model prediction score: " , regressor_poly4.score(x_train_poly4, y_train))     # 1.0

The prediction accuracy of the quartic linear model is 100%, but looking at the fitted curve, it is obvious that there is an illogical prediction curve,

Outside the sample point, the prediction may be very inaccurate, which is overfitting

 

 

 





Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325073217&siteId=291194637