1. Basic Concepts
Polynomial regression is a regression analysis method that studies the polynomial relationship between a dependent variable and one or more independent variables. If there is only one independent variable, it is called univariate polynomial regression; if there are multiple independent variables, it is called multivariate polynomial regression.
1. In the univariate regression analysis, if the relationship between the dependent variable y and the independent variable x is non-linear, but cannot find an appropriate function curve to fit, then a univariate polynomial regression can be used.
2. The biggest advantage of polynomial regression is that the measured point can be approximated by increasing the high-order term of x until it is satisfied.
3. In fact, polynomial regression can handle quite a class of nonlinear problems, and it plays an important role in regression analysis because any function can be approximated by a polynomial piecewise.
2. Examples
We have performed linear regression based on the known house transaction price and the size of the house before, and then we can predict the transaction price of the known house size and the unknown house transaction price. However, in practical applications, such The fit is often not good enough, so we perform a polynomial regression on this dataset here.
Goal: Establish a polynomial regression equation for house transaction information, and predict house prices based on the regression equation
import matplotlib.pyplot as plt import numpy as np from sklearn import linear_model #Import linear model and polynomial feature construction module from sklearn.preprocessing import PolynomialFeatures datasets_X =[] datasets_Y =[] fr = open ( 'prices.txt' , 'r' ) # Read the entire file at once. lines =fr.readlines() #Operate line by line, loop through all data for line in lines: #Remove commas in data files items =line.strip().split( ',' ) #Convert the read data Int type , and write datasets_X and datasets_Y respectively . datasets_X.append( int (items[ 0 ])) datasets_Y.append( int (items[ 1 ])) # Find the length of datasets_X , which is the total number of data. length = len (datasets_X) #Convert datasets_X into an array and turn it into two dimensions to meet the input parameter requirements of the linear regression fitting function datasets_X = np.array(datasets_X).reshape([length , 1 ] ) #Convert datasets_Y for the array datasets_Y=np.array(datasets_Y) minX =min(datasets_X) maxX = max (datasets_X) #With the maximum and minimum values of datasets_X as the range, establish an arithmetic sequence to facilitate subsequent drawing. X=np.arange(minX , maxX).reshape([- 1 , 1 ]) #degree=2 means to establish the quadratic polynomial feature X_poly of datasets_X . poly_reg =PolynomialFeatures( degree = 2 ) X_ploy =poly_reg.fit_transform(datasets_X) lin_reg_2=linear_model.LinearRegression() lin_reg_2.fit(X_ploy,datasets_Y) #View regression equation coefficients print ( 'Cofficients:' , lin_reg_2.coef_) #View regression equation intercept print ( 'intercept' , lin_reg_2.intercept_) plt.scatter(datasets_X,datasets_Y,color='red') plt.plot(X,lin_reg_2.predict(poly_reg.fit_transform(X)),color='blue') plt.xlabel('Area') plt.ylabel('Price') plt.show()
operation result:
Cofficients: [0.00000000e+00 4.93982848e-02 1.89186822e-05] intercept 151.8469675050044
通过多项式回归拟合的曲线与 数据点的关系如下图所示。依据该 多项式回归方程即可通过房屋的尺 寸,来预测房屋的成交价格。