8. Machine learning sklearn---polynomial regression (non-linear fitting of the relationship between house prices and house size)

1. Basic Concepts

Polynomial regression is a regression analysis method that studies the polynomial relationship between a dependent variable and one or more independent variables. If there is only one independent variable, it is called univariate polynomial regression; if there are multiple independent variables, it is called multivariate polynomial regression. 

        

1. In the univariate regression analysis, if the relationship between the dependent variable y and the independent variable x is non-linear, but cannot find an appropriate function curve to fit, then a univariate polynomial regression can be used. 

2. The biggest advantage of polynomial regression is that the measured point can be approximated by increasing the high-order term of x until it is satisfied. 

3. In fact, polynomial regression can handle quite a class of nonlinear problems, and it plays an important role in regression analysis because any function can be approximated by a polynomial piecewise.


2. Examples

We have performed linear regression based on the known house transaction price and the size of the house before, and then we can predict the transaction price of the known house size and the unknown house transaction price. However, in practical applications, such The fit is often not good enough, so we perform a polynomial regression on this dataset here.

Goal: Establish a polynomial regression equation for house transaction information, and predict house prices based on the regression equation 


import matplotlib.pyplot as plt
 import numpy as np
 from sklearn import   linear_model #Import
 linear model and polynomial feature construction module
 from sklearn.preprocessing import   PolynomialFeatures

datasets_X =[]
datasets_Y =[]
fr = open ( 'prices.txt' , 'r' )
 # Read the entire file at once.
lines =fr.readlines() #Operate
 line by line, loop through all data
 for line in lines:
     #Remove commas in data files
     items =line.strip().split( ',' )
     #Convert the read data Int type , and write datasets_X and datasets_Y respectively .
    datasets_X.append( int (items[ 0 ]))
    datasets_Y.append( int (items[ 1 ]))
 # Find the length of
 datasets_X , which is the total number of data. length = len (datasets_X)
 #Convert datasets_X into an array and turn it into two dimensions to meet the input parameter requirements of the linear regression fitting function
 datasets_X = np.array(datasets_X).reshape([length , 1 ] )
 #Convert datasets_Y for the array
 datasets_Y=np.array(datasets_Y)

minX =min(datasets_X)
maxX = max (datasets_X)
 #With the maximum and minimum values ​​of datasets_X as the range, establish an arithmetic sequence to facilitate subsequent drawing.
X=np.arange(minX , maxX).reshape([- 1 , 1 ])
 #degree=2 means to establish the quadratic polynomial feature X_poly of datasets_X .
poly_reg =PolynomialFeatures( degree = 2 )
X_ploy =poly_reg.fit_transform(datasets_X)
lin_reg_2=linear_model.LinearRegression()
lin_reg_2.fit(X_ploy,datasets_Y)

#View regression equation coefficients
 print ( 'Cofficients:' , lin_reg_2.coef_) #View
 regression equation intercept
 print ( 'intercept' , lin_reg_2.intercept_)
plt.scatter(datasets_X,datasets_Y,color='red')
plt.plot(X,lin_reg_2.predict(poly_reg.fit_transform(X)),color='blue')
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()
operation result:
Cofficients: [0.00000000e+00 4.93982848e-02 1.89186822e-05]
intercept 151.8469675050044

通过多项式回归拟合的曲线与 数据点的关系如下图所示。依据该 多项式回归方程即可通过房屋的尺 寸,来预测房屋的成交价格。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325565813&siteId=291194637