sklearn of polynomial regression

'' ' 
    Polynomial regression: If you want the regression model to better fit the training sample data, you can use polynomial regression is. 
    Univariate polynomial regression: 
        Mathematical model: y = w0 + w1 * x ^ 1 + w2 * x ^ 2 + .... + wn * x ^ n 
        be seen as an extension to higher-order terms of a term characteristic obtained: 
                Y = W0 + w1 * x1 + w2 * x2 + .... + wn * xn 
        so a polynomial regression That can be seen as a multiple linear regression model can be used LinearRegression sample data model training. 

        Is achieved univariate polynomial regression requires two steps: 
            1. converting a polynomial regression multiple linear regression problem (given only to the maximum degree polynomial). 
            2. The result obtained in step 1 polynomials w1, w2, w3, ..., wn as sample characteristics, to linear regression multivariate linear model is trained. 

    Select the appropriate maximum number of its models score higher than R2 a linear regression model score, if the number is too high, there will be over - fitting, score will be lower than a linear regression score 

        using the "data line" sklearn implementation provided by two steps the order of execution: 
            Import sklearn.pipeline AS PL 
            Import sklearn.preprocessing AS SP 
            Import sklearn.linear_model AS LM

            = pl.make_pipeline Model ( 
                # 10: the highest order of the polynomial 
                sp.PolynomialFeatures (10), # characteristic polynomial expander 
                lm.LinearRegression ()) # linear regressor 

    overfitting and underfitting: 
        1. overfitting: too complex models for training data can get better precision, but precision is generally lower for the test data, this phenomenon is called over-fitting. 
        2. underfitting: too simple model, both for the training data or test data can not be given a sufficiently high prediction accuracy, a phenomenon called underfitting. 
        3. a learning model performance should be acceptable prediction accuracy of training data and test data are close, but the accuracy can not be too low. 
                Training set R2 test set R2 
                    0.3 0.4 underfitting: too simple, the rules do not reflect data 
                    0.9 0.2 Overfitting: too complicated, too special, the lack of general 
                    0.7 0.6 Acceptable: moderate complexity, reflects both the rule data, Without loss of generality while 

    loading data single.txt file, it is based on a polynomial regression algorithm to train the regression model. 
        step:
            Package guide ---> --- read data> Create Polynomial Regression ---> --- prediction model and training> obtained pred_y predicted by the model, a polynomial function of an image rendering 
'' ' 
Import sklearn.pipeline AS PL
 Import sklearn AS LM .linear_model
 Import sklearn.preprocessing AS SP
 Import matplotlib.pyplot AS MP
 Import numpy AS NP
 Import sklearn.metrics SM AS 

# collection data 
X, Y = np.loadtxt ( ' ./ml_data/single.txt ' , DELIMITER = ' , ' , usecols = (0,. 1), the unpack = True)
 # input becomes a two-dimensional array, as this line, a feature of an 
X x.reshape = (-1,. 1 ) 

# create models 
model =pl.make_pipeline ( 
    sp.PolynomialFeatures ( 10),   # polynomial expansion device characterized 
    lm.LinearRegression ()   # linear regressor 
)
 # training model 
model.fit (X, Y)
 # determine a predicted value Y 
pred_y = model.predict (X) 

# model evaluation 
Print ( ' mean absolute error: ' , sm.mean_absolute_error (Y, pred_y))
 Print ( ' mean squared error: ' , sm.mean_squared_error (Y, pred_y))
 Print ( ' median absolute error: ' , sm.median_absolute_error (Y, pred_y))
 Print ( ' R2 score:' , Sm.r2_score (Y, pred_y)) 

# draw polynomial regression line 
PX = np.linspace (x.min (), x.max (), 1000 ) 
PX = px.reshape (-1,. 1 ) 
pred_py = Model. Predict (PX) 

# drawn image 
mp.figure ( " Poly Regression " , facecolor = ' LightGray ' ) 
mp.title ( ' Poly Regression ' , fontSize = 16 ) 
mp.tick_params (labelsize = 10 ) 
mp.grid (lineStyle = ' : ' ) 
mp.xlabel ( ' X ' )
mp.ylabel ( ' Y ' ) 

mp.scatter (X, Y, S = 60, marker = ' O ' , C = ' DodgerBlue ' , label = ' Points ' ) 
mp.plot (PX, pred_py, C = ' OrangeRed ' , label = ' polyfit Line ' ) 
mp.tight_layout () 
mp.legend () 
mp.show () 


output: 
mean absolute error: 0.4818952136579405 
mean square error: 0.35240714067500095 
median absolute error: 0.47265950409692536 
R2 score: 0.7868629092058499

  

Guess you like

Origin www.cnblogs.com/yuxiangyang/p/11183409.html