'' ' Polynomial regression: If you want the regression model to better fit the training sample data, you can use polynomial regression is. Univariate polynomial regression: Mathematical model: y = w0 + w1 * x ^ 1 + w2 * x ^ 2 + .... + wn * x ^ n be seen as an extension to higher-order terms of a term characteristic obtained: Y = W0 + w1 * x1 + w2 * x2 + .... + wn * xn so a polynomial regression That can be seen as a multiple linear regression model can be used LinearRegression sample data model training. Is achieved univariate polynomial regression requires two steps: 1. converting a polynomial regression multiple linear regression problem (given only to the maximum degree polynomial). 2. The result obtained in step 1 polynomials w1, w2, w3, ..., wn as sample characteristics, to linear regression multivariate linear model is trained. Select the appropriate maximum number of its models score higher than R2 a linear regression model score, if the number is too high, there will be over - fitting, score will be lower than a linear regression score using the "data line" sklearn implementation provided by two steps the order of execution: Import sklearn.pipeline AS PL Import sklearn.preprocessing AS SP Import sklearn.linear_model AS LM = pl.make_pipeline Model ( # 10: the highest order of the polynomial sp.PolynomialFeatures (10), # characteristic polynomial expander lm.LinearRegression ()) # linear regressor overfitting and underfitting: 1. overfitting: too complex models for training data can get better precision, but precision is generally lower for the test data, this phenomenon is called over-fitting. 2. underfitting: too simple model, both for the training data or test data can not be given a sufficiently high prediction accuracy, a phenomenon called underfitting. 3. a learning model performance should be acceptable prediction accuracy of training data and test data are close, but the accuracy can not be too low. Training set R2 test set R2 0.3 0.4 underfitting: too simple, the rules do not reflect data 0.9 0.2 Overfitting: too complicated, too special, the lack of general 0.7 0.6 Acceptable: moderate complexity, reflects both the rule data, Without loss of generality while loading data single.txt file, it is based on a polynomial regression algorithm to train the regression model. step: Package guide ---> --- read data> Create Polynomial Regression ---> --- prediction model and training> obtained pred_y predicted by the model, a polynomial function of an image rendering '' ' Import sklearn.pipeline AS PL Import sklearn AS LM .linear_model Import sklearn.preprocessing AS SP Import matplotlib.pyplot AS MP Import numpy AS NP Import sklearn.metrics SM AS # collection data X, Y = np.loadtxt ( ' ./ml_data/single.txt ' , DELIMITER = ' , ' , usecols = (0,. 1), the unpack = True) # input becomes a two-dimensional array, as this line, a feature of an X x.reshape = (-1,. 1 ) # create models model =pl.make_pipeline ( sp.PolynomialFeatures ( 10), # polynomial expansion device characterized lm.LinearRegression () # linear regressor ) # training model model.fit (X, Y) # determine a predicted value Y pred_y = model.predict (X) # model evaluation Print ( ' mean absolute error: ' , sm.mean_absolute_error (Y, pred_y)) Print ( ' mean squared error: ' , sm.mean_squared_error (Y, pred_y)) Print ( ' median absolute error: ' , sm.median_absolute_error (Y, pred_y)) Print ( ' R2 score:' , Sm.r2_score (Y, pred_y)) # draw polynomial regression line PX = np.linspace (x.min (), x.max (), 1000 ) PX = px.reshape (-1,. 1 ) pred_py = Model. Predict (PX) # drawn image mp.figure ( " Poly Regression " , facecolor = ' LightGray ' ) mp.title ( ' Poly Regression ' , fontSize = 16 ) mp.tick_params (labelsize = 10 ) mp.grid (lineStyle = ' : ' ) mp.xlabel ( ' X ' ) mp.ylabel ( ' Y ' ) mp.scatter (X, Y, S = 60, marker = ' O ' , C = ' DodgerBlue ' , label = ' Points ' ) mp.plot (PX, pred_py, C = ' OrangeRed ' , label = ' polyfit Line ' ) mp.tight_layout () mp.legend () mp.show () output: mean absolute error: 0.4818952136579405 mean square error: 0.35240714067500095 median absolute error: 0.47265950409692536 R2 score: 0.7868629092058499