【Example】
Suppose there is a correlation between Y and, consider the quadratic regression model
,8 sets of observation data are as follows:
Data preprocessing
Process the data set according to the requirements of the quadratic model to obtain a new data set
Basic grammar
PROC REG data = data set;
MODEL Dependent variable = independent variable list</optional>;
< restrict equality constraints of independent variables;>
SAS code
for processed data
data d1;
input x1-x9 y ;
cards;
38 47.5 23 1444 2256.25 529 1805.00 1805.00 1092.50 66.0
41 21.3 17 1681 453.69 289 873.30 873.30 362.10 43.0
34 36.5 21 1156 1332.25 441 1241.00 1241.00 766.50 36.0
35 18.0 14 1225 324.00 196 630.00 630.00 252.00 23.0
31 29.5 11 961 870.25 121 914.50 914.50 324.50 27.0
34 14.2 9 1156 201.64 81 482.80 482.80 127.80 14.0
29 21.0 4 841 441.00 16 609.00 609.00 84.00 12.0
32 10.0 8 1024 100.00 64 320.00 320.00 80.00 7.6
;
proc print;
run;
proc reg data=d1;
model y=x1-x9
/selection=stepwise
sle=0.05 sls=0.05;
run;
proc reg data=d1;
model y=x4 x7;
run;
quit;
Choose step by step
first step
Step 2
Until all variables remaining in the model have a significance level of 0.0500, while no other variables meet the 0.0500 significance level.
Obtain x4 and x7, corresponding to
Parameter Estimation
So the quadratic regression equation is:
The parameter estimation table not only gives the coefficients of the regression equation, but also gives the results of the test (significant probability p value)
For example, given, if the p-values of the constant term and the independent variable are botha, it means that there is a highly significant contradiction with the regression equation, In order to obtain the optimal regression equation, the least important independent variables should be deleted from the equation, and the regression equation with the remaining independent variables should be re-established and then tested. This is the meaning of variable screening.
variance analysis
Regression sum of squares:
Residual sum of squares:
Mean squared error:
The mean square error is an estimate of the error variance in the model
Test statistic, the significance probability p value is less than, which means that the fitted model is highly significant, and the model explains represents the main part of the total variation in this set of data.
regression statistics
decisive factor:
Complex correlation coefficient:
Estimator of standard deviation: