Learn mathematical modeling algorithms and applications [Regression Analysis]

Insert image description here
The first half is basic knowledge, and the second half is example analysis and MATLAB practice.

Univariate linear regression

Mathematical models and definitions

Insert image description here
Insert image description here
In other words, it is necessary to minimize the sum of squares of the vertical distances from all sample points to the sample regression line.

Model parameter estimation

Insert image description here
Insert image description here
Insert image description here

Inspection, prediction and control

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Linearizable univariate nonlinear regression

Insert image description here
Insert image description here
This is a nonlinear regression or curve regression problem (a curve needs to be matched). The
general method of matching a curve is:
first conduct n tests on two variables x and y and observe (x;, y;), i=12,...,n Draw a scatter diagram and determine the type of curve to be matched based on the scatter diagram. Then determine the unknown parameters and b of each type of curve from n pairs of test data. The method used is to convert nonlinear regression into linear regression through variable substitution, that is The method of nonlinear regression and linearization is used.
Insert image description here
The difference between correlation analysis and regression analysis
is to analyze whether there is a correlation and what is the degree of correlation; regression analysis is to construct table variables and specific functional formulas between variables, and use functions or equations to fit them .
Significance level: estimates the overall probability of making a mistake when the overall parameter falls within a certain range, generally 5%

multiple linear regression

Regression command analysis and examples
1. Multiple linear regression
2. Polynomial regression
3. Nonlinear regression
4. Stepwise regression

1. Mathematical models and definitions

Insert image description here

2. Model parameter estimation

Insert image description here
Insert image description here

3. Testing and prediction in multiple linear regression

Insert image description here
Insert image description here

4. Stepwise regression analysis

The "optimal" regression equation is a regression equation that includes all variables that have an impact on Y, excluding
variables that have an insignificant impact on Y. There are several ways to select the "optimal" regression equation:
(1) From all variables Select the optimal one from the regression equation of possible factor (variable) combinations;
(2) Eliminate non-significant factors one by one from the regression equation containing all variables;
(3) Start with one variable and introduce the variables into the equation one by one;
(4) ) "In and out" stepwise regression analysis. The fourth method, stepwise regression analysis, is more ideal in screening variables
.

The idea of ​​the stepwise regression analysis method
: • Starting from an independent variable, depending on the significance of the independent variable Y, introduce the regression equation one by one
from small. • When the introduced independent variable becomes due to the introduction of subsequent variables When it is not significant,
it should be eliminated. • Introducing an independent variable or removing an independent variable from the regression equation is a
step in the stepwise regression. • A Y value test must be performed for each step to ensure that each time a new variable is introduced The regression equation before significant
variables only includes variables that have a significant effect on Y. • This process is repeated until no insignificant variables are removed from the regression equation and no significant variables can be introduced into the regression equation
.

multiple linear regression

Insert image description here
Insert image description here

Insert image description here
Answer
code

%输入数据
x=[143 145 146 147 149 150 153 154 155 156 157 158 159 160 162 164]';
X=[ones(16,1) x];
Y=[88 85 88 91 92 93 93 95 96 98 97 96 98 99 100 102]';
%回归分析与检验
[b,bint,r,rint,stats]=regress(Y,X)
rcoplot(r,rint)%做残差图
%图像的预测与对比
z=b(1)+b(2)*5%这里乘几都可以,只是为了做个对比。
plot(x,Y,'k+',x,z,'r')

Insert image description here
Insert image description here
Insert image description here
It can be seen from the residual plot that, except for the second data, the residuals of the rest of the data are close to zero, and the confidence intervals of the residuals all include zero, which shows that the regression model y-16.073+0.7194x can work better consistent with the original data, while the second data can be considered an outlier.
Insert image description here

polynomial regression

Insert image description here
Insert image description here
Do quadratic polynomial regression directly:

t=1/30:1/30:14/30;
s=[11.86 15.67 20.60 26.69 33.71 41.93 51.13 61.49  72.90 85.44 99.08 113.77 129.54 146.48];
[p,S]=polyfit(t,s,2)%2指的是最高次数为2次

Running results
Insert image description here
Insert image description here
The second method: it can be converted into multiple linear regression. Treat
t 2 as one element, that is, x2, and thus it becomes multiple linear regression for calculation.

%多项式回归化为多元线性回归
t=1/30:1/30:14/30;
s=[11.86 15.67 20.60 26.69 33.71 41.93 51.13 61.49 72.90 85.44 99.08 113.77 129.54 146.48];
T=[ones(14,1) t' (t.^2)'];%为14*3的矩阵
[b,bint,r,rint,stats]=regress(s',T);%回归系数,回归系数的区间估计,残差,置信期间
b,stats%回复系数和检验回归模型的统计量
%预测及作图
Y=polyconf(p,t,S)%求回归多项式在x处的预测值Y
plot(t,s,'k+',t,Y,'r')

operation result
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Multiple Binomial Regression

Insert image description here
Insert image description here
Method 1
directly uses multivariate binomial regression
code

x1=[1000 600 1200 500 300 400 1300 1100 1300 300];
x2=[5 7 6 6 8 7 5 4 3 9];
y=[100 75 80 70 50 65 90 100 110 60]';
x=[x1' x2'];
rstool(x,y,'purequadratic')%多元二项式回归,纯二次型

The values ​​​​at x1 and x2 of the running results
Insert image description here
need to be changed manually by yourself, just change them to the values ​​you need. The
prediction results will be displayed on the left
. Click output to beta. Both rmse and residuals are transferred to the MATLAB workspace. Insert image description here
Insert image description here
Insert image description here
Insert image description here
Generally, rmse is less than 10.
Method 2:
Convert pure quadratic form into multiple linear regression
code

%多元二项式化为多元线性回归
x1=[1000 600 1200 500 300 400 1300 1100 1300 300];
x2=[5 7 6 6 8 7 5 4 3 9];
y=[100 75 80 70 50 65 90 100 110 60]';
x=[ones(10,1) x1' x2' (x1.^2)' (x2.^2)'];
[b,bint,r,rint,stats]=regress(y,x);
b,stats

operation result
Insert image description here

nonlinear regression

Insert image description here
Example: First, create the M file volume.m code for the nonlinear model y=ae b/x
Insert image description here
to be fitted.


%非线性拟合
        %对将要拟合的非线性模型y=ae^b/x,建立M文件volum.m
        %function yhat=volum(beta,x)
        % yhat=beta(1)*exp(beta(2)./x);
%输入数据
x=2:16;
y=[6.42 8.20 9.58 9.5 9.7 10 9.93 9.99 10.49 10.59 10.60 10.80 10.60 10.90 10.76];
beta0=[8 2]';%初值
%求回归系数
[beta,r ,J]=nlinfit(x',y','volum',beta0);
beta

The running result
Insert image description here
is that the regression model is:
y =11.6036e -1.0641/x
Figure
Insert image description here

gradual regression

Insert image description here
Insert image description here
code

x1=[7 1 11 11 7 11 3 1 2 21 1 11 10]';
x2=[26 29 56 31 52 55 71 31 54 47 40 66 68]';
x3=[6 15 8 8 6 9 17 22 18 4 23 9 8]';
x4=[60 52 20 47 33 22 6 44 22 26 34 12 12]';
y=[78.5 74.3 104.3 87.6 95.9 109.2 102.7 72.5 93.1 115.9 83.8 113.3 109.4]';
x=[x1 x2 x3 x4];
stepwise(x,y)

Click Next to run the results
Insert image description here
, gradually move in each variable, and observe the changes in each indicator until you find the most suitable combination of independent variables. Click Next, that is, move in the
x4 observation results.
Insert image description here
Continue to click Next, that is, move in the x1 observation results, and
Insert image description here
you can see the RMSE process. These two steps dropped significantly. The correlation coefficient increases significantly and approaches 1. At this time, the software can no longer move in the independent variables by itself, but Momen can manually click on the red or blue lines in the picture to change the moving in and out of the independent variables.
After repeated confirmation, the performance will change significantly when moving x1 or x2 out, so only x1 and x2 are retained.
Then export the results.
The exported results are stored in the workspace
and enter the code.

X=[ones(13,1) x1 x2];
b=regress(y,X)%对y,x1,x2做线性回归

Insert image description here

Guess you like

Origin blog.csdn.net/Luohuasheng_/article/details/128601423