background
The nutrients needed for crop growth in a certain area are mainly nitrogen (N), phosphorus (P), and potassium (K). A certain crop research institute has conducted a certain number of experiments on potatoes and lettuce in this area. The experimental data are as shown in Table 4-1, which records the experimental data for potatoes, and Table 4-2, which records the experimental data for lettuce. Among them, hm2 means hectares, t means tons, and kg means kilograms. When the fertilization of one nutrient is changed, the fertilization rates of the other two nutrients are always maintained at the seventh level. For example, when conducting experiments on N fertilization for potato yield, the fertilizer amounts of P and K were 196kg/hm2 and 372kg/hm2 respectively.
Objective: Analyze the relationship between fertilizer application and yield, and estimate all results in terms of application value and how to improve the program.
Table 4-1 N, P and K effects of potatoes:
Fertilization amount (N) Kg/hm2 |
Yield t/hm2 |
Fertilizer amount ( P ) Kg/hm2 |
Yield t/hm2 |
Fertilization amount (K) Kg/hm2 |
Yield t/hm2 |
0 |
15.81 |
0 |
33.46 |
0 |
18.98 |
34 |
21.36 |
24 |
32.47 |
47 |
27.35 |
67 |
25.72 |
49 |
36.06 |
93 |
34.86 |
101 |
32.29 |
73 |
37.96 |
140 |
39.52 |
135 |
34.03 |
98 |
41.04 |
186 |
38.44 |
202 |
39.45 |
147 |
40.09 |
279 |
37.73 |
259 |
43.15 |
196 |
41.26 |
372 |
38.43 |
336 |
43.46 |
245 |
42.17 |
465 |
43.87 |
404 |
40.83 |
294 |
40.36 |
558 |
42.77 |
471 |
30.75 |
342 |
42.73 |
651 |
46.22 |
Table 4-2 N, P and K effects of lettuce:
Fertilization amount (N) Kg/hm2 |
Yield t/hm2 |
Fertilizer amount ( P ) Kg/hm2 |
Yield t/hm2 |
Fertilization amount (K) Kg/hm2 |
Yield t/hm2 |
0 |
11.2 |
0 |
6.39 |
0 |
15.75 |
28 |
12.70 |
49 |
9.48 |
47 |
16.67 |
56 |
14.56 |
98 |
12.46 |
93 |
16.89 |
84 |
16.27 |
147 |
14.33 |
140 |
16.24 |
112 |
17.75 |
196 |
17.10 |
186 |
17.56 |
169 |
22.59 |
294 |
21.94 |
279 |
19.20 |
224 |
21.63 |
391 |
22.64 |
372 |
17.97 |
280 |
19.34 |
489 |
21.34 |
465 |
15.84 |
336 |
16.12 |
587 |
22.07 |
558 |
20.11 |
392 |
14.11 |
685 |
24.53 |
651 |
19.40 |
>> tn=[0 34 67 101 135 202 259 336 404 471];
yn=[15.18 21.36 25.72 32.29 34.03 39.45 43.15 43.46 40.83 30.75];
tp=[0 24 49 73 98 147 196 245 294 342];
>> yp=[33.46 32.47 36.06 37.96 41.04 40.09 41.26 42.17 40.36 42.73];
tk=[0 47 93 140 186 279 372 465 558 651];
yk=[18.98 27.35 34.86 39.52 38.44 37.73 38.43 43.87 42.77 46.32];
>> sn=[0 28 56 84 112 168 224 280 336 392];
xn=[11.02 12.70 14.56 16.27 17.75 22.59 21.63 19.34 16.12 14.11];
sp=[0 49 98 147 196 294 391 489 587 685];
>> xp=[6.39 9.48 12.46 14.33 17.10 21.49 22.46 21.34 22.07 24.53];
sk=[0 47 93 140 186 279 372 465 558 651];
xk=[15.75 16.76 16.89 16.24 17.56 19.20 17.97 15.84 20.11 19.40];
1. The relationship between potato yield and nitrogen fertilizer
1.1 Draw a scatter plot
Taking potato yield and nitrogen fertilizer application amount as an example, preliminary observations were made and a quadratic function was used as an empirical equation.
1.2 Regression model
【3.1】
1.3 Model parameter solution and significance test
Call matlab's regression solution function regress, which is explained step by step below.
>> clear %清除内存里的一切变量;
>> load d:\tudou %调回存储的数据;
>>[b,bt,r,rt,st]=regress(yn‘,[ones(length(tn),1),tn’,(tn.^2)‘]) %调用格式
- b Regression coefficient, ascending power arrangement
- bt Confidence interval of regression coefficient, default 95%
- r Residual error, that is, the difference between the theoretical value and the measured value
- rt 95% confidence interval of the residuals
- st model test parameters (R2, F, P, sig2). (R2 represents the correlation test between the dependent variable and the independent variable of the equation, R is also called the determination coefficient; F, p is the significance test of the equation; sig2 is Estimated value of σ2.)
regress matlaba regression command
yn' dependent variable (column)
[ones(length(tn),1),tn',(tn.^2)']) matrix, the form is
Regression coefficients
b =
14.7416 a0
0.1971 a1
-0.0003 a2
95% confidence interval for regression coefficient
bt =
12.6301 16.8532
0.1736 0.2207
-0.0004 -0.0003
The confidence interval of the regression coefficient should not include 0; if it includes 0, it means that the coefficient is not significant (100 experiments, more than 95 times are very close to 0, which is very dangerous (or does not contribute much))
The equation significance index F=251.7971>Fα(2,7) shows that the regression result is significant.
It is better for the confidence interval of the residual to include 0, which means that the residual is always near 0, which means that the residual is small. If the residual confidence interval does not include zero, the sample observation is likely to be an extreme value.
The residual sum of squares is recorded as SSE, and SSE/(np-1) can be used as an unbiased estimate of σ2.
In this problem, R2=0.9863 means controlling the amount of nitrogen fertilizer. Among 100 planting experiments, at least 98 potato yields can be explained by this empirical formula.
1.4 Model correction or improvement
According to the above calculation, it is found that the right endpoint of the residual confidence interval of the 10th sample is less than 0, that is, the 10th sample may be abnormal. After removing the 10th sample (which can be replaced by linear interpolation), the regression result is R2=0.9956; F=678.5246; p<0.000
The regression equation is extremely significant:
1.5 Forecast
According to the above analysis, when the amount of fertilizer applied to potatoes tn is given, the potato yield Yn obeys the distribution
where σ2 is a point estimate based on the sum of squares of the residuals
Then, in potato cultivation, when the fertilizer amount per hectare is tn=380kg, the distribution of potato yield Yn is:
That is, the 95% confidence interval of potato yield is (39.8725, 42.9181)
2. Relationship between potato yield and phosphate fertilizer (1)
2.1 Model guessing
After the amount of phosphate fertilizer applied exceeds 100, the yield of potatoes barely increases. That is, there is an upper bound in the relationship between potatoes and phosphate fertilizer. You can try the following empirical functions:
- (exponential function)
- (hyperbolic function)
- (Logistic curve)
- (Logarithmic function)
- y=pm (x) (polynomial function)
- (Power function)
No matter which one is used, it is nonlinear regression, and error and reliability analysis are not as mature and easy to use as linear regression.
2.2 Data preprocessing
(1) After reading the data and scatter plot, it was found that the potato yield when the phosphate fertilizer rate was 24kg/hm2 was not as good as the yield without fertilization. The second data was considered to be contaminated, and the first and third data were replaced by linear interpolation.
The above-mentioned nonlinear models, no matter which one is used, will involve the independent variable taking a negative exponential function. Because the independent variable is too large, the negative exponential function will be around 0, which is too sensitive. So first standardize the amount of phosphate fertilizer:
2.3 Linearization of empirical formulas
As shown in the figure above, the maximum yield of potatoes does not exceed 44, and the maximum value of the Weibull function is A. It may be assumed that A=44, then
Take Y=ln(44-y), a0=lnB, a1=-C, then the Weibull function can be linearly divided to draw a (tp', Y) scatter plot to verify the guess.
As shown in the picture below, the guess is correct.
2.4 Build model
Yp represents the yield of potatoes when phosphate fertilizer is applied, tp represents the amount of phosphate fertilizer applied, and the model is set to [3.2]
Taking Y=ln(44-Yp), a0=lnB, a1=-C, [3.2] is equivalent to the following linear regression model [3.3]
2.5 Parameter solution
The matlab calculation code is as follows
>> clear
>> load d:\tudou
>> yp(2)=(yp(1)+yp(3))/2;
>> tp1=(tp-mean(tp))/std(tp);
>> Y=log(44-yp);
>> plot(tp1,Y,'*')
>> [mean(tp),std(tp)]
ans =
146.8000 118.2547
>> [a,at,r,rt,st]=regress(Y',[ones(length(tp),1),tp1']);
>> a
a =
1.4041
-0.6211
>> at
at =
1.1495 1.6587
-0.8895 -0.3527
>> st
st =
0.7807 28.4829 0.0007 0.1219
>> rt
rt =
-0.5561 0.9161
-0.5894 0.9383
-0.6351 0.9434
-0.8102 0.8236
-1.2249 0.0744
-0.8761 0.7971
-0.9568 0.6814
-1.0436 0.4756
0.1942 1.1279
-0.8104 0.5307
The 9th sample is abnormal and calculated after correction.
>> Y(9)=[];tp1(9)=[];
[a,at,r,rt,st]=regress(Y',[ones(length(tp1),1),tp1']);
>> st
st =
0.9154 75.7884 0.0001 0.0536
(极显著)
a =
1.3133
-0.7467
The regression result is
The comparison between experimental values and theoretical values is shown in the figure below.
Theoretical value Experimental value
34.6242 33.4600
35.9398 34.7600 37.1144
36.0600 38.0806 37.9600
38.9432 41.0400 40.2863
40.0900
41.2726 41.2600
41.9 970 42.1700 42.5290
40.3600
42.9129
42.7300
3. Relationship between potato yield and phosphate fertilizer (2)
In the relationship between potato yield and phosphate fertilizer (1) above, we assume that the maximum potato yield is 44 before linearization (this assumption is sometimes unreasonable). If it cannot be linearized, it can only be linearized.
3.1 Empirical model
3.2 Model solution
Call matlab's least squares curve fitting function lsqcurvefit to solve for the model parameter beta. First write the m file tpfun.m of the model's empirical function.
The calling format is:
>> clear
>> load d:\tudou
>> x=tp';y=yp';
>> [beta,res]=lsqcurvefit(@tpfun,[40,0.2,2],x,y)
[beta,res]=lsqcurvefit(@tpfun,[40,0.2,2],x,y)
- beta is the estimated value of the returned parameter, given in the order in tpfun.m
- res returns the residual sum of squares, that is, the sum of squared errors between the empirical value and the experimental value
- @tpfun calls the m file with the function name tpfun.m
- [40,0.2,2] is the initial value of parameter beta
- X amount of phosphate fertilizer applied in the potato trial (column vector)
- Y Potato yield in the potato experiment (column vector)
The calculation result is
beta =
19.4643 3.9464 27.4523
res =
18.1411
That is, the fitting equation of potatoes to phosphate fertilizer is:
3.3 Error analysis
The fitting error of potatoes to phosphate fertilizer is
Comparison image of empirical values and experimental values
实验值y 回归值y*
33.4600 32.5365
32.4700 35.0157
36.0600 36.5785
37.9600 37.6559
41.0400 38.5330
40.0900 39.8342
41.2600 40.8111
42.1700 41.5935
40.3600 42.2462
Distribution diagram of the difference between experimental values and fitted values of potato yield
>> E=y-tpfun(beta,x)
E =
0.9235
-2.5457
-0.5185
0.3041
2.5070
0.2558
0.4489
0.5765
-1.8862
-0.0654
3.4 Normality hypothesis test
(1) Draw a normal probability diagram
>> normplot(E)
Except for 3 points, the remaining points are evenly distributed on both sides of the straight line, that is, the data approximately obeys a normal distribution.
(2)Jbtest normal test
>> [h,p,jbstat,critval]=jbtest(E,0.05)
h =
0
p =
0.5000
jbstat =
0.1221
critval =
2.5239
Jbtest test: reject the null hypothesis H0 when h=1; reject the null hypothesis when jbstat>critval. The calculation results show that there is no reason to reject H0, that is, to accept H0.
E is considered to obey a normal distribution with mean 0.