Statistical model - analysis of fertilization effects

background

The nutrients needed for crop growth in a certain area are mainly nitrogen (N), phosphorus (P), and potassium (K). A certain crop research institute has conducted a certain number of experiments on potatoes and lettuce in this area. The experimental data are as shown in Table 4-1, which records the experimental data for potatoes, and Table 4-2, which records the experimental data for lettuce. Among them, hm2 means hectares, t means tons, and kg means kilograms. When the fertilization of one nutrient is changed, the fertilization rates of the other two nutrients are always maintained at the seventh level. For example, when conducting experiments on N fertilization for potato yield, the fertilizer amounts of P and K were 196kg/hm2 and 372kg/hm2 respectively.

Objective: Analyze the relationship between fertilizer application and yield, and estimate all results in terms of application value and how to improve the program.

Table 4-1 N, P and K effects of potatoes:

Fertilization amount (N)

Kg/hm2

Yield

t/hm2

Fertilizer amount ( P )

Kg/hm2

Yield

t/hm2

Fertilization amount (K)

Kg/hm2

Yield

t/hm2

0

15.81

0

33.46

0

18.98

34

21.36

24

32.47

47

27.35

67

25.72

49

36.06

93

34.86

101

32.29

73

37.96

140

39.52

135

34.03

98

41.04

186

38.44

202

39.45

147

40.09

279

37.73

259

43.15

196

41.26

372

38.43

336

43.46

245

42.17

465

43.87

404

40.83

294

40.36

558

42.77

471

30.75

342

42.73

651

46.22

Table 4-2 N, P and K effects of lettuce:

Fertilization amount (N)

Kg/hm2

Yield

t/hm2

Fertilizer amount ( P )

Kg/hm2

Yield

t/hm2

Fertilization amount (K)

Kg/hm2

Yield

t/hm2

0

11.2

0

6.39

0

15.75

28

12.70

49

9.48

47

16.67

56

14.56

98

12.46

93

16.89

84

16.27

147

14.33

140

16.24

112

17.75

196

17.10

186

17.56

169

22.59

294

21.94

279

19.20

224

21.63

391

22.64

372

17.97

280

19.34

489

21.34

465

15.84

336

16.12

587

22.07

558

20.11

392

14.11

685

24.53

651

19.40

>> tn=[0 34 67 101 135 202 259 336 404 471];
yn=[15.18 21.36 25.72 32.29 34.03 39.45 43.15 43.46 40.83 30.75];
tp=[0 24 49 73 98 147 196 245 294 342];
>> yp=[33.46 32.47 36.06 37.96 41.04 40.09 41.26 42.17 40.36 42.73];
tk=[0 47 93 140 186 279 372 465 558 651];
yk=[18.98 27.35 34.86 39.52 38.44 37.73 38.43 43.87 42.77 46.32];
>> sn=[0 28 56 84 112 168 224 280 336 392];
xn=[11.02 12.70 14.56 16.27 17.75 22.59 21.63 19.34 16.12 14.11];
sp=[0 49 98 147 196 294 391 489 587 685];
>> xp=[6.39 9.48 12.46 14.33 17.10 21.49 22.46 21.34 22.07 24.53];
sk=[0 47 93 140 186 279 372 465 558 651];
xk=[15.75 16.76 16.89 16.24 17.56 19.20 17.97 15.84 20.11 19.40];

1. The relationship between potato yield and nitrogen fertilizer

1.1 Draw a scatter plot

 Taking potato yield and nitrogen fertilizer application amount as an example, preliminary observations were made and a quadratic function was used as an empirical equation.

1.2 Regression model

【3.1】

1.3 Model parameter solution and significance test

Call matlab's regression solution function regress, which is explained step by step below.

>> clear     %清除内存里的一切变量;
>> load d:\tudou  %调回存储的数据;
>>[b,bt,r,rt,st]=regress(yn‘,[ones(length(tn),1),tn’,(tn.^2)‘])  %调用格式
  • b Regression coefficient, ascending power arrangement
  • bt Confidence interval of regression coefficient, default 95%
  •  r Residual error, that is, the difference between the theoretical value and the measured value
  •  rt 95% confidence interval of the residuals
  •  st model test parameters (R2, F, P, sig2). (R2 represents the correlation test between the dependent variable and the independent variable of the equation, R is also called the determination coefficient; F, p is the significance test of the equation; sig2 is Estimated value of σ2.)

 regress matlaba regression command

yn' dependent variable (column)

[ones(length(tn),1),tn',(tn.^2)']) matrix, the form is

Regression coefficients

b =    

14.7416 a0    

0.1971 a1  

 -0.0003 a2

95% confidence interval for regression coefficient

bt =  

12.6301   16.8532  

0.1736    0.2207

-0.0004   -0.0003

 The confidence interval of the regression coefficient should not include 0; if it includes 0, it means that the coefficient is not significant (100 experiments, more than 95 times are very close to 0, which is very dangerous (or does not contribute much))

 

The equation significance index F=251.7971>Fα(2,7) shows that the regression result is significant.

It is better for the confidence interval of the residual to include 0, which means that the residual is always near 0, which means that the residual is small. If the residual confidence interval does not include zero, the sample observation is likely to be an extreme value.

The residual sum of squares is recorded as SSE, and SSE/(np-1) can be used as an unbiased estimate of σ2.

In this problem, R2=0.9863 means controlling the amount of nitrogen fertilizer. Among 100 planting experiments, at least 98 potato yields can be explained by this empirical formula.

1.4 Model correction or improvement

 According to the above calculation, it is found that the right endpoint of the residual confidence interval of the 10th sample is less than 0, that is, the 10th sample may be abnormal. After removing the 10th sample (which can be replaced by linear interpolation), the regression result is R2=0.9956; F=678.5246; p<0.000

The regression equation is extremely significant:

 1.5 Forecast

According to the above analysis, when the amount of fertilizer applied to potatoes tn is given, the potato yield Yn obeys the distribution

where σ2 is a point estimate based on the sum of squares of the residuals

  Then, in potato cultivation, when the fertilizer amount per hectare is tn=380kg, the distribution of potato yield Yn is:

That is, the 95% confidence interval of potato yield is (39.8725, 42.9181)

2. Relationship between potato yield and phosphate fertilizer (1)

2.1 Model guessing

After the amount of phosphate fertilizer applied exceeds 100, the yield of potatoes barely increases. That is, there is an upper bound in the relationship between potatoes and phosphate fertilizer. You can try the following empirical functions:

  1. (exponential function)
  2. (hyperbolic function)
  3. (Logistic curve)
  4. (Logarithmic function)
  5. y=pm (x) (polynomial function)
  6. (Power function)

  No matter which one is used, it is nonlinear regression, and error and reliability analysis are not as mature and easy to use as linear regression.

 2.2 Data preprocessing

(1) After reading the data and scatter plot, it was found that the potato yield when the phosphate fertilizer rate was 24kg/hm2 was not as good as the yield without fertilization. The second data was considered to be contaminated, and the first and third data were replaced by linear interpolation.

The above-mentioned nonlinear models, no matter which one is used, will involve the independent variable taking a negative exponential function. Because the independent variable is too large, the negative exponential function will be around 0, which is too sensitive. So first standardize the amount of phosphate fertilizer:

2.3 Linearization of empirical formulas

As shown in the figure above, the maximum yield of potatoes does not exceed 44, and the maximum value of the Weibull function is A. It may be assumed that A=44, then

 Take Y=ln(44-y), a0=lnB, a1=-C, then the Weibull function can be linearly divided to draw a (tp', Y) scatter plot to verify the guess.

As shown in the picture below, the guess is correct.

2.4 Build model

  Yp represents the yield of potatoes when phosphate fertilizer is applied, tp represents the amount of phosphate fertilizer applied, and the model is set to [3.2]

 Taking Y=ln(44-Yp), a0=lnB, a1=-C, [3.2] is equivalent to the following linear regression model [3.3]

2.5 Parameter solution

The matlab calculation code is as follows

>> clear
>> load d:\tudou
>> yp(2)=(yp(1)+yp(3))/2;
>> tp1=(tp-mean(tp))/std(tp);
>> Y=log(44-yp);
>> plot(tp1,Y,'*')
>> [mean(tp),std(tp)]
ans =
  146.8000  118.2547
>> [a,at,r,rt,st]=regress(Y',[ones(length(tp),1),tp1']);
>> a
a =
    1.4041
   -0.6211
>> at
at =
    1.1495    1.6587
   -0.8895   -0.3527
>> st
st =
    0.7807   28.4829    0.0007    0.1219
>> rt
rt =
   -0.5561    0.9161
   -0.5894    0.9383
   -0.6351    0.9434
   -0.8102    0.8236
   -1.2249    0.0744
   -0.8761    0.7971
   -0.9568    0.6814
   -1.0436    0.4756
    0.1942    1.1279
   -0.8104    0.5307

The 9th sample is abnormal and calculated after correction.

>> Y(9)=[];tp1(9)=[];
[a,at,r,rt,st]=regress(Y',[ones(length(tp1),1),tp1']);
>> st
st =
    0.9154   75.7884    0.0001    0.0536
(极显著)
a =
    1.3133
   -0.7467

The regression result is

The comparison between experimental values ​​and theoretical values ​​is shown in the figure below.

   

    Theoretical value Experimental value
   34.6242 33.4600
   35.9398 34.7600 37.1144
   36.0600 38.0806 37.9600
   38.9432 41.0400 40.2863
   40.0900
   41.2726 41.2600
   41.9 970 42.1700 42.5290
   40.3600
   42.9129
   42.7300


3. Relationship between potato yield and phosphate fertilizer (2)

In the relationship between potato yield and phosphate fertilizer (1) above, we assume that the maximum potato yield is 44 before linearization (this assumption is sometimes unreasonable). If it cannot be linearized, it can only be linearized.

3.1 Empirical model

3.2 Model solution

 Call matlab's least squares curve fitting function lsqcurvefit to solve for the model parameter beta. First write the m file tpfun.m of the model's empirical function.

 The calling format is:

>> clear
>> load d:\tudou
>> x=tp';y=yp';
>> [beta,res]=lsqcurvefit(@tpfun,[40,0.2,2],x,y)

[beta,res]=lsqcurvefit(@tpfun,[40,0.2,2],x,y)

  • beta is the estimated value of the returned parameter, given in the order in tpfun.m
  • res returns the residual sum of squares, that is, the sum of squared errors between the empirical value and the experimental value
  • @tpfun calls the m file with the function name tpfun.m
  • [40,0.2,2] is the initial value of parameter beta
  • X amount of phosphate fertilizer applied in the potato trial (column vector)
  •  Y Potato yield in the potato experiment (column vector)

 The calculation result is

beta =
   19.4643    3.9464   27.4523
res =
   18.1411

That is, the fitting equation of potatoes to phosphate fertilizer is:

3.3 Error analysis

The fitting error of potatoes to phosphate fertilizer is

                                                    Comparison image of empirical values ​​and experimental values   

   实验值y    回归值y*    
   33.4600   32.5365
   32.4700   35.0157
   36.0600   36.5785
   37.9600   37.6559
   41.0400   38.5330
   40.0900   39.8342
   41.2600   40.8111
   42.1700   41.5935
   40.3600   42.2462

                          Distribution diagram of the difference between experimental values ​​and fitted values ​​of potato yield

>> E=y-tpfun(beta,x)
E =
    0.9235
   -2.5457
   -0.5185
    0.3041
    2.5070
    0.2558
    0.4489
    0.5765
   -1.8862
   -0.0654

3.4 Normality hypothesis test

(1) Draw a normal probability diagram

>> normplot(E)

Except for 3 points, the remaining points are evenly distributed on both sides of the straight line, that is, the data approximately obeys a normal distribution.

(2)Jbtest normal test

>> [h,p,jbstat,critval]=jbtest(E,0.05)
h =
     0
p =
    0.5000
jbstat =
    0.1221
critval =
    2.5239

Jbtest test: reject the null hypothesis H0 when h=1; reject the null hypothesis when jbstat>critval. The calculation results show that there is no reason to reject H0, that is, to accept H0.

E is considered to obey a normal distribution with mean 0.

Guess you like

Origin blog.csdn.net/m0_63024355/article/details/133157775