MATLAB-based calculation of spatial data variogram and drawing of empirical semivariogram


  In the early blog (https://blog.csdn.net/zhebushibiaoshifu/article/details/113943720), we described in detail to learn a few basic concepts of computing and its mathematical derivation formula to be combed. Next, I will practice and explain in detail the codes and operations related to geoscientific computing through several new special blogs. This blog is the first one- MATLAB - based spatial data variogram calculation and empirical semivariance plot drawing .
  On the other hand, because the related theoretical concepts involved in the above blogs are relatively abstract, they often need to be combined with practice to better understand. Therefore, you can look at the above blog together with other geoscience computing blogs in this article and later, and you can better understand The implications of related theories.
  Among them, because the data used in this article is not mine, I regret that I cannot show the data together; but based on the ideas of this blog and the detailed explanation of the code, you can use your own data to calculate the spatial data variation function The whole process and analysis method of drawing with empirical semi-variance diagram are fully reproduced.

1 Data processing

1.1 Data reading

  In this article, my initial data are the spatial locations (X and Y, in meters), pH , organic matter content, and total nitrogen content of 658 soil sampling points in a certain area . These data are all stored in the "data.xls" file; and later operations are more than in MATLAB software. Therefore, it is necessary to import the source data selectively into the MATLAB software first. This function can be realized
  by using functions in MATLAB software xlsread. The specific code is attached at "1.3 Normal Distribution Inspection and Conversion".

1.2 Elimination of abnormal data

  Due to sampling records, laboratory testing and other processes, the obtained sampling point data may have certain errors, resulting in individual abnormal values. Use the " mean plus standard deviation method " to filter and eliminate these abnormal data.
  The " 2S " and " 3S " methods in the "average plus standard deviation method" were used to deal with it, and it was found that the treatment effect of the "2S" method was better than that of the latter, so the follow-up experiment took the result of the "2S" method to continue.
  Among them, the "2S" method refers to the part whose value is greater than or less than ±2 times the standard deviation of its average value as an abnormal value, and the "3S" method refers to the part whose value is greater or less than ±3 times the standard deviation of its average value . Treated as an outlier.
  After the abnormal value is obtained, it is removed from the 658 sampling points; the remaining sampling point data continues with subsequent operations.
  The specific codes of this part are attached at "1.3 Normal Distribution Inspection and Conversion".

1.3 Normal distribution test and conversion

  The calculation of the variogram needs to be based on the assumption that the initial data conforms to the normal distribution; and the sampling point data does not necessarily conform to the normal distribution. Therefore, we need to perform a normal distribution test on the original data.
  Generally, the normal distribution test can be intuitively judged by numerical test and images such as histogram and QQ graph . This paper adopts the above two numerical and image inspection methods to jointly judge the characteristics of normal distribution.
  For the numerical test method, I planned to choose the Kolmogorov-Smirnov test method at the beginning ; but since I learned that this method is only applicable to the standard normal test, I switched to the Lilliefors test later .
  The Kolmogorov-Smirnov test compares the empirical distribution function of the sample with the given distribution function to infer whether the sample comes from the population of the given distribution function; when it is used for normality testing, it can only do standard normality tests.
  The Lilliefors test improves the above-mentioned Kolmogorov-Smirnov test, which can be used for the general normal distribution test.
  QQ plot (Quantile Quantile Plot) is a kind of scatter plot. Its abscissa indicates the quantile of a sample data, and the ordinate indicates the quantile of another sample data; a scatter plot composed of abscissa and ordinate Represents the quantile corresponding to the same cumulative probability. Therefore, the QQ graph has such characteristics:
  y=x
  For this straight line, if the points in the scatter diagram are distributed near the straight line, it means that the two samples are equally distributed; therefore, if the abscissa (ordinate) is expressed as the quantile of a standard normal distribution sample, Then the points in the scatter diagram are distributed near the above-mentioned straight line, which shows that the sample represented by the ordinate (abscissa) conforms to or basically approximately conforms to the normal distribution. This article adopts the method of expressing the abscissa as a normal distribution.
  In addition, the PP plot (Probability Probability Plot) can also be used to test the normal distribution. The abscissa of the PP chart represents the cumulative probability of a certain sample of data, and the ordinate represents the cumulative probability of another sample of data; the cumulative probability of the variable corresponds to the specified cumulative probability of the theoretical distribution and is drawn as a scatter plot for visualization. Groundly detect whether the sample data conforms to a certain probability distribution. Similar to the QQ diagram, if the data to be tested conforms to the specified distribution, all its points are distributed near the above-mentioned straight line. If the abscissa (ordinate) is expressed as the quantile of a standard normal distribution sample, it can be explained that the points in the scatter diagram are distributed near a straight line, and the sample indicated by the ordinate (abscissa) conforms to or approximately conforms to Normal distribution.
  For the three soil properties, I chose to first take the pH value as an example. Through the above-mentioned numerical inspection and image inspection methods, it is concluded that the original pH value data after removing the abnormal value does not conform to the normal distribution. Therefore, I tried to perform logarithm and square root conversion on the original data ; then it was found that although the normal distribution characteristics of the original pH square root data still failed to pass the stricter Lilliefors test, its histogram and QQ map image test The result is closer to the normal distribution and more obvious than the previous two. Therefore, the subsequent extraction of the square root processing result continues.
  It is worth mentioning that after obtaining the experimental variogram of the square root of the pH value and its scatter plot in the second half of this article, when performing the same operation on the other two spatial attribute data (ie organic matter content and total nitrogen content), It is found that after the total nitrogen content data is eliminated by the "2S" method, the original form of the data can pass the Lilliefors test, and the distribution characteristics of the histogram and QQ map are very close to the normal distribution.
  I am also going to try to perform arcsine conversion on spatial attribute data. However, it was later discovered that the original data of the three attribute values ​​were not strictly distributed in the range of -1 to 1, so they were not converted in arcsine mode.
  The image inspection results after the above inspection and conversion processing are as follows.
Insert picture description here
  The above part of the code is as follows:

clc;clear;
info=xlsread('data.xls');
oPH=info(:,3);
oOM=info(:,4);
oTN=info(:,5);
 
mPH=mean(oPH);
sPH=std(oPH);
num2=find(oPH>(mPH+2*sPH)|oPH<(mPH-2*sPH));
num3=find(oPH>(mPH+3*sPH)|oPH<(mPH-3*sPH));
PH=oPH;
for i=1:length(num2)
    n=num2(i,1);
    PH(n,:)=[0];
end
PH(all(PH==0,2),:)=[];
 
%KSTest(PH,0.05)
H1=lillietest(PH);
 
for i=1:length(PH)
    lPH(i,:)=log(PH(i,:));
end
 
H2=lillietest(lPH);
 
for i=1:length(PH)
    sqPH(i,:)=(PH(i,:))^0.5;
end
 
H3=lillietest(sqPH);
 
% for i=1:length(PH)
%     arcPH(i,:)=asin(PH(i,:));
% end
% 
% H4=lillietest(arcPH);
 
subplot(2,3,1),histogram(PH),title("Distribution Histogram of pH");
subplot(2,3,2),histogram(lPH),title("Distribution Histogram of Natural Logarithm of pH");
subplot(2,3,3),histogram(sqPH),title("Distributio n Histogram of Square Root of pH");
subplot(2,3,4),qqplot(PH),title("Quantile Quantile Plot of pH");
subplot(2,3,5),qqplot(lPH),title("Quantile Quantile Plot of Natural Logarithm of pH");
subplot(2,3,6),qqplot(sqPH),title("Quantile Quantile Plot of Square Root of pH");

2 Distance measurement

  Next, the distance between the selected sampling points needs to be measured. This is a complicated process that requires the help of loop statements.
  The specific code of this part is as follows.

poX=info(:,1);
poY=info(:,2);
dis=zeros(length(sqPH),length(sqPH));
for i=1:length(sqPH)
    for j=i+1:length(sqPH)
dis(i,j)=sqrt((poX(i,1)-poX(j,1))^2+(poY(i,1)-poY(j,1))^2);
    end
end

3 Distance grouping

  After calculating the distance between all the sampling points, we need to group the distance values ​​according to a certain range delimitation principle.
  The distance grouping first needs to determine the step size. It is found through experiments that if the step size is too large, the resulting scatter plot will have lower accuracy, and if the step size is too small, the total number of point pairs in each group may be smaller. Therefore, the step length here is 500 meters; secondly, the maximum lag distance is determined, and here half of the maximum distance between all sampling points is taken as its value. Then calculate the lag level corresponding to each group, the upper and lower bounds of each group, and so on.
  The specific code of this part is attached to "4 Average distance, semi-variance calculation and drawing" in this article.

4 Average distance, semi-variance calculation and drawing

  Calculate the number of corresponding point pairs in each group, the sum of distances between point pairs, and the sum of attribute value differences between point pairs, etc. Then, according to the above parameters, the average value of the distance between the point pairs and the average value of the attribute value difference between the point pairs are finally obtained.
  According to the average value of the distance between the corresponding point pairs in each group as the horizontal axis, and the average value of the attribute value difference between the corresponding point pairs in each group as the vertical axis, an empirical semivariogram is drawn.
  The specific codes of this part and the above parts are as follows.

madi=max(max(dis));
midi=min(min(dis(dis>0)));
radi=madi-midi;
ste=500;
clnu=floor((madi/2)/ste)+1;
ponu=zeros(clnu,1);
todi=ponu;
todiav=todi;
diff=ponu;
diffav=diff;
for k=1:clnu
    midite=ste*(k-1);
    madite=ste*k;
    for i=1:length(sqPH)
        for j=i+1:length(sqPH)
            if dis(i,j)>midite && dis(i,j)<=madite
                ponu(k,1)=ponu(k,1)+1;
                todi(k,1)=todi(k,1)+dis(i,j); diff(k,1)=diff(k,1)+(sqPH(i)-sqPH(j))^2;
            end
        end
    end
    todiav(k,1)=todi(k,1)/ponu(k,1);
    diffav(k,1)=diff(k,1)/ponu(k,1)/2;
end
plot(todiav(:,1),diffav(:,1)),title("Empirical Semivariogram of Square Root of pH");
xlabel("Separation Distance (Metre)"),ylabel("Standardized Semivariance");

5 Drawing results

  Through the above process, the line graph and scatter plot of the experimental variogram after the square root of the pH value are obtained.
Insert picture description here
Insert picture description here
  It can be seen that the experimental variogram after the square root of the pH value is more consistent with the spherical model or exponential model with abutment value. The value of the function rises rapidly in the range of 0 to 8000 meters. After the distance is 8000 meters, the value rises and slows down, and the range is about 25000 meters; that is, it "rises quickly, then slows down, and then stabilizes" The overall trend of the image is more obvious. But the overall performance of its numerical value is low-the nugget constant is about 0.004, while the abutment value is only about 0.013. In order to verify the correctness of the numerical values, the above-mentioned full-process operation is also performed on organic matter and total nitrogen.
  Get the line graph and scatter plot of the two corresponding variograms.

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

  From the above three groups, a total of six plots of the square root of the pH value, the line graph and the scatter plot of the experimental variogram corresponding to organic matter and total nitrogen, it can be seen that the magnitude of the experimental variogram value corresponding to different values ​​will also be different; but the overall " The overall trend of “rising rapidly, then slowing down, and then stabilizing” is very consistent.
  In addition, as mentioned above, for the three spatial attribute data (pH value, organic matter content, and total nitrogen content), the most consistent normal distribution is also the three attribute data (original value, logarithm and total nitrogen content). Square root), the only value among the nine data states that passed the Lilliefors normal distribution test—the original value of total nitrogen content after the outliers were eliminated. The image test results of its normal distribution are shown below.
Insert picture description here
At this point, we have completed all the operations and analysis processes~

Welcome to pay attention to the public account: crazy learning GIS
Insert picture description here

Guess you like

Origin blog.csdn.net/zhebushibiaoshifu/article/details/114030470