Gray correlation degree analysis algorithm (including matlab source code and examples)

    This article is written after sorting out, and there is no guarantee that there will be no problems. If you find any problems, please correct them in the comment area or private message! ! !

Table of contents

foreword

1. Determine the parent sequence and subsequence

2. Data normalization

        1) Initialization

        2) Averaging

3. Calculate the absolute value difference

4. Calculate the gray correlation coefficient

PS: Meaning data demonstration

Code


foreword

Gray Relation Analysis (GRA) is a method of multi-factor statistical analysis. Generally speaking, through this algorithm, we can get the strength of a project affected by other factors , such as the GDP affected by the first The influence of the primary industry, the secondary industry, and the tertiary industry, and how much impact do these three industries have on the GDP? This is the problem we want to discuss, and it is also a problem that gray relational analysis can solve.

In the following discussion, we use the data in the following table as a demonstration (the data is only for learning and does not guarantee its authenticity and rationality)

Table 1 The original table of direct economic losses of disasters and related influencing factors
years Disaster direct economic loss (100 million yuan) Area of ​​crops affected by disaster (thousand hectares) Earthquake disaster losses (100 million yuan) Marine disaster losses (100 million yuan) Forest fire loss (100 million yuan) Losses caused by mudslides (100 million yuan)
2000 2045.3 34374 14.6792 120.9 0.3069 49.4201
2001 1942.2 31793 14.8449 100.1 0.7409 34.8699
2002 1637.2 27319 1.4774 65.9 0.361 50.974
2003 1884.2 32516 46.604 80.52 3.7 50.4325
2004 1602.3 16279 9.4959 54.22 2.0213 40.8828

Gray relational degree analysis is roughly divided into four steps

1. Determine the parent sequence and subsequence

        Mother sequence: The data sequence that can reflect the behavior characteristics of the system. In the above example, the direct economic loss of the disaster is the mother sequence.

        Subsequence: A data sequence composed of factors that affect system behavior. In the above example, the crop disaster area, earthquake disaster loss, marine disaster loss, forest fire loss, and debris flow loss are subsequences.

2. Data normalization

        Normalization is dimensionless. Due to different dimensions, the index value may be large or small. If the large data is not processed, the influence of small data variables must be submerged. Therefore, it is necessary to reduce the difference in the absolute value of the data and unify it to an approximate value. This is data normalization.

        There are two main ways of normalization, namely initialization and meanization.

        1) Initialization

        In simple terms, it is to divide the data of each indicator by the original value. Obviously the initial data in Table 1 is the value of 2000 [ 2045.3  34374  14.6792  120.9  0.3069  49.4201

        The processed data is as follows (four decimal places are retained)

Table 2.1 Data after initialization
years Disaster direct economic loss (100 million yuan) Area of ​​crops affected by disaster (thousand hectares) Earthquake disaster losses (100 million yuan) Marine disaster losses (100 million yuan) Forest fire loss (100 million yuan) Losses caused by mudslides (100 million yuan)
2000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
2001 0.9496 0.9249 1.0113 0.8280 2.4141 0.7056
2002 0.8005 0.7948 0.1006 0.5451 1.1763 1.0314
2003 0.9212 0.9460 3.1748 0.6660 12.0560 1.0205
2004 0.7834 0.4736 0.6469 0.4485 6.5862 0.8273

        2) Averaging

        Averaging is to first calculate the average value of the parent series and each sub-sequence, and then divide the data of each indicator by the average value of the corresponding column. The example of averaging is at the end! ! !

3. Calculate the absolute value difference

       This step calculates the absolute value difference between the evaluated index sequence and the corresponding elements of the reference sequence. The reference sequence is generally the parent sequence after data normalization, that is, [ 1.00  , 0.95  , 0.80  , 0.92  , 0.78 ]. The specific operation is if there is m Columns, the last m-1 columns correspond to subtracting the first column, and then take the absolute value to obtain a table of m-1 columns.             

Table 3.1 Absolute value difference
0 0 0 0 0
0.0247 0.0617 0.1216 1.4645 0.2440
0.0057 0.6998 0.2554 0.3758 0.2310
0.0247 2.2536 0.2552 11.1348 0.0993
0.3093 0.1365 0.3349 5.8028 0.0438

Then take the minimum value a=0 and the maximum value b=11.1348         in this table

4. Calculate the gray correlation coefficient

        For each data in Table 3, we collectively call this number x, then the calculation formula of the gray correlation coefficient is

                \xi =\frac{a+\rho *b}{x+\rho*b}, so we get the gray relational coefficient table ( \rhogenerally 0.5 is taken for the resolution coefficient).

Table 4.1 Gray correlation coefficient table
1 1 1 1 1
0.9956 0.9890 0.9786 0.7917 0.9580
0.9990 0.8883 0.9561 0.9368 0.9602
0.9956 0.7119 0.9562 0.3333 0.9825
0.9473 0.9761 0.9433 0.4896 0.9922

        Then take the average value of each column, which is the correlation degree between the influencing factors of this column and the direct economic loss of the disaster

Table 5.1 The degree of correlation between the direct economic losses of disasters and various relevant factors
crop damage area Earthquake Disaster Loss marine disaster loss Forest disaster loss mudslide loss
0.9875 0.9131 0.9668 0.7103 0.9786

PS: Meaning data demonstration

                

Table 1 The original table of direct economic losses of disasters and related influencing factors
years Disaster direct economic loss (100 million yuan) Area of ​​crops affected by disaster (thousand hectares) Earthquake disaster losses (100 million yuan) Marine disaster losses (100 million yuan) Forest fire loss (100 million yuan) Losses caused by mudslides (100 million yuan)
2000 2045.3 34374 14.6792 120.9 0.3069 49.4201
2001 1942.2 31793 14.8449 100.1 0.7409 34.8699
2002 1637.2 27319 1.4774 65.9 0.361 50.974
2003 1884.2 32516 46.604 80.52 3.7 50.4325
2004 1602.3 16279 9.4959 54.22 2.0213 40.8828

        Taking the mean for each column gives [1822.24, 28456, 17.42028, 84.328, 1.42602, 45.31586]

Table 2.2 Averaged data
years Disaster direct economic loss (100 million yuan) Area of ​​crops affected by disaster (thousand hectares) Earthquake disaster losses (100 million yuan) Marine disaster losses (100 million yuan) Forest fire loss (100 million yuan) Losses caused by mudslides (100 million yuan)
2000 1.1224 1.2079 0.8427 1.4337 0.2152 1.0906
2001 1.0658 1.1173 0.8522 1.1870 0.5196 0.7695
2002 0.8985 0.9600 0.0848 0.7815 0.2532 1.1249
2003 1.0340 1.1427 2.6753 0.9548 2.5946 1.1129
2004 0.8793 0.5721 0.5451 0.6430 1.4174 0.9022

        

Table 3.2 Absolute value difference
0.0855 0.2798 0.3113 0.9072 0.0318
0.0514 0.2137 0.1212 0.5463 0.2963
0.0616 0.8136 0.1170 0.6453 0.2264
0.1087 1.6413 0.0792 1.5606 0.0789
0.3072 0.3342 0.2363 0.5381 0.0229

Table 4.2 Gray correlation coefficient table
0.9309 0.7665 0.7452 0.4882 0.9895
0.9672 0.8155 0.8956 0.6171 0.7552
0.9561 0.5161 0.8996 0.5754 0.8056
0.9077 0.3426 0.9374 0.3542 0.9377
0.7479 0.7304 0.7980 0.6208 1.0000

Table 5.2 The degree of correlation between the direct economic losses of disasters and various relevant factors
crop damage area Earthquake Disaster Loss marine disaster loss Forest disaster loss mudslide loss
0.9020 0.6343 0.8552 0.5311 0.8976

Code

%灰色关联度分析
%以旅游收入为例 一共6行4列 行数代表2000-2005年 列数表示 国内生产总值 第一产业 第二产业 第三产业
%最终得到的结果代表三产业对国内生产总值的关联度
clear;clc;
load 灰色关联度分析-灾害直接经济损失.mat
[n,m]=size(X);

%初值化例子 灾害直接经济损失 
%均值化例子 国内生产总值

%初值化和均值化选择一个就可以

%均值化
aveg=mean(X);%取每一列的平均值
for j=1:m
   update_x(:,j)=X(:,j)./aveg(:,j);%每一列的数据除以这一列的平均值
end

%初值化
% for j=1:m
%     update_x(:,j)=X(:,j)./X(1,j);%每一列的数据除以这一列的初始值
% end


for i=1:n
    for j=2:m
        temp_x(i,j-1)=abs(update_x(i,j)-update_x(i,1));%update_x第一列与后面几列的差值的绝对值
    end
end
a=min(min(temp_x));%temp_x矩阵里的最小值
b=max(max(temp_x));%temp_x矩阵里的最大值
p=0.5;%分辨系数 一般是0.5
XX=(a+p*b)./(temp_x+p*b);
R=mean(XX,1)%灰色关联度


   

               

Guess you like

Origin blog.csdn.net/m0_62558103/article/details/126803195