This article is written after sorting out, and there is no guarantee that there will be no problems. If you find any problems, please correct them in the comment area or private message! ! !
Table of contents
1. Determine the parent sequence and subsequence
3. Calculate the absolute value difference
4. Calculate the gray correlation coefficient
PS: Meaning data demonstration
foreword
Gray Relation Analysis (GRA) is a method of multi-factor statistical analysis. Generally speaking, through this algorithm, we can get the strength of a project affected by other factors , such as the GDP affected by the first The influence of the primary industry, the secondary industry, and the tertiary industry, and how much impact do these three industries have on the GDP? This is the problem we want to discuss, and it is also a problem that gray relational analysis can solve.
In the following discussion, we use the data in the following table as a demonstration (the data is only for learning and does not guarantee its authenticity and rationality)
years | Disaster direct economic loss (100 million yuan) | Area of crops affected by disaster (thousand hectares) | Earthquake disaster losses (100 million yuan) | Marine disaster losses (100 million yuan) | Forest fire loss (100 million yuan) | Losses caused by mudslides (100 million yuan) |
2000 | 2045.3 | 34374 | 14.6792 | 120.9 | 0.3069 | 49.4201 |
2001 | 1942.2 | 31793 | 14.8449 | 100.1 | 0.7409 | 34.8699 |
2002 | 1637.2 | 27319 | 1.4774 | 65.9 | 0.361 | 50.974 |
2003 | 1884.2 | 32516 | 46.604 | 80.52 | 3.7 | 50.4325 |
2004 | 1602.3 | 16279 | 9.4959 | 54.22 | 2.0213 | 40.8828 |
Gray relational degree analysis is roughly divided into four steps
1. Determine the parent sequence and subsequence
Mother sequence: The data sequence that can reflect the behavior characteristics of the system. In the above example, the direct economic loss of the disaster is the mother sequence.
Subsequence: A data sequence composed of factors that affect system behavior. In the above example, the crop disaster area, earthquake disaster loss, marine disaster loss, forest fire loss, and debris flow loss are subsequences.
2. Data normalization
Normalization is dimensionless. Due to different dimensions, the index value may be large or small. If the large data is not processed, the influence of small data variables must be submerged. Therefore, it is necessary to reduce the difference in the absolute value of the data and unify it to an approximate value. This is data normalization.
There are two main ways of normalization, namely initialization and meanization.
1) Initialization
In simple terms, it is to divide the data of each indicator by the original value. Obviously the initial data in Table 1 is the value of 2000 [ 2045.3 , 34374 , 14.6792 , 120.9 , 0.3069 , 49.4201 ]
The processed data is as follows (four decimal places are retained)
years | Disaster direct economic loss (100 million yuan) | Area of crops affected by disaster (thousand hectares) | Earthquake disaster losses (100 million yuan) | Marine disaster losses (100 million yuan) | Forest fire loss (100 million yuan) | Losses caused by mudslides (100 million yuan) |
2000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
2001 | 0.9496 | 0.9249 | 1.0113 | 0.8280 | 2.4141 | 0.7056 |
2002 | 0.8005 | 0.7948 | 0.1006 | 0.5451 | 1.1763 | 1.0314 |
2003 | 0.9212 | 0.9460 | 3.1748 | 0.6660 | 12.0560 | 1.0205 |
2004 | 0.7834 | 0.4736 | 0.6469 | 0.4485 | 6.5862 | 0.8273 |
2) Averaging
Averaging is to first calculate the average value of the parent series and each sub-sequence, and then divide the data of each indicator by the average value of the corresponding column. The example of averaging is at the end! ! !
3. Calculate the absolute value difference
This step calculates the absolute value difference between the evaluated index sequence and the corresponding elements of the reference sequence. The reference sequence is generally the parent sequence after data normalization, that is, [ 1.00 , 0.95 , 0.80 , 0.92 , 0.78 ]. The specific operation is if there is m Columns, the last m-1 columns correspond to subtracting the first column, and then take the absolute value to obtain a table of m-1 columns.
0 | 0 | 0 | 0 | 0 |
0.0247 | 0.0617 | 0.1216 | 1.4645 | 0.2440 |
0.0057 | 0.6998 | 0.2554 | 0.3758 | 0.2310 |
0.0247 | 2.2536 | 0.2552 | 11.1348 | 0.0993 |
0.3093 | 0.1365 | 0.3349 | 5.8028 | 0.0438 |
Then take the minimum value a=0 and the maximum value b=11.1348 in this table
4. Calculate the gray correlation coefficient
For each data in Table 3, we collectively call this number x, then the calculation formula of the gray correlation coefficient is
, so we get the gray relational coefficient table ( generally 0.5 is taken for the resolution coefficient).
1 | 1 | 1 | 1 | 1 |
0.9956 | 0.9890 | 0.9786 | 0.7917 | 0.9580 |
0.9990 | 0.8883 | 0.9561 | 0.9368 | 0.9602 |
0.9956 | 0.7119 | 0.9562 | 0.3333 | 0.9825 |
0.9473 | 0.9761 | 0.9433 | 0.4896 | 0.9922 |
Then take the average value of each column, which is the correlation degree between the influencing factors of this column and the direct economic loss of the disaster
crop damage area | Earthquake Disaster Loss | marine disaster loss | Forest disaster loss | mudslide loss |
0.9875 | 0.9131 | 0.9668 | 0.7103 | 0.9786 |
PS: Meaning data demonstration
years | Disaster direct economic loss (100 million yuan) | Area of crops affected by disaster (thousand hectares) | Earthquake disaster losses (100 million yuan) | Marine disaster losses (100 million yuan) | Forest fire loss (100 million yuan) | Losses caused by mudslides (100 million yuan) |
2000 | 2045.3 | 34374 | 14.6792 | 120.9 | 0.3069 | 49.4201 |
2001 | 1942.2 | 31793 | 14.8449 | 100.1 | 0.7409 | 34.8699 |
2002 | 1637.2 | 27319 | 1.4774 | 65.9 | 0.361 | 50.974 |
2003 | 1884.2 | 32516 | 46.604 | 80.52 | 3.7 | 50.4325 |
2004 | 1602.3 | 16279 | 9.4959 | 54.22 | 2.0213 | 40.8828 |
Taking the mean for each column gives [1822.24, 28456, 17.42028, 84.328, 1.42602, 45.31586]
years | Disaster direct economic loss (100 million yuan) | Area of crops affected by disaster (thousand hectares) | Earthquake disaster losses (100 million yuan) | Marine disaster losses (100 million yuan) | Forest fire loss (100 million yuan) | Losses caused by mudslides (100 million yuan) |
2000 | 1.1224 | 1.2079 | 0.8427 | 1.4337 | 0.2152 | 1.0906 |
2001 | 1.0658 | 1.1173 | 0.8522 | 1.1870 | 0.5196 | 0.7695 |
2002 | 0.8985 | 0.9600 | 0.0848 | 0.7815 | 0.2532 | 1.1249 |
2003 | 1.0340 | 1.1427 | 2.6753 | 0.9548 | 2.5946 | 1.1129 |
2004 | 0.8793 | 0.5721 | 0.5451 | 0.6430 | 1.4174 | 0.9022 |
0.0855 | 0.2798 | 0.3113 | 0.9072 | 0.0318 |
0.0514 | 0.2137 | 0.1212 | 0.5463 | 0.2963 |
0.0616 | 0.8136 | 0.1170 | 0.6453 | 0.2264 |
0.1087 | 1.6413 | 0.0792 | 1.5606 | 0.0789 |
0.3072 | 0.3342 | 0.2363 | 0.5381 | 0.0229 |
0.9309 | 0.7665 | 0.7452 | 0.4882 | 0.9895 |
0.9672 | 0.8155 | 0.8956 | 0.6171 | 0.7552 |
0.9561 | 0.5161 | 0.8996 | 0.5754 | 0.8056 |
0.9077 | 0.3426 | 0.9374 | 0.3542 | 0.9377 |
0.7479 | 0.7304 | 0.7980 | 0.6208 | 1.0000 |
crop damage area | Earthquake Disaster Loss | marine disaster loss | Forest disaster loss | mudslide loss |
0.9020 | 0.6343 | 0.8552 | 0.5311 | 0.8976 |
Code
%灰色关联度分析
%以旅游收入为例 一共6行4列 行数代表2000-2005年 列数表示 国内生产总值 第一产业 第二产业 第三产业
%最终得到的结果代表三产业对国内生产总值的关联度
clear;clc;
load 灰色关联度分析-灾害直接经济损失.mat
[n,m]=size(X);
%初值化例子 灾害直接经济损失
%均值化例子 国内生产总值
%初值化和均值化选择一个就可以
%均值化
aveg=mean(X);%取每一列的平均值
for j=1:m
update_x(:,j)=X(:,j)./aveg(:,j);%每一列的数据除以这一列的平均值
end
%初值化
% for j=1:m
% update_x(:,j)=X(:,j)./X(1,j);%每一列的数据除以这一列的初始值
% end
for i=1:n
for j=2:m
temp_x(i,j-1)=abs(update_x(i,j)-update_x(i,1));%update_x第一列与后面几列的差值的绝对值
end
end
a=min(min(temp_x));%temp_x矩阵里的最小值
b=max(max(temp_x));%temp_x矩阵里的最大值
p=0.5;%分辨系数 一般是0.5
XX=(a+p*b)./(temp_x+p*b);
R=mean(XX,1)%灰色关联度