Table of contents
1. Overview of gray correlation analysis
The basic idea of gray correlation analysis:
2. Application examples of gray correlation analysis
1. Determine the analysis sequence
1. Overview of gray correlation analysis
Whena system is composed of multiple factors, we usually want to knowwhich ones What are the main factors , which are secondary factors ; which factors are important to the development of the system Which factors have a big impact, and which factors have a small impact on the development of the system; which factors promote the development of the system and need to be strengthened. What factors hinder system development and need to be suppressed
Regression analysis, analysis of variance, principal component analysis in mathematical statistics all have shortcomings:
- A large amount of data is required. If the amount of data is small, it will be difficult to find statistical patterns.
- The sample is required to obey a typical probability distribution, and there is a linear relationship between the data of each factor and the system characteristic data, and the factors are required to be independent of each other.
- Large amount of calculation
Advantages of Gray correlation analysis: It is equally applicable to the size of the sample and whether the sample is regular or not, and the calculation amount is small and very convenient.
The basic idea of gray correlation analysis:
being judged whether they are closely connected according to the degree of similarity of the geometric shapes of the sequence curves . The closer the curves are, the greater the correlation between adjacent sequences, and vice versa.
2. Application examples of gray correlation analysis
Example 1:
1. Determine the analysis sequence
- Parent sequence: a data sequence that can reflect the behavioral characteristics of the system. (Similar to the dependent variable) In this example, the parent series is GDP
- Subsequence: A data sequence composed of factors that affect system behavior. (Similar to the independent variable), recorded here as
2. Preprocess variables
First find the mean value of each indicator, and then divide each element in the indicator by its mean value
3. Calculate the correlation coefficient between each indicator in the subsequence and the parent sequence
Subtract the subsequence from the parent sequence and take the absolute value. Find the final minimum value and record it as a and the maximum value as b< /span>
Definition:
: Resolution coefficient (generally taken as 0.5)
The calculation results are as follows:
4. Calculate the gray correlation degree of and
Find:
Therefore, we can conclude that the region’s GDP between 2000 and 2005 was most affected by the tertiary industry.
3. MATLAB implementation
clear;clc
load gdp.mat % 导入数据 一个6*4的矩阵
Mean = mean(gdp); % 求出每一列的均值以供后续的数据预处理
gdp = gdp ./ repmat(Mean,size(gdp,1),1);
disp('预处理后的矩阵为:'); disp(gdp)
Y = gdp(:,1); % 母序列
X = gdp(:,2:end); % 子序列
absX0_Xi = abs(X - repmat(Y,1,size(X,2))) % 计算|X0-Xi|矩阵(在这里我们把X0定义为了Y)
a = min(min(absX0_Xi)) % 计算两级最小差a
b = max(max(absX0_Xi)) % 计算两级最大差b
rho = 0.5; % 分辨系数取0.5
gamma = (a+rho*b) ./ (absX0_Xi + rho*b) % 计算子序列中各个指标与母序列的关联系数
disp('子序列中各个指标的灰色关联度分别为:')
disp(mean(gamma))