Matlab statistical analysis - correlation coefficient

Statistical Analysis - Correlation Coefficient

Correlation coefficient (pearson and spearman)

Pearson person correlation coefficient and Spearman spearman rank correlation coefficient, they can be used to measure the **(linear)** correlation between two variables, according to the different conditions that the data meet, we need to choose different correlation coefficients Perform calculations and analysis.

basic concept

  1. Overall : All the individuals of the object to be investigated are called the overall.
    We always hope to get some characteristics of the overall data (such as mean variance, etc.)
  2. Sample : A portion of individuals drawn from a population is called a sample of the population.
  3. Statistics : Calculate the statistics of these extracted samples to estimate the overall statistics.
    For example, use the sample mean and sample standard deviation to estimate the overall mean (mean level) and overall standard deviation (deviation degree)
  4. Correlation describes a straight-line correlation

Correlation coefficient calculation formula

insert image description here
insert image description here
insert image description here

error-prone

insert image description here

Summarize

(1) If the two variables themselves have a linear relationship, then the Pearson correlation coefficient with a large absolute value means a strong correlation, and a small one means a weak correlation; (2)
When the relationship between the two variables is uncertain, Even if the Pearson correlation coefficient is calculated and found to be very large, it cannot explain that the two variables are linearly related, or even that they are related. We must draw a scatter diagram to see it.

Correlation coefficient size

insert image description here

descriptive statistics

For the processing time of data of mathematical analysis type, a wave of descriptive statistics can be calculated.
insert image description here

%描述性统计量的计算
MIN = min(Test); % 每一列的最小值
MAX = max(Test); % 每一列的最大值
MEAN = mean(Test); % 每一列的均值
MEDIAN = median(Test); %每一列的中位数
SKEWNESS = skewness(Test); %每一列的偏度
KURTOSIS = kurtosis(Test); %每一列的峰度
STD = std(Test); % 每一列的标准差
RESULT = [MIN;MAX;MEAN;MEDIAN;SKEWNESS;KURTOSIS;STD]
%将这些统计量放到一个矩阵中表示
matlab corrcoef function

insert image description here

hypothetical test

insert image description here
insert image description here
insert image description here

insert image description here

P value judgment method

insert image description here

%% 计算各列之间的相关系数以及p值
[R,P] = corrcoef(Test)
% 在EXCEL表格中给数据右上角标上显著性符号吧
P < 0.01 % 标记3颗星的位置
(P < 0.05) .* (P > 0.01) % 标记2颗星的位置
(P < 0.1) .* (P > 0.05)  % 标记1颗星的位置

insert image description here

Test whether the data obey the normal distribution

  1. Use normplot() to simply fit the data and observe the data distribution.
    (If you find that the scatter points can basically be near the red straight line, it means that this set of data has a high possibility of conforming to the normal distribution. It can only be said that the possibility satisfies the normal distribution, but it still needs to be normalized by the lillietest function or jbtest
    function Goodness-of-fit tests for state distributions to illustrate the situation)
    insert image description here
%% 正态分布检验
m=[1006.1,1014,1001.6,996.4,997.8,981.6,996.4,991.9,993.3,1000.6,987.3,1015.6,981.6,996.2,999.2,994.5,1005.9,1001.9,986.4,1007.6,1001.4,1014.6,1010.2,993.9,1001.4]

normplot(m)

[H,P,LSTAT,CV] = lillietest(m,0.05)
[h,p,jbstat,critval] = jbtest(m,0.05)

%{
    
    
lillietest
H = 0
P = 0.5000
LSTAT = 0.1028
CV =  0.1730

jbtest
h= 0
p = 0.5000
jbstat = 0.3112
critval = 4.1494

H=0说明接受假设,该组数据符合正态分布;P=0.5说明符合正态分布的概率很大;
LSTAT小于接受假设的临界值0.173,因此接受假设。
(如果LSTAT大于接受假设的临界值0.173,因此不能接受假设,拒绝假设。)

[h,p]=lillietest(X)
返回值h:   只有01两种情况,h=0假设符合正态分布,h=1假设不符合正态分布
返回值p:   方差概率,也可以说事情的发生概率,p<0.05(显著性水平通常取0.05,还有0.0250.01三种情况)为不可能事件,拒绝;p>0.05,接受
参数X:     检测的数据
%}
%% 正态分布检验
% 正态分布的偏度和峰度
x = normrnd(2,3,100,1);   % 生成100*1的随机向量,每个元素是均值为2,标准差为3的正态分布
skewness(x)  %偏度 0.1387
kurtosis(x)  %峰度 3.0816
qqplot(x)

insert image description here

%% 正态分布JB检验
%{
    
    
    MATLAB中进行JB检验的语法:[h,p] = jbtest(x,alpha)
    当输出h等于1时,表示拒绝原假设;h等于0则代表不能拒绝原假设。
    alpha就是显著性水平,一般取0.05,此时置信水平为10.05=0.95
    x就是我们要检验的随机变量,注意这里的x只能是向量。
    检验值p,判断是否满足条件
%}
%% 正态分布检验
% 检验第一列数据是否为正态分布
[h,p] = jbtest(Test(:,1),0.05)
% 用循环检验所有列的数据
n_c = size(Test,2); % number of column 数据的列数
H = zeros(1,6); %6组数据
P = zeros(1,6);
for i = 1:n_c
    [h,p] = jbtest(Test(:,i),0.05);
    H(i)=h;
    P(i)=p;
end
disp(H)
disp(P)

Spearman coefficient

insert image description here
When using Matlab's built-in function to calculate the Spearman rank correlation coefficient, it is necessary to ensure that both X and Y are column vectors;

%% 斯皮尔曼相关系数
X = [3 8 4 7 2]'  % 一定要是列向量哦,一撇'表示求转置
Y = [5 10 9 10 6]'
% 第一种计算方法
1-6*(1+0.25+0.25+1)/5/24

% 第二种计算方法
coeff = corr(X , Y , 'type' , 'Spearman')
% 等价于:
RX = [2 5 3 4 1]
RY = [1 4.5 3 4.5 2]
R = corrcoef(RX,RY)

% 计算矩阵各列的斯皮尔曼相关系数
R = corr(Test, 'type' , 'Spearman') %通用计算公式

Applicable range of correlation coefficient

  1. Continuous data, normal distribution, linear relationship, using pearson correlation coefficient. If the above conditions are not met, the spearman correlation coefficient is used. (The spearman correlation coefficient uses a relatively large range)
  2. The correlation coefficient describes the linear relationship between variables, and the size of the coefficient does not necessarily explain the
    insert image description here

Guess you like

Origin blog.csdn.net/weixin_43599390/article/details/131358251