Mathematical modeling --- Pearson correlation coefficient

Correlation coefficient

  1. Pearson person correlation coefficient— A linear correlation coefficient
  2. Spearman rank correlation coefficient

Measure the size of the correlation between two variables, and select different correlation coefficients for calculation and analysis according to different conditions met by the data
Insert picture description here

Choice of correlation coefficient

Insert picture description here

Pearson correlation coefficient

1. Overall Pearson correlation coefficient

Insert picture description here
Insert picture description here

  • The Pearson correlation coefficient reflectsA linear correlation coefficient

The size of the covariance is related to the dimensions of the two variables, so it is not suitable for comparison.
Pearson's correlation coefficient can be regarded as the standardized covariance

2. Sample Pearson correlation coefficient

Insert picture description here

  • Sample correlation coefficient, sample standard difference mother is n − 1 n-1n1 because of unbiased estimators
3. Misunderstanding of Pearson's correlation coefficient

Insert picture description here

  • Be sure to draw a scatter plot first, which shows the linear relationship (that is, determine the linear relationship first) before you can use the Pearson correlation coefficient

Insert picture description here
which is:
Insert picture description here

Pearson's correlation coefficient hypothesis test conditions

Insert picture description here
It must be judged whether the data is normally distributed, here is the judgment method

Descriptive statistics

matlab

Insert picture description here

Excel

Insert picture description here

SPSS

Insert picture description here

The process of finding the correlation coefficient

Pearson Correlation Coefficient-a linear correlation coefficient

Determine whether the data is normally distributed

It must be judged whether the data is normally distributed, here is the judgment method

Draw a scatter plot for multiple indicators

When there are multiple indicators, you needDraw a scatter plot of pairwise indicators, It is more convenient to use SPSS to check whether it is linearly related
through the scatter plot of each two indicators (that is, to check whether the Pearson correlation coefficient can be used)

  • Operation in SPSS
    Import data --> Graphics --> Old dialog box --> Scatter plot/dot plot --> Matrix scatter plot
    Insert picture description here
Find Pearson's correlation coefficient
  • Operation in matlab — corrcoef function
[R,P] = corrcoef(A)

Take each column in A as a set of data
R: Return the correlation coefficient matrix of A

P: PP for each correlation coefficient
A certain column of P valueA represents a certain index of the sample
A certain row represents a sample

corrcoef(A,B)

Returns the coefficient between two random variables AB

Visualize the correlation coefficient
  • Use Excel
    Insert picture description here
    for example:Insert picture description here
Significantly mark the correlation coefficient table

1. In matlab:

  1. Find the probability density value
tpdf(x,n)

tpdf: find the probability density value of t distributed at the point x and the degree of freedom n
x: find the interval from negative infinity to x for the specified
n: the degree of freedom
For example:

x = -4:0.1:4;
y = tpdf(x,28);
plot(x,y);

Insert picture description here
Probability is the area of ​​the probability density plot and the x-axis

  1. Find the x corresponding to the p value (p is the area in the probability density graph)
x = tinv(p,n)

tinv: represents the cumulative density function (cdf) of the t distributionInverse function
p: is the area from negative infinity -> point x in the probability density graph
n: is the degree of freedom
For example:

x = tinv(0.975,28)    % x = 2.0484

The corresponding x is
obtained from − ∞ → x -\infty \to xx , so that the degree of freedom of t distribution is 28 and p is 0.975

  1. Find the cumulative density p
    , which is the area enclosed by the x-axis
p = tcdf(x,n)

tcdf: cumulative density function
p: from − ∞ → x -\infty \to x in the probability density graphThe area of x — that is the probability
n: degrees of freedom

  1. Use the P value in probability theory and mathematical statistics
    Two methods:
  • Method 1
[R,P] = corrcoef(A)  % P为该处双侧检验的P值 
  • Method 2 should
    be a one-sided test:
    ∴ \therefore∴As P = 1 - tcdf(x,n)
    a two-sided test:
    ∴ \therefore P = (1 - tcdf(x,n)) * 2
  1. The P value needs to be compared with the significance level
    1. Self-marking:
    Rejected | Unable to reject
    -------- | -----
    P <0.01 means rejecting the null hypothesis at the 99% confidence level P<0.01 means Reject the null hypothesis at the 99\% confidence levelP<0 . 0 . 1 said Description in . 9 . 9 % opposing channel water level of the repellent insulating original prosthesis set |P> 0.01 description on the 99% confidence level can not reject the null hypothesis P> 0.01 description 99 \% confidence level can not reject the null hypothesisP>0 . 0 . 1 said Description in . 9 . 9 % opposing channel water level of the non- law -repellent insulating original prosthesis set
    P <0.05 P described reject the null hypothesis at the 95% confidence level <0.05 described reject the null hypothesis at the 95 \% confidence levelP<0 . 0 . 5 , said Description in . 9 . 5 % opposing channel water level of the repellent insulating original prosthesis set |P> 0.05 description on the 95% confidence level can not reject the null hypothesis P> 0.05 description 95 \% confidence level can not reject the null hypothesisP>0 . 0 . 5 , said Description in . 9 . 5 % opposing channel water level of the non- law -repellent insulating original prosthesis set
    P <0.10 P described null hypothesis is rejected at 90% confidence level <0.10 described null hypothesis is rejected at 90 \% confidence levelP<0 . . 1 0 said Description in . 9 0 % opposing channel water level of the repellent insulating original prosthesis set |P> 0.10 explained on 90% confidence level can not reject the null hypothesis P> 0.10 description 90 \% confidence level we can not reject the null hypothesisP>0 . . 1 0 said Description in . 9 0 % opposing channel water level of the non- law -repellent insulating original prosthesis provided
  • Mark the correlation coefficient table
    when P <0.01 P <0.01P<0 . 0 1 marked* * * * * *
    P < 0.05 a n d P > 0.01 P<0.05 and P>0.01 P<0.05andP>0 . 0 1 marked* * *
    P < 0.10 a n d P > 0.05 P<0.10 and P>0.05 P<0.10andP>0 . 0 5 marked* *

Insert picture description here

2. SPSS :

Analysis -> Correlation -> Bivariate
Insert picture description here

  • Two-tailed/single-tailed: two-sided inspection/one-sided inspection
  • Mark the significance correlation: carry out the significance mark

Insert picture description here


Reference: Mathematical Modeling Breeze Video

Guess you like

Origin blog.csdn.net/qq_43779658/article/details/107748177