The concept of each correlation coefficient

I recently read "Quantitative Research and Statistical Analysis: SPSS (PASW) Data Analysis Example Analysis". I originally wanted to see the content of the moderating variable. Later I saw the part of the correlation relationship. I originally thought that the correlation should be nothing. After reading it, I have deepened a lot of understanding. There are many things that I didn’t understand before, or there is no such a systematic summary. Next, I mainly sort out the concepts of various correlation coefficients, including continuous variables, ordinal variables, and categorical variables. Corresponding relationship. Due to laziness, a lot of copies of the book directly, add some my understanding!

1. Net correlation and partial correlation

Insert picture description here
Insert picture description here
Net correlation means that you want to study the correlation between x1 and x2, but x1 and x2 may be related to other variables (x3, x4···) at the same time, so it is not reasonable to only consider the correlation between x1 and x2, because They are all affected by other variables at the same time. If the influence of other variables is excluded, the correlation between x1 and x2 is considered as net correlation. Partial correlation is to exclude the influence of some other variables. In fact, I didn't understand the usefulness of this part of correlation. It seems that I have never used this in my data analysis process.

2. Spearman correlation

Insert picture description here
For ordinal variables, you can use this, that is, only the order in the data is used, and the addition, subtraction, multiplication, and division between the data is not used, but because the Pearson correlation coefficient is less considered a lot of data relationships, if the Pearson correlation coefficient If applicable, the Pearson correlation coefficient is preferred, but according to my experience, the two will be very similar, so it is not a problem to always use the spearman correlation coefficient if it is robust.

3. Point two series related

Insert picture description here
I haven't seen this part before. I saw someone make a pearson correlation coefficient for a continuous variable and a binary variable. I still think his foundation is not solid. Now it seems that I am ignorant, hhh, so ashamed. However, when explaining this point two series of correlations, it cannot be said that the closer to 1 is the more positive correlation, the closer to -1 the more negative correlation, because the binary variable has no size, and 1 and 0 are defined by themselves, so pay attention to the interpretation. A little bit. The closer the absolute value of the point two series correlation is to 1, the more relevant the dichotomous categorical variable and the continuous variable are, but how the correlation should be explained in conjunction with the data.

4.eta coefficient

Insert picture description here
Insert picture description here
It is used to judge the strength of the association between a categorical variable and a continuous variable. Since one variable is a categorical variable, there is no such thing as a positive or negative correlation. It can only be said that the correlation between the two is strong or weak. Specifically, it is also defined by the percentage of intra-group variation in the total variation. It's still easy to understand.

5. Column contact number

For two multi-categorical variables, cross-analysis can be used to analyze, and the column connection number can be extracted to characterize the degree of association between the two multi-categorical variables. For specific introduction, please refer to this link: Contingency analysis .

6. Bibliography

The pictures are all from
Insert picture description here
So lazy, so lazy, hahaha

Guess you like

Origin blog.csdn.net/qq_39805362/article/details/105605596