SPSS--s04 canonical correlation analysis

Typical related basic principles

 Canonical correlation analysis is a further development of principal component analysis and factor analysis. It studies the interdependence between two groups of variables  and changes the interrelationship between two groups of variables into the study of the correlation between two new variables without changing the relationship between them. Abandoning the information of the original variables, the two new variables are composed of linear combinations of the first group of variables and the second group of variables, and the number of the two groups of variables can be different, and the content represented by the two groups of variables can also be different.

For example, different dosage forms, doses, routes of administration, and time of administration of a certain drug are one type of factors, while the reactions to various systems of the human body (nervous system, circulatory system, respiratory system, digestive system, etc.) after administration are: Another factor, in this case, is the need to analyze the correlation between "treatment" and "effect" from the two groups as a whole.

Compared with simple correlation analysis, canonical correlation analysis may sometimes be affected by some factors, and what it reflects is superficial rather than essential connection, and sometimes it is even a false appearance. Therefore, typicality correlation analysis has its unique role in correlation analysis.

Linear Combinations of Canonical Correlation Analysis

Assume two sets of variables X1, X2,...,Xp and Y1, Y2,...,Yp, then their linear combination can be expressed as:

This linear combination is called the first canonical correlation variable, and can also be extended to the general case, that is, the i-th (i ≧ 1) canonical correlation variable.

Conditions to be met for typical correlation analysis

Canonical correlation analysis is performed on the premise that the original data meets certain conditions and assumptions. These conditions include that the original variables must obey a multivariate normal distribution , and  the sample size must be at least greater than the number of original variables (generally 10 to 20 times the number of variables). ) , these assumptions include that there must be correlation between the two groups of variables , that typical variables can be synthesized from each group of original variables, that is, there must be a certain correlation within the original variable group , etc. If these conditions and assumptions cannot be met, canonical correlation analysis cannot be performed.

General steps of canonical correlation analysis

case analysis

To explore the correlation between growth and development indicators and physical fitness of primary school students, a city conducted a survey on the physical fitness of primary school students. Four growth and development indicators of 84 10-year-old boys: vital capacity (L), height (cm), weight (Kg), chest circumference (cm) and four indicators reflecting physical fitness: 50m run (s), high jump (cm) ), long jump (m), and medicine ball throw (m) for typical correlation analysis. (Data source: Medical Statistics 4th Edition, edited by Sun Zhenqiu and others)

Data view

variable view

Teach you step by step

There is no corresponding menu operation in the statistical software SPSS23.0 and below, so it needs to be completed using syntax.

[1] Create a new grammar: click "File" "New" "Grammar"

[2] The following interface pops up, enter the syntax code:

INCLUDE ' C:\ProgramFiles\IBM\SPSS\Statistics\22\Samples\English\Canonical correlation . sps '.     This statement is the installation location of '' Canonical correlation.sps" . You need to enter the location of your own file.

CANCORR SET1 = x1 x2 x3  x4  / SET2 = y1 y2 y3 y4 . 

[3] After entering the syntax, click "Run"

Result analysis

The correlation coefficient between variables x1 x2 x3 x4; the correlation coefficient between variables y1 y2 y3 y4;

Correlation coefficient between two sets of variables

The first column is the typical correlation coefficients, which are λ1=0.871, λ2=0.312, λ3=0.164, λ4=0.053; the second column is the test of each typical correlation coefficient. From the results, it can be seen that the first typical correlation coefficient is in α There is statistical significance when =0.05;

Standardized U-Canonical Correlation Variables vs. Unstandardized (Original) U-Canonical Correlation Variables

Standardized V canonical correlation variables vs. unstandardized (original) V canonical correlation variables

The standardized first canonical correlation variable can be expressed as:

U1=-0.099X1-0.462X2-0.066X3-0.525X4

V1=0.176Y1-0.791Y2-0.153Y3-0.059Y4

In the same way, other typical correlation variables can be written

From the above expression, it can be seen that U1 is mainly affected by X2 (height) and X4 (chest circumference); V1 is mainly affected by Y2 (high jump) and Y1 (50m run); in addition, through the typical The positive or negative correlation coefficient can determine the positive or negative correlation between variable X and variable Y. Taking variable Y1 as an example, each variable in U1 is negatively correlated with Y1.

Guess you like

Origin blog.csdn.net/m0_72494332/article/details/132589663