[Digital Analog] Typical correlation analysis

Introduction to canonical correlation analysis

  • Uses: A multivariate statistical method to study the correlation between two sets of variables (each set of variables may have multiple indicators) (revealing the intrinsic relationship between the two sets of variables).

Canonical correlation analysis steps

  • ① Find the linear combination of variables in each group of variables so that the linear combination of the two groups has the largest correlation coefficient;
  • ② Select linear combinations that are unrelated to the initially selected pair of linear combinations, pair them, and select the pair with the largest correlation coefficient;
  • ③ Repeat this until the correlation between the two sets of variables is completed in advance
    • Determine whether the extraction is complete: Hypothesis testing can be performed until the correlation coefficient is not significant
  • Concept introduction
    • Canonical variables: selected linear combination pairs
    • Canonical correlation coefficient: The correlation coefficient of canonical variables. ⇒ Measures the strength of the relationship between two sets of variables.

Example demonstration: key steps of typical correlation analysis

  • (1)The distribution of data has assumptions: the two sets of data obey a joint normal distribution.

    • It can be stated directly in the paper that the two sets of data conform to the (joint) normal distribution
      Insert image description here
  • (2) First, test the correlation between the two sets of variables (construct the likelihood ratio statistic).

    • A p value less than 0.05 (0.1) indicates that the null hypothesis is rejected at the 95% (90%) confidence level, that is, the two sets of variables are considered related.
    • This test is not necessary, because the canonical correlation coefficient needs to be tested later (if the first canonical correlation coefficient is tested to be significant, the above conclusion can be reached)
      Insert image description here
  • (3)Determine the number of typical correlated variables(Just look at the P value corresponding to the typical correlation coefficient
    Insert image description here

  • (4) Use standardized canonical correlation variables to analyze the problem
    Insert image description here

  • (5) Carry outTypical load analysis

    • Typical load analysis reflects the correlation of data (reflecting the correlation of comprehensive indicators and each indicator)
      Insert image description here
  • (6) Calculate the contribution of the first r typical variables to the total variance of the sample
    Insert image description here

  • SPSS specific operationsare as follows


Specific examples

Question analysis

  • What is the relationship between exploring the opinions of viewers and industry insiders on some TV programs?
    • The first set of variables (audience ratings): low-educated (led), high-educated (hed) and network (net) surveys;
    • The second set of variables (rated by industry insiders): artists (arti) including actors and directors, distribution (com) and heads of various departments in the industry (man)
      Insert image description here
  • Idea: Directly analyzing variables in pairs, it is difficult to get a clear impression of the relationship between these two sets of variables (audience and industry insiders). ⇒ Convert the correlation between multiple variables into the correlation between two representative variables.
    • Selection of representatives: Can measure the internal rules of the group in a more comprehensive and comprehensive way. ⇒The simplest comprehensive form of a set of variables is a linear combination of the set of variables.

SPSS operation steps

  • Note:Spss requires at least version 24 (lower versions cannot directly perform typical correlation analysis operations and require programming)

  • Step 1: Import data from excel to spssInsert image description here

  • Step 2: Check the type of data (all set to "Scale" here)

    • Scale: Numeric scalar (such as height, weight, etc.)
    • Ordered: ordered categorical variables (such as A, B, C, D, good, bad, etc.)
    • Nominal: Unordered categorical indicators (such as men and women, etc.)
      Insert image description here
  • Step 3: Click the menu function and select typical correlations.

  • Step 4: Move the data to the corresponding collection.

    • Note that you need to install Python first to run
      Step 3 and 4
  • Step 5: Export analysis results
    Insert image description here

  • Step 6: Analyze the results

    • ①Instructions for this step (explanation of this step does not need to be written in a paper):
      • To include the figure in the paper, the following five headers need to be modified
        Insert image description here
      • Through p-value comparison, it can be seen that only the first row of data is valuable (available) ⇒ Obtainedcanonical correlation coefficient
        Insert image description here
      • After obtaining the canonical correlation coefficient, findcanonically correlated variables
        • Non-standardized data will be affected by dimensions, so the data must be standardized before use.
    • ②There are three specific links a~c (the part that needs to be written in the paper)
      Insert image description here
  • Step 7: Selectively analyzetypical loadingsandvariance explained .
    Insert image description here


Typical load analysis

Typical load

  • Definition: Canonical loading analysis refers to the analysis of the correlation between original variables and canonical variables.

    • In step 6 above, the standardized typical correlation variables are obtained, and their sum can be judged by the absolute value of the standardized data. The magnitude of the correlation between canonical variables. However, this method is not very strict. The strict method is to calculate the correlation, which can be achieved through typical load analysis.
  • Typical load analysis examples:
    Insert image description here

cross loading

  • Cross loading is rarely used, and generally only typical loads are used to analyze the interior of the set.
    Insert image description here

Typical redundancy analysis (rarely used)

  • Typical redundancy analysis: Calculate the variance ratio of each of the three typical variables to see whether the three explanations are strong or not.
    Insert image description here

After-school exercises

  • We want to explore what is the relationship between the opinions of viewers and industry insiders on some TV programs? Use canonical correlation analysis to complete this question and write a short paper.
    • Audience ratings come from three types of surveys: low-educated (led), high-educated (hed) and network (net) surveys, which form the first group of variables;
    • The ratings from industry insiders come from three categories: artists (arti) including actors and directors, distributors (com), and heads of various departments in the industry (man), forming the second group of variables.
  • Read the article "Evaluation of Wine, the First Prize Essay of Mathematical Modeling A in 2012"
  • Other video explanations:SPSS typical correlation analysis

postscript

Guess you like

Origin blog.csdn.net/SHIE_Ww/article/details/129204305