【Mathematical Modeling】Statistical Analysis Methods

1. Regression Analysis

  • The amount of data needs to be large, and the larger the total sample size n, the better --> to ensure better fitting effect and better prediction effect
    • Generally n>40/45 is better
  • method
    • Establish regression model yi=β0+β1i+……+βkxki+εi
      insert image description here
    • Write down the estimated formula
    • Bring the data in to find the regression coefficient [Find out what β^ is by least squares estimation]
    • [Part] Test whether the regression coefficients β1, β2... βk are 0, and the coefficient is significant, indicating that the independent variable x is significant
      • If βi=0, it means that the regression equation is not affected by xi, and the regression equation is simplified
    • [Overall] Test the regression equation
      • 0<r^2<=1 [The closer R2 is to 1, the better the model, and the smaller R2 is definitely not good]
      • variance analysis
      • Sig [p value] the smaller the better Sig<0.01 is more significant
    • Fortune-telling

εi is generally iid, which means independent and identically distributed, ~N(σ, σ^2)
insert image description here

2. Logistic regression

  • Dependent variables are attribute variables, categorical variables, at least one variable is continuous
  • Modelinsert image description here

insert image description here
insert image description here

3. Cluster Analysis

  • Systematic clustering method [in the case of small samples]
    continuously reduces the number of classes, and the selection criteria are not unique
    • Cluster the samples
    • Cluster the variables

4. Discriminant analysis

  • Selection criteria for unique, supervised learning
    insert image description here
    insert image description here

5. Principal Component Analysis

  • Purpose: Dimensionality reduction! ! make variable reduction

insert image description hereinsert image description here

  • Take out a part of the principal components [example: y1, y2, y3]
  • Regression on y with the extracted principal components
  • Coefficient factors a1, a2, a3 before estimating y1, y2, y3
  • Interpretation of the coefficients in the principal components of the independent variable x is a relatively large proportion [multiple use of factor analysis]

6. Factor analysis

insert image description here

  • Most of the elements of the correlation matrix are >0.3, and the correlation coefficient is large
  • Write out the factor model, then analyze

7. Correspondence Analysis

  • Both horizontal and vertical coordinates are regarded as categorical variables

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/m0_73612212/article/details/131715302