Article directory
1. Regression Analysis
- The amount of data needs to be large, and the larger the total sample size n, the better --> to ensure better fitting effect and better prediction effect
- Generally n>40/45 is better
- method
- Establish regression model yi=β0+β1i+……+βkxki+εi
- Write down the estimated formula
- Bring the data in to find the regression coefficient [Find out what β^ is by least squares estimation]
- [Part] Test whether the regression coefficients β1, β2... βk are 0, and the coefficient is significant, indicating that the independent variable x is significant
- If βi=0, it means that the regression equation is not affected by xi, and the regression equation is simplified
- [Overall] Test the regression equation
- 0<r^2<=1 [The closer R2 is to 1, the better the model, and the smaller R2 is definitely not good]
- variance analysis
- Sig [p value] the smaller the better Sig<0.01 is more significant
- Fortune-telling
- Establish regression model yi=β0+β1i+……+βkxki+εi
εi is generally iid, which means independent and identically distributed, ~N(σ, σ^2)
2. Logistic regression
- Dependent variables are attribute variables, categorical variables, at least one variable is continuous
- Model
3. Cluster Analysis
- Systematic clustering method [in the case of small samples]
continuously reduces the number of classes, and the selection criteria are not unique- Cluster the samples
- Cluster the variables
4. Discriminant analysis
- Selection criteria for unique, supervised learning
5. Principal Component Analysis
- Purpose: Dimensionality reduction! ! make variable reduction
- Take out a part of the principal components [example: y1, y2, y3]
- Regression on y with the extracted principal components
- Coefficient factors a1, a2, a3 before estimating y1, y2, y3
- Interpretation of the coefficients in the principal components of the independent variable x is a relatively large proportion [multiple use of factor analysis]
6. Factor analysis
- Most of the elements of the correlation matrix are >0.3, and the correlation coefficient is large
- Write out the factor model, then analyze
7. Correspondence Analysis
- Both horizontal and vertical coordinates are regarded as categorical variables