Mathematical Modeling - Classification Models

                   This lecture will introduce classification models. For the classification model, we will introduce two classification algorithms, logistic regression (logistic regression) and Fisher linear discriminant analysis; for the multi-classification model, we will briefly introduce the operation steps of multi-class linear discriminant analysis and multi-class logistic regression in Spss. .

An example of classifying fruits in this question  

 

 

 

Idea: Logistic regression to the original phenomenon

  1. set dummy variable y 
  2. Regression is performed, and the estimated y-hat is closer to the dummy variable, then it is classified as that.

 

Eg: Let 1 apple, 2 oranges, if y is close to 1, it is an apple, and if it is close to 0, it is an orange

Data preprocessing to generate dummy variables

Independent variable mass weight, width fruit width, height fruit height, color_score color (0-1)

Dependent variable: fruit_name fruit name

Generate dummy variable operation: Transform -> Create dummy variable

3. Logistic regression: 

 

4. Build a model:

 It is not difficult to see that there is a correlation between u and x, so there is endogenousness, which leads to inaccurate data, so it needs to be improved.

 A solution to endogeneity: the two-point distribution

How to get the connection function 

These two formulas are obtained from the figure. Both models are consistent with x belonging to (-∞, +∞) and y belonging to (0, 1). 

 

How to solve it?

 

Substitute the independent variable into the formula to get y and compare it with 0.5 (the comparison of 0.5 in this question is a fruit case)

Maximum likelihood estimation can estimate the rough B_hat and then deduce y_hat for the final prediction.

How is it used for classification?

Here we choose the second equation e^X/1+e^x 

 SPSS solves binary logistic regression: 

 

 

Logistic regression coefficient table: 

 

What if the independent variable has a categorical variable?

 

What about poor forecasts? 

 

 

Negative impact:

Increasing the squared independent variable too much makes the fitted line fit the sample data exactly, causing the predicted data to not fit.

 

How to determine the appropriate model? (It not only makes the sample data consistent, but also makes the forecast data more reliable) 

 

Here we remove three apples and oranges for comparison

 

Fisher's Linear Judgment Analysis 

 

The core problem: find the coefficient vector w 

 

SPSS operation: 

 

 

 

Multi-category problem: 

 

Fisher Judgment Multi-Classification

1. Set the number of categories

 2. Summary table

 

3. Saving: predicting group members + group member probability

 

Fisher multi-classification discriminant results: 

 

Logistic multi-class discrimination:

 

Spss operation:

Analysis -> Regression -> Multivariate Logistic

 

Statistics: Select the category and the rest to see if you need to choose

 Save options: estimate response probabilities, predict classes.

 

result: 

 

after class homework:

 

answer:

In order to facilitate multi-class classification, we need to customize the name of the category, such as 1 for Iris versicolor, 2 for Iris Iris, and 3 for Iris Virginia.

 

 The blogger chose Logistic multivariate classification:

But in order to prevent the inaccuracy of sample data or prediction data, we divide the data into training group and test group, and finally get the classification results.

 

forecast result:

 

Guess you like

Origin blog.csdn.net/weixin_73612682/article/details/132131155