Binary logit regression analysis

case data

A classmate wants to study whether college students manage their finances, or their financial situation, and formulate a questionnaire, which includes some basic information topics such as: gender, age, living expenses, etc. As well as financial awareness, financial status and so on. Now he wants to use the collected data to analyze the relationship between the age, gender and cost of living of college students and whether they are willing to buy financial products. Some of the data collected are as follows:

analyse problem

In fact, analyzing the relationship between the age, gender and living expenses of college students and whether they are willing to buy financial products is actually to judge the influence of college students’ age, gender and living expenses on whether they are willing to buy financial products. We can use regression analysis, because whether they are willing to buy financial products The product is nothing more than "yes" and "no" as binary classification variables, so here we can judge that the analysis model used is binary logit regression. Therefore, binary logit regression is performed with "willingness to buy financial products as the dependent variable" and "age", "gender" and "living expenses" as independent variables. Before analysis, the data needs to be preprocessed to make the data format meet the analysis requirements.

Analysis preprocessing

The dependent variable in binary logit regression needs to meet the binary classification variables and can only be 0 and 1. If the dependent variable does not meet the requirements, it can be encoded by [Data Coding] in [Data Processing] in SPSSAU.

dependent variable

Using SPSSAU data encoding is processed as follows:

In the encoded data, 0 means "no" and 1 means "yes".

independent variable

Although binary logit regression has no requirements for the data type of the independent variable, if the independent variable is a definite variable, it is theoretically necessary to perform dummy variable processing, but it is not necessary and needs to be processed in combination with the analysis, due to the "gender" in the example It is a binary variable, which needs to be treated as a dummy variable and "male gender" is used as a reference item for analysis. Use the "dummy variable" of the generated variable in SPSSAU to process it as follows:

Univariate analysis

After processing the dependent variable and the independent variable, single factor analysis can be performed. Although this step is not necessary, it can be used to explore the relationship between the independent variable and the dependent variable. If no effect is found through single factor analysis, the subsequent regression In the analysis, it is found that there is an impact, so the situation of the data should be checked at this time to avoid other problems. Because the independent variable has both definite variables and quantitative variables, different methods are used for analysis. Since the dependent variable is a binary variable, so Analysis was performed using chi-square test and t-test.

chi square test

Using chi-square test to study the relationship between "willing to buy" financial products and "gender", the results are as follows:

The above table shows the results of the chi-square test. It can be seen that the proportion of people who are willing to buy financial products in the study is larger, and the proportion of girls who buy financial products is about 67%, and boys who are unwilling to buy financial products The proportion is larger, accounting for about 65%. Finally, through the chi-square test, it is found that the chi-square value in this test is 52.594, and the p-value is far less than 0.05, which is significant, so it shows that gender has a significant effect on whether you are willing to buy financial products Influence. Next, the other two variables were investigated using t-tests.

t test

Since the dependent variable is a dichotomous variable and the group is 2, the independent sample t-test is used to study the relationship between "willing to buy" financial products and "age" and "living cost".

"Willing to Buy" & "Age"

It can be seen from the above table that the average age of those unwilling to buy financial products is 21, and the average age of purchasing financial products is 23. Because the age range of college students in the survey is 19-25, there is a big difference between 21 and 23 , through the t test, it is also found that the t value is -15.848, and the p value is far less than 0.05, so age has an impact on whether you are willing to buy financial products.

"Would you like to buy" & "living expenses"

It can be seen from the above table that the average living expenses of those who are unwilling to buy wealth management products is about 1312 yuan, and the average living expenses of those who are willing to buy wealth management products is about 2026 yuan. It is found that the t value is -38.377, and the p value is much less than 0.05, so the cost of living has an impact on the willingness to buy financial products.

Analysis of binary logit results

Model validity check

First, check the likelihood ratio test results of the model, and find that the p value is less than 0.05, indicating that the model is statistically significant as a whole, that is, at least one independent variable has a predictive effect. And through the regression analysis results, it can be seen that

Regression Analysis Results

Through the above analysis, finally take "whether to buy" as the dependent variable, and "gender female", "living expenses" and "age" as independent variables to perform binary logit regression, and choose [stepwise method], the results are as follows:

It can be seen from the above table that the p-values ​​of the analysis items "female gender", "age" and "living expenses" are all less than 0.05, which means that all of them have an impact on the dependent variable "willing to buy" financial products and the regression coefficients are all greater than 0. for positive impact. Among them, the OR value of "female gender" is 4.118, which means that women's willingness to buy financial products is 4.118 times that of male samples. Other variables and so on. In addition, SPSSAU also provides model formulas and model predictions, Hosmer-Lemeshow fit test, etc., because the example mainly studies whether it has an impact, so I won't go into details here.

Summarize

The relationship between the age, gender and living expenses of college students and their willingness to buy financial products is analyzed by binary logit regression. Since the binary logit regression requires the dependent variable, the dependent variable is preprocessed before the analysis, and the independent variable (categorical variable) is treated as a dummy variable. Before the formal analysis, the data were analyzed by single factor analysis, the purpose was to explore the relationship between the independent variable and the dependent variable, and all of them were found to be significant, and then the binary logit regression analysis was carried out to find that the model construction was effective, and "gender female", "age " and "cost of living" have a positive effect on the dependent variable, and in the example, women's willingness to buy financial products is 4.118 times that of male samples. Analysis complete.

Guess you like

Origin blog.csdn.net/m0_37228052/article/details/129821664