2023 "Shenzhen Cup" Topic A Full Nanny Thesis Explanation of the Three Eastern Provinces

Analysis of Factors Affecting the Health of Urban Residents in Question A 

Chronic non-communicable diseases (hereinafter referred to as chronic diseases) represented by cardiovascular and cerebrovascular diseases, diabetes, malignant tumors and chronic obstructive pulmonary disease have become important issues affecting the health of Chinese residents. As people's lifestyles change, the prevalence of chronic diseases continues to rise. As we all know, health status is closely related to age, eating habits, physical activity, occupation and so on. How to achieve the purpose of promoting good health by reasonably arranging meals, moderate physical exercise, and practicing a healthy lifestyle is a common concern of the whole society. Attachment A1 is a survey questionnaire on "Chronic Non-communicable Diseases and Their Related Influencing Factors Epidemiology" conducted by a city's health research department for some residents. Attachment A2 is the corresponding survey data results. Eight guidelines proposed for a balanced resident diet in the revised Dietary Guidelines for Chinese Residents.

Overall display diagram:

You can look at my table of contents and you will know that one of my solutions to this question is very clear. It is not the various water papers on the market, and even the table of contents is not logical.

The following is an overview of the thesis, as you can see, the four chapters 5-8 here correspond to the four sub-topics in the title

Question 1 Refer to Appendix A3, analyze the rationality of residents' eating habits in Appendix A2, and explain the main problems.

For question 1, we need to refer to the attached supplier to analyze the eating habits and rationality of the residents in appendix 2, and explain the main problems here. By understanding the 8 criteria in appendix A3, we can extract the relevant indicators in appendix 2 that can reflect the relevant indicators of appendix 3. The data is used to measure the rationality of residents' eating habits, and then descriptive analysis is performed on each indicator to draw charts to illustrate the gap between residents' eating habits and the "Dietary Guidelines for Chinese Residents". As shown in the figure below, after classifying the indicators of D4~D37, we can sequentially extract the indicators involved in Annex A3 to construct indicators, and then conduct gap analysis with A3 after construction.

The following is the distribution of indicators after construction, where you can combine normal distribution diagrams, violin diagrams, and boxplots for display and analysis

The results of the analysis can be summarized, pointing out the gap between the rationality of residents’ eating habits and the guidelines, and then giving suggestions


Question 1 produces the result after the code is run:

Question 2: Analyze whether the living habits and eating habits of residents are related to factors such as age, gender, marital status, education level, and occupation.

Solution 1: Correlation analysis, first sort out the relevant variables of living habits indicators and eating habits indicators, and then conduct correlation analysis on age, gender, marital status, education level, occupation and other factors one by one, and then analyze the results of the previous correlation analysis Integrate to obtain the mean value of its correlation coefficient, and then determine whether there is a correlation between the overall and the above factors, and individually, which variables have low correlation or do not show correlation.

Solution 2: Logistic regression, first of all, you can sort out the variables related to the indicators of living habits and eating habits, and these variables are used as X, and then the age, gender, marital status, education level, occupation and other demographic factors are used as Y, such as Take gender as an example of Y, first analyze whether its F test is significant, if there is significance, it means that there is an impact relationship as a whole, and then check the standardized regression coefficient of each item to check the individual significance relation;

Solution 3: Machine learning + model interpretation (shap model), same as method 2, first check the indicators, then use machine learning to model classification or regression models, and input the model into the shap model, so that each indicator can be determined from a nonlinear perspective Impact on Demographic Factor (Y)

Here I use solution 2 for the second question, because the third question uses solution 3, which can give you more choices. The paper for question 2 is as follows, which is different from the description of solution 2. Here I used PCA has done dimensionality reduction, because the sorted indicators have reached more than 380 indicators. Of course, dimensionality reduction is not a direct dimensionality reduction. My thesis is divided into 4 parts for dimensionality reduction, and its explainability is analyzed in turn.
Question 2 produces results after the code is run:

Question 3 Based on the data in Appendix A2, deeply analyze the relationship and degree of common chronic diseases (such as hypertension, diabetes, etc.) with smoking, drinking, eating habits, living habits, nature of work, exercise and other factors.

This question is the same as question 2, the only difference is to change Y, where Y is (0: no disease, 1: high blood pressure or diabetes), and then sort out these variables, it is recommended to ask question 2 You can use solution 3, and then apply the same solution as problem 2, so that the difficulty of solving problem 3 is reduced. If you want to show off your skills, you can use different machine learning for comparison. Here I use the xgboost+shap model.
Question 3 produces results after the code is run:

Question 4 According to the specific conditions of the residents in Annex A2, reasonably classify the residents, and put forward reasonable suggestions on healthy diet and exercise for various groups of people.

The key core of this question is the direction of classification. From the point of view of the question, there are many types of classification, such as whether there is disease (high blood pressure or diabetes), or classification according to demographic characteristics, such as juvenile, youth, middle-aged, old , or obese groups, or eating habits, etc., so in fact, there are many ways to do this question, but they are inseparable. After the classification, we propose healthy diet, exercise and other aspects for various groups of people. It is a reasonable suggestion that this approach is the same as the analysis steps. This analysis can directly copy the analysis of the first question, but this time it is divided according to the population.

Here I first classify the population attributes of residents based on k-cluster analysis, and use the elbow rule to determine that residents are divided into three types of customer groups. Category 0 is mainly elderly women with average education, mainly married or widowed, etc. Category 1 is mainly young and middle-aged groups, with a similar ratio of male to female, high education level, and the marital status is mainly unmarried and married, etc. Category 2 is mainly middle-aged and elderly women, with average education level, mainly married, etc. Then use the classification label as a filter condition to divide into three customer groups. The dietary habits and rational analysis methods are the same as in Chapter 5, and after listing the problems of each group, combined with the guidelines. Suggestions for diet and exercise that are beneficial to health


Question 4 produces results after the code is run:

The complete video and finished products can be obtained at Bilibili:

2023 Northeast Three Provinces Shenzhen Cup Topic A Thesis Finished Nanny Tutorial Analysis on the Health of Urban Residents_哔哩哔哩_bilibili

Guess you like

Origin blog.csdn.net/weixin_44099072/article/details/132018094