2023 Shenzhen Cup Question A Detailed Analysis Version 1.1

Analysis of Factors Affecting the Health of Urban Residents in Question A  

Attachment A1 is a survey questionnaire on "Chronic Non-communicable Diseases and Their Related Influencing Factors Epidemiology" conducted by a city's health research department for some residents. Attachment A2 is the corresponding survey data results. Eight guidelines proposed for a balanced resident diet in the revised Dietary Guidelines for Chinese Residents.

picture

  The data presented relate to questionnaires, although the ones obtained have good results. But I think this is still a questionnaire competition, so the reliability and validity test of the questionnaire, some links that need to be done in the questionnaire competition, I personally suggest that you can learn from it. Therefore, I have found excellent papers for everyone in the CP Cup (the status of the Chinese competition in questionnaire competitions), and you can refer to it for similar questionnaire processing.

picture

For data analysis, the data in Annex 2 can be roughly divided into several categories of first-level indicators according to the questions set in Annex 1. The colors in the picture are mainly related to different topics, the blue is related to the third question, and the yellow and green are related to the second question.

picture

   Data preprocessing, including operations such as dimensionality reduction, outliers, and missing values. Outlier questions are shown here, such as smoking question ID12342 People born in 1963, the questionnaire results show that they do not smoke, but they smoke 7 cigarettes a day and 7 times a week. It must be an incorrect data as an outlier. In the follow-up, regarding other processing, I need to carry out actual operations before I can share more detailed processing with you.

picture

Question 1. Referring to Appendix A3, analyze the rationality of residents' eating habits in Appendix A2, and explain the main problems.

Data Processing + Analysis

For question 1, analyzing the rationality of residents’ eating habits means analyzing the given data and giving necessary text descriptions. Question 1 here is essentially the idea of ​​the questioner should be to let everyone simply analyze and describe the data and reference materials given in the question, and do a preliminary treatment. The method is similar to language modeling, and it is enough to conduct single-factor descriptive analysis on the key factor indicators.

I think that the rationality of the diet can be judged mainly by referring to the three, four and five items of the eight guidelines in Annex III.

picture

Question 2.  Analyze whether the living habits and eating habits of residents are related to factors such as age, gender, marital status, education level, and occupation.

Dimensionality reduction processing + correlation analysis

For question 2, the essence of the question is to analyze the correlation between the two variables of living habits and eating habits and other given data indicators (it can be regarded as a multivariate analysis problem of single variable to multivariate, and related methods can refer to Teacher Si Shoukui's tenth Chapter Multivariate Analysis). Remember! ! Don't analyze the second question too deeply. The second and third questions are a whole process that is a progressive process. It is not necessary to complete the analysis of the second question.

Question 3.  Based on the data in Appendix A2, deeply analyze the relationship and degree of common chronic diseases (such as hypertension, diabetes, etc.) with factors such as smoking, drinking, eating habits, living habits, nature of work, and exercise.

For question three, it can be understood as an in-depth analysis of question two, although it is no longer the two variables of question two, living habits and eating habits. Instead, we directly want us to get the degree of correlation between common chronic diseases and other indicators. In other words, we want us to get the exact function expression (used to describe the functional relationship between two variables and other indicators, similar to getting The result is y=k1x1+k2x2+k3x3+b.

For question three, it is definitely not advisable to directly analyze the correlation between common chronic diseases and more than 200 other indicators. Therefore, I think that the processing of problems two and three should involve data dimensionality reduction, that is, after dimensionality reduction processing for eating habits with more indicators. It would be more appropriate to establish a model similar to y=k1x1+k2x2+k3x3+b.

Question 4.  According to the specific conditions of the residents in Annex A2, reasonably classify the residents, and put forward reasonable suggestions on healthy diet and exercise for various groups of people.

For question 4, we can choose some classification models to reasonably classify the residents mentioned in the title, and we can get reasonable results, similar to Q-type and R-type cluster analysis. You can also refer to Zhengda Ben’s paper. This problem is similar to Zhengda Cup. We classify customers and tap potential users. You can refer to it.

Click to read the original text for information mentioned in the article

Mark:

This idea is updated to 7.27, and necessary modifications may be made according to your comments in the future. My humble opinion, I hope everyone understands.

1. Questionnaire

Questionnaire Reliability and Validity

Refer to the excellent papers of Chia Tai Cup (questionnaire category)

2. Data preprocessing

Outliers For example, smoking question ID12342 People born in 1963, the questionnaire results show that they do not smoke, but they smoke 7 cigarettes a day and 7 times a week. It must be a wrong data as an outlier.

3. The essence of the problem

Question 1. Data processing + data analysis

Question 2: Multivariate Analysis of Data Dimensionality Reduction + Correlation Analysis (mentioned in the 5th and 6th class of the 10-class Award Guarantee Course)

Question 3. Data dimensionality reduction + deterministic function relationship

Question 4. Classification

4. The idea of ​​question four can follow the method of mining potential users in the CP Cup paper.

Click to read the original text for information mentioned in the article

Guess you like

Origin blog.csdn.net/qq_33690821/article/details/131971301