[The 11th Teddy Cup Data Mining Challenge in 2023] Question A: Analysis of COVID-19 Epidemic Prevention and Control Data 32-page and 40-page papers and implementation code

Please add a picture description

[The 11th Teddy Cup Data Mining Challenge in 2023] Question A: Analysis of COVID-19 Epidemic Prevention and Control Data 32-page and 40-page papers and implementation code

Related Links

(1) Modeling scheme

[The 11th Teddy Cup Data Mining Challenge in 2023] Question A: Analysis and modeling scheme of COVID-19 epidemic prevention and control data and detailed explanation of python code

(2) Papers on relevant competition topics

[The 11th Teddy Cup Data Mining Challenge in 2023] Question A: Analysis of COVID-19 Epidemic Prevention and Control Data 32-page and 40-page papers and implementation code

[The 11th Teddy Cup Data Mining Challenge in 2023] Topic B: Data Analysis and Demand Forecasting of Product Orders 23-page paper and implementation code

[The 11th Teddy Cup Data Mining Challenge in 2023] Question C: Construction of a 27-page paper and implementation code for a two-way recommendation system for recruitment and job hunting on Teddy’s internal promotion platform

1 topic

1. Background

Since the end of 2019, there have been different degrees of new coronavirus infections across the country. How to control the spread of the epidemic and maintain the normal operation of social life and economic order is an important issue for epidemic prevention and control. Big data analysis provides efficient, convenient and fast tools for the precise prevention and control of the epidemic, especially in the classification management of personnel, transmission route tracking, and epidemic research and judgment. Reliable basis. The epidemic data mainly includes personnel information.csv, site information.csv, personal self-examination and reporting information.csv, site code scanning information.csv, nucleic acid sampling and testing information.csv, and vaccination information.csv. This question provides relevant data information of a certain city's new crown epidemic prevention system. Please conduct a comprehensive analysis based on these data information. The main tasks include data warehouse design, epidemic transmission route tracking, transmission index estimation, and epidemic trend research and judgment.

(1) Personnel Information Form: Attachment 2.csv

serial number field name field description Field Type Defaults
1 user_id Person id: the unique identifier of the person bigint(20)
2 openid WeChat OpenID varchar(64) null
3. gender Gender: Male Female varchar(2) null
4 nation nationality varchar(20) null
5 age age int null
6 birthdate date of birth varchar(20) null
7 create_time creation time timestamp null

(2) Venue Information Form: Attachment 3.csv

serial number field name field description Field Type Defaults
1 grid_point_id venue id: the unique identifier of the venue bigint(20)
2 name place name varchar (255) null
3. point_type venue type varchar (50) nnulnulll
4 x_coordinate X coordinate (unit: meter) decimal(12,2) null
5 y_coordinate Y coordinate (unit: meter) decimal(12,2) null
6 create_time creation time timestamp null

(3) Personal self-examination and reporting information form: Attachment 4.csv

NO. field name field description Field Type Defaults
1 sno Serial number: the unique identification of the self-inspection record bigint(20)
2 user_id Personnel ID: corresponding to user_id.ID in the "Personnel Information Table" long number(20)
3. x_coordinate The x-coordinate of the reported location decimal(12,2) null
4 y_coordinate The y coordinate of the reported location decimal(12,2) null
5 symptom Symptoms: 1 fever, 2 fatigue, 3 dry cough, 4 nasal congestion, 5 runny nose, 6 diarrhea, 7 dyspnea, 8 asymptomatic varchar (100) null
6 nucleic_acid_result Nucleic acid test results: 0 negative, 1 positive, 2 unknown (optional) varchar (10) null
7 resident_flag Permanent resident or not: 0 unknown, 1 yes, 2 no int null
8 dump_time reporting time timestamp null

(4) Site Code Scanning Information Form: Attachment 5.csv

serial number field name field description Field Type Defaults
1 sno Serial number: the unique identification of the scanning code record bigint(20)
2 grid_point_id Site ID: corresponding to the grid_point_id in the "Site Information Table" bigint(20)
3 user_id Personnel ID: corresponding to user_id.ID in the "Personnel Information Table" bigint(20)
4 temperature body temperature double null
5 create_time Scan code record time timestamp null

(5) Nucleic acid sampling and testing information form: Attachment 6.csv

serial number field name field description Field Type Defaults
1 sno Serial number: unique identification of nucleic acid sampling records bigint(20)
2 user_id Personnel ID: corresponding to user_id.ID in the "Personnel Information Table" bigint(20) null
3 cysj Sampling date and time timestamp null
4 jcsj Detection date and time timestamp null
5 jg Test result: negative, positive, unknown varchar (50) null
6 grid_point_id Site ID: corresponding to the grid_point_id in the "Site Information Table" bigint(20)

(6) Vaccination Information Form: Attachment 7.csv

serial number field name field description Field Type Defaults
1 sno Serial Number: Unique identifier for vaccination records bigint(20)
2 inject_sn Inoculation serial number varchar(50)
3 user_id Personnel ID: corresponding to user_id.ID in the "Personnel Information Table" varchar(50)
4 age Vaccination age int null
5 gender Gender: 1 male, 2 female varchar(10) null
6 birthdate date of birth varchar(50) null
7 inject_date vaccination date timestamp null
8 inject_times Number of stitches: 1 first stitch, 2 second stitches, 3 reinforcement stitches varchar(30) null
9 vaccine_type Vaccine type: 1 inactivated vaccine, 2 recombinant protein vaccine, 3 viral vector vaccine, 4 nucleic acid vaccine, 5 attenuated vaccine varchar(30) null

Two, the problem

  1. 根据核酸检测中阳性人员的出行时间与场所追踪密接者,将结果保存到“result1.csv”文件中,文件模板格式如下
序号 密接者ID 密接日期 密接场所ID 阳性人员ID
  1. 由问题1的结果,根据密接者的出行时间与场所追踪相应的次密接者,将结果保存到“result2.csv”文件中,文件模板如下。
序号 次密接者ID 次密接日期 次密接场所ID 密接者ID
  1. 建立模型,分析接种疫苗对病毒传播指数的影响。

  2. 根据阳性人员的数量及辐射范围,分析确定需要重点管控的场所。

  3. 为了更精准地进行疫情防控和人员管理,你认为还需要收集哪些相关数据。基于这些数据构建模型,分析其精准防控的效果。

注在解决上述问题时,要求结合赛题提供的数据信息表建立数据仓库,实现数据治理的内容,请在论文中明确阐述做了哪些数据治理工作,具体是如何实现的。

2 论文一介绍

新冠疫情防控数据的分析 --基于机器学习算法的大数据分析

摘要

自新冠疫情发生以来,这一感染性极强的病毒在全球呈现爆发式的蔓延和增长,对全球的社会经济和人类的日常生活都造成了极大的影响。因此,对于疫情的防控和治疗措施对于全球而言都是及其重要的一个课题。

本文主要是基于题目所给的自疫情发生以来所记录的部分数据,运用Python对数据进行清洗和处理,确定密接者和次密接者的信息,结果导出在result1.csv和result2.csv文件中,使疫情防控更为精确而有效;其次,根据人员接种疫苗前后的感染情况,运用SEIR传染病模型求解病毒传播指数,运用卡方检验和皮尔逊、斯皮尔曼等级相关系数进行相关性检验,以此求解不同人群在不同场合感染、传播病毒的速率,结果表明,接种疫苗能够减缓病毒传播速度;最后,再根据阳性人员的密集程度将重点防控区域落于公共交通场所,社区住所,以及学校和各大娱乐场所,以便达到更好的疫情防控效果。此外,根据分析我们还发现如果将成年人视为重点防控人员,可以在一定程度上减缓病毒传播的速度,从而达到一定的防控效果。

Based on this data survey and analysis, it is hoped that certain feasible measures can be brought to the prevention and control of the epidemic to make the prevention and control of the epidemic more precise and slow down the spread of the virus; in addition, for the application of machine learning algorithms in solving practical problems, It can greatly reduce the waste of human resources and solve practical problems more efficiently.

**Keywords:** machine learning algorithm, SEIR infectious disease model, chi-square test, Pearson, Spearman rank correlation coefficient

insert image description here

3 Introduction to Paper 2

Research and judgment on the trend of new crown epidemic prevention and control based on machine learning

Summary

Due to the successive emergence of different degrees of new crown virus infection in various places, how to control the spread of the epidemic and maintain the normal operation of social life and economic order is an important issue for the prevention and control of the epidemic. Based on this background, big data analysis provides an important reference for the prediction and peak value of the epidemic situation, improves governance efficiency, reduces casualties, and introduces epidemic response measures in line with China's national conditions. Big data has played an important role in personnel classification management, transmission route tracking, and epidemic research and judgment, providing a reliable basis for management decisions of health and epidemic prevention departments.

This paper proposes an innovative solution based on the research and judgment of the trend of the prevention and control of the new crown epidemic: use the K-Nearest Neighbor (KNN) machine learning algorithm to find other people who have had close contact with the positive person, and use the KNN algorithm to realize the second close contact. For tracking, the machine learning linear regression model is also used to analyze and explore the impact of vaccination and virus index, and then find out the number of positive people and the radiation range according to the heat map and machine learning K-Means clustering algorithm, and analyze and determine the key points that need to be controlled Finally, use the space-time analysis method to analyze the directed graph of personnel flow and use the cluster analysis algorithm to divide the location into different clusters, combined with the knowledge of data governance, to provide more accurate decision-making reference for epidemic prevention and control and personnel management.

keywords:

k-Nearest Neighbors (KNN) Linear Regression Machine Learning K-Means Clustering

insert image description here

4 Obtaining methods

computer browser open

(1) Article 1

(2) Part Two

Guess you like

Origin blog.csdn.net/weixin_43935696/article/details/130474426