Full analysis of the mathematical modeling code for the 2023 Electrician Cup Question B: The Impact of Artificial Intelligence on College Students' Learning Modeling Problems

Question B

Problem restatement

Artificial intelligence (AI) was first proposed during a seminar at Dartmouth College in the United States in 1956. It has achieved significant development in the past few decades and has had a wide impact on various fields of society. A questionnaire was designed to understand the impact of artificial intelligence on college students' learning in different aspects, and Appendix 1 is provided as the questionnaire content, and Appendix 2 is the survey feedback result data. We hope that based on these data, we can establish corresponding mathematical models to analyze the impact of artificial intelligence on college students’ learning and solve the following problems

Question one

Analyze and numerically process the data given in Appendix 2, and provide processing methods.

The data in Appendix 2 has a total of 4605 rows, and each row has a total of 30 characteristics (i.e., the number of columns), including gender, major, grade, personality and other information.

data analysis:

  1. Data preprocessing and cleaning:
    Clean and preprocess data according to specific circumstances, including processing missing values, processing duplicate values, converting data types, etc. For example, if there are missing values ​​in some columns, you can use fillna()a function to fill them; if there are duplicate rows, you can use drop_duplicates()a function to remove duplicate rows; if you need to convert the data type of a specific column, you can use astype()a function to convert.
  2. Analyze a feature

According to your own understanding, analyze certain characteristics and visualize the data, such as statistical survey of the male-to-female ratio of documents, calculating the number of people in each major and drawing a pie chart, etc.

Numerical processing:

  1. For categorical variables (such as gender, major, grade, personality, etc.), you can use One-Hot Encoding to convert them into numerical variables. For example, convert the gender "female" to 1 and "male" to 0; convert the major "literature and history" to [1, 0, 0, ...], indicating that a major has multiple possible values. , representing each possible value with a binary bit.
  2. For numeric variables (such as online time), you can retain their original values.
  3. For the "yes" and "no" answers to the questions, they can be converted into binary variables, that is, "yes" is converted into 1 and "no" is converted into 0.
  4. For question answer options, a method similar to one-hot encoding can be used for numerical processing. For example, convert the answer "Use a mobile phone to surf the Internet" to "What is the most common way to surf the Internet?" to [1, 0, 0, ...], indicating that when a question has multiple options, use one for each option. Binary bit representation.

Sample code:

Python

import pandas as pd
 import matplotlib.pyplot as plt
 ​
 # 读取附件二数据文件
 data = pd.read_excel('附件2.xlsx')
 ​
 # 数据预处理和清洗
 # 假设我们只关注部分特征,可以选择需要的列进行分析
 # 强烈建议自行修改表头(例如去掉序号),这里我们因为是示例展示便不再修改
 selected_columns = ['1、您的性别(1-22题为单选题)', '2、您的专业', '3、您所在的年级', '4、您的性格', '5、您最常通过哪种方式上网?', '6、您每周的上网时长大约是多少?']
 data = data[selected_columns]
 ​
 # 缺失值处理
 data = data.dropna()  # 删除含有缺失值的行
 ​
 # 数值化处理
 # 例如,可以使用独热编码对分类变量进行数值化
 # 在独热编码过程中,创建的新列名的命名方式是在原始列名的基础上添加各个类别的名称。
 # 例如,对于列名为"您的性别(1-22题为单选题)",如果该列有两个类别,即"女"和"男",那么独热编码后将创建两个新的列,分别命名为"您的性别(1-22题为单选题)_女"和"您的性别(1-22题为单选题)_男"。
 categorical_columns = ['1、您的性别(1-22题为单选题)', '2、您的专业', '3、您所在的年级', '4、您的性格', '5、您最常通过哪种方式上网?']
 for column in categorical_columns:
     encoded_columns = pd.get_dummies(data[column], prefix=column)
     data = pd.concat([data, encoded_columns], axis=1)
 print(data.columns)
 # Index(['1、您的性别(1-22题为单选题)', '2、您的专业', '3、您所在的年级', '4、您的性格', '5、您最常通过哪种方式上网?',
 #        '6、您每周的上网时长大约是多少?', '1、您的性别(1-22题为单选题)_女', '1、您的性别(1-22题为单选题)_男',
 #        '2、您的专业_文法类', '2、您的专业_理工类', '2、您的专业_经管类', '2、您的专业_艺术教育类', '3、您所在的年级_大一',
 #        '3、您所在的年级_大三', '3、您所在的年级_大二', '3、您所在的年级_大四', '4、您的性格_其他', '4、您的性格_坚定型',
 #        '4、您的性格_外向型', '4、您的性格_安静型', '4、您的性格_感性型', '4、您的性格_温顺型',
 #        '5、您最常通过哪种方式上网?_其他', '5、您最常通过哪种方式上网?_在寝室用笔记本上网', '5、您最常通过哪种方式上网?_在网吧上',
 #        '5、您最常通过哪种方式上网?_用平板电脑上网', '5、您最常通过哪种方式上网?_用手机上网'],
 #       dtype='object')


 # 数据分析和可视化
 # 示例:计算每个性别的人数并绘制柱状图
 gender_counts = data['您的性别(1-22题为单选题)_女'].sum(), data['您的性别(1-22题为单选题)_男'].sum()
 gender_labels = ['女性', '男性']
 ​
 plt.bar(gender_labels, gender_counts)
 plt.xlabel('性别')
 plt.ylabel('人数')
 plt.title('性别分布')
 plt.show()
 ​
 # 示例:计算各专业的人数并绘制饼图
 major_counts = data['您的专业'].value_counts()
 ​
 plt.pie(major_counts, labels=major_counts.index, autopct='%1.1f%%')
 plt.axis('equal')
 plt.title('专业分布')
 plt.show()
 ​
 # 示例:计算使用学习软件工具的人数并绘制条形图
 tool_users = data['您是否使用过学习软件工具?'].value_counts()
 ​
 plt.bar(tool_users.index, tool_users)
 plt.xlabel('使用工具情况')
 plt.ylabel('人数')
 plt.title('学习软件工具使用情况')
 plt.show()
 ​
 
pandas and matplotlib are third-party libraries for Python. You can install them from Baidu yourself.

Matlab

% 读取数据
 data = readtable('附件2.xlsx');
 ​
 % 数据预处理和清洗
 selected_columns = {'1、您的性别(1-22题为单选题)', '2、您的专业', '3、您所在的年级', '4、您的性格', '5、您最常通过哪种方式上网?', '6、您每周的上网时长大约是多少?'};
 data = data(:, selected_columns);
 ​
 % 缺失值处理
 data = rmmissing(data);
 ​
 % 数值化处理
 categorical_columns = {'1、您的性别(1-22题为单选题)', '2、您的专业', '3、您所在的年级', '4、您的性格', '5、您最常通过哪种方式上网?'};
 for i = 1:numel(categorical_columns)
     column = categorical_columns{i};
     encoded_columns = dummyvar(data{:, column});
     encoded_columns.Properties.VariableNames = strcat(column, '_', encoded_columns.Properties.VariableNames);
     data = [data, encoded_columns];
 end
 ​
 % 数据分析和可视化
 % 示例:计算每个性别的人数并绘制柱状图
 female_count = sum(data.('1、您的性别(1-22题为单选题)_女'));
 male_count = sum(data.('1、您的性别(1-22题为单选题)_男'));
 gender_counts = [female_count, male_count];
 ​
 gender_labels = {'女性', '男性'};
 ​
 bar(gender_counts)
 xlabel('性别')
 ylabel('人数')
 title('性别分布')
 set(gca, 'XTickLabel', gender_labels)
 ​
 % 示例:计算各专业的人数并绘制饼图
 major_counts = countcats(data.('2、您的专业'));
 ​
 pie(major_counts)
 title('专业分布')
 ​
 % 示例:计算使用学习软件工具的人数并绘制条形图
 tool_users = countcats(data.('您是否使用过学习软件工具?'));
 ​
 bar(tool_users)
 xlabel('使用工具情况')
 ylabel('人数')
 title('学习软件工具使用情况')

More details can be found here:

2023 Electrician Cup (Question B) In-depth Analysis | Complete Code of Mathematical Modeling + Full Analysis of the Modeling Process - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/qq_25834913/article/details/132497541