2023 Higher Education Society Cup Mathematical Modeling Ideas - Review: Analysis of Campus Consumption Behavior

0 Question Ideas

(Share on CSDN as soon as the competition questions come out)

https://blog.csdn.net/dc_sinor?type=blog

1 Competition background

The campus card is an information integration system that integrates multiple functions such as identity authentication, financial consumption, and data sharing. While providing high-quality and efficient information services to teachers and students, the system itself has also accumulated a large number of historical records, which contain information on students' consumption behavior and the operating status of various departments such as the school cafeteria.

Many colleges and universities are building "smart campuses" based on the campus card system. For example, the Yangtze Evening News reported on January 27, 2016: "Nanjing University of Technology provides "heart-warming meal card subsidies" to poor students."

There is no need to apply or review, but you can quietly get hundreds of yuan extra on your meal card... The reporter learned exclusively from Nanjing University of Science and Technology yesterday that the Nanjing University of Science and Technology Education Foundation officially launched the "Heart-warming Meal Card"

The project provides "targeted assistance" to the food and clothing problems of extremely poor students.

The project specifically provides assistance to poor undergraduate students with regard to their "food and clothing problems". At the school's all-in-one card center, staff from the Education Foundation found the card swiping records of more than 16,000 undergraduates in the school from mid-September to mid-November, and conducted big data analysis on all records. In the end, more than 500 "quasi-aid objects" were selected.

The NTU Education Foundation will use a "seed fund" of 1 million yuan as start-up capital, determine the specific subsidy amount based on the different circumstances of each poor student, and then "quietly" transfer the money into the student's meal card , to ensure that students with difficulties can have enough food.

——"Yangtze Evening News" January 27, 2016: Nanjing University of Technology provides "heart-warming meal card subsidies" to poor students. This competition question provides one month's operation data of the campus card system of a domestic university. We hope that contestants will use it.

Data analysis and modeling methods are used to mine the information contained in the data, analyze students' learning and living behaviors on campus, and provide information support for improving school services and decision-making by relevant departments.

2 Analysis goals

  • 1. Analyze students’ consumption behavior and the operating status of the canteen, and provide suggestions for canteen operation.

  • 2. Construct a student consumption segmentation model to provide reference opinions for schools to determine students’ economic status.

3 Data description

The attachment is the card data of a school from April 1 to April 30, 2019.

There are 3 files in total: data1.csv, data2.csv, data3.csv
Insert image description here
Insert image description here
Insert image description here

4 Data preprocessing

Load the three files data1.csv, data2.csv, and data3.csv in the attachment into the analysis environment, and refer to Appendix 1 to understand the meaning of the fields. Probe data quality and handle necessary aspects such as missing values ​​and outliers. Save the processing results as "task1_1_X.csv" (if it contains multiple data tables, X can be numbered starting from 1), and describe the processing process in the report.

import numpy as np
import pandas as pd
import os
os.chdir('/home/kesci/input/2019B1631')
data1 = pd.read_csv("data1.csv", encoding="gbk")
data2 = pd.read_csv("data2.csv", encoding="gbk")
data3 = pd.read_csv("data3.csv", encoding="gbk")
data1.head(3)

Insert image description here

data1.columns = ['序号', '校园卡号', '性别', '专业名称', '门禁卡号']
data1.dtypes

Insert image description here

data1.to_csv('/home/kesci/work/output/2019B/task1_1_1.csv', index=False, encoding='gbk')
data2.head(3)

Insert image description here
Associate the students' personal information in data1.csv with the consumption records in data2.csv, and save the processing results as "task1_2_1.csv"; associate the students' personal information in data1.csv with the access control records in data3.csv , and the processing result is saved as "task1_2_2.csv".

data1 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_1.csv", encoding="gbk")
data2 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_2.csv", encoding="gbk")
data3 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_3.csv", encoding="gbk")
data1.head(3)

Insert image description here

5 Data Analysis

5.1 Analysis of Dining Behavior in Canteens

Draw a pie chart of the proportion of diners in each canteen, analyze whether there are significant differences in where students eat breakfast, lunch and dinner, and describe it in the report. (Tip: Multiple credit card swiping records with very close time intervals may be one dining behavior)

data = pd.read_csv('/home/kesci/work/output/2019B/task1_2_1.csv', encoding='gbk')
data.head()

Insert image description here

import matplotlib as mpl
import matplotlib.pyplot as plt
# notebook嵌入图片
%matplotlib inline
# 提高分辨率
%config InlineBackend.figure_format='retina'
from matplotlib.font_manager import FontProperties
font = FontProperties(fname="/home/kesci/work/SimHei.ttf")
import warnings
warnings.filterwarnings('ignore')
canteen1 = data['消费地点'].apply(str).str.contains('第一食堂').sum()
canteen2 = data['消费地点'].apply(str).str.contains('第二食堂').sum()
canteen3 = data['消费地点'].apply(str).str.contains('第三食堂').sum()
canteen4 = data['消费地点'].apply(str).str.contains('第四食堂').sum()
canteen5 = data['消费地点'].apply(str).str.contains('第五食堂').sum()
# 绘制饼图
canteen_name = ['食堂1', '食堂2', '食堂3', '食堂4', '食堂5']
man_count = [canteen1,canteen2,canteen3,canteen4,canteen5]
# 创建画布
plt.figure(figsize=(10, 6), dpi=50)
# 绘制饼图
plt.pie(man_count, labels=canteen_name, autopct='%1.2f%%', shadow=False, startangle=90, textprops={'fontproperties':font})
# 显示图例
plt.legend(prop=font)
# 添加标题
plt.title("食堂就餐人次占比饼图", fontproperties=font)
# 饼图保持圆形
plt.axis('equal')
# 显示图像
plt.show()

Insert image description here
Through the canteen card swiping records, draw the dining time curves of the canteen on working days and non-working days respectively, analyze the dining peaks of breakfast, lunch and dinner in the canteen, and describe them in the report.

Insert image description here

# 对data中消费时间数据进行时间格式转换,转换后可作运算,coerce将无效解析设置为NaT
data.loc[:,'消费时间'] = pd.to_datetime(data.loc[:,'消费时间'],format='%Y-%m-%d %H:%M',errors='coerce')
data.dtypes
# 创建一个消费星期列,根据消费时间计算出消费时间是星期几,Monday=1, Sunday=7
data['消费星期'] = data['消费时间'].dt.dayofweek + 1
data.head(3)
# 以周一至周五作为工作日,周六日作为非工作日,拆分为两组数据
work_day_query = data.loc[:,'消费星期'] <= 5
unwork_day_query = data.loc[:,'消费星期'] > 5

work_day_data = data.loc[work_day_query,:]
unwork_day_data = data.loc[unwork_day_query,:]
# 计算工作日消费时间对应的各时间的消费次数
work_day_times = []
for i in range(24):
    work_day_times.append(work_day_data['消费时间'].apply(str).str.contains(' {:02d}:'.format(i)).sum())
    # 以时间段作为x轴,同一时间段出现的次数和作为y轴,作曲线图
x = []
for i in range(24):
    x.append('{:02d}:00'.format(i))
# 绘图
plt.plot(x, work_day_times, label='工作日')
# x,y轴标签
plt.xlabel('时间', fontproperties=font);
plt.ylabel('次数', fontproperties=font)
# 标题
plt.title('工作日消费曲线图', fontproperties=font)
# x轴倾斜60度
plt.xticks(rotation=60)
# 显示label
plt.legend(prop=font)
# 加网格
plt.grid()

Insert image description here

# 计算飞工作日消费时间对应的各时间的消费次数
unwork_day_times = []
for i in range(24):
    unwork_day_times.append(unwork_day_data['消费时间'].apply(str).str.contains(' {:02d}:'.format(i)).sum())
    # 以时间段作为x轴,同一时间段出现的次数和作为y轴,作曲线图
x = []
for i in range(24): 
    x.append('{:02d}:00'.format(i))
plt.plot(x, unwork_day_times, label='非工作日')
plt.xlabel('时间', fontproperties=font);
plt.ylabel('次数', fontproperties=font)
plt.title('非工作日消费曲线图', fontproperties=font)
plt.xticks(rotation=60)
plt.legend(prop=font)
plt.grid()

Insert image description here
Based on the results of the above analysis, it is easy to provide suggestions for canteen operations, such as staggering peak hours, etc.

5.2 Analysis of Student Consumption Behavior

Based on students' overall campus consumption data, calculate the per capita credit card swiping frequency and per capita consumption amount this month, and select 3 majors to analyze the consumption characteristics of different gender student groups in different majors.

data = pd.read_csv('/home/kesci/work/output/2019B/task1_2_1.csv', encoding='gbk')
data.head()

Insert image description here

# 计算人均刷卡频次(总刷卡次数/学生总人数)
cost_count = data['消费时间'].count()
student_count = data['校园卡号'].value_counts(dropna=False).count()
average_cost_count = int(round(cost_count / student_count))
average_cost_count


# 计算人均消费额(总消费金额/学生总人数)
cost_sum = data['消费金额'].sum()
average_cost_money = int(round(cost_sum / student_count))
average_cost_money


# 选择消费次数最多的3个专业进行分析
data['专业名称'].value_counts(dropna=False)

Insert image description here

# 消费次数最多的3个专业为 连锁经营、机械制造、会计
major1 = data['专业名称'].apply(str).str.contains('18连锁经营')
major2 = data['专业名称'].apply(str).str.contains('18机械制造')
major3 = data['专业名称'].apply(str).str.contains('18会计')
major4 = data['专业名称'].apply(str).str.contains('18机械制造(学徒)')

data_new = data[(major1 | major2 | major3) ^ major4]
data_new['专业名称'].value_counts(dropna=False)


 分析 每个专业,不同性别 的学生消费特点
data_male = data_new[data_new['性别'] == '男']
data_female = data_new[data_new['性别'] == '女']
data_female.head()

Insert image description here
According to the overall campus consumption behavior of students, appropriate characteristics are selected, a clustering model is constructed, and the consumption characteristics of each type of student group are analyzed.

data['专业名称'].value_counts(dropna=False).count()
# 选择特征:性别、总消费金额、总消费次数
data_1 = data[['校园卡号','性别']].drop_duplicates().reset_index(drop=True)
data_1['性别'] = data_1['性别'].astype(str).replace(({'男': 1, '女': 0}))
data_1.set_index(['校园卡号'], inplace=True)
data_2 = data.groupby('校园卡号').sum()[['消费金额']]
data_2.columns = ['总消费金额']
data_3 = data.groupby('校园卡号').count()[['消费时间']]
data_3.columns = ['总消费次数']
data_123 =  pd.concat([data_1, data_2, data_3], axis=1)#.reset_index(drop=True)
data_123.head()

# 构建聚类模型
from sklearn.cluster import KMeans
# k为聚类类别,iteration为聚类最大循环次数,data_zs为标准化后的数据
k = 3    # 分成几类可以在此处调整
iteration = 500
data_zs = 1.0 * (data_123 - data_123.mean()) / data_123.std()
# n_jobs为并发数
model = KMeans(n_clusters=k, n_jobs=4, max_iter=iteration, random_state=1234)
model.fit(data_zs)
# r1统计各个类别的数目,r2找出聚类中心
r1 = pd.Series(model.labels_).value_counts()
r2 = pd.DataFrame(model.cluster_centers_)
r = pd.concat([r2,r1], axis=1)
r.columns = list(data_123.columns) + ['类别数目']


# 选出消费总额最低的500名学生的消费信息
data_500 = data.groupby('校园卡号').sum()[['消费金额']]
data_500.sort_values(by=['消费金额'],ascending=True,inplace=True,na_position='first')
data_500 = data_500.head(500)
data_500_index = data_500.index.values
data_500 = data[data['校园卡号'].isin(data_500_index)]
data_500.head(10)

Insert image description here

# 绘制饼图
canteen_name = list(data_max_place.index)
man_count = list(data_max_place.values)
# 创建画布
plt.figure(figsize=(10, 6), dpi=50)
# 绘制饼图
plt.pie(man_count, labels=canteen_name, autopct='%1.2f%%', shadow=False, startangle=90, textprops={'fontproperties':font})
# 显示图例
plt.legend(prop=font)
# 添加标题
plt.title("低消费学生常消费地点占比饼图", fontproperties=font)
# 饼图保持圆形
plt.axis('equal')
# 显示图像
plt.show()

Insert image description here

Modeling information

Data Sharing: The strongest modeling data
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/math_assistant/article/details/132480447