2023 Huashu Cup Mathematical Modeling Ideas - Review: Analysis of Campus Consumer Behavior

0 Ideas for the competition

(Share on CSDN as soon as the competition questions come out)

https://blog.csdn.net/dc_sinor?type=blog

1 Background of the topic

The campus card is an information integration system that integrates multiple functions such as identity authentication, financial consumption, and data sharing. While providing high-quality and efficient information services for teachers and students, the system itself has accumulated a large number of historical records, which contain information such as the consumption behavior of students and the operation status of various departments such as school canteens.

Many colleges and universities carry out the related construction of "smart campus" based on the campus card system, such as the report of "Yangtze Evening News" on January 27, 2016: "Southern Institute of Technology Gives Poor Students "Heart-warming Meal Card Subsidies"".

There is no need to apply, no review, and a few hundred yuan can be quietly added to the meal card... The reporter learned exclusively from Nanjing University of Technology yesterday that the Nanjing University of Technology Education Foundation officially launched the "Heart-warming Meal Card"

The project provides "precision assistance" for the food and clothing problems of extremely poor students.

The project provides assistance specifically for the "food and clothing problem" of impoverished undergraduates. In the school card center, the staff of the Education Foundation found the card swiping records of more than 16,000 undergraduates in the school from mid-September to mid-November, and conducted big data analysis on all the records. In the end, more than 500 "quasi-aid objects" were selected.

The South Technological Education Foundation will take out the "seed fund" of 1 million yuan as the start-up capital, determine the specific subsidy amount according to the different situations of each poor student, and then "quietly" put the money into the student's meal card , to ensure that students with difficulties can have enough food.

——"Yangtze Evening News" January 27, 2016: NIT provides "heart-warming meal card subsidy" to poor students. This question provides one month's operation data of the campus card system of a domestic university. Participants are expected to use it

The method of data analysis and modeling, mining the information contained in the data, analyzing the learning and living behavior of students on campus, providing information support for improving school services and decision-making of relevant departments.

2 Analysis objectives

  • 1. Analyze the consumption behavior of students and the operation status of the canteen, and provide suggestions for the operation of the canteen.

  • 2. Construct a student consumption segmentation model to provide reference for schools to determine the economic status of students.

3 Data description

The attachment is the one-card data of a school from April 1, 2019 to April 30, 2019

There are 3 files in total: data1.csv, data2.csv, data3.csv
insert image description here
insert image description here
insert image description here

4 Data preprocessing

Load the three files data1.csv, data2.csv, and data3.csv in the attachment to the analysis environment, and refer to Appendix 1 to understand the meaning of the fields. Investigate data quality and make necessary treatments for things like missing values ​​and outliers. Save the processing result as "task1_1_X.csv" (if multiple data tables are included, X can be numbered from 1 to the next), and describe the processing process in the report.

import numpy as np
import pandas as pd
import os
os.chdir('/home/kesci/input/2019B1631')
data1 = pd.read_csv("data1.csv", encoding="gbk")
data2 = pd.read_csv("data2.csv", encoding="gbk")
data3 = pd.read_csv("data3.csv", encoding="gbk")
data1.head(3)

insert image description here

data1.columns = ['序号', '校园卡号', '性别', '专业名称', '门禁卡号']
data1.dtypes

insert image description here

data1.to_csv('/home/kesci/work/output/2019B/task1_1_1.csv', index=False, encoding='gbk')
data2.head(3)

insert image description here
Associate the student personal information in data1.csv with the consumption records in data2.csv, and save the processing result as "task1_2_1.csv"; associate the student personal information in data1.csv with the access control records in data3.csv , and the processing result is saved as "task1_2_2.csv".

data1 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_1.csv", encoding="gbk")
data2 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_2.csv", encoding="gbk")
data3 = pd.read_csv("/home/kesci/work/output/2019B/task1_1_3.csv", encoding="gbk")
data1.head(3)

insert image description here

5 Data analysis

5.1 Analysis of Dining Behavior in Canteens

Draw a pie chart of the proportion of the number of people dining in each cafeteria, analyze whether there is a significant difference in the places where students eat breakfast, lunch and dinner, and describe it in the report. (Hint: multiple credit card swiping records with very close time intervals may be a dining behavior)

data = pd.read_csv('/home/kesci/work/output/2019B/task1_2_1.csv', encoding='gbk')
data.head()

insert image description here

import matplotlib as mpl
import matplotlib.pyplot as plt
# notebook嵌入图片
%matplotlib inline
# 提高分辨率
%config InlineBackend.figure_format='retina'
from matplotlib.font_manager import FontProperties
font = FontProperties(fname="/home/kesci/work/SimHei.ttf")
import warnings
warnings.filterwarnings('ignore')
canteen1 = data['消费地点'].apply(str).str.contains('第一食堂').sum()
canteen2 = data['消费地点'].apply(str).str.contains('第二食堂').sum()
canteen3 = data['消费地点'].apply(str).str.contains('第三食堂').sum()
canteen4 = data['消费地点'].apply(str).str.contains('第四食堂').sum()
canteen5 = data['消费地点'].apply(str).str.contains('第五食堂').sum()
# 绘制饼图
canteen_name = ['食堂1', '食堂2', '食堂3', '食堂4', '食堂5']
man_count = [canteen1,canteen2,canteen3,canteen4,canteen5]
# 创建画布
plt.figure(figsize=(10, 6), dpi=50)
# 绘制饼图
plt.pie(man_count, labels=canteen_name, autopct='%1.2f%%', shadow=False, startangle=90, textprops={'fontproperties':font})
# 显示图例
plt.legend(prop=font)
# 添加标题
plt.title("食堂就餐人次占比饼图", fontproperties=font)
# 饼图保持圆形
plt.axis('equal')
# 显示图像
plt.show()

insert image description here
Through the card swiping records in the cafeteria, the dining time curves of the cafeteria on working days and non-working days are drawn respectively, and the dining peaks of breakfast, lunch and dinner in the cafeteria are analyzed, and described in the report.

insert image description here

# 对data中消费时间数据进行时间格式转换,转换后可作运算,coerce将无效解析设置为NaT
data.loc[:,'消费时间'] = pd.to_datetime(data.loc[:,'消费时间'],format='%Y-%m-%d %H:%M',errors='coerce')
data.dtypes
# 创建一个消费星期列,根据消费时间计算出消费时间是星期几,Monday=1, Sunday=7
data['消费星期'] = data['消费时间'].dt.dayofweek + 1
data.head(3)
# 以周一至周五作为工作日,周六日作为非工作日,拆分为两组数据
work_day_query = data.loc[:,'消费星期'] <= 5
unwork_day_query = data.loc[:,'消费星期'] > 5

work_day_data = data.loc[work_day_query,:]
unwork_day_data = data.loc[unwork_day_query,:]
# 计算工作日消费时间对应的各时间的消费次数
work_day_times = []
for i in range(24):
    work_day_times.append(work_day_data['消费时间'].apply(str).str.contains(' {:02d}:'.format(i)).sum())
    # 以时间段作为x轴,同一时间段出现的次数和作为y轴,作曲线图
x = []
for i in range(24):
    x.append('{:02d}:00'.format(i))
# 绘图
plt.plot(x, work_day_times, label='工作日')
# x,y轴标签
plt.xlabel('时间', fontproperties=font);
plt.ylabel('次数', fontproperties=font)
# 标题
plt.title('工作日消费曲线图', fontproperties=font)
# x轴倾斜60度
plt.xticks(rotation=60)
# 显示label
plt.legend(prop=font)
# 加网格
plt.grid()

insert image description here

# 计算飞工作日消费时间对应的各时间的消费次数
unwork_day_times = []
for i in range(24):
    unwork_day_times.append(unwork_day_data['消费时间'].apply(str).str.contains(' {:02d}:'.format(i)).sum())
    # 以时间段作为x轴,同一时间段出现的次数和作为y轴,作曲线图
x = []
for i in range(24): 
    x.append('{:02d}:00'.format(i))
plt.plot(x, unwork_day_times, label='非工作日')
plt.xlabel('时间', fontproperties=font);
plt.ylabel('次数', fontproperties=font)
plt.title('非工作日消费曲线图', fontproperties=font)
plt.xticks(rotation=60)
plt.legend(prop=font)
plt.grid()

insert image description here
According to the results of the above analysis, it is easy to provide suggestions for the operation of canteens, such as staggering peak hours and so on.

5.2 Analysis of Student Consumption Behavior

According to the overall campus consumption data of students, the per capita card swiping frequency and per capita consumption amount are calculated this month, and three majors are selected to analyze the consumption characteristics of students of different genders in different majors.

data = pd.read_csv('/home/kesci/work/output/2019B/task1_2_1.csv', encoding='gbk')
data.head()

insert image description here

# 计算人均刷卡频次(总刷卡次数/学生总人数)
cost_count = data['消费时间'].count()
student_count = data['校园卡号'].value_counts(dropna=False).count()
average_cost_count = int(round(cost_count / student_count))
average_cost_count


# 计算人均消费额(总消费金额/学生总人数)
cost_sum = data['消费金额'].sum()
average_cost_money = int(round(cost_sum / student_count))
average_cost_money


# 选择消费次数最多的3个专业进行分析
data['专业名称'].value_counts(dropna=False)

insert image description here

# 消费次数最多的3个专业为 连锁经营、机械制造、会计
major1 = data['专业名称'].apply(str).str.contains('18连锁经营')
major2 = data['专业名称'].apply(str).str.contains('18机械制造')
major3 = data['专业名称'].apply(str).str.contains('18会计')
major4 = data['专业名称'].apply(str).str.contains('18机械制造(学徒)')

data_new = data[(major1 | major2 | major3) ^ major4]
data_new['专业名称'].value_counts(dropna=False)


 分析 每个专业,不同性别 的学生消费特点
data_male = data_new[data_new['性别'] == '男']
data_female = data_new[data_new['性别'] == '女']
data_female.head()

insert image description here
According to the overall campus consumption behavior of students, select appropriate features, build a clustering model, and analyze the consumption characteristics of each type of student group.

data['专业名称'].value_counts(dropna=False).count()
# 选择特征:性别、总消费金额、总消费次数
data_1 = data[['校园卡号','性别']].drop_duplicates().reset_index(drop=True)
data_1['性别'] = data_1['性别'].astype(str).replace(({'男': 1, '女': 0}))
data_1.set_index(['校园卡号'], inplace=True)
data_2 = data.groupby('校园卡号').sum()[['消费金额']]
data_2.columns = ['总消费金额']
data_3 = data.groupby('校园卡号').count()[['消费时间']]
data_3.columns = ['总消费次数']
data_123 =  pd.concat([data_1, data_2, data_3], axis=1)#.reset_index(drop=True)
data_123.head()

# 构建聚类模型
from sklearn.cluster import KMeans
# k为聚类类别,iteration为聚类最大循环次数,data_zs为标准化后的数据
k = 3    # 分成几类可以在此处调整
iteration = 500
data_zs = 1.0 * (data_123 - data_123.mean()) / data_123.std()
# n_jobs为并发数
model = KMeans(n_clusters=k, n_jobs=4, max_iter=iteration, random_state=1234)
model.fit(data_zs)
# r1统计各个类别的数目,r2找出聚类中心
r1 = pd.Series(model.labels_).value_counts()
r2 = pd.DataFrame(model.cluster_centers_)
r = pd.concat([r2,r1], axis=1)
r.columns = list(data_123.columns) + ['类别数目']


# 选出消费总额最低的500名学生的消费信息
data_500 = data.groupby('校园卡号').sum()[['消费金额']]
data_500.sort_values(by=['消费金额'],ascending=True,inplace=True,na_position='first')
data_500 = data_500.head(500)
data_500_index = data_500.index.values
data_500 = data[data['校园卡号'].isin(data_500_index)]
data_500.head(10)

insert image description here

# 绘制饼图
canteen_name = list(data_max_place.index)
man_count = list(data_max_place.values)
# 创建画布
plt.figure(figsize=(10, 6), dpi=50)
# 绘制饼图
plt.pie(man_count, labels=canteen_name, autopct='%1.2f%%', shadow=False, startangle=90, textprops={'fontproperties':font})
# 显示图例
plt.legend(prop=font)
# 添加标题
plt.title("低消费学生常消费地点占比饼图", fontproperties=font)
# 饼图保持圆形
plt.axis('equal')
# 显示图像
plt.show()

insert image description here

Guess you like

Origin blog.csdn.net/dc_sinor/article/details/131996923