1. Experimental purpose
This training is mainly about data analysis and visualization of the tip data set.
2. Experimental data
The experimental tip data set comes from the data that comes with the Python library Seaborn, which has been converted into an Excel type data set.
Partial screenshot:
3. Experimental operation
1. Import module
#导入实验需要的包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei']#用来显示中文标签
plt.rcParams['axes.unicode_minus']=False#用来显示负号
%matplotlib inline
2. Get the data.
Import the data and display the first 5 rows.
fdata=pd.read_excel('C:/Users/leglon/Desktop/ch4/tips.xls')#读取数据,在此需要导入xls的环境
fdata.head()#输出前五行
Here you need to install the xlrd environment in advance, otherwise errors may easily occur: ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd. To solve this problem, just open cmd and
enter : pip install xlrd
, just wait for the installation to complete. Or go to anaconda to download and install the xlrd environment.
Steps: anaconda—>Environments—>tensorflow—>Not installed, enter: xlrd, check the pop-up option, and then click Apply. Just open it again.
3. View data information
fdata.describe()#查看数据描述
4. Modify the column name to Chinese
#修改为汉字,并且显示前五行数据
fdata.rename(columns={
'total_bill':'消费总额','tip':'小费','sex':'性别','smoker':'是否吸烟','day':'星期','time':'聚餐时间段','size':'人数'},inplace=True)
fdata.head()
5. View the top 5 lines of per capita consumption
#人均消费,显示前五行
fdata['人均消费']=round(fdata['消费总额']/fdata['人数'],2)
fdata.head()
6. Find data in the data set where the per capita consumption of smoking men is greater than 15
#查询吸烟男性中消费大于15的数据
fdata.query('是否吸烟=="Yes"&性别=="Male"&人均消费>15')
7. Check the relationship between total consumption and tips
fdata.plot(kind='scatter',x='消费总额',y='小费')#查看消费总额与小费的关系
It can be seen from the figure that there is a positive correlation between tips and total consumption.
8. Check the relationship between smoking and tipping
fdata.plot(kind='scatter',x='是否吸烟',y='小费')#查看是否吸烟与小费的关系
It can be seen from the figure that the relationship between smoking and tipping has little impact.
9. Compare the total consumption data of men and women
fdata.groupby('性别')['消费总额'].mean()
It can be seen that men consume more than women.
10. See how generosity compares between genders
#查看性别的慷慨程度对比
fdata.groupby('性别')['小费'].mean()
Men tip more than women.
11. Analyze the relationship between week and tip
#分析星期与小费的关系
print(fdata['星期'].unique())#显示星期的取值
r=fdata.groupby('星期')['小费'].mean()
fig=r.plot(kind='bar',x='星期',y='小费',fontsize=12,rot=30)
fig.axes.title.set_size(16)
It can be seen from the figure that tips are larger on Saturdays and Sundays than on Thursdays and Fridays.
12. Analyze the generosity of gender and smoking combinations
#分析性别与吸烟组合的慷慨度
r=fdata.groupby(['性别','是否吸烟',])['小费'].mean()
fig=r.plot(kind='bar',x=['性别','是否吸烟'],y='小费',fontsize=12,
rot=30)
fig.axes.title.set_size(16)
It can be seen that non-smoking men are more generous and tip more; non-smoking women are more generous than smoking women.
13. Analyze the relationship between dinner time and tipping
#分析聚餐时间段与小费的关系
r=fdata.groupby(['聚餐时间段'])['小费'].mean()
fig=r.plot(kind='bar',x='聚餐时间段',y='小费',fontsize=15,rot=30)
fig.axes.title.set_size(16)
You can see from the picture that tips are larger during dinner than during lunch.
14. Analyze the relationship between the number of people and tips
#分析人数与小费的关系
r=fdata.groupby(['人数',])['小费'].mean()
fig=r.plot(kind='bar',x='人数',y='小费',fontsize=15,rot=30)
fig.axes.title.set_size(16)
It can be seen from the picture that the more people at the dinner party, the more tips will be given.
4. Summary
Learning data visualization plays a great role for us. We can get some important information by analyzing data, allowing us to understand events better and give us more ways to respond to events.