动手学数据挖掘笔记（数据可视化）

数据挖掘笔记：

文章目录

- 数据可视化

数据可视化

1.可视化展示泰坦尼克号数据集中男女中生存人数分布情况（用柱状图试试）：

sex = text.groupby('Sex')['Survived'].sum()
sex.plot.bar() # 画出柱状图
plt.title('survived_count')# 设置标题
plt.show()

在这里插入图片描述

2.可视化展示泰坦尼克号数据集中男女中生存人与死亡人数的比例图（用柱状图试试）:

# 提示：计算男女中死亡人数 1表示生存，0表示死亡
text.groupby(['Sex', 'Survived'])['Survived'].count().unstack().plot(kind='bar', stacked='True')
plt.title('survived_count')
plt.ylabel('count')

在这里插入图片描述

注意此处不用sum()函数而是用count()函数，原因如下：

text.groupby(['Sex', 'Survived'])['Survived'].count()

Sex     Survived
female  0            81
        1           233
male    0           468
        1           109
Name: Survived, dtype: int64

text.groupby(['Sex', 'Survived'])['Survived'].sum()

Sex     Survived
female  0             0
        1           233
male    0             0
        1           109
Name: Survived, dtype: int64

count()是求总个数，sum()是求总和。

对于unstack()和stack()函数：
使用stack函数，将data的行索引[‘one’,‘two’,'three’]转变成列索引（第二层），便得到了一个层次化的Series（data2），使用unstack函数，将data2的第二层列索引转变成行索引（默认的，可以改变），便又得到了DataFrame（data3）：
在这里插入图片描述
3.可视化展示泰坦尼克号数据集中不同票价的人生存和死亡人数分布情况。（用折线图试试）（横轴是不同票价，纵轴是存活人数）:

此处使用value_counts()函数，因为’Fare’有很多值。

a.排序前：

fare_survived = text.groupby(['Fare'])['Survived'].value_counts()
fare_survived

Fare      Survived
0.0000    0           14
          1            1
4.0125    0            1
5.0000    0            1
6.2375    0            1
                      ..
247.5208  1            1
262.3750  1            2
263.0000  0            2
          1            2
512.3292  1            3
Name: Survived, Length: 330, dtype: int64

fig = plt.figure(figsize=(20, 18))
fare_survived.plot(grid=True)
plt.legend() # plt.legend()函数主要的作用就是给图加上图例
plt.show()

在这里插入图片描述
b.排序后：

fare_survived = text.groupby(['Fare'])['Survived'].value_counts().sort_values(ascending=False)
fare_survived

Fare     Survived
8.0500   0           38
7.8958   0           37
13.0000  0           26
7.7500   0           22
26.0000  0           16
                     ..
20.2500  1            1
         0            1
18.7875  1            1
         0            1
15.0500  0            1
Name: Survived, Length: 330, dtype: int64

fig = plt.figure(figsize=(20, 18))
fare_survived.plot(grid=True)
plt.legend() # plt.legend()函数主要的作用就是给图加上图例
plt.show()

在这里插入图片描述
4.可视化展示泰坦尼克号数据集中不同仓位等级的人生存和死亡人员的分布情况。（用柱状图试试）:

Pclass_survived = text.groupby(['Pclass'])['Survived'].value_counts()
Pclass_survived

Pclass  Survived
1       1           136
        0            80
2       0            97
        1            87
3       0           372
        1           119
Name: Survived, dtype: int64

import seaborn as sns
sns.countplot(x='Pclass', hue='Survived', data=text) # sns.conntplot()画柱状图

在这里插入图片描述
5.可视化展示泰坦尼克号数据集中不同年龄的人生存与死亡人数分布情况。(不限表达方式):

facet = sns.FacetGrid(text, hue='Survived', aspect=3)
facet.map(sns.kdeplot, 'Age', shade=True)
facet.set(xlim=(0, text['Age'].max()))
facet.add_legend()

在这里插入图片描述
6.可视化展示泰坦尼克号数据集中不同仓位等级的人年龄分布情况。（用折线图试试）：

text[text['Pclass'] == 1]['Age'].plot(kind='kde')
text[text['Pclass'] == 2]['Age'].plot(kind='kde')
text[text['Pclass'] == 3]['Age'].plot(kind='kde')
plt.xlabel('Age')
plt.legend((1, 2, 3), loc='best')

在这里插入图片描述