Python学习之数据可视化

常用Python包

  • Matplotlib
  • Seaborn
  • Pandas
  • Bokeh
  • Plotly
  • Vispy
  • Vega
  • gaga-lite

Matplotlib可视化

Matplotlib安装

pip install matplotlib-i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

如果失败了可以试试这样:
先更新pip,在安装matplotlib

python -m pip install -U pip setuptools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
python -m pip install matplotlib -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

Matplotlib包括两个模板

  1. 绘图API:pyplot,通常用于可视化
  2. 集成库:pylab,是Matplotlib和SciPy、NumPy的集成库

Matplotlib绘图的两种方式

  1. inline,静态绘图
  2. notebook,交互式图

在二维坐标上绘图plt.plot()
plt.show()显示结果

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"])
plt.show()

在这里插入图片描述
实现显示多条线条的方法plt.plot(x,y1,x,y2,x,y3…)

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0.0, 4.0, 0.1)
print(t)
plt.plot(t, t, t, t + 2, t, t ** 2, t, t + 8)
plt.show()

在这里插入图片描述

改变图的属性

  1. 设置点的类型
    在plt.plot()中增加第三个实参的取值,如‘o’
import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"],'o')
plt.show()
plt.plot(women["height"],women["weight"],'D')
plt.show()

在这里插入图片描述
在这里插入图片描述

  1. 设置线的颜色和形状
    改变plt.plot()的第三个实参
import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.plot(women["height"],women["weight"],'g--')
plt.show()
plt.plot(women["height"],women["weight"],'rD')
plt.show()

在这里插入图片描述
在这里插入图片描述
具体用法可以参考这两篇

https://blog.csdn.net/cjcrxzz/article/details/79627483
https://blog.csdn.net/sinat_36219858/article/details/79800460?utm_source=distribute.pc_relevant.none-task

  1. 显示汉字

放在plot前
汉字常用字体:SimHei、Kaiti、Lisu、Fangsong、YouYuan

plt.rcParams['font.family'] = 'SimHei'
  1. 设置图名以及x/y轴名称

plt.title()、plt.xlabel()、plt.ylabel()分别为图的标题、x坐标名和y坐标名

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.rcParams['font.family'] = 'SimHei'
plt.plot(women["height"], women["weight"], 'g--')
plt.title("此处为图名")
plt.xlabel("x轴的名称")
plt.ylabel("y轴的名称")
plt.show()

在这里插入图片描述

  1. 图例的位置
    首先在plt.plot()加上label参数,再使用plt.legend(loc = )loc为位置,可设置为如"upper left"。显示的是图例,即lebel的内容
import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.rcParams['font.family'] = 'SimHei'
plt.plot(women["height"], women["weight"], 'g--', label='weight')
plt.title("此处为图名")
plt.xlabel("x轴的名称")
plt.ylabel("y轴的名称")

plt.legend(loc="upper left")
plt.show()

在这里插入图片描述

改变图的类型

plt.scatter()散点图

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.scatter(women["height"], women["weight"])
plt.show()

在这里插入图片描述

改变图的坐标轴的取值范围

定义横坐标:plt.xlim()
定义纵坐标:plt.ylim()
同时定义横、纵坐标:plt.axis()
np.linspace(0,10,100)功能为返回一个含有100个元素且每个元素取值范围为[0,100]的等距离数列

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.xlim(11, -2)  # x轴取值范围为[11,-2]
plt.ylim(2.2, -1.3)  # y轴取值范围为[2.2,-1.3]
plt.show()

在这里插入图片描述
plt.axis(a1,a2,b1,b2):a1和a2为x轴的取值范围,b1和b2为y轴的取值范围

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis([-1, 21, -1.6, 1.6])
plt.show()

在这里插入图片描述
plt.axis("equal’)x轴和y轴的刻度单位一样

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis("equal")
plt.show()

在这里插入图片描述

去掉边界的空白

plt.axis(“tight”)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.axis("tight")
plt.show()

在这里插入图片描述

在同一个坐标上画两个图

定义多个plt.plot()

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x),label="sin(x)")
plt.plot(x, np.cos(x),label="cos(x)")
plt.axis("tight")
plt.legend()
plt.show()

在这里插入图片描述

多图显示

plt.subplot(x,y,z)表示的是接下面的图显示位置是x*y个窗口的第z个窗口

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.subplot(2, 3, 5)  # 2*3个窗口的第5个窗口
plt.scatter(women["height"], women["weight"])
plt.subplot(2, 3, 1)  # 2*3个窗口的第1个窗口
plt.scatter(women["height"], women["weight"])
plt.show()

在这里插入图片描述

图的保存

将plt.show()替换为plt.savefig(“图片名称.图片格式”)
保存在当前工作目录

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
plt.subplot(2, 3, 5)  # 2*3个窗口的第5个窗口
plt.scatter(women["height"], women["weight"])
plt.subplot(2, 3, 1)  # 2*3个窗口的第1个窗口
plt.scatter(women["height"], women["weight"])
plt.savefig("sagefig.png")

在这里插入图片描述
在这里插入图片描述

散点图的画法

sklearn模块下载

pip install sklearn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

make_blobs:生成符合正态分布的随机数据集
参数:

  • n_samples:样本数量,即行数
  • n_features:每个样本的特征数量,即列数
  • centers:类别数
  • random_state:随机数的生成方式
  • cluster_std:每个类别的方差

返回值:

  • X:测试集,类型为数组,形状为[n_samples,n_features]
  • y:每个成员的标签(label),也是个数组,形状为[n_samples]的数组

plt.scatter()的参数

  • X[:,0]和X[:,1]分别为x坐标和y坐标
  • c为颜色
  • s为点的大小
  • cmap为色带,是c的补充
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=300, centers=4, random_state=0, cluster_std=1.0)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap="rainbow")
plt.show()

在这里插入图片描述

Pandas可视化

Pandas的画图函数,使得DataFrame类的数据可视化更加容易
Pandas的plot(kind=)参数决定了图的类别

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar")
plt.show()

在这里插入图片描述
barh代表的是横向柱状图

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="barh")
plt.show()

在这里插入图片描述

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar", x="height", y="weight", color='g')
plt.show()

在这里插入图片描述
kde表示为核密度估计曲线

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="kde")
plt.show()

在这里插入图片描述

plt.legend(loc=“best”)使图例位置最优

import matplotlib.pyplot as plt
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
women.plot(kind="bar", x="height", y="weight", color='g')
plt.legend(loc="best")
plt.show()

在这里插入图片描述

Seaborn可视化

cumsum为Matlab中的一个函数,通常用于计算一个数组各行的累加值,语法为:B = cumsum(A,dim),或B = cumsum(A)
plt.legend()的功能为设置图例参数

  • 图例内容:abcdef
  • 图例列数:ncol = 2
  • 图例的显示位置:loc = “upper left”
import matplotlib.pyplot as plt
import numpy as np

plt.style.use("classic")
Rng = np.random.RandomState(0)
X = np.linspace(0, 10, 500) # 生成500个0~10之间的数
y = np.cumsum(Rng.randn(500, 6), 0)
plt.plot(X, y)
plt.legend("abcdef", ncol=2, loc="upper left")
plt.show()

在这里插入图片描述
Seaborn下载

pip install seaborn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

加上Seaborn可以使图形更加美观

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

plt.style.use("classic")
Rng = np.random.RandomState(0)
X = np.linspace(0, 10, 500)
y = np.cumsum(Rng.randn(500, 6), 0)
sns.set()
plt.plot(X, y)
plt.legend("abcdef", ncol=2, loc="upper left")
plt.show()

在这里插入图片描述

核密度估计图(KDE)

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.kdeplot(women.height,shade=True)
plt.show()

在这里插入图片描述
sns.distplot()绘制displot图,功能为直方图+kdeplot

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.distplot(women.height)
plt.show()

在这里插入图片描述
sns.pairplot():散点图矩阵

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.pairplot(women)
plt.show()

在这里插入图片描述

sns.jointplot()联合分布图

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
sns.jointplot(women.height, women.weight, kind="reg")
plt.show()

在这里插入图片描述
用with同样可以改变参数,注意要加:,同时注意缩进

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
with sns.axes_style("white"):
    sns.jointplot(women.height, women.weight, kind="reg")
plt.show()

在这里插入图片描述

plt.hist()为绘制直方图
还可以将Seaborn放在for循环里将多个变量画在一起

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

dt = {'height': pd.Series([58, 59, 60, 61, 62], index=[0, 1, 2, 3, 4]),
      'weight': pd.Series([115, 117, 120, 123, 126], index=[0, 1, 2, 3, 4])}
women = pd.DataFrame(dt)
print(women)
'''
   height  weight
0      58     115
1      59     117
2      60     120
3      61     123
4      62     126
'''
for x in ["height", "weight"]:
    plt.hist(women[x], normed=True, alpha=0.5)
plt.show()

在这里插入图片描述
更多Seaborn操作参考

https://www.jianshu.com/p/844f66d00ac1

数据可视化实战

  1. 数据准备
 import os
print(os.getcwd())#E:\py_workspace\test2

用pandas中的read_csv()读取到内存对象salaries中

import pandas as pd

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列

查看数据

import pandas as pd

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列
print(salaries.head())
'''
       rank discipline  yrs.since.phd  yrs.service   sex  salary
1      Prof          B             19           18  Male  139750
2      Prof          B             20           16  Male  173200
3  AsstProf          B              4            3  Male   79750
4      Prof          B             45           39  Male  115000
5      Prof          B             40           41  Male  141500
'''
  1. 导入Python包
import seaborn as sns
import matplotlib.pyplot as plt
  1. 可视化绘图

sns.set_style(‘darkgrid’)设置Seaborn的绘图样式或主题为darkgrid(灰色+网格)
sns.stripplot()为绘制散点图
参数:

  • data:数据来源
  • x:设置x轴
  • y:设置y轴
  • jitter:是否抖动
  • alpha:透明度
    sns.boxplot()为绘制箱线图
    参数:
  • data:数据来源
  • x:设置x轴
  • y:设置y轴
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

salaries = pd.read_csv("salaries.csv", index_col=0)
# index_col=0使读取的数据文件带有索引列且索引列位于第0列
print(salaries.head())
'''
       rank discipline  yrs.since.phd  yrs.service   sex  salary
1      Prof          B             19           18  Male  139750
2      Prof          B             20           16  Male  173200
3  AsstProf          B              4            3  Male   79750
4      Prof          B             45           39  Male  115000
5      Prof          B             40           41  Male  141500
'''
sns.set_style('darkgrid')
sns.stripplot(data=salaries, x='rank', y='salary', jitter=True, alpha=0.5)
sns.boxplot(data=salaries, x='rank', y='salary')
plt.show()

在这里插入图片描述

发布了28 篇原创文章 · 获赞 0 · 访问量 855

猜你喜欢

转载自blog.csdn.net/weixin_43866408/article/details/104398299