(Three) Matplotlib data visualization

Case data: https://cloud.189.cn/t/aYbUv2JbEzUn

1. Matplotlib drawing area

When drawing, it generally includes three levels from large to small: drawing board, canvas, and drawing area. The window is the drawing board, Figure is the drawing object, and Axes is the drawing area. A drawing object can contain multiple Axes subgraphs, and each Axes has its own drawing area of ​​the coordinate system.

You can use plt.gcf (get current figure) to get the current drawing object, plt.gca (get current Axes) to get the current drawing area, and plt.sca (set current figure) to set the drawing area of ​​the current operation.

1.1 Create drawing objects

  • fugure(num=None,figsize=None,dip=None,facecolor=None,...) 
  • num: image number or name
  • figsize: the width and height of the image, in inches (1 inch=2.54cm)
  • dip: Specify the resolution of the drawing object, that is, how many pixels per inch

1.2 Create a single subgraph

  • subplot(nrows,ncols,index,**kwargs)
  • nrows: total number of rows
  • ncols: total number of columns
  • index: Specify the number from left to right and top to bottom. If the three parameters nrows, ncols and index are all less than 10, you can remove them all, such as 222 

1.3 Example

fig = plt.figure(figsize=(10,8))
fig.suptitle("test1")
ax1 = plt.subplot(221)
ax1.set_title("test221")
ax1.plot([1,2,3,4,5],[4,8,12,16,20])

ax2 = plt.subplot(222)
ax2.plot([5,4,3,2,1])

ax3 = plt.subplot(223)
ax3.plot([1,2,3,3,3])

ax4 = plt.subplot(224)
ax4.plot([5,4,3,3,3])

2. The display problem of Chinese characters and minus signs 

plt.figure(figsize=(6,4))
# 中文显示问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.text(0,0,u'这是一个测试')

# 负号显示问题
plt.rcParams['axes.unicode_minus']=False

Three, common graphics

3.1 Scatter plot

# 设置图片大小
plt.rcParams['figure.figsize']=8,6
# 设置画板颜色
sns.set_palette(sns.color_palette("muted"))

# 随机生成50个点,x轴取值范围0-20
x = np.random.rand(50)*20

# 随机生成50个点,y轴取值范围0-10
y = np.random.rand(50)*10

plt.plot(x,y,'o')

3.2 Bubble chart

# 生成50个点
N = 50
x = np.random.rand(N)
y = np.random.rand(N)

# 点的颜色
colors = np.random.rand(N)
# 点的半径
area = (30*np.random.rand(N))**2
# alpha:透明度0.5
plt.scatter(x,y,s=area,c=colors,alpha=0.5)
plt.show()

3.3 Line graph

t = np.arange(0.0, 2.0, 0.1)
s = np.sin(t*np.pi)

plt.plot(t,s,'r--',label='aaaa')
plt.plot(t*2, s, 'b--', label='bbbb')
plt.xlabel('x')
plt.ylabel('y')
plt.title('test')
# legend  图例  说明
plt.legend()

3.4 Violin Diagram

Similar to a box plot, in addition to the maximum, minimum, and median, the curves on both sides of it also describe the probability density

data = np.random.rand(20,5)
plt.violinplot(data,showmedians=True,showmeans=False)

3.5 Histogram

# 导入包
import  pandas as pd
import seaborn as  sns
import numpy as np
import matplotlib.pyplot as plt
# 读取数据
df=pd.read_csv("./data/HR.csv")
s = df["salary"]
df.head(4)
# 柱状图
plt.title("SALARY")
plt.xlabel("salary")
plt.ylabel("count")

# 添加横轴的标注
plt.xticks(np.arange(len(s.value_counts()))+0.5,s.value_counts().index)

# 设置x轴和y轴最小值、最大值
plt.axis([0,4,0,10000])

# bar(x坐标int或者float,条形的高度int或者float,线条的宽度0-1)
plt.bar(np.arange(len(s.value_counts()))+0.5,s.value_counts(),width=0.5)

for x,y in zip(np.arange(len(s.value_counts()))+0.5,s.value_counts()):
        plt.text(x,y,y,ha="center",va="bottom")
    
plt.show()

 

# seaborn绘柱状图
# 设置背景颜色
sns.set_style(style="darkgrid")
plt.rcParams['figure.figsize']=6,6
# 设置字体、字体大小等
sns.set_context(context="poster",font_scale=0.8)
# 设置调色板
# sns.set_palette("spring")
sns.set_palette(sns.color_palette("RdBu",n_colors=7))
sns.countplot(x="salary",data=df)

'''
plt.rcParams['font.sans-serif']=['SimHei'] # 用来显示中文标签
plt.rcParams['axes.unicode_minus']=False # 用来显示负号
plt.rcParams['figure.figsize'] = (16.0, 10.0) # 调整生成的图表最大尺寸
'''
# 设置图片大小
# hue多层绘制
plt.rcParams['figure.figsize']=10,8
sns.countplot(x="salary",hue="department",data=df)

 

3.6 Histogram

# 去除异常值
df=df.dropna(how="any",axis=0)
df=df[df["last_evaluation"]<=1][df["salary"]!="nme"][df["department"]!="sale"]

# 直方图
f = plt.figure()
f.add_subplot(1,3,1)
# distplot(kde=False):不显示包围曲线
# distplot(hist=False):不显示直方图
sns.distplot(df["satisfaction_level"],kde=False,bins=10)
f.add_subplot(1,3,2)
sns.distplot(df["last_evaluation"],bins=10)
f.add_subplot(1,3,3)
sns.distplot(df["average_monthly_hours"],bins=10)
plt.show()

3.7 Box plot

#箱线图
sns.boxplot(y=df["last_evaluation"],saturation=0.75)
plt.show()

3.8 Line chart

# 折线图
# 第一种画法
# sub_df=df.groupby("time_spend_company").mean()
# sns.pointplot(x=sub_df.index,y=sub_df["left"])

# 另一种画法
sns.pointplot(x="time_spend_company",y="left",data=df)

3.9 Pie Chart

# lables标签
lbs=df["department"].value_counts().index
# explode:离开整体,如果某个区域等于sales,离开0.1倍的间隔
explodes=[0.1 if i=="sales" else 0 for i in lbs ]

# autopct:显示百分比,精度小数点后一位
plt.pie(df["department"].value_counts(normalize=True),explode=explodes,autopct='%1.1f%%',colors=sns.color_palette("Reds", n_colors=7),labels=lbs)
plt.show()

 

Guess you like

Origin blog.csdn.net/qq_29644709/article/details/114697327