python - matplotlib improve application plt.bar, plt.hist, plt.plot, plt.boxplot, plt.scatter

Histogram plt.bar

Stacking the same histogram display plt.pivot () + plt.bar (bottom =)

For different quarter, evaluate the quality of different products show, first use the data content of the data screened by pd.pivot_table (), re-set the index column and the column headings (columns), and then display the data in a Bar

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data=pd.read_csv(r'my_csv_date.csv',encoding='gbk')
print(data)
#解决中文乱码问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(6.4,4.8))#设置图片背景的参数
print('下面是整理后的数据:==========')
#通过pivot_table函数重新提取数据两列作为(data)索引列和横向的列名(columns),另外的vlues对应的是数列,注意的是,尽量让索引是和数字有关的序列,可自动排序
over_view=pd.pivot_table(data=data,index='Q',columns='GOODS',values='QUA',aggfunc=np.sum)
print(over_view)
#使用tick_label对x轴的标签重新赋值
plt.bar(x= over_view.index.values,height=over_view.,color='green',tick_label=['第一季度','第二季度','第三季度','第四季度'] )
plt.bar(x= over_view.index.values,height=over_view.,bottom=over_view.,color='red')
plt.bar(x= over_view.index.values,height=over_view.,bottom=over_view.+over_view.,color='orange')
plt.xticks(rotation=45)	#x轴上的标签旋转45度
plt.ylabel('质量分数',fontsize=20,labelpad=20)
plt.show()

Here Insert Picture Description
And by a program running on top of the results, draw attention to the following points,
1, indexes to make use of numeric sort of characters , it can be automatically sorted, and finally to the need to use plt.bar (tick_label = []) to modify the character of the abscissa label to
2, when creating the histogram, the data used are themselves the post-processing variables and column indexes, and data processing is generally used np.sum (), a record type number.
3, in order to display the graphics will not be blocked , you need to plt.bar (bottom =), will want to put the data below as values passed, if not in use , each time the most low-end graphics are based on the horizontal axis counted, if there completely blocked, the graphic meaningless
in the example above, the quality of service for each quarter, scoring three products. Let quarter and item name (ABC) as the horizontal and vertical parameters, service quality scores as a data display

Stacking the same histogram display plt.crosstab (nomalize =) + plt.bar (bottom =)

Stack display, and the unit of data selection, it is clear that each stage of the percentage amount of data ;
and then draw bar graphics used to set the value of the label attribute , otherwise, there will be problems using plt.legend
PLT .legend () display position is inappropriate, by plt.legend () in bbox_to_anchor = (coordinates) , that the upper right corner of the pattern is (1,1), may be provided (1.01,0.8) content annotation in the image external

#前面和上个程序一样
over_view=pd.crosstab(data.Q,data.GOODS,normalize='index', values=data.QUA,aggfunc=np.sum,)
print(over_view)
plt.bar(x= over_view.index.values,height=over_view.,color='green',label='甲',tick_label=['第一','第二','第三','第四'] )
plt.bar(x= over_view.index.values,height=over_view.,bottom=over_view.,label='乙',color='red')
plt.bar(x= over_view.index.values,height=over_view.,label='丙',bottom=over_view.+over_view.,color='orange')
plt.xticks(rotation=45)#旋转横坐标标签
plt.ylabel('质量分数',fontsize=20,labelpad=20)
plt.legend(bbox_to_anchor=(1.01,0.8))
plt.show()

Here Insert Picture Description
Program Analysis: This program and draw graphics via pd.crosstab () the data owned by each line (ie, index of each row) of a

Histogram plt.hist

Normal distribution function, the following function returns the corresponding calculation value of the output frequency of the normal

def zhengtai_func(x,miu,sigma):
    zhen_y=np.exp(-(x-miu)**2/(2*(sigma**2)))/(sigma*np.sqrt(2*np.pi))
    return zhen_y

Normal calculation formula;
Here Insert Picture Description

data=pd.read_csv(r'my_csv_date.csv',encoding='gbk')
print(data)
#解决中文乱码问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False

#删除数据中缺失项,计算数据的均值,标准差,和正态分布频率点
data.dropna(subset=['人数'],inplace=True)
print(data)
#x_mean=np.mean(data.人数) #与下一行数据功能相同
x_mean=data.人数.mean()
x_std=np.std(data.人数)
x=np.arange(data.人数.min(),data.人数.max(),0.1)
y=zhengtai_func(x,x_mean,x_std)

#显示直方图,核密度图,设置注释标签label                      
plt.hist(x=data.人数,bins=5,color='lightblue',label='年龄频率',\
         edgecolor='orange',density=True)
plt.plot(x,y,color='red',linewidth=3,label='正态分布线')
data.人数.plot(kind='kde',color="black",xlim=[0,10],label='核密度图')
plt.xlabel('不同年龄',fontsize=15,labelpad=20)
plt.title('年龄频率分布图',fontsize=25,pad=25)
plt.legend(loc='best')
plt.show()

Here Insert Picture Description
And pattern analysis program:
1, to define a normal probability calculation, the reference numbers respectively, the data x, the average x_mean, standard deviation x_std, returns the calculated probability values ,
2, and then reads the data, data.dropna () delete missing data line, the data calculated mean, standard deviation, x takes a plurality of values, by the normal equation shown as line (mean and standard deviation data is used to determine the normal probability distribution data).
3, histogrammed plt.hist (), normal line plt.plot (x, y), the nuclear density distribution line data.plot (kind = 'KDE'),
. 4, the difference between the normal distribution and nuclear density distribution: greater relationship between every phase of the nuclear density distribution and data, the normal distribution is determined after determining the mean and standard deviation of the mean value

FIG tank plt.boxplot ()

Reference: box knowledge , the parameters of knowledge
Here Insert Picture Description
image

plt.boxplot ( X, Notch = None, sym = None, Vert = None, WHIS = None, Positions = None, WIDTHS = None, patch_artist = None, on Bootstrap = None, usermedians = None, conf_intervals = None, meanline = None, showmeans = None, showcaps = None, showBox = None, showfliers = None, boxprops = None, Labels = None, flierprops = None, medianprops = None, meanprops = None, capprops = None, whiskerprops = None, manage_xticks = True, AutoRange = false, ZOrder = None, HOLD = None, data = None)
** X: Specifies the rendering data ** boxplot;
notch: whether in the form of a recess show box plots, the default non-notch;
sym: Specifies abnormally shaped point, the default is the + sign;
Vert: whether the box plots arranged vertically, perpendicular to the default display;
whis: Specifies the vertical distance required to position the upper and lower quartile, 1.5 times the default Interquartile;
Positions: boxplot specified location, default [0,1,2 ...];
WIDTHS : Specifies boxplot width, the default is 0.5;
** patch_artist: ** whether the fill color of the box;
** meanline: ** indicates whether the form of the mean line, the default represented by dots;
** showmeans: ** whether the mean default is not displayed;
showcaps: whether the top two lines and the terminal box plots displayed by default;
** showBox: ** Are boxplot display box, the default display;
showfliers : whether to display an abnormal value, the default display;
* * boxprops: ** set box attributes, such as border color, fill color, and the like;
labels: adding labels as box plots, similar to the role of legends;
filerprops : set the property value is abnormal, the abnormal point such as the shape, size, fill the like;
** medianprops: ** median setting properties, such as line type, thickness and the like;
** meanprops: ** mean setting properties, such as point size, color, and the like;
** capprops: ** set top box plot and end lines ; Properties, such as color, thickness, etc.
whiskerprops: setting the required properties, such as color, thickness, type of cable and the like;

data=pd.read_csv(r'my_csv_date.csv',encoding='gbk')
print(data)
#解决中文乱码问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False

plt.boxplot(x=data.人数,patch_artist=True,showmeans=True,meanline=True, boxprops={'facecolor':'green','color':'red'},\
            flierprops={'marker':'o','markerfacecolor':'red','markersize':20},\
           medianprops={'linestyle':'--','color':'orange'},\
           meanprops={'linestyle':'-','color':'blue'})
plt.xlabel('不同标号的人数',fontsize=15,labelpad=20)
plt.title('人数箱图',fontsize=25,pad=25)
# plt.legend(loc='best')
plt.show()

Here Insert Picture Description
And interprets the graphical depiction, FIG box can obviously show outliers, the display frame box 25% -75% quantile offline: -1.5 25% fraction (75% -25% quantile points the number of bits), on-line: 25% +1.5 fraction (75% -25% quantile quantile),

Scatter plt.scatter

Thank Iris data
iris data official website , download or open slowly
in order to show the relationship between two continuous variables, using a loop for each of the things to different display attributes, note the name of the accuracy of the data

data=pd.read_csv(r'iris.csv',encoding='gbk')

# print(data)
#解决中文乱码问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False

#设置不同物种的颜色形状标记
species=['virginica','setosa','versicolor']
colors=['red','blue','orange']
marker=['o','s','x']

#给每个物种不同标记
for i in range(0,3):
    plt.scatter(x=data.Width[data.Species==species[i]],
                y=data.Length[data.Species==species[i]],
                color=colors[i],marker=marker[i],label=species[i])
plt.legend(loc='best')
plt.ylabel('花瓣长度',fontsize=15,labelpad=10)
plt.xlabel('花瓣宽度',fontsize=15,labelpad=10)
plt.title('三种鸢尾花数据',fontsize=20,pad=20)

plt.show()

Here Insert Picture Description

Line graph plt.plot ()

Trend data display

#设置图像大小和读取数据
fig=plt.figure(figsize=(8,7))
data=pd.read_csv(r'my_csv_date.csv',encoding='gbk')
print(data)

#解决中文乱码问题
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False

#设置第1列作为x的值,第3,4,5列作为对应的y值
plt.plot(data.iloc[:,0],data.iloc[:,2],'bs--',
        data.iloc[:,0],data.iloc[:,3],'ro--',
        data.iloc[:,0],data.iloc[:,4],'gh--',)

#修改折线图的横坐标名称,第一个参数是原始名称,第二个参数是新的名称
plt.xticks(range(0,12,1),data.iloc[range(0,12,1),1],rotation=45,fontsize=10)
plt.legend(loc='best')
plt.ylabel('数据值',fontsize=15,labelpad=10)
plt.xlabel('顺序',fontsize=15,labelpad=10)
plt.title('数据变化',fontsize=20,pad=20)

plt.show()

Here Insert Picture Description
The following is a description of the center of the image, (programs and image do not correspond), picture parameter correctly

plt.legend(loc='best',frameon=False,ncol=1)
plt.legend(loc='best',frameon=True,ncol=1)
plt.legend(loc='best',frameon=False,ncol=3)

Here Insert Picture Description

Published 70 original articles · won praise 1 · views 2414

Guess you like

Origin blog.csdn.net/weixin_43794311/article/details/105102003