[Python Data Processing - DataFrame Data Visualization] Pie chart, scatter chart, line chart, column chart, histogram

Welcome to my [Zhihu Account] where I do things:Coffee
and my [Station B Marvel Editing Account]:Coffee a> If my notes are helpful to you, please use your little finger to give me a big like. VideosMan

I have also summarized the relevant knowledge about DataFrame. Welcome to like and collect it! !

[Python Study Notes—Nanny Edition] Chapter 4—About Pandas, data preparation, data processing, data analysis, and data visualization


5. Data visualization

Related Notes

4.4 Python data processing: Matplotlib series (4)—plt.bar() and plt.barh bar charts

pull out long canvas

fig = plt.figure(figsize=(12,4))    # 设置画布大小

Adjust label font size

plt.tick_params(axis='x', labelsize=8)    # 设置x轴标签大小

label rotation

plt.bar(df['sport_type'], df['score'])

4.5.1 Pie chart: plt.pie(gb2.Number of people, labels=gb2.index, autopct='%.2f%%', colors=['b', 'pink', (0.5, 0.8, 0.3)], explode=[0, 0, 0, 0, 0.1])

Pie Graph: Also known as a circle chart, it is a circular statistical chart divided into several sectors. It can intuitively reflect the proportional relationship between an individual and the whole.

pie(x,labels,colors,explode,autopct)

x         进行绘图的序列
labels     饼图的各部分标签
colors     饼图的各部分颜色,使用GRB标颜色
explode    需要突出的块状序列
autopct    饼图占比的显示格式,%.2f:保留两位小数
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx')
df

gb=df.groupby(by=['班级'])['学号'].agg([('人数',np.size)])

Insert image description here

  plt.pie(gb.人数,labels=gb.index,autopct='%.2f%%',colors=['b','pink',(0.5,0.8,0.3)],explode=[0,0.2,0])

Insert image description here

practise

df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
df2['C语言程序设计']=pd.cut(df2.C语言程序设计,bins=[0,60,70,80,90,101],right=False,labels=['不及格','及格','中等','良好','优秀']) #right=False 控制左闭右开
gb2=df2.groupby(by=['C语言程序设计'])['学号'].agg([('人数',np.size)]).fillna(0)  #fillna(0)填充空值
plt.pie(gb2.人数,labels=gb2.index,autopct='%.2f%%',colors=['b','pink',(0.5,0.8,0.3)],explode=[0,0,0,0,0.1])
plt.rcParams['font.sans-serif']=['SimHei']  #字体
plt.rcParams['font.size']=30  #字体大小
plt.rcParams['figure.figsize']=[6,6]   #正圆

Insert image description here

  • plt.rcParams, something that displays the set value font

    plt.rcParams
    
    plt.rcParams['font.sans-serif']
    Out[43]: 
    ['DejaVu Sans',
     'Bitstream Vera Sans',
     'Computer Modern Sans Serif',
     'Lucida Grande',
     'Verdana',
     'Geneva',
     'Lucid',
     'Arial',
     'Helvetica',
     'Avant Garde',
     'sans-serif']
    
    plt.rcParams['font.sans-serif']=['SimHei']
    plt.rcParams['font.sans-serif']=['SimHei','...','...'] #没有的往后找
    

4.5.2 Scatter plot: plt.plot(df.high generation,df.score,‘o’,color=‘pink’)

Scatter diagram: It is a graph that uses one variable as the abscissa and another variable as the ordinate, and uses the distribution shape of scatter points (coordinate points) to reflect the relationship between variables.

plot(x,y, '. ',color=(r,g,b))
plt.xlabel('x轴坐标')
plt.ylabel('y轴坐标')
plt.grid(Ture)
x、y        X轴和Y轴的序列
'. ''o'      小点还是大点
Color       散点图的颜色,可以用rgb定义,也可以用英文字母定义
RGB颜色的设置:(red,green,blue)  红绿蓝颜色组成
df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx')
df

gb=df.groupby(by=['班级'])['学号'].agg([('人数',np.size)])
plt.plot(df.英语,df.数分,'.',color='g')
plt.xlabel('英语')
plt.xlabel('数分')
plt.plot(df.高代,df.数分,'o',color='pink')
plt.plot(df.高代,df.数分,'o',color='pink')

Insert image description here

plt.plot(df.高代,df.数分,'-',color='pink') #连线

4.5.3 Line chart: plt.plot(df.student number,df.total score,‘-’,color=‘r’)

Parameter value Comment
- continuous curve
continuous dashed line
-. Continuous using curves with points
: curve formed by points
. small dots,scatter plot
O big dot,scatter plot
, Scatter plot of pixels (smaller points)
* Scatter plot of five-pointed stars
> Right corner marker scatter plot
< left corner marker scatter plot
1(2,3,4) Umbrella upper (bottom, left and right) labeled scatter plot
s Square Marker Scatter Plot
p Pentagram Marker Scatter Plot
in Lower Triangle Marker Scatter Plot
^ Upper triangle mark scatter plot
h Polygon markers scatter plot
d Diamond Marker Scatter Plot
       df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx')
       df
       plt.plot(df.学号,df.总分,'-',color='r')

Student number, class name, gender, English, physical education, military training, numerical scores, advanced algebra, computer basics, total score
0 2308024241 23080242 Jackie Chan male 76 78 77 40 23 60 89 443< a i=2> 1 2308024244 23080242 Zhou Yinu66 91 75 47 47 44 82 452 2 2308024251 23080242 Zhang Bonan85 81 75 45 45 60 80 471 3 2308024249 23080242 Zhu Haonan 65 50 80 72 62 71 82 482 4 2308024219 23080242 Seal Female 73 88 92 61 47 46 83 490 490 5 23080 24201 23080242 Chi Peinan 60 50 89 71 76 71 82 499 6 2308024347 23080243 Li Huanv67 61 84 61 65 78 83 499 7 2308024307 23080243 Chen Tiannan76 79 86 69 40 69 82 501 8 2308024326 23080243 Yu Haonan66 67 85 65 61 71 95 510 9 2308024320 23080243 Li Jianu 62 60 90 60 67 77 95 51 1< a i=11> 10 2308024342 23080243 Li Shangchunan76 90 84 60 66 60 82 518 11 2308024310 23080243 Guo Dounu79 67 84 64 64 79 85 522 12 2308024435 23080244 Jiang Yitao male77 71 87 61 73 76 82 527 13 2308024432 23080244 Zhao Yunan74 74 88 68 70 71 85 530 14 2308024446 23080244 Zhou Lunu 76 80 77 61 74 80 85 533 15 2308024421 23080244 Lin Jianxiang male 72 72 81 63 90 75 85 538 16 2308024433 23080244 Li Daqiang male 79 76 77 78 70 70 89 539 17 2308024428 23080244 Li Xiotong, male 64 96 91 69 60 77 83 540 18 2308024402 23080244 Wang Huinv, 73 74 93 70 71 75 88 544 19 2308024422 23080244 Li Xiaoliang Male 85 60 85 72 72 83 89 546


















Insert image description here

      df=df.sort_values('学号') 

Insert image description here

     df['学号后三位']=df.学号.astype(str).str.slice(-3,) #分离学号后三位,并加入新一列(不会影响df)
     plt.plot(df.学号后三位,df.总分,'-',color='r')  #画图
     plt.xticks(rotation=60)  #标签旋转度数

Insert image description here

      df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
      plt.plot(df2.姓名,df2.C语言程序设计,'--',color='g')
      plt.xlabel('姓名')
      plt.ylabel('C语言程序设计')
      plt.xticks(rotation=90)
      plt.rcParams['font.sans-serif']=['SimHei']
      plt.rcParams['figure.figsize']=[10,6] 

4.5.4 Column chart: plt.bar(df. last three digits of student number, df. total score, width=1, color=[‘r’, ‘b’])

A column chart is used to display changes in data over a period of time or to display comparisons between items. It is a rectangle with a unit length. It is a statistical chart drawn based on the size of the data. It is used to compare two or more data (time or category).

 bar(left,height,width,color)
 barh(bottom,width,height,color)
 left      x轴的位置序列,一般采用arange函数产生一个序列
 height    y轴的数值序列,也就是柱形图高度,一般就是我们需要展示的数据
 width    柱形图的宽度,一般设置为1即可
 color    柱形图填充颜色
df['学号后三位']=df.学号.astype(str).str.slice(-3,)
plt.bar(df.学号后三位,df.总分,width=1,color=['r','b'])  #柱形图
plt.xticks(rotation=60)

plt.barh(df.学号后三位,df.总分,0.6,color=['r','b'])  #条形图

bar
bar

barh
barh

4.5.5 Histogram: plt.hist(df2.C language programming, bins=10, color=‘g’, cumulative=True)

Histogram: It is drawn with a series of rectangles of equal width and unequal height. The width represents the interval of the data range, the height represents the frequency of data occurrence within a given interval, and the changing height shape represents the distribution of the data.

How often to view data

hist(x,color,bins,cumulative=False)
	  x           需要进行绘制的向量
	  color        直方图填充的颜色
	  bins         设置直方图的分组个数
	  cumulative   设置是否累积计数,默认是False
df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
plt.hist(df2.C语言程序设计,bins=10,color='g',cumulative=False)
plt.hist(df2.C语言程序设计,bins=10,color='g',cumulative=True)

 cumulative=False
cumulative=False
cumulative=True
cumulative=True
bins=20
bins=20

Guess you like

Origin blog.csdn.net/Yedge/article/details/127593686