Welcome to my [Zhihu Account] where I do things:Coffee
and my [Station B Marvel Editing Account]:Coffee a> If my notes are helpful to you, please use your little finger to give me a big like. VideosMan
I have also summarized the relevant knowledge about DataFrame. Welcome to like and collect it! !
Python data processing - DataFrame data visualization
- 5. Data visualization
-
- Related Notes
- 4.5.1 Pie chart: plt.pie(gb2.number of people, labels=gb2.index, autopct='%.2f%%', colors=['b', 39;pink', (0.5, 0.8, 0.3)], explode=[0, 0, 0, 0, 0.1])
- 4.5.2 Scatter plot: plt.plot(df.high generation,df.score,'o',color='pink')
- 4.5.3 Line chart: plt.plot(df.student number,df.total score,'-',color='r')
- 4.5.4 Column chart: plt.bar(df. last three digits of student number, df. total score, width=1, color=['r','b'])
- 4.5.5 Histogram: plt.hist(df2.C language programming, bins=10, color='g', cumulative=True)
5. Data visualization
Related Notes
4.4 Python data processing: Matplotlib series (4)—plt.bar() and plt.barh bar charts
pull out long canvas
fig = plt.figure(figsize=(12,4)) # 设置画布大小
Adjust label font size
plt.tick_params(axis='x', labelsize=8) # 设置x轴标签大小
label rotation
plt.bar(df['sport_type'], df['score'])
4.5.1 Pie chart: plt.pie(gb2.Number of people, labels=gb2.index, autopct='%.2f%%', colors=['b', 'pink', (0.5, 0.8, 0.3)], explode=[0, 0, 0, 0, 0.1])
Pie Graph: Also known as a circle chart, it is a circular statistical chart divided into several sectors. It can intuitively reflect the proportional relationship between an individual and the whole.
pie(x,labels,colors,explode,autopct)
x 进行绘图的序列
labels 饼图的各部分标签
colors 饼图的各部分颜色,使用GRB标颜色
explode 需要突出的块状序列
autopct 饼图占比的显示格式,%.2f:保留两位小数
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx')
df
gb=df.groupby(by=['班级'])['学号'].agg([('人数',np.size)])
plt.pie(gb.人数,labels=gb.index,autopct='%.2f%%',colors=['b','pink',(0.5,0.8,0.3)],explode=[0,0.2,0])
practise
df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
df2['C语言程序设计']=pd.cut(df2.C语言程序设计,bins=[0,60,70,80,90,101],right=False,labels=['不及格','及格','中等','良好','优秀']) #right=False 控制左闭右开
gb2=df2.groupby(by=['C语言程序设计'])['学号'].agg([('人数',np.size)]).fillna(0) #fillna(0)填充空值
plt.pie(gb2.人数,labels=gb2.index,autopct='%.2f%%',colors=['b','pink',(0.5,0.8,0.3)],explode=[0,0,0,0,0.1])
plt.rcParams['font.sans-serif']=['SimHei'] #字体
plt.rcParams['font.size']=30 #字体大小
plt.rcParams['figure.figsize']=[6,6] #正圆
-
plt.rcParams, something that displays the set value font
plt.rcParams plt.rcParams['font.sans-serif'] Out[43]: ['DejaVu Sans', 'Bitstream Vera Sans', 'Computer Modern Sans Serif', 'Lucida Grande', 'Verdana', 'Geneva', 'Lucid', 'Arial', 'Helvetica', 'Avant Garde', 'sans-serif'] plt.rcParams['font.sans-serif']=['SimHei'] plt.rcParams['font.sans-serif']=['SimHei','...','...'] #没有的往后找
4.5.2 Scatter plot: plt.plot(df.high generation,df.score,‘o’,color=‘pink’)
Scatter diagram: It is a graph that uses one variable as the abscissa and another variable as the ordinate, and uses the distribution shape of scatter points (coordinate points) to reflect the relationship between variables.
plot(x,y, '. ',color=(r,g,b))
plt.xlabel('x轴坐标')
plt.ylabel('y轴坐标')
plt.grid(Ture)
x、y X轴和Y轴的序列
'. '、'o' 小点还是大点
Color 散点图的颜色,可以用rgb定义,也可以用英文字母定义
RGB颜色的设置:(red,green,blue) 红绿蓝颜色组成
df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx')
df
gb=df.groupby(by=['班级'])['学号'].agg([('人数',np.size)])
plt.plot(df.英语,df.数分,'.',color='g')
plt.xlabel('英语')
plt.xlabel('数分')
plt.plot(df.高代,df.数分,'o',color='pink')
plt.plot(df.高代,df.数分,'o',color='pink')
plt.plot(df.高代,df.数分,'-',color='pink') #连线
4.5.3 Line chart: plt.plot(df.student number,df.total score,‘-’,color=‘r’)
Parameter value | Comment |
---|---|
- | continuous curve |
— | continuous dashed line |
-. | Continuous using curves with points |
: | curve formed by points |
. | small dots,scatter plot |
O | big dot,scatter plot |
, | Scatter plot of pixels (smaller points) |
* | Scatter plot of five-pointed stars |
> | Right corner marker scatter plot |
< | left corner marker scatter plot |
1(2,3,4) | Umbrella upper (bottom, left and right) labeled scatter plot |
s | Square Marker Scatter Plot |
p | Pentagram Marker Scatter Plot |
in | Lower Triangle Marker Scatter Plot |
^ | Upper triangle mark scatter plot |
h | Polygon markers scatter plot |
d | Diamond Marker Scatter Plot |
df = pd.read_excel(r'E:\Python\第4章数据\rz4.xlsx') df plt.plot(df.学号,df.总分,'-',color='r')
Student number, class name, gender, English, physical education, military training, numerical scores, advanced algebra, computer basics, total score
0 2308024241 23080242 Jackie Chan male 76 78 77 40 23 60 89 443< a i=2> 1 2308024244 23080242 Zhou Yinu66 91 75 47 47 44 82 452 2 2308024251 23080242 Zhang Bonan85 81 75 45 45 60 80 471 3 2308024249 23080242 Zhu Haonan 65 50 80 72 62 71 82 482 4 2308024219 23080242 Seal Female 73 88 92 61 47 46 83 490 490 5 23080 24201 23080242 Chi Peinan 60 50 89 71 76 71 82 499 6 2308024347 23080243 Li Huanv67 61 84 61 65 78 83 499 7 2308024307 23080243 Chen Tiannan76 79 86 69 40 69 82 501 8 2308024326 23080243 Yu Haonan66 67 85 65 61 71 95 510 9 2308024320 23080243 Li Jianu 62 60 90 60 67 77 95 51 1< a i=11> 10 2308024342 23080243 Li Shangchunan76 90 84 60 66 60 82 518 11 2308024310 23080243 Guo Dounu79 67 84 64 64 79 85 522 12 2308024435 23080244 Jiang Yitao male77 71 87 61 73 76 82 527 13 2308024432 23080244 Zhao Yunan74 74 88 68 70 71 85 530 14 2308024446 23080244 Zhou Lunu 76 80 77 61 74 80 85 533 15 2308024421 23080244 Lin Jianxiang male 72 72 81 63 90 75 85 538 16 2308024433 23080244 Li Daqiang male 79 76 77 78 70 70 89 539 17 2308024428 23080244 Li Xiotong, male 64 96 91 69 60 77 83 540 18 2308024402 23080244 Wang Huinv, 73 74 93 70 71 75 88 544 19 2308024422 23080244 Li Xiaoliang Male 85 60 85 72 72 83 89 546
df=df.sort_values('学号')
df['学号后三位']=df.学号.astype(str).str.slice(-3,) #分离学号后三位,并加入新一列(不会影响df)
plt.plot(df.学号后三位,df.总分,'-',color='r') #画图
plt.xticks(rotation=60) #标签旋转度数
df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
plt.plot(df2.姓名,df2.C语言程序设计,'--',color='g')
plt.xlabel('姓名')
plt.ylabel('C语言程序设计')
plt.xticks(rotation=90)
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['figure.figsize']=[10,6]
4.5.4 Column chart: plt.bar(df. last three digits of student number, df. total score, width=1, color=[‘r’, ‘b’])
A column chart is used to display changes in data over a period of time or to display comparisons between items. It is a rectangle with a unit length. It is a statistical chart drawn based on the size of the data. It is used to compare two or more data (time or category).
bar(left,height,width,color)
barh(bottom,width,height,color)
left x轴的位置序列,一般采用arange函数产生一个序列
height y轴的数值序列,也就是柱形图高度,一般就是我们需要展示的数据
width 柱形图的宽度,一般设置为1即可
color 柱形图填充颜色
df['学号后三位']=df.学号.astype(str).str.slice(-3,)
plt.bar(df.学号后三位,df.总分,width=1,color=['r','b']) #柱形图
plt.xticks(rotation=60)
plt.barh(df.学号后三位,df.总分,0.6,color=['r','b']) #条形图
bar
barh
4.5.5 Histogram: plt.hist(df2.C language programming, bins=10, color=‘g’, cumulative=True)
Histogram: It is drawn with a series of rectangles of equal width and unequal height. The width represents the interval of the data range, the height represents the frequency of data occurrence within a given interval, and the changing height shape represents the distribution of the data.
How often to view data
hist(x,color,bins,cumulative=False)
x 需要进行绘制的向量
color 直方图填充的颜色
bins 设置直方图的分组个数
cumulative 设置是否累积计数,默认是False
df2= pd.read_excel(r'E:\Python\第4章数据\09电动1.xls')
plt.hist(df2.C语言程序设计,bins=10,color='g',cumulative=False)
plt.hist(df2.C语言程序设计,bins=10,color='g',cumulative=True)
cumulative=False
cumulative=True
bins=20