Python visualization - basic graphics drawing based on matplotlib.pyplot

Table of contents

1. Introduction to this article

2. Basic Graphics

2.1 Histogram

2.1.1 Ordinary histogram

2.1.2 Stacked column chart

2.1.3 Grouped histogram

 2.2 Bar graph

2.2.1 Ordinary bar chart

2.2.2 Multiple Bar Chart

2.2.3 Stacked bar chart

 2.3 Histogram

2.3.1 Ordinary histogram

2.3.2 Probability distribution histogram

 2.4 Pie Chart

2.4.1 Non-split pie chart (regular pie chart)

2.4.2 Columnar pie chart

2.4.3 Ring chart

 2.5 Scatter plot

 2.5 Polar chart (radar chart)

2.5.1 Ordinary Polar Diagram

2.5.2 Radar chart

2.5.3 Composite Radar Chart

2.6 Boxplots 

 3. Summary


1. Introduction to this article

        This article draws basic graphics based on python's matplotlib library, including: histograms, bar charts, histograms, pie charts, polar charts, bubble charts (scatter plots), and box plots. But it will not be very hard on the parameters, the author's ability is limited, and there are only a few most commonly used in writing papers, I hope readers will forgive me. Regarding the role of these types of diagrams, this article will mention them, but not in depth. The author believes that true knowledge comes from practice, and if you use it a lot, you will naturally know what to use.

        I hope this article can provide some help to readers. The author is also constantly learning. If there is any mistake, criticism and correction are welcome. The libraries to be imported in this article are as follows.

import matplotlib.pyplot as plt
import numpy as np

        If you want the graphics to be marked in Chinese, you need to import the following code. Among them, SimHei means bold. If you need to set a specific font, you can refer to other information.

plt.rcParams["font.sans-serif"] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False

2. Basic Graphics

2.1 Histogram

        bar(): The histogram is to show the data differences in different categories or at different times . The basic parameters are:

bar(x,y,width,bottom,align,hatch,color,lw,tick_label,edgecolor,label...)
Parameter explanation and use
x The coordinates of the column on the x-axis can also be a sequence of strings
y The height of the column, that is, the value on the y-axis, can also be understood as the value to be displayed. Floats, lists and ndarrays (numpy arrays) are supported
width Column width, default is 0.8. Set width, you can draw a grouped histogram
bottom bottom, the base height of the column, starting from 0 by default, when the value is larger, you can set a more consistent base height to highlight the difference
align The alignment of the column on the x-axis can be set to center and edge. The center is the center, and the edge is to the right. If you want to be to the left, you need to set the width to a negative number at the same time.
hatch Column filling symbols, you can use /,|,-,+,x,o,O,.,*,\ (easy as escape characters), etc.
color Column fill color, string, can be a fixed value, also accepts a string array
lw Shorthand for linewidth, the width of the column border, a floating point number. Similar to this is linestyle (abbreviation: ls)
tick_label You can set the label on the x-axis, the array type
edgecolor The border color of the column, you can specify a color, or you can specify an array
label The legend to be displayed, a string, needs to be used together with the plt.legend function. However, if the set color is not uniform or displayed in a grouped histogram, only the first color that appears will be displayed and marked with meaning.

2.1.1 Ordinary histogram

        The color in carray and array is shorthand for fill color, if you want to set a unique color, you need to complete the input.

x = [1, 2, 3, 4, 5]
# x = list("ABCDE")
y = [2.3, 2.6, 3.7, 4.1, 4.7]
width = [0.8, 0.2, 0.3, 0.5, 0.1]
carray = ['r', 'g', 'b', 'c', 'm']  # 柱子填充颜色
earray = ['r', 'y', 'k', 'g', 'b']  # 边框颜色
linewidth = 0.2
tlabel = ["a", 'b', 'c', 'd', 'e']
plt.bar(x, y, width=width, align="center",
        bottom=None,
        lw=linewidth,
        tick_label=tlabel,
        color=carray, 
        edgecolor=earray,
        hatch='+',  # 柱子填充符号
        label="zzzz") 
plt.legend()
plt.show()

        The effect is as follows, it can be found that in the column part, the filled symbols will also be included. The value on the x-axis is customizable and can also be set with plt.xticks().

 2.1.2 Stacked column chart

        The stacked histogram can display the proportion of a certain sub-category under a large category. In matplotlob, it is to draw an additional histogram to cover. But you need to pay attention to the order of coverage, "big categories come first, small categories follow". prevent being covered.

x = [1, 2, 3, 4, 5]
y = [2.3, 2.6, 3.7, 4.1, 4.7]
y1 = [1, 1.5, 2.3, 1.8, 3.2]
width = [0.8, 0.2, 0.3, 0.5, 0.1]
carray = ['k', 'g', 'b', 'c', 'm']
earray = ['r', 'y', 'k', 'g', 'b']
linewidth = 0.2
tlabel = ["a", 'b', 'c', 'd', 'e']
plt.bar(x, y, width=width, align="center",
        bottom=None,
        lw=linewidth,
        tick_label=tlabel,
        color=carray,
        edgecolor=earray,
        hatch='/')  # 柱子填充符号
plt.bar(x, y1, width, align="center",
        lw=linewidth, hatch='\\')
plt.show()

        The effect is as follows. It should be noted that sometimes there will be no such obvious stratification. The possible reason is that the order of the two histograms is wrong; or the setting of the y-axis is wrong, resulting in a small gap in a certain dimension, which can be Solved by changing the range of the y-axis.

 2.1.3 Grouped histogram

        As mentioned earlier, width can be set, but at this time the first parameter x cannot be a string, because strings cannot be added to floating-point numbers. Grouped histograms can display different group comparisons of different categories. For example, the comparison of the number of consumers in different seasons in different years.

x = np.arange(5)
y = [6, 10, 4, 5, 1]
y1 = [2, 6, 3, 8, 5]
bar_width = 0.35
tick_label = [2016, 2017, 2018, 2019, 2020]
plt.bar(x, y, bar_width, color='c', align='center', label='A类', alpha=0.5)
plt.bar(x+bar_width, y1, bar_width,
        color='b', align='center',
        alpha=0.5, label="B类",
        tick_label=tick_label)  # 与堆积柱状图相比,在x轴方向加入数值
plt.xlabel("不同年度不同产品类别销售情况")
plt.ylabel("销售量")
plt.legend()
plt.show()

        The result is as follows, which is actually achieved by changing the position of the second histogram on the x-axis. It may be found that there is an alpha parameter in the code, which does not appear in the above parameter introduction. It is in **kwargs, and **kwargs in bar belongs to the Rectange attribute. I will not explain too much here, and the author has nothing to say Make it clear, however, this alpha parameter means transparency, setting the transparency of the column can basically be used in any graphics drawing.

 2.2 Bar graph

        barh(): Basically the same effect as the histogram, comparing the gap between different categories. In some scenarios, the display effect is better than the histogram. The basic parameters are:

barh(y,width,height,left,align,....
y Represents the y coordinate axis, accepts a string sequence or an int array, which is equivalent to x in bar
width That is, the width of the bar, the same as the width of the bar
height Same as height in bar
left Equivalent to bottom in bar, that is, to set the initial position of the bar, accepting array or int type

        Other parameters are basically the same as in the bar() function, so I won’t repeat them here.

2.2.1 Ordinary bar chart

        A normal bar graph is equivalent to a histogram rotated 90°.

y = [2, 7, 4, 5, 6]
width = [2.3, 2.6, 3.7, 4.1, 4.7]
height = 0.2
carray = 'b'
earray = 'm'
tlabel = ['cc', 'mm', 'r', 'uu', 'pp']  # y轴上标签
plt.barh(y, width, height,
         align='center',
         tick_label=tlabel,
         edgecolor=earray,
         color=carray,
         left=10)
plt.show()

 2.2.2 Multiple Bar Chart

         A composite bar chart corresponds to a grouped column chart. Note: Whether it is a compound bar chart or a grouped column chart, a legend must be added

y = np.arange(5)
h = [6, 10, 4, 5, 1]
h1 = [2, 6, 3, 8, 5]
bar_width = 0.35
tick_label = [2016, 2017, 2018, 2019, 2020]
plt.barh(y, h, bar_width, color='c', align='center', label='A类', alpha=0.5)
plt.barh(y+bar_width, h1, bar_width,
         color='b', align='center',
         alpha=0.5, label="B类",
         tick_label=tick_label)  # 与堆积柱状图相比,在x轴方向加入数值
plt.xlabel("不同年度不同产品类别销售情况")
plt.ylabel("销售量")
plt.legend()
plt.show()

 2.2.3 Stacked bar chart

        As for its name, I don’t know if it’s correct or not, but that’s what it means. It’s also a comparison of the small and medium categories in the large category. You should also pay attention to the drawing order of the different bar graphs to prevent them from being overwritten.

y = np.arange(5)
h = [2.3, 2.6, 3.7, 4.1, 4.7]
h1 = [1, 1.5, 2.3, 1.8, 3.2]
width = 0.8
linewidth = 0.2
tlabel = [2017, 2018, 2019, 2020, 2021]
plt.barh(y, h, height=width, align="center",
         lw=linewidth,
         tick_label=tlabel,
         color="blue",
         label="粮食产量")
plt.barh(y, h1, height=width, align="center", color='red',
         lw=linewidth, label="小麦产量")
plt.title("不同年度小麦产量占比")
plt.xlabel("产量/亿吨")
plt.legend()
plt.show()

         The effect is as follows. As for lw and width, the author suggests that, under normal circumstances, just set a value, instead of setting an array, the display is very ugly. The purpose of drawing is to display the data intuitively, and don’t put the cart before the horse.

 2.3 Histogram

         hist(): Histogram is used to show the distribution of data .

         Differences from histograms:

        (1) The histogram is to compare the size of the data, and the histogram shows the distribution of the data;

        (2) The columns of the histogram are separated, and the histograms are adjacent.

        (3) The x-axis of the histogram is basically categorical data, while the histogram is quantitative data

        Common parameters are

hist(x, bins=None, range=None, density=False,
        cumulative=False, bottom=None, histtype='bar', 
        align='mid',rwidth=None, log=False, 
        color=None,label=None)
x integer value or array sequence
bins Integer or sequence. If it is an integer, it is the number of columns, otherwise it is the range value of each column
range Set the range of the x-axis of the histogram, which can remove some outliers. The format is (xmin,xmax)
density bool type, if True, display the probability density
cumulative boo type, if it is True, calculate the cumulative frequency
bottom Array or certificate, corresponding to the initial position, same as bar and barh
histtype Histogram type. bar means the most common; barstacked means stacked histogram; step means unfilled histogram;
stepfilled indicates a filled histogram, which is not too different from bar
align Post position. left means that the center of the column is at the left edge of the bins; mid is the middle; right is the right edge.
rwidth Integer value, the ratio of the width of the column to the width of the bins
log bool class. Whether to logarithmize the x-axis and reduce the dimension

        color sets the color, label sets the column label, and alpha sets the transparency.

2.3.1 Ordinary histogram

x = np.random.normal(size=1000)
plt.hist(x, bins=100, density=False,
         histtype="bar",  # 用于选择条形图的样式
         align="mid",
         alpha=0.6)
plt.show()

         np.random.normal(size=1000) means to generate 1000 standard normal distribution random numbers.

 2.3.2 Probability distribution histogram

x = np.random.normal(size=1000)
plt.hist(x, bins=100, density=True,
         histtype="bar",  # 用于选择条形图的样式
         align="mid",
         alpha=0.6)
plt.show()

        The biggest difference from the ordinary histogram is the scale of the y-axis. The ordinary histogram represents the frequency, and the figure below represents the probability. 

        

 2.4 Pie Chart

        pie(): Intuitively reflect the ratio between items in one-dimensional data. But it is required that there are no negative and zero values ​​in the data. There are many parameters commonly used in pie charts, as follows:

pie(x, explode=None, labels=None, colors=None, 
        autopct=None,pctdistance=0.6, shadow=False, 
        labeldistance=1.1,startangle=0, radius=1, 
        wedgeprops=None,textprops=None, center=(0, 0),
        rotatelabels=False)
x One-dimensional array, the size of each plate
explode
The offset distance of each plate relative to the radius of the pie circle, the value is a decimal.
labels label for each plate, sequence of strings
autopct Plate share label. %d%%: integer percentage; %0.1f: one decimal; %0.1f%%: one decimal percentage; %0.2f%%: two decimal percentage
pctdistance
板块内标签与圆心的距离。浮点数
shadow 饼图下是否有阴影,略微展现立体感
labeldistance
饼块外标签与圆心的距离
startangle
饼块起始角度。浮点数,默认为 0,即从 x 轴开始,角度逆时针旋转
radius
饼图半径。浮点数,默认为 1,若>1,需要设置好画布大小
wedgeprops
饼块属性。字典,可以设置饼块边框大小,填充颜色等
textpropss
标签的文本属性,字典,可以设置字体、字号、颜色等
center 饼图中心坐标,包含两个浮点数元素的元组
rotatelabels
bool类型,饼块外标签是否按饼块角度旋转。

2.4.1 非分裂饼图(常规饼图)

x = [10, 30, 50, 20, 45]
colors = ['#377eb8', "#4daf4a", "#984ea3", "#ff7f00", "#e7c3b2"]  # 颜色
labels = ['A', 'B', 'C', 'D', 'E']
plt.pie(x, colors=colors, labels=labels,
        autopct='%0.1f%%',
        startangle=45,
        pctdistance=0.7,
        labeldistance=1.2)
plt.show()

 2.4.2 分列式饼图

labels = ['A类', 'B类', 'C类', 'D类', 'E类']
x = [0.35, 0.20, 0.15, 0.05, 0.25]
colors = ['r', 'g', 'b', 'c', 'm']
explode = [0.1, 0.2, 0.05, 0.1, 0.06]  # 每个板块相对于饼原半径的偏移距离
wedgeprops = {'linewidth': 1, 'edgecolor': "black"}
plt.pie(x, explode, labels, colors,
        pctdistance=0.6,  # 板块内标签与圆心的距离
        autopct='%0.2f%%',  # 板块内的标签
        shadow=True,
        labeldistance=1.1,  # 饼块外标签与圆心的距离
        startangle=0.00,  # 饼块的其实角度
        radius=1,  # 兵源半径
        wedgeprops=wedgeprops,  # 饼块的属性
        textprops={'fontsize': 12, 'color': 'black', "font": 'Kaiti'},
        center=(0, 0),  # 饼原中心坐标
        rotatelabels=False)  # 饼块外标签是否按照饼块角度旋转
plt.show()

         分裂饼图最主要的参数即为explode。分裂饼图可以突出某一项所占比例,使得用户第一眼可以聚焦到想看到的地方。

 2.4.3 环形图

        在matplotlib中,环形图的原理即在一个圆中减去一个小圆,即用和背景颜色相同的单位圆覆盖住中间位置。

x = [15, 20, 17, 25]
explode = (0, 0.1, 0, 0)
label = ['a', 'b', 'c', 'd']
x_0 = [1, 0, 0, 0]
plt.pie(x, explode,
        labels=label,
        autopct="%3.1f%%",
        startangle=90,
        shadow=True)
plt.pie(x_0, radius=0.5, colors='w')
plt.axis('equal')
plt.show()

        环形图要显示好标签,需要设置标签距离圆心的位置,确保足够美观。单位圆即只有一类,其余类没有,或者为0。 

 2.5 散点图

        scatter():也称为气泡图。散点图可以展示两组数据中的关系,或者分析一种趋势,和折线图的作用有点类似。基本参数如下:

scatter(x, y, s=None, c=None, marker=None, 
        norm=None, vmin=None, vmax=None, alpha=None, 
        linewidths=None,edgecolors=None)
x,y 对应二维坐标轴上的坐标,分别对应x轴、y轴
s
指定图点的大小,默认 20
c
指定散点图的颜色,默认为蓝色
marker
指定散点的形状,默认为空心圆。可以设置:. , o v ^ > < * | _ + x d 
norm
设置数据亮度,范围0-1,使用该参数时仍需要参数 c 
linewidths
设置散点边界线的宽度
edgecolor
设置散点边界线的颜色
vmin,vmax
亮度设置,与 norm 类似,如果使用 norm 参数,此参数无效

alpha即设置散点的透明度。示例如下:

x = np.random.randn(1000)  # 返回1000个样本服从标准正态的数
y = np.random.randn(1000)
plt.scatter(x, y,
            s=20,  # 指定点的大小,若传入一维数组,则表示每个点的大小
            c='g',
            marker='o',  # 指定散点形状
            vmin=0, vmax=20,
            alpha=0.5,
            linewidths=1,
            edgecolors='c')
plt.show()

 2.5 极线图(雷达图)

        polar():雷达图可以很好的对比不同个体在各个指标上的差异。也可以体现个体在各个方面的平衡。常用参数如下:

polar(thera,r,marker,linestyle,linewidth,color,ms...)
thera 每个标记所在射线于极径的角度
r 标记点到原点的距离
marker 标记点样式,有 . , o v ^ > < * | + _ x d
linestyle 线的样式,-,--,-.,:
linewidth 浮点数,线的宽度
color 线的颜色
ms 标记点的大小

2.5.1 普通极线图

angles = np.linspace(0, 2*np.pi, 6)
r = np.random.randint(1, 10, 6)
plt.polar(angles, r)
plt.show()

        np.linspace表示从0~2Π均匀分成6个数,random.randint(1,10,6)表示在1-10中选6个数,普通极线图是没有闭合的。效果如下。 

 2.5.2 雷达图

        区别于普通的极线图,雷达图是闭合的。若要在极坐标上添加标签,需要使用此函数:plt.thetagrids()。

angles = np.linspace(0, 2*np.pi, 6)
labels = ['能力A', '能力B', "能力C", "能力D", "能力E", "能力F"]
r = np.random.randint(1, 10, 6)
r = np.append(r, r[0])
angles = np.append(angles, angles[0])
plt.polar(angles, r)
plt.thetagrids(angles=(np.linspace(0, 360, 6)), labels=labels, fontsize=15)
plt.show()

(注:有时候angles即角度用0到2\pi的方法会导致绘制出来的图像一条直线,可以换成0到360) 

        想让极线图闭合,在matplotlib中,即在r中添加最初的位置,角度也要加上去。thetagrids的第一个参数是角度,第二个是标签。

若想要中间有填充效果,可以使用fill函数进行添加。

函数为:plt.fill(thera,r,color,alpha) ,效果如下:

angles = np.linspace(0, 2*np.pi, 6)
labels = ['能力A', '能力B', "能力C", "能力D", "能力E", "能力F"]
r = np.random.randint(1, 10, 6)
r = np.append(r, r[0])
angles = np.append(angles, angles[0])
plt.polar(angles, r, marker='*', lw=1, ls='--', ms=10, color='r')
plt.thetagrids(angles=(np.linspace(0, 360, 6)), labels=labels, fontsize=15)
plt.fill(angles, r, color='r', alpha=0.25)
plt.show()

 2.5.3 复式雷达图

        其实就是多个对象之间不同能力的对比,相比较分组柱状图,更加直观。

angles = np.linspace(0, 2*np.pi, 6)
labels = ['能力A', '能力B', "能力C", "能力D", "能力E", "能力F"]
r = np.random.randint(1, 10, 6)
r = np.append(r, r[0])
r1 = np.random.randint(1, 10, 6)
r1 = np.append(r1, r1[0])
angles = np.append(angles, angles[0])
plt.polar(angles, r, marker='*', lw=1, ls='--', ms=10, color='r', label="A")
plt.thetagrids(angles=(np.linspace(0, 360, 6)), labels=labels, fontsize=15)
plt.fill(angles, r, color='r', alpha=0.25)
plt.polar(angles, r1, marker='*', lw=1, ls='--', ms=10, color='g', label="B")
plt.fill(angles, r1, color='m', alpha=0.25)
plt.legend()
plt.show()

2.6 箱线图 

         boxplot():可以展示一组数据的分散情况,直方图虽然也能查看分散情况,但不够明显。参数如下:

boxplot(x, notch=None, sym=None, vert=None, whis=None,
        positions=None, widths=None, patch_artist=None,
        bootstrap=None, usermedians=None, conf_intervals=None,
        meanline=None, showmeans=None, showcaps=None, showbox=None,
        showfliers=None, boxprops=None, labels=None, flierprops=None,
        medianprops=None, meanprops=None, capprops=None,
        whiskerprops=None)
x 指定数据,可以是一维数组,也可以是多维,即绘制多个图
notch
bool类型,是否凹口的形式展示箱线图,默认非凹口
sym
指定异常点的形状,默认为+号显示
vert
bool类型,是否需要将箱线图垂直摆放,默认垂直拜访
whis
指定上下须与上下四分位的距离,默认为 1.5 倍的四分位差
positions
指定箱线图的位置
widths
指定箱线图的宽度,默认为 0.5
patch_artist
是否填充箱体的颜色
meanline
bool类型,是否用线的形式表示均值,默认用点来表示
showcaps
bool类型,是否显示箱线图顶端和末端的两条线,默认显示
showmeans
是否显示均值,默认不显示
showbox
bool类型, 是否显示箱线图的箱体,默认显示
showfliers
bool类型, 是否显示异常值,默认显示
boxprops
设置箱体的属性,如边框颜色,填充色等
labels
为箱线图添加标签,类似于图例的作用
filerprops
设置异常值的属性,如异常点的形状、大小、填充色等
medianprops
设置中位数的属性,如线的类型、粗细等
meanprops
设置均值的属性,如点的大小、颜色等
capprops
设置箱线图顶端和末端线条的属性,如颜色、粗细等
whiskerprops
设置须的属性,如颜色、如粗细、线的类型等

        乍一看,箱线图的参数特别多,但是基本使用,只需要把x输入即可,其他都是特殊情况下的定制。

        如何看箱线图?以上图为例,最上方和最下方的两条线是上下限。箱体上下的两条箱线即对应四分位数(y轴看是对应2和7),中间黄色的线即为均值。

x = np.arange(10)
plt.boxplot(x,
            sym='o',
            vert=None,  # 是否将箱线图垂直摆放
            widths=0.8,  # 设置箱线图宽度
            patch_artist='g',  # 是否填充箱体颜色
            showmeans=True,
            meanprops={'marker': 'o', 'markerfacecolor': 'red', 'markersize': 5})
plt.show()

         如果要在一张图上画多个箱线图,输入的x是多维数据即可。图中meanprops的参数也可以类似地用在其他参数上。

x = np.arange(10)
y = np.arange(3, 12)
plt.boxplot([x, y],
            sym='o',
            vert=None,  # 是否将箱线图垂直摆放
            widths=0.8,  # 设置箱线图宽度
            patch_artist='g',  # 是否填充箱体颜色
            showmeans=True,
            meanprops={'marker': 'o', 'markerfacecolor': 'red', 'markersize': 5})
plt.show()

 三、 总结

        作者认为,常用的图就这些,无论是在学习还是在工作上,这些只是基本的图形。如果需要美化,可以结合上一篇文章Python可视化——matplotlib.pyplot绘图的基本参数详解。以及通过更加丰富的色彩进行展示。当然数据才是最重要的,作图的目的是为了更加直观的看清楚数据的情况。        

        如有写错或者不明白的地方,可以在评论区或者私信作者,欢迎各位读者批评指正。创作不易,可以的话,点赞、关注、收藏!!!

Guess you like

Origin blog.csdn.net/qq_60471758/article/details/128365567