Python realizes data visualization based on Matplotlib

When we are doing data analysis, if there is only a bunch of data in front of you, you must look uncomfortable. It is best to present the changes in the data through graphics. After realizing the data visualization, we can understand the information presented and conveyed by the data faster, easier, and more clearly.

Python has many useful tools for data display, such as Matplotlib, Seaborn, Pygal, etc., which are all popular function packages. Let's take a look at how to display local data and data obtained from the network.

Visual views can be roughly divided into 4 categories, namely:

  • Mutual comparison: For example, a line chart can compare the relationship between various categories of data, including the trend of data change over time;
  • Interrelationship: such as scatter diagram, you can observe the relationship between two or more instances;
  • Composition ratio: such as a pie chart, you can intuitively see the proportion and share size of each part, including its proportion change over time;
  • Distribution: such as a histogram, you can observe the specific distribution of single or multiple variables.

There are 10 commonly used views: scatter plot, line chart, histogram, bar chart, pie chart, heat map, box plot, spider plot, binary variable distribution chart, and pairwise relationship.

1. Install Matplotlib

Open the terminal and enter pip install matplotlibto install automatically.

If you want to view the Matplotlib developer documentation, python -m pydoc -p 8899just enter it, and then visit it after startup http://localhost:8899, and .../site-packagesfind matplotlib (package) under the column.

2. Line chart

Recently, have you seen an article with the title "The Worst Graduation Season in History" again? In fact, it is written like this every year, and every year is the most difficult and miserable.

In fact, we can find the corresponding data and draw it as a line graph, and you can see the trend of the number of graduates by looking at it.

This is the data of college graduates from 2010 to 2022 (unit: ten thousand):

years Number of University Graduates (Unit: 10,000) Number of research students
2022 1076 120
2021 909 117.65
2020 874 110.66
2019 834 91.65
2018 821 85.8
2017 795 80.61
2016 765 66.71
2015 749 64.51
2014 700 62.13
2013 699 61.14
2012 680 58.97
2011 660 56.02
2010 631 53.82

2.1 Data of college graduates from 2010 to 2022

According to the data given above, let's first draw a line chart to see the change trend of the number of college graduates from 2010 to 2022. Among them, our X-axis is the year, and the Y-axis is the number of people. The code is as follows:

# 定义 X 轴和 Y 轴数据
# 其中,X 轴为年份;Y 轴为毕业生人数(单位:万)
xData = [2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022]
yData = [631,660,680,699,700,749,765,795,821,834,874,909,1076]

# 参数 1,设置横坐标的值
# 参数 2,设置纵坐标的值
plt.plot(xData, yData, xData, yData2)

# 展示图像
plt.show()

The running result, as shown in the figure:

insert image description here

2.2 Postgraduate data from 2010 to 2022

Then, if I say, I want to see if the number of graduate students has such a large increase, is it so exaggerated? In fact, it is not difficult, just pass in multiple lists of data representing the X-axis and Y-axis, and you can get a composite line chart.

code show as below:

# 定义 X 轴和 Y 轴数据
# 其中,X 轴为年份;Y 轴为毕业生人数(单位:万)
xData = [2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022]
yData = [631,660,680,699,700,749,765,795,821,834,874,909,1076]

# 加入研究生人数
yData2 = [53.82,56.02,58.97,61.14,62.13,64.51,66.71,80.61,85.8,91.65,110.66,117.65,120]

# 参数 1,设置横坐标的值
# 参数 2,设置纵坐标的值
# 参数 3,设置第二条折线的横坐标的值
# 参数 4,设置第二条折线的纵坐标的值
plt.plot(xData, yData, xData, yData2)

# 展示图像
plt.show()

The running result, as shown in the figure:

insert image description here

2.3 Modify the color and line thickness

Even, you can change the color and thickness of the polyline, which is also very simple. For example, by colorspecifying the corresponding color, by linewidthspecifying the thickness of the polyline, the code is as follows:

# 通过 color 可以指定对应的颜色
# 通过 linewidth 可以指定粗细值
plt.plot(xData, yData, color='orange', linewidth=5.0)
plt.plot(xData, yData2, color='green', linewidth=5.0)

# 展示图像
plt.show()

The running result, as shown in the figure:

insert image description here

2.4 Four styles of polylines

If you don't like the "solid line" line, you can also linestylechange it through. The four commonly used types are:

The first type, - represents a solid line (default value);

The second type: -- indicates a dotted line;

The third type:: represents a virtual point;

The fourth type: -. Indicates the combination of short lines and dots.

code show as below:

plt.plot(xData, yData, color='orange', linewidth=5.0, linestyle='--')
plt.plot(xData, yData2, color='green', linewidth=5.0, linestyle='-.')

# 展示图像
plt.show()

The running result, as shown in the figure:

insert image description here

3. Pie chart

3.1 Ordinary pie chart

Through the pie chart, you can easily see the ratio between the size of each part of the data and the overall data. In Matplotlib, we use the pie(x, labels=None) function to draw pie charts, where x represents the data in the pie chart, and multiple different shares can be specified, and labels are used to specify the title of the pie chart.

Suppose, let's count the category share of a fruit shop. It's very simple. The code is as follows:

import matplotlib.pyplot as plt

# 饼图的数据
datas = [22, 30, 35, 8, 15]
# 标签(苹果,梨,桃子,葡萄,樱桃)
labs = ['apple', 'pear', 'peach', 'grape', 'cherry']

plt.pie(x = datas, labels = labs)
plt.show()

The running result, as shown in the figure:

insert image description here

3.2 Separate Pie Charts

If you want to highlight a certain part and make it more conspicuous, we can do that too. In the pie() function, just specify explode again.

# 饼图的数据
datas = [22, 30, 35, 8, 15]
# 标签(苹果,梨,桃子,葡萄,樱桃)
labs = ['apple', 'pear', 'peach', 'grape', 'cherry']

# 分离,将 cherry 分离出去
exp = [0,0,0,0,0.15]
plt.pie(x = datas, labels = labs, explode = exp)

plt.show()

The running result, as shown in the figure:

insert image description here

3.3 More settings for the pie chart

Of course, you still have many display methods that you can specify, such as you can customize the color of the pie chart, the format of the percentage, the distance between the label and the center of the circle, the initial angle of the pie chart, the center of the circle, the radius of the pie chart, counter/clockwise direction , circle solid line, title, etc., everyone can try more.

code show as below:

import matplotlib.pyplot as plt

# 饼图的数据
datas = [22, 30, 35, 8, 15]
# 标签(苹果,梨,桃子,葡萄,樱桃)
labs = ['apple', 'pear', 'peach', 'grape', 'cherry']

# 分离,将 cherry 分离出去
exp = [0,0,0,0,0.15]

# 自定义颜色
cols = ['orange', 'green', 'blue', 'red', 'purple']

plt.pie(x = datas, 
        labels = labs, 
        explode = exp, 
        colors = cols, 
        autopct = '%.2f%%', # 设置百分比
        textprops = {
    
    'fontsize':12, 'color':'black'},# 设置字体属性
       )

# 标题
plt.title('Fruit Shop')

plt.show()

The running result, as shown in the figure:

insert image description here

4. Histogram

4.1 Ordinary histogram

Histograms allow us to see the characteristics of categories more clearly. A histogram can be drawn through the bar(x, height) function, where x represents the position sequence of the x-axis, and height represents the numerical sequence of the y-axis.

For example, we read an excel data of student grades locally, and then draw it in the form of a histogram.

In Excel, we have a simple data, as shown in the figure:

insert image description here

The specific code is as follows:

import pandas as pd 
import matplotlib.pyplot as plt

users = pd.read_excel('../../user.xlsx')
# sort_values 排序,inplace 原地修改,ascending False 从大到小
users.sort_values(by='score', inplace=True, ascending=False)

# 直接使用 plt.bar() 绘制柱状图
plt.bar(users.name, users.score, color='orange')

# 设置标题、x 轴名称与 y 轴名称,fontsize 设置字号
plt.title('Student Score', fontsize=16)
plt.xlabel('Name')
plt.ylabel('Score')

# 如果 x 轴字体太长,利用 rotation 将其旋转 90 度,方便显示
# plt.xticks(users.name, rotation='90')

# 紧凑型布局(因为 x 轴文字较长,为了让其显示全,使用紧凑型布局)
plt.tight_layout()
plt.show()

The running result, as shown in the figure:

insert image description here

If you want to change the X-axis and Y-axis names to Chinese, you can add Chinese support.

import pandas as pd 
import matplotlib.pyplot as plt

users = pd.read_excel('../../user.xlsx')
# sort_values 排序,inplace 原地修改,ascending False 从大到小
users.sort_values(by='score', inplace=True, ascending=False)

# 直接使用 plt.bar() 绘制柱状图
plt.bar(users.name, users.score, color='orange')

# 添加中文字体支持
from matplotlib.font_manager import FontProperties

# SimSun.ttc 简体字
font = FontProperties(frame="SimSun.ttc", size=16)

# 设置标题、x 轴名称与 y 轴名称,fontsize 设置字号
plt.title('学生分数', fontproper=font)
plt.xlabel('名字', fontproperties=font, fontsize=14)
plt.ylabel('分数', fontproperties=font, fontsize=14)

# 因为 x 轴字体太长,利用 rotation 将其旋转 90 度,方便显示
plt.xticks(users.name, rotation='90')

# 紧凑型布局(因为 x 轴文字较长,为了让其显示全,使用紧凑型布局)
plt.tight_layout()
plt.show()

4.2 Stacked histogram

Assuming that the test scores of our students have come out in three subjects, including Chinese, mathematics, and English, then if we want to superimpose the display, we can directly use stacked to realize the superimposed form.

Similarly, we also need to read the data in Excel first, and then draw.

insert image description here

The specific code is as follows:

import pandas as pd 
import matplotlib.pyplot as plt

users = pd.read_excel('../../user2.xlsx')

# 新计算出一个总量,用于排序
users['Total'] = users['chinese'] + users['english'] + users['math']

# sort_values 排序,inplace 原地修改
users.sort_values(by='Total', inplace=True)

# 直接使用 plt.bar() 绘制柱状图
# plt.bar(users.name, users.score, color='orange')

# 水平的叠加柱状图,barh 中而 h 表示 horizontal 水平的
# 利用 stacked 就可以实现叠加形式
# users.plot.bar(x='name', y=['chinese', 'english', 'math'], stacked=True)
users.plot.barh(x='name', y=['chinese', 'english', 'math'], stacked=True)

# 紧凑型布局(因为 x 轴文字较长,为了让其显示全,使用紧凑型布局)
plt.tight_layout()
plt.show()

The running result, as shown in the figure:

(1) The effect of the vertical direction:

insert image description here

(2) The effect of the horizontal direction:

insert image description here

5. Scatter plot

The scatter plot describes the relationship between two variables, and displays the values ​​of the two variables in a two-dimensional coordinate system. They practice into a line.

We can specify a range and draw a scatter plot through random numbers to get a feel for it.

Very simple, the specific code is as follows:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# 准备数据(此处使用随机数据)
N = 1000
x = np.random.randn(N)
y = np.random.randn(N)

# x:指定 X 轴数据  
# y:指定 Y 轴数据
# market:指定散点的图形样式
plt.scatter(x, y, marker='*')

# 绘制散点图
plt.show()

The running result, as shown in the figure:

insert image description here

Among them, we use the scatter() function when we draw a scatter diagram. In addition to the parameters we have used above, there are several more common ones. Let’s explain it in a unified way:

  • x: specify the X-axis data;

  • y: specify the Y axis data;

  • s: specifies the size of the scatter point;

  • c: specifies the color of the scattered points;

  • alpha: Specifies the transparency of the scatter;

  • linewidths: Specifies the thickness of the scattered border;

  • edgecolors: Specifies the color of the scattered border;

  • marker: Specifies the graphic style of the scatter point (there are many)

    mark illustrate mark illustrate
    . point s square
    , pixel p pentagon
    o round * star
    v down triangle h Octagon
    ^ upward triangle H Octagon 2
    < left triangle + plus
    > right triangle x Multiplication sign
    1 down trident D diamond
    2 up trident d small rhombus
    3 Trident left | vertical line
    4 Three forks to the right _ horizontal line
  • camp: A mapping that specifies the color of the scatter points.

6. Histogram

A histogram, also called a quality distribution map, mainly consists of a series of vertical stripes or line segments of varying heights to represent the data distribution. Among them, the horizontal axis is used to represent the data type, and the vertical axis is used to represent the distribution.

On the abscissa, it is equally divided into a certain number of small intervals, and each small interval uses rectangular bars of different heights to display the y value in the interval. At this time, we can observe the histogram distribution of the data set.

The specific code is as follows:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 准备数据
a = np.random.randn(100)
s = pd.Series(a) 

# 绘制直方图
plt.hist(s)
plt.show()

The running result, as shown in the figure:

insert image description here

7. Box plot

The box plot can be used to analyze the difference, degree of dispersion, outliers, etc. of the data. It consists of five parts, namely maximum value (max), minimum value (min), median (median) and upper and lower quartiles (Q3, Q1).

It can be realized by using the boxplot(x, labels=None) function, where x refers to the data to be drawn in the boxplot, and labels is used to set the label of the boxplot.

The specific code is as follows:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 准备数据
data = np.random.normal(size=(10,5)) 
lables = ['A','B','C','D','E']

# 用Matplotlib画箱线图
plt.boxplot(data,labels=lables)
plt.show()

The running result, as shown in the figure:

insert image description here

8. Radar chart

This is an evaluation rendering of my project management capabilities. The diagram presented is called a radar chart, also known as a spider chart.

It's a way of showing a one-to-many relationship. In the figure, we can observe that the significance of one variable over the other is very clear.

When setting the data, we need to specify the statistical data corresponding to the label, that is, labels and stats.

In addition, it is essentially a circle, and we need to calculate the angle of its coordinates according to the number of labels, and then set the corresponding value.

insert image description here

The specific code is as follows:

![雷达图](../%E9%9B%B7%E8%BE%BE%E5%9B%BE.png)import numpy as np
import matplotlib.pyplot as plt
# import seaborn as sns
# from matplotlib.font_manager import FontProperties 

# 准备数据
# 语文,数学,英语,物理,化学,体育
labels = np.array(["Chinese","Math","English","Pysics","Cemistry","Sport"])
# 统计值
stats = [85, 60, 90, 77, 86, 85]

# 画图数据准备,角度、状态值
# 根据数据长度平均分割圆周长
angles = np.linspace(0, 2*np.pi, len(labels), endpoint=False)
# 闭合
stats = np.concatenate((stats,[stats[0]]))
angles = np.concatenate((angles,[angles[0]]))
labels=np.concatenate((labels,[labels[0]]))

# 绘制雷达图
fig = plt.figure()
# 将图分成 1 行 1 列,画出位置 1 的图
ax = fig.add_subplot(111, polar=True)   
ax.plot(angles, stats, 'o-', color='r', linewidth=2)
ax.fill(angles, stats, facecolor='r', alpha=0.25)

# 设置中文字体
# font = FontProperties(fname=r"C:\Windows\Fonts\simhei.ttf", size=14)  
# ax.set_thetagrids(angles * 180/np.pi, labels, FontProperties=font)
ax.set_thetagrids(angles * 180/np.pi, labels)
plt.show()

The running result, as shown in the figure:

insert image description here

9. Contour map

First look at the picture below, which is a "contour map", which is a way to use existing data to draw a geographical distribution map.

insert image description here

The contour map is to connect the points with the same height on the ground surface to form a ring line, which is directly projected onto the plane to form a horizontal curve. Rings of different heights will not meet unless the surface shows cliffs or cliffs, so that the lines are too dense somewhere to overlap. If there are flat and open hills on the surface, the distance between the curves will be quite wide.

In our data analysis, in addition to geographical data, we also have a lot of other data to be displayed as a "contour map". What it needs is three-dimensional data, where the X and Y axis data are used to determine the coordinate points, and the Z axis data is the height corresponding to different coordinate points.

When you get all the data, you can use the contour() function to draw. If you want to fill the color, you can use the contourf() function to achieve it.

Commonly used parameters are:

  • X: used to specify the X-axis data;
  • Y: used to specify the Y axis data;
  • Z: It is used to specify the corresponding height data of X and Y axis coordinates;
  • colors: used to specify the color of contour lines at different heights;
  • alpha: used to specify the transparency of the contour line;
  • cmap: used to specify the color map of the contour line;
  • linewidths: used to specify the width of the contour line;
  • linesytles: used to specify the style of contour lines.

Then let's draw a picture, it's very simple, the specific code is as follows:

import numpy as np
import matplotlib.pyplot as plt
 
# 指定 X,Y 轴数据
# 从左边取值为从 -6 到 6
# 各取 5 个点,一共取 5*5 = 25 个点
x = np.linspace(-3, 3, 5)
y = np.linspace(-3, 3, 5)

# 将 X,Y 数据进行网格化
X, Y = np.meshgrid(x, y)
 
# 定义等高线高度函数
def f(x, y):
    return x * (y * 0.2)
 
# 填充颜色
plt.contourf(X, Y, f(X,Y), 10, alpha = 0.75, cmap = 'rainbow')

# 绘制等高线
con = plt.contour(X, Y, f(X,Y), 10, colors = 'black', linewidth = 0.5)

# 显示各等高线的数据标签
plt.clabel(con, inline = True, fontsize = 10)

# 去除坐标轴
plt.xticks(())
plt.yticks(())

plt.show()

The running result, as shown in the figure:

insert image description here

10. 3D graphics

It is very fun to draw a 3D map. The data it needs are basically the same as the contour map above. First specify the coordinate points through the X and Y axes, and then specify the corresponding height of the coordinate points through the Z axis.

We can use Matplotlib's scatter3D() method to achieve it; we can also use the Axes3D object's plot_surface() method to achieve it, let's play with it.

Let's take a look at the scatter3D() method of Matplotlib. The specific code is as follows:

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

ax = plt.axes(projection='3d')

# 指定 X、Y、Z 三个坐标轴
zd = 10 * np.random.random(100)
xd = 3 * np.sin(zd)
yd = 4 * np.cos(zd)

# 绘制散点图
# 基本上是 matplotlib 画图的属性设置
ax.scatter3D(xd, yd, zd, 
             c = 'red')
plt.show()

The running result, as shown in the figure:

insert image description here

Let's take a look at the plot_surface() method of the Axes3D object. The specific code is as follows:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D    

fig = plt.figure(figsize = (10, 8))
ax = Axes3D(fig)

# 获取 X、Y 数据
x = np.arange(-8,8,0.25)  
y = np.arange(-8,8,0.25)
x,y = np.meshgrid(x,y)

r = np.sqrt(x**2 + y**2)
# 获取 Z 数据
z = np.sin(r)/2                 

# 调用 plot_surface() 函数进行绘制
ax.plot_surface(x, y, z,
                rstride = 1,
                cstride = 1, 
                cmap = plt.get_cmap('rainbow'))

ax.contourf(x, y, z, 
            zdir = 'z',
            offset = -2)
plt.show()

The running result, as shown in the figure:

insert image description here

Guess you like

Origin blog.csdn.net/qq_41340258/article/details/125567812