Artificial Intelligence (7): Use of Matplotlib

1 HelloWorld of Matplotlib

1.1 What is Matplotlib

  • Is specially designed for developing 2D charts (including 3D charts)
  • Progressive, interactive visualization of data

1.2 Why learn Matplotlib

Visualization is a key auxiliary tool in the entire data mining, which can clearly understand the data and thereby adjust our analysis methods.

  • Ability to visualize data and present it more intuitively
  • Make data more objective and convincing

For example, the following two pictures are digital display and graphic display:

1.3 Implement a simple Matplotlib drawing — taking a line chart as an example

(1) matplotlib.pyplot module

matplotlib.pytplot contains a series of drawing functions similar to matlab.

import matplotlib.pyplot as plt

(2) Graphic drawing process

 Create canvas -- plt.figure()

plt.figure(figsize=(), dpi=)

figsize: Specify the length and width of the figure

dpi: image clarity

Return fig object

draw image

plt.plot(x, y)

Take the line chart as an example

display image

plt.show()

Line chart drawing and display

Example: Show the weather in Shanghai for a week, for example, the weather temperature from Monday to Sunday is as follows

import matplotlib.pyplot as plt
# 1.创建画布
plt.figure(figsize=(10, 10), dpi=100)
# 2.绘制折线图
plt.plot([1, 2, 3, 4, 5, 6 ,7], [17,17,18,15,11,11,13])
# 3.显示图像
plt.show()

1.4 Understand the Matplotlib image structure (understand)

2 Basic drawing functions - taking line chart as an example

2.1 Improve the original line chart - add auxiliary functions to the graph

In order to better understand all the basic drawing functions, we integrate all the basic API usage through the drawing of weather and temperature changes.

Requirement: Draw a line chart of temperature changes per minute in a city from 11 o'clock to 12 o'clock. The temperature range is 15 degrees to 18 degrees.

Effect:

(1) Prepare data and draw initial line chart

import matplotlib.pyplot as plt
import random
# 画出温度变化图
# 0.准备x, y坐标的数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=80)
# 2.绘制折线图
plt.plot(x, y_shanghai)
# 3.显示图像
plt.show()

(2) Add custom x, y scales

  • plt.xticks(x, **kwargs)

        y:scale value to be displayed

  • x: scale value to be displayed

        plt.yticks(y, **kwargs)

# 增加以下两行代码
# 构造x轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
# 构造y轴刻度
y_ticks = range(40)
# 修改x,y轴坐标的刻度显示
plt.xticks(x[::5], x_ticks_label[::5])
plt.yticks(y_ticks[::5])

The complete code is as follows:

import matplotlib.pyplot as plt
import random
# 画出温度变化图
# 0.准备x, y坐标的数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=80)
# 2.绘制折线图
plt.plot(x, y_shanghai)
# 增加以下两行代码
# 构造x轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
# 构造y轴刻度
y_ticks = range(40)
# 修改x,y轴坐标的刻度显示
plt.xticks(x[::5], x_ticks_label[::5])
plt.yticks(y_ticks[::5])
# 3.显示图像
plt.show()

If the Chinese problem has not been solved, it will look like this:

Note: Chinese display problem solved

Solution one:

Download Chinese fonts (boldface, see barebones version)

Step 1:Download the SimHei font (or other fonts that support Chinese display)

Step 2:Install fonts

Under Linux: Copy fonts to usr/share/fonts:

sudo cp ~/SimHei.ttf /usr/share/fonts/SimHei.ttf

Under windows and mac: double-click to install

Step 3:Delete the cache file in ~/.matplotlib

cd ~/.matplotlib

rm -r *

Step 4:Modify the configuration file matplotlibrc

vi ~/.matplotlib/matplotlibrc

Modify the file content to:

font.family : sans-serif

font.sans-serif : SimHei

axes.unicode_minus : False

Solution two:

Dynamically setting matplotlibrc in the Python script can also avoid the trouble caused by changing the configuration file. The specific code is as follows:

from pylab import mpl

# 设置显示中文字体

mpl.rcParams["font.sans-serif"] = ["SimHei"]

Sometimes, after the font is changed, some characters in the coordinate axis cannot be displayed normally. In this case, you need to change the axes.unicode_minus parameter:

# 设置正常显示符号

mpl.rcParams["axes.unicode_minus"] = False

(3) Add grid display

In order to more clearly observe the values ​​corresponding to the graph

plt.grid(True, linestyle='--', alpha=0.5)

(4) Add description information

Add x-axis, y-axis description information and title

The font size in the image can be modified through the fontsize parameter.

plt.xlabel("时间")
plt.ylabel("温度")
plt.title("中午11点0分到12点之间的温度变化图示", fontsize=20)

(5) Image saving

# 保存图片到指定路径
plt.savefig("test.png")

Note: plt.show() will release the figure resource. If you save the image after displaying the image, you will only be able to save an empty image.

Complete code:

import matplotlib.pyplot as plt
import random
from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
# 设置正常显示符号
mpl.rcParams["axes.unicode_minus"] = False
# 0.准备数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)
# 2.绘制图像
plt.plot(x, y_shanghai)
# 2.1 添加x,y轴刻度
# 构造x,y轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
y_ticks = range(40)
# 刻度显示
plt.xticks(x[::5], x_ticks_label[::5])
plt.yticks(y_ticks[::5])
# 2.2 添加网格显示
plt.grid(True, linestyle="--", alpha=0.5)
# 2.3 添加描述信息
plt.xlabel("时间")
plt.ylabel("温度")
plt.title("中午11点--12点某城市温度变化图", fontsize=20)
# 2.4 图像保存
plt.savefig("./test.png")
# 3.图像显示
plt.show()

2.2 Draw multiple images in one coordinate system

(1) Multiple plots

Requirement: Add the temperature change of another city

We collected the temperature changes in Beijing that day, and the temperature ranged from 1 to 3 degrees. How to add another different graph in the same coordinate system? It is actually very simple. You just need to plot again, but you need to distinguish the lines, as shown below

# 增加北京的温度数据
y_beijing = [random.uniform(1, 3) for i in x]
# 绘制折线图
plt.plot(x, y_beijing )
# 使用多次plot可以画多个折线
plt.plot(x, y_beijing, color='r', linestyle='--')

The complete code is as follows:

from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
import matplotlib.pyplot as plt
import random
# 画出温度变化图
# 0.准备x, y坐标的数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
# 增加北京的温度数据
y_beijing = [random.uniform(1, 3) for i in x]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=80)
# 2.绘制折线图
plt.plot(x, y_shanghai)
# 绘制折线图
plt.plot(x, y_beijing)
# 增加以下两行代码
# 构造x轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
# 构造y轴刻度
y_ticks = range(40)
# 修改x,y轴坐标的刻度显示
plt.xticks(x[::5], x_ticks_label[::5])
plt.yticks(y_ticks[::5])
plt.xlabel("时间")
plt.ylabel("温度")
plt.title("中午11点0分到12点之间的温度变化图示", fontsize=20)
plt.grid(True, linestyle='--', alpha=0.5)
# 使用多次plot可以画多个折线
plt.plot(x, y_beijing, color='r', linestyle='--')
plt.savefig("test.png")
# 3.显示图像
plt.show()

We observed carefully that two new places were used, one is for different polyline display effects, and the other is to add a legend.

(2) Set graphic style

Color characters style character
r red - solid line
g green - - dashed line
b blue -. Dotted line
w white : dotted line
c cyan ' ' Leave blank, space
m magenta
y yellow
k black

(3) Display legend

Note: If just setting the label in plt.plot() cannot finally display the legend, you also need to display the legend through plt.legend().

code show as below:

# 绘制折线图
plt.plot(x, y_shanghai, label="上海")
# 使用多次plot可以画多个折线
plt.plot(x, y_beijing, color='r', linestyle='--', label="北京")
# 显示图例
plt.legend(loc="best")


Location String Location Code
'best' 0
'upper right' 1
'upper left' 2
'lower left' 3
'lower right' 4
'right' 5
'center left' 6
'center right' 7
'lower center' 8
'upper center' 9
'center' 10

 The complete code is as follows:

from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
import matplotlib.pyplot as plt
import random
# 0.准备数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
y_beijing = [random.uniform(1,3) for i in x]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)
# 2.绘制图像
plt.plot(x, y_shanghai, label="上海")
plt.plot(x, y_beijing, color="r", linestyle="--", label="北京")
# 2.1 添加x,y轴刻度
# 构造x,y轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
y_ticks = range(40)
# 刻度显示
plt.xticks(x[::5], x_ticks_label[::5])
plt.yticks(y_ticks[::5])
# 2.2 添加网格显示
plt.grid(True, linestyle="--", alpha=0.5)
# 2.3 添加描述信息
plt.xlabel("时间")
plt.ylabel("温度")
plt.title("中午11点--12点某城市温度变化图", fontsize=20)
# 2.4 图像保存
plt.savefig("./test.png")
# 2.5 添加图例
plt.legend(loc=0)
# 3.图像显示
plt.show()

2.3 Multiple coordinate system display—plt.subplots (object-oriented drawing method)

If we want to display the weather maps of Shanghai and Beijing in different coordinate systems of the same map, the effect is as follows:

It can be implemented through the subplots function (the old version has subplot, which is inconvenient to use). The subplots function is recommended.

matplotlib.pyplot.subplots(nrows=1, ncols=1, **fig_kw) creates a figure with multiple axes (coordinate system/plot area)

Parameters:
nrows, ncols : 设置有几行几列坐标系
    int, optional, default: 1, Number of rows/columns of the subplot grid.
Returns:
    fig : 图对象
    axes : 返回相应数量的坐标系
设置标题等方法不同:
    set_xticks
    set_yticks
    set_xlabel
    set_ylabel

For more methods about the axes sub-coordinate system: refer to https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes

Note: plt.functionname() is equivalent to the process-oriented drawing method, and axes.set_methodname() is equivalent to the object-oriented drawing method.

from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
import matplotlib.pyplot as plt
import random
# 0.准备数据
x = range(60)
y_shanghai = [random.uniform(15, 18) for i in x]
y_beijing = [random.uniform(1, 5) for i in x]
# 1.创建画布
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 8), dpi=100)
# 2.绘制图像
axes[0].plot(x, y_shanghai, label="上海")
axes[1].plot(x, y_beijing, color="r", linestyle="--", label="北京")
# 2.1 添加x,y轴刻度
# 构造x,y轴刻度标签
x_ticks_label = ["11点{}分".format(i) for i in x]
y_ticks = range(40)
# 刻度显示
axes[0].set_xticks(x[::5])
axes[0].set_yticks(y_ticks[::5])
axes[0].set_xticklabels(x_ticks_label[::5])
axes[1].set_xticks(x[::5])
axes[1].set_yticks(y_ticks[::5])
axes[1].set_xticklabels(x_ticks_label[::5])
# 2.2 添加网格显示
axes[0].grid(True, linestyle="--", alpha=0.5)
axes[1].grid(True, linestyle="--", alpha=0.5)
# 2.3 添加描述信息
axes[0].set_xlabel("时间")
axes[0].set_ylabel("温度")
axes[0].set_title("中午11点--12点某城市温度变化图", fontsize=20)
axes[1].set_xlabel("时间")
axes[1].set_ylabel("温度")
axes[1].set_title("中午11点--12点某城市温度变化图", fontsize=20)
# # 2.4 图像保存
plt.savefig("./test.png")
# # 2.5 添加图例
# plt.legend(loc=0)
axes[0].legend(loc=0)
axes[1].legend(loc=0)
# 3.图像显示
plt.show()

2.4 Application scenarios of line charts

  • Present the number of daily active users of the company's products (different regions)
  • Presents the number of app downloads per day
  • Presents the changes in the number of user clicks over time after the new product features are launched.
  • Expansion: draw images of various mathematical functions

Note: In addition to drawing line graphs, plt.plot() can also be used to draw images of various mathematical functions.

from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
import numpy as np
# 0.准备数据
x = np.linspace(-10, 10, 1000)
y = np.sin(x)
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)
# 2.绘制函数图像
plt.plot(x, y)
# 2.1 添加网格显示
plt.grid()
# 3.显示图像
plt.show()

Code:

3 common graphics drawing

Matplotlib can draw line charts, scatter plots, bar charts, histograms, and pie charts.

We need to know the meaning of different statistical charts to decide which statistical chart to choose to present our data.

3.1 Common graphic types and meanings

  • Line chart: A statistical chart that uses the rise or fall of a line to represent the increase or decrease in statistical quantities.

Features: It can display the changing trends of data and reflect the changes in things. (Variety)

api:plt.plot(x, y)

  • Scatter plot: Use two sets of data to form multiple coordinate points, examine the distribution of the coordinate points, determine whether there is some correlation between the two variables or summarize the distribution pattern of the coordinate points.

Features: Determine whether there is a quantitative correlation trend between variables and display outliers (distribution rules)

api:plt.scatter(x, y)

  • Column Chart: Data arranged in columns or rows of a worksheet can be plotted into a histogram.

Features: Drawing continuous discrete data, you can see the size of each data at a glance and compare the differences between the data. (statistics/comparison)

api:plt.bar(x, width, align='center', **kwargs)

Parameters:
x : 需要传递的数据
width : 柱状图的宽度
align : 每个柱状图的位置对齐方式
    {'center', 'edge'}, optional, default: 'center'
**kwargs :
color:选择柱状图的颜色

  • Histogram: A series of vertical stripes or line segments of varying heights that represent the distribution of data. Generally, the horizontal axis represents the data range, and the vertical axis represents the distribution.

Features: Draw continuous data to show the distribution of one or more sets of data (statistics)

api:matplotlib.pyplot.hist(x, bins=None)

Parameters:
x : 需要传递的数据
bins : 组距

  • Pie chart: used to represent the proportion of different categories, and compare various categories through the size of the radian.

Features: Proportion of classified data (proportion)

api:plt.pie(x, labels=,autopct=,colors)

Parameters:
x:数量,自动算百分比
labels:每部分名称
autopct:占比显示指定%1.2f%%
colors:每部分颜色

3.2 Scatter plot drawing

Demand: Explore the relationship between house area and house price

House area data:

x = [225.98, 247.07, 253.14, 457.85, 241.58, 301.01, 20.67, 288.64,
163.56, 120.06, 207.83, 342.75, 147.9 , 53.06, 224.72, 29.51,
21.61, 483.21, 245.25, 399.25, 343.35]

House price data:

y = [196.63, 203.88, 210.75, 372.74, 202.41, 247.61, 24.9 , 239.34,
140.32, 104.15, 176.84, 288.23, 128.79, 49.64, 191.74, 33.1 ,
30.74, 400.02, 205.35, 330.64, 283.45]

Code:

import matplotlib.pyplot as plt
# 0.准备数据
x = [225.98, 247.07, 253.14, 457.85, 241.58, 301.01, 20.67, 288.64,
163.56, 120.06, 207.83, 342.75, 147.9 , 53.06, 224.72, 29.51,
21.61, 483.21, 245.25, 399.25, 343.35]
y = [196.63, 203.88, 210.75, 372.74, 202.41, 247.61, 24.9 , 239.34,
140.32, 104.15, 176.84, 288.23, 128.79, 49.64, 191.74, 33.1 ,
30.74, 400.02, 205.35, 330.64, 283.45]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)
# 2.绘制散点图
plt.scatter(x, y)
# 3.显示图像
plt.show()

3.3 Histogram drawing

Demand - Compare box office receipts for each movie

The movie data is shown in the figure below:

  • Prepare data
['雷神3:诸神黄昏','正义联盟','东方快车谋杀案','寻梦环游记','全球风暴', '降魔传','追捕','七十七天','密战','狂兽','其它']
[73853,57767,22354,15969,14839,8725,8716,8318,7916,6764,52222]
  • Draw a bar chart

Code:

import matplotlib.pyplot as plt
from pylab import mpl
# 设置显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
# 0.准备数据
# 电影名字
movie_name = ['雷神3:诸神黄昏','正义联盟','东方快车谋杀案','寻梦环游记','全球风暴','降魔传','追捕','七十七天','密战','狂兽','其它']
# 横坐标
x = range(len(movie_name))
# 票房数据
y = [73853,57767,22354,15969,14839,8725,8716,8318,7916,6764,52222]
# 1.创建画布
plt.figure(figsize=(20, 8), dpi=100)
# 2.绘制柱状图
plt.bar(x, y, width=0.5, color=['b','r','g','y','c','m','y','k','c','g','b'])
# 2.1b修改x轴的刻度显示
plt.xticks(x, movie_name)
# 2.2 添加网格显示
plt.grid(linestyle="--", alpha=0.5)
# 2.3 添加标题
plt.title("电影票房收入对比")
# 3.显示图像
plt.show()

Guess you like

Origin blog.csdn.net/u013938578/article/details/133975831