Python data visualization-matplotlib basics

Use matplotlib to plot

Precautions:

  1. Since the default pyplot font does not support the display of Chinese characters, it is necessary to change the font when drawing through even the font.sans-serif parameter, so that the graphics can display Chinese normally. When used to change the font at the same time, part of the characters given by Zhongou on the axis will not be displayed, so you need to change the axes.unicode.minus parameter at the same time
plt.rcParams['font.sans-serif']="SimHei"
plt.rcParams['axes.unicode_minus']=False
  1. Setting the drawing style
    When drawing with matplotlib, you can set the
    drawing style
    Insert picture description herepreset by the system. Use: plt.style.available statement to view all the drawing styles that can be used in the system. Use of the preset style: plt.style.use("ggplot ")

1. Create Canvas and Create Subgraph

Construct a blank canvas, and you can choose whether to divide the entire canvas into multiple parts, which is convenient for drawing multiple graphics on the same painting.

Function name Function
plt.figure Create a blank canvas, you can specify the size and pixels of the canvas
figure.add_subplot Create and select a sub-picture, you can specify the number of rows, columns, and the number of the selected picture
  • Figure function: matplotlib.pyplot.figure() 1. The images drawn by matplotlib
    are all located in the figure object
    2. The parameter figuresize is used to set the size and aspect ratio of the image
  • subplot function: plt.subplot (A, B, C)
    1.A, B represents the divided image window A B regions, namely: row column
    2. c indicates the currently selected region to be operated

2. Add canvas content

The second part is the main part of the drawing. Among them, the steps of adding title, axis name, and drawing graphics are parallel, and there is no order. You can draw graphics first, or add various labels first.

Function name Function
plt.title Add a title to the current graphic, you can specify the title, position, color, font and other parameters
plt.xlable Add the name of the x-axis in the current graph, you can specify the position, color, font and other parameters
plt.ylable Add the name of the y-axis in the current graph, you can specify the position, color, font and other parameters
plt.xlim Specify the range of the current x-axis, only a numerical range can be determined, and the string identifier cannot be used
plt.ylim Specify the range of the current y-axis, only a numerical range can be determined, and the string identifier cannot be used
plt.xticks Specify the number and value of the x-axis scale
plt.yticks Specify the number and value of the y-axis scale
plt.legend Specify the legend of the current graph, you can specify the size, position and label of the current legend

3. Set the dynamic rc parameters of pyplot

Pyplot uses rc configuration files to customize various default attributes of graphics, which are called rc configuration or rc parameters. Almost all default attributes can be controlled in pyplot, such as the size of the view window, line width, color, style, and axis , Grid properties, text, fonts, etc.

3.1 Commonly used rc parameters for lines
rc parameter name Explanation Value
lines.linewidth Line width Take a value between 0-10, the default is 1.5
lines.linestyle Line style Available are "-", "–", ".", ":" four types, the default is "-"
lines.marker The shape of the point on the line More than 20 kinds of "o", "D", "h", ".", ",", "S" are available, the default is None
lines.markersize Point size Take a value between 0-10, the default is 1
3.2 Common line types
linestyle value significance linestyle value significance
- solid line -. Dotted line
Long dotted line : Short dashed line

4. Graphics drawing

4.1 Scatter plot

Scatter chart, also known as scatter point distribution chart, is a graph in which one feature is the abscissa and the other feature is the ordinate. The distribution of coordinate points is used to reflect the statistical relationship between features.
Values ​​are represented by points at different positions in the chart, and categories are represented by different marks in the chart, and are usually used to compare data across categories.

  • Scatter function: matplotlib.pyplot.scatter(x,y,s=None,c=None,marker=None,alpha=None,**kwargs), the common parameters are explained as follows:
parameter name Description
x, y Receive array, which represents the corresponding data of x-axis and y-axis. There is no default.
s Receive a numeric value or a one-dimensional array, specify the size of the point, if you pass in a one-dimensional array, it means the size of each point, the default is None
c Receive a numeric value or a one-dimensional array, specify the color of the point, if a one-dimensional array is passed in, it means the color of each point, the default is None
marker Receive a specific string, which represents the type of the drawn point, the default is None
alpha Receive 0-1 hour, indicating the transparency of the point, the default is None
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']="SimHei"
plt.rcParams['axes.unicode_minus']=False
data1 = pd.DataFrame({
    
    "姓名":["韩梅梅","李雷","Lucy","Lily","Jim","小明","Amy"],
                     "身高":[160,170,163,165,178,182,168],
                     "体重":[48,55,52,50,60,58,49]})
#绘制散点图查看身高和体重之间的关系
figure = plt.figure(figsize=(6,5))
plt.scatter(data1["身高"],data1["体重"],color="b",marker="s")
plt.xlabel("height(cm)")
plt.ylabel("weight(kg)")
plt.title("身高体重关系图")

Insert picture description here

4.2 Draw a line chart

A line chart is a graph that connects data points in order. It can be regarded as a graph that connects a scatter chart in the order of x-axis coordinates. The main function of the line chart is to view the dependent variable y and the independent variable x. The trend is most suitable for continuous data that changes over time (according to the common scale setting). At the same time, we can see the difference in quantity and the change in growth trend.

  • Plot function:
    matplotlib.pyplot.plot(*args,**kwargs)
    commonly used parameters and descriptions are shown in the following table:
parameter name Description
x 、 y Receive array. Indicates the data corresponding to the x-axis and y-axis. No default
color Receive a specific string. Specify the color of the line. Default is None
linestyle Receive a specific string. Specify the line type, the default is "-"
marker Receive a specific string. Indicates the type of points drawn. Default is None
alpha Accept 0-1 decimal. Indicates the transparency of the point. The default is None.
data_x = list(range(1,1000))
data_y = [10]
for i in range(998):
    data_y.append(data_y[i]+np.random.randint(-1,2))
plt.plot(data_x,data_y)

Insert picture description here

5. Analyze the internal data distribution and dispersion of features

5.1 Draw a histogram

直方图是统计报告图的一种,一般用于表示连续型数据的分布情况,一般用横轴表示数据分组,纵轴表示属于该组取值范围的样本数量或者占比。
用直方图可以比较直观的看出产品质量特性的分布状态,便于判断其总体质量分布情况。直方图可以发现分布表无法发现的数据模式、样本的频率分布和总体的分布。

  • hist函数:matplotlib.pyplot.hist(x,y)
    其中:
    x:待绘制直方图的一维数组
    y:可以是整数,表示均匀分为n组,也可以是
#正态分布数据
import random
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']="SimHei"
plt.rcParams['axes.unicode_minus']=False
mu,sigma = 100,15
x = mu+sigma*np.random.randn(10000)
#the histgram of data
n,bins,pathces = plt.hist(x,50,density=0,facecolor="g",alpha=0.75)
plt.title("Histogram of IQ")
plt.xlabel("Smart")
plt.ylabel("频数")
plt.text(60,500,r"$\mu=100,\ \sigma=15$")
plt.axis=([40,160,0,600])
plt.grid(True)

Insert picture description here
注意:plt.hist()函数中的density参数是布尔类型,默认为None,当其值为False时,y轴表示频数,当其值为True时,y轴表示频率

5.2 绘制条形图

条形图是统计报告图的一种,由一系列高低不等的纵向条纹或线段表示类别型数据分布情况,一般用横轴表类别,纵轴表示该样本数量或占比。
用条形图可以比较直观的看出产品质量特性的分布状态,便于判断其总体质量情况,

  • bar()函数:plt.bar(left,height,width=0.8,bottom=None,hold=None,data=None,**kwargs)
  • 常用参数说明:
参数名称 说明
left 接收array,表示x轴数据,无默认
height 接收array,表示x轴所代表数据的数量。无默认
width 接收0-1之间的float,指定条形图的宽度,默认为0.8
color 接收特定string或者包含颜色字符串的array,表示条形图的颜色,默认为None
grades = ["高一","高二","高三"]
values = [879,517,725]
plt.bar(grades,values,color="b",width=.4)
plt.title("全校人数")

Insert picture description here- 组合图示例:

year = ["2017","2018","2019","2020"]
sales = np.random.rand(4)*1000000
conv = np.random.rand(4)
fig,ax = plt.subplots(figsize=(12,8))
ax1 = ax.twinx()#创建次坐标轴

ax.bar(year,sales,color="skyblue")
ax1.plot(year,conv,"-o",color = "y")

Insert picture description here

5.3创建饼图

饼图是将各项的大小与各项的比例显示在一张图中,以每一部分的大小来确定每一项的占比。
饼图可以比较清楚的反映出部分与部分、部分与整体之间的比例关系,易于显示每组数据相对于整体数据的大小,且比较直观。

  • pie()函数:
matplotlib.pyplot.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=None, radius=None,)

Insert picture description here

# 频次或频率
data = [1, 2, 3, 4, 5]  # 各类别占比
# 各类别标签
label = ['猫', '狗', '牛', '羊', '马']
# 每个类别要绘制的颜色
color = ['lightblue', 'lightgreen', 'lightyellow', 'pink', 'orange']  # 各类别颜色
explode = (0, 0, 0, 0, 0.1)  # 各类别的偏移半径
plt.pie(data,colors = color,labels=label,shadow=True,autopct="%.2f%%",explode=explode)

Insert picture description here

5.4 Draw a box plot

Box plots can provide key information about the location and dispersion of data, especially in the more commonly used statistics, and can provide key information about the location and dispersion of data, especially when showing different characteristics. The difference in the degree of dispersion.
The five statistics used in the box plot are: minimum, lower quartile, median, upper quartile, and maximum. It can roughly see that the data is symmetrical and distributed. Information such as degree can be used to compare several samples.

  • boxplot function
matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None,meanline=None, labels=None,)

Common parameters:
Insert picture description here

5.5 Radar chart

The radar chart is suitable for displaying variables in three or more dimensions. The radar chart is a method of displaying multivariate data in the form of a chart that displays three or more variables on an axis starting at the same point. Relative positions and angles are usually meaningless.
Radar charts are very useful for seeing which variables have similar values ​​and whether there are outliers between variables. The radar chart can also be used to see which variables score higher or lower in the data set, so it is very suitable for displaying performance-related data, and is often used to display data such as rankings, evaluations, and comments.

fig = plt.figure(figsize=(10,5))
# 玩家数据(0-10分)
data4 = np.array([[3.2, 1.7, 1.9, 2.5, 8.0],
                  [8.2, 6.9, 5.4, 1.7, 3.6],
                  [5.2, 4.2, 8.7, 0.5, 1.7],
                  [7.4, 5.4, 4.1, 3.5, 6.2]])
n,k = data4.shape#获取数组的行列信息
# 各维度名称
names = ['打钱速度', '击杀助攻', '输出能力', '控制时长', '吸收伤害']
ax = fig.add_subplot(111,polar=True)#设置极坐标
angles = np.linspace(0,2*np.pi,k,endpoint=False)#创建等差数列绘制周长
angles = np.concatenate((angles,[angles[0]]))#使周长闭合,即在数列的最后增加一个数字,该数字为数列开始的数字
Linestyle = ['bo-', 'ro:', 'gD--', 'yv-.']  # 点线形状
Fillcolor = ['b', 'r', 'g', 'y']  # 填充颜色\点线颜色
for i in range(n):
    data = np.concatenate((data4[i],[data4[i][0]]))#每一组数据都让其闭合,形成一个封闭的图形
    data = data4[i]
    ax.plot(angles,data,Linestyle[i],linewidth = 2)
    ax.fill(angles,data,facecolor=Fillcolor[i],alpha=0.25)
ax.set_thetagrids(angles[:-1] * 180/np.pi, names)  # 显示类别名字
ax.set_title("玩家能力值对比图", va='bottom')  # 设定标题
ax.set_rlim(0, 11)  # 设置各指标的最终范围
ax.grid(True)  # 显示网格

Insert picture description here

Guess you like

Origin blog.csdn.net/ava_zhang2017/article/details/108497056