[Quick Review of Python Language]—Basics of Data Visualization

Table of contents

introduce

1. Matplotlib module (commonly used)

1. Drawing process & commonly used diagrams

Edit

2. Draw sub-pictures & add labels

Edit

3. Object-oriented drawing

4. Pylab module application

2. Seaborn module (commonly used)

1. Commonly used pictures

2. Code examples

Edit

Edit

Edit

Edit

3. Artist module

4. Pandas drawing

1. Data frame & series

2. Commonly used drawing functions in pandas


introduce

There are many ways to implement data visualization in Python. The following is an introduction to several popular data visualization modules based on actual project needs: Pyplot module, Seaborn module, Artist module, and Pandas module. (Personally I often use pyplot and seaborn)

1. Matplotlib module (commonly used)

Matplotlib provides a set of command APIs similar to Matlab, suitable for interactive charting. It can be easily used as a drawing control and embedded in GUI applications. Complete documentation https://matplotlib.org/3.1.1/gallery/index.html has source programs for opening various figures.

1. Drawing process & commonly used diagrams

①分别导入Matplotlib.pyplot和numpy
②定义横轴标度并以横轴标度为自变量,定义纵轴功能函数
③figure()函数指定图像长宽比
④plot()函数绘制功能函数
⑤plt的属性函数设置图像属性
⑥show()函数显示图像

Format:

plt.plot(x,y,其他参数)

Other parameters label, color, linewidth, b-- (specify color and line type at the same time, dot (.) solid line (-) dotted line (-.) dotted line (:) dotted line (--) no line ('" ;'))

Commonly used graph types:

Line chart plt.plot demonstration:

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,1000)
y = np.sin(x)
z = np.cos(x**2)
plt.figure(figsize=(8,4))
plt.plot(x,y,label = "$sin(x)$",color = "red",linewidth = 2)  #绘图并指定了线的标签,颜色,粗细
plt.plot(x,z,label = "$cos(x^2)$",color = "blue",linewidth = 1)
plt.xlabel("Times")
plt.ylabel("Volt")
plt.title("PyplotTest")
plt.ylim(-1.2,1.2)  #y轴显示范围
plt.legend() #显示图中左下角的提示信息,即提示标签(哪个线是哪个函数)

2. Draw sub-pictures & add labels


In Matplotlib, axes are used to represent a drawing area. A drawing object (figure) can contain multiple axes (axis), which can be understood as subfigures. You can use thesubplot function to quickly draw charts (subgraphs) with multiple axes:

subplot(numRows,numCols,plotNum)

Divide the drawing area into numRows x numCols sub-areas, numbered from left to right and top to bottom, starting with number 1. When all three parameters are less than 10, you can omit the comma between them.

The annotation is the annotation of the drawing:
①The text() function can place text at any position in the axis field to mark certain features of the drawing
②The annotate() method provides auxiliary functions for positioning, making annotation accurate and convenient
The text position and the annotation point position are described by tuples (x, y), and the parameters x, y represent Mark the position of the point, the parameter xytext represents the text position

③...

#子图绘制演示(接着上面示例的构建的函数)
fig = plt.figure(figsize=(8,4))
ax = fig.add_subplot(211) #创建Axes对象
plt.subplot(2,1,1)  #两行一列个子区域,编号1位置
plt.plot(x,y,label = "$sin(x)$",color = "red",linewidth = 2)
plt.ylabel("y-Volt")
plt.legend()
plt.subplot(2,1,2)  #两行一列个子区域,编号2位置
plt.plot(x,z,label = "$cos(x^2)$",color = "blue",linewidth = 1)
plt.ylabel("z-Volt")
plt.xlabel("Times")
ax.annotate("sin(x)",xy=(2,1),xytext=(3,1.5),arrowprops = dict(facecolor='black',shrink = 0.05))  #添加文字和黑色箭头(Artist模块的简单类型Artist)
ax.set_ylim(-2,2)
plt.show()

3. Object-oriented drawing

4. Pylab module application


It is also a module in matplotlib. It provides tool modules for drawing two-dimensional and three-dimensional data. It contains common functions in the numpy and pyplot modules to facilitate quick calculation and drawing.

2. Seaborn module (commonly used)

It is based on matplotlib but provides more advanced statistical graphics methods!

1. Commonly used pictures

2. Code examples

Below is a code demonstration from feature engineering (data preprocessing) in the logistic regression algorithm (a classification algorithm, titannic data set):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import preprocessing

titanic_data = pd.read_csv("titanic_data.csv")   #泰坦尼克号幸存或遇难者信息
titanic_data = titanic_data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Embarked', 'Fare']] #选取需要的8列
#1.特征工程
titanic_data['Age'].fillna((titanic_data['Age'].mean()), inplace=True) #Age有177个空值,这里用平均值替代
titanic_data.dropna(inplace=True)  #Embarked只有2个空值,可放弃这两个值
titanic_data_X = titanic_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Embarked', 'Fare']]
titanic_data_Y = titanic_data[['Survived']]  #分离自变量X和因变量Y(最后的分类结果为2个1或0,是否存活)
X_train, X_test, Y_train, Y_test = train_test_split(titanic_data_X, titanic_data_Y,test_size=0.20)  #将数据分成训练集和测试集
seaborn.countplot(x='Pclass', data = X_train)  #检查Pclass(舱位等级)柱状图
plt.show()

seaborn.displot(X_train['Age'])                #检查Age分布图(柱状图+核密度估计)
plt.show()

seaborn.displot(X_train['Fare'])               #检查Fare(票价)分布图(柱状图+核密度估计)
plt.show()

age_scaler = StandardScaler()                  #创建Z-Score标准化对象,对Age进行分类特征标准化
age_scaler.fit(pd.DataFrame(X_train['Age']))
X_train.loc[:, 'Age'] = age_scaler.transform(X_train[['Age']])   #双[]

fare_scaler = StandardScaler()                  #创建Z-Score标准化对象,对Fare(票价)进行分类特征标准化
fare_scaler.fit(pd.DataFrame(X_train['Fare']))
X_train.loc[:, 'Fare'] = fare_scaler.transform(X_train[['Fare']])  #双[]

X_train.loc[:, 'Sex'] = X_train['Sex'].map({'female': 0, 'male': 1}) #将Sex映射为0,1

embarked_encoder = preprocessing.LabelEncoder() #创建编码对象,对Embarked(登船口3个)编码
embarked_encoder.fit(pd.DataFrame(X_train['Embarked']))
X_train.loc[:, 'Embarked'] = embarked_encoder.transform(X_train[['Embarked']])

#截至此,将所有数据的格式转换完成,用heatmap检查下特征之间的关联性
seaborn.heatmap(X_train.corr())
plt.show()

3. Artist module


The API of the Matplotlib drawing library contains 3 layers - artboard, rendering, artist.Artist (how to render). Compared with the two APIs of Pyplot and Pylab, Artist is used to handle all high-level structures, such as the drawing and layout of charts, text, curves, etc., without paying attention to the underlying drawing details.
Artist is divided into two types: simple type and container type. Simple types of Artists are standard drawing components, such as Line2D, Rectangle, Text, AxesTmage, etc.; container types can contain many simple types of Artists to form a whole, such as Axis, Axes, Figure, etc.

step:

①创建Figure对象
②用Figure对象创建一个或多个Axes或者Subplot对象
③调用Axes等对象的方法创建各种简单类型的Artist

Every element in the chart drawn by Matplotlib is controlled by Artist, and each Artist object contains many attributes to control the display effect.Common attributes:

alpha透明值,0完全透明,1完全不透明
animate布尔值,绘制动画效果是使用
axes此Artist对象所在的Axes对象,可能为None
figure此Artist对象所在的Figure对象,可能为None
label文本标签
picker控制Artist对象选取
zorder控制绘图顺序

All attributes can be read and written through the correspondingget_* and set_* functions, such as setting alpha to half of the current value: < /span>

fig.set_alpha(0.5*fig.get_alpha())

If one line of code sets multiple properties:

fig.set(alpha = 0.5,zorder = 2,label = '$sin(x)$')

4. Pandas drawing

Pandas is python's most powerful data analysis and exploration tool, including advanced data structures and sophisticated tools. It is built on numpy, making numpy-centered applications more convenient; it supports SQL-like data operations and has rich data processing functions; its plotting relies on matplotlib, and the two are usually used together.

1. Data frame & series

pandas comes with two important data structures: data frame and series
①data frame

In a two-dimensional table, rows and columns have indexes, and row- and column-oriented operations are symmetrical. There are many ways to create a data frame. A dictionary or Numpy array containing a list of equal lengths is often used to create a database. The row index starts from 0 by default, and the column index is user-defined (the row index can also be customized, and the column index must correspond to the dictionary, otherwise the data is null)

import pandas as pd
data = {'name':['小明','小红','小刚','小强','大壮'],
        'age':[15,16,14,18,20],
        'score':[88,99,65,95,67]
        }
dataframe1 = pd.DataFrame(data)
dataframe2 = pd.DataFrame(data,columns=['name','age','score'],index=['one','two','three','four','five'])
print(dataframe1)
print(dataframe2)
运行结果:
  name  age  score
0   小明   15     88
1   小红   16     99
2   小刚   14     65
3   小强   18     95
4   大壮   20     67
      name  age  score
one     小明   15     88
two     小红   16     99
three   小刚   14     65
four    小强   18     95
five    大壮   20     67

②Series

A general term for values ​​with the same attribute, which can be understood as a one-dimensional array (degenerated data frame)

print(dataframe2['name'])
运行结果:
one      小明
two      小红
three    小刚
four     小强
five     大壮
Name: name, dtype: object

2. Commonly used drawing functions in pandas

plot():绘制线性二维图(matplotlib/pandas库都有)
pie():绘制饼形图(matplotlib/pandas、库都有)
hist():绘制二维条形直方图(matplotlib/pandas库都有)
boxplot():绘制样本数据箱体图(pandas库)
plot(logy = True):绘制y轴的对数图(pandas库)
plot(yerr = error):绘制误差条形图(pandas库)

Guess you like

Origin blog.csdn.net/weixin_51658186/article/details/134165613