One, know matplotlib
Matplotlib is a Python 2D drawing library that can generate publication-quality graphics in a variety of hardcopy formats and a cross-platform interactive environment for drawing various static, dynamic, and interactive charts.
Matplotlib can be used for Python scripts, Python and IPython Shells, Jupyter notebooks, web application servers and various graphical user interface toolkits.
Matplotlib is the master of the Python data visualization library. It has become a recognized data visualization tool in python. The drawing interfaces of pandas and seaborn that we are familiar with are actually based on the advanced packaging made by matplotlib.
In order to have a better understanding of matplotlib, let us start to understand it from some of the most basic concepts, and then gradually transition to some advanced techniques.
Two, a simplest drawing example
Matplotlib images are drawn on figures (such as windows, jupyter forms), and each figure contains one or more axes (a subarea that can specify a coordinate system). The easiest way to create figures and axes is through pyplot.subplots
commands. After creating axes, you can use to Axes.plot
draw the simplest line chart.
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots() # 创建一个包含一个axes的figure
ax.plot([1, 2, 3, 4], [1, 4, 2, 3]) # 绘制图像
[<matplotlib.lines.Line2D at 0x29936f8d588>]
Similar to the MATLAB command, you can also draw an image in a simpler way. The matplotlib.pyplot
method can draw an image directly on the current axes. If the user does not specify an axis, matplotlib will automatically create one for you. So the above example can also be simplified to the following line of code.
plt.plot([1, 2, 3, 4], [1, 4, 2, 3])
[<matplotlib.lines.Line2D at 0x29937023948>]
Three, the composition of Figure
Now let’s take a closer look at the composition of the figure. Through a figure anatomy, we can see that a complete matplotlib image usually includes the following four levels, which are also called containers, which will be described in detail in the next section. In the world of matplotlib, we will manipulate each part of the image through various command methods to achieve the final effect of data visualization. A complete image is actually a collection of various sub-elements.
-
Figure
: The top level, used to accommodate all drawing elements -
Axes
: The core of the matplotlib universe, containing a large number of elements to construct a sub-picture, a figure can be composed of one or more sub-pictures -
Axis
: The subordinate level of axes, used to process all elements related to axes and grids -
Tick
: The subordinate level of axis, used to process all elements related to scale
Fourth, the input type of the drawing function
This is true for all drawing functions except numpy. numpy.array or numpy.ma.masked_array as input. Classes are "class arrays", such as pandas data objects and numpy. numpy.matrix may or may not work as expected. It is best to convert these to numpy. Draw before the array object.
- For example, to convert a pandas.DataFrame
import pandas
a = pandas.DataFrame(np.random.rand(4, 5), columns = list('abcde'))
a_asarray = a.values
a_asarray
array([[0.64315283, 0.71255487, 0.63248487, 0.12595602, 0.99449794],
[0.16308167, 0.61709654, 0.66262482, 0.80364086, 0.62175289],
[0.26725519, 0.40534232, 0.80591293, 0.18886383, 0.76421408],
[0.45280963, 0.54372388, 0.51494903, 0.77928021, 0.39712175]])
- Convert numpy.matrix
b = np.matrix([[1, 2], [3, 4]])
b_asarray = np.asarray(b)
b
matrix([[1, 2],
[3, 4]])
b_asarray
array([[1, 2],
[3, 4]])
Five, two kinds of drawing interface
matplotlib provides the two most commonly used drawing interfaces
-
Explicitly create figures and axes, and call drawing methods on them, also known as OO mode (object-oriented style)
-
Rely on pyplot to automatically create figures and axes, and draw
Use the first drawing interface, like this:
x = np.linspace(0, 2, 100) # 这个函数返回0-2之间均有间隔的数字100个
fig, ax = plt.subplots()
ax.plot(x, x, label='linear')
ax.plot(x, x**2, label='quadratic')
ax.plot(x, x**3, label='cubic')
ax.set_xlabel('x label')
ax.set_ylabel('y label')
ax.set_title("Simple Plot")
ax.legend() # legend()用于显示图标
<matplotlib.legend.Legend at 0x299370a6d88>
If you use the second drawing interface to draw the same graph, the code is like this:
x = np.linspace(0, 2, 100)
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')
plt.xlabel('x label')
plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
<matplotlib.legend.Legend at 0x29937137e48>
In fact, for the case of embedding Matplotlib in GUI applications, there is a third method, which completely removes pyplot, even for graph creation. We won't discuss it here; see the corresponding section in the gallery for more information (Matplotlib embedded in the graphical user interface).
Matplotlib's documentation and examples use both OO and pyplot methods (both methods are equally powerful), you should be free to use either method (however, it is better to choose one method and stick to it instead of mixing they). In general, we recommend limiting pyplot to interactive plots (for example, in Jupyter notebooks), and for non-interactive plots (in functions and scripts that are intended to be reused as part of a larger project), the ooc style is preferred.
Six, functional mapping
Usually, people will find themselves drawing the same plot over and over again, but using different data sets, which leads to the need to write specialized functions to draw the plot. The recommended function signature is like this
def my_plotter(ax, data1, data2, param_dict):
"""
A helper function to make a graph
Parameters
----------
ax : Axes
The axes to draw to
data1 : array
The x data
data2 : array
The y data
param_dict : dict
Dictionary of kwargs to pass to ax.plot
Returns
-------
out : list
list of artists added
"""
out = ax.plot(data1, data2, **param_dict)
return out
You can do
data1, data2, data3, data4 = np.random.randn(4, 100)
fig, ax = plt.subplots(1, 1)
my_plotter(ax, data1, data2, {
'marker': 'x'})
[<matplotlib.lines.Line2D at 0x7ff6082f6e10>]
At the same time, you can also do
fig, (ax1, ax2) = plt.subplots(1, 2)
my_plotter(ax1, data1, data2, {
'marker':'x'})
my_plotter(ax2, data3, data4, {
'marker':'o'})
[<matplotlib.lines.Line2D at 0x7ff604ec0c90>]
Reference
1. Matplotlib official website user guide
operation
When do you usually use data visualization in your work or study, and what do you hope to achieve through visualization?