Data visualization and Matplotlib

Data visualization and Matplotlib

Processing, analysis and visualization of data Python recent years has become one of the most important application areas, visualization of data refers to data presented as beautiful charts, and further found that the data contained in the law as well as hidden information. Any connection with data visualization and data mining analysis is closely related to big data, and these areas are hot and the moment of "deep learning" The ultimate goal is to achieve to predict the future situation from the past data. Python is in great data visualization, even with a personal computer can achieve one million even greater amount of data to explore the body of work, which can have up to complete work on the basis of existing third-party libraries (no "duplicate wheel"). Matplotlib is a Python library drawing in the crowd, it contains a number of tools, you can create a variety of graphics (including scatter plots, line charts, histograms, pie charts, radar charts, etc.) to use these tools, Python scientific computing community it is also often used to complete the job data visualization.

Installation matplotlib

You may be used to install pip matplotlib, the command as follows.

pip install matplotlib

Draw a line chart

# coding: utf-8
import matplotlib.pyplot as plt


def main():
    # 保存x轴数据的列表
    x_values = [x for x in range(1, 11)]
    # 保存y轴数据的列表
    y_values = [x ** 2 for x in range(1, 11)]
    # 设置图表的标题以及x和y轴的说明
    plt.title('Square Numbers')
    plt.xlabel('Value', fontsize=18)
    plt.ylabel('Square', fontsize=18)
    # 设置刻度标记的文字大小
    plt.tick_params(axis='both', labelsize=16)
    # 绘制折线图
    plt.plot(x_values, y_values)
    plt.show()


if __name__ == '__main__':
    main()

Running the program, results as shown below.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-94nBGppi-1581379107399) (./ res / result1.png)]

If jupyter a notebook, requires the use of magic instruction %matplotlib inreslineto set the display shown in the graph on the page, the following effects.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-9XBWXjnB-1581379107400) (./ res / result-in-jupyter.png)]

Draw a scatter plot

The above code may be plota function into scatterthe function plotted scattergram results as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-f19bKXt5-1581379107400) (./ res / result2.png)]

Of course, also directly by plotthe shape and color of the line drawing function provided in the line graph is a scattergram transformation, the corresponding code is as follows, wherein the parameter 'xr' each point is represented by symbol 'x' pattern, color red ( r ED).

plt.plot(x_values, y_values, 'xr')

Re-run the program, results as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-cBR2AgIe-1581379107401) (./ res / result3.png)]

You may have noticed, 1 and 10 corresponding to 'x' mark at the position of the pattern is less obvious corners, to solve this problem can adjust the range of x coordinates and y axes by adding the following code.

plt.axis([0, 12, 0, 120])

Effect adjusted as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-VM39ieMP-1581379107401) (./ res / result4.png)]

Drawing sinusoidal

In the following procedure, we use called NumPy third-party libraries to generate a sample and calculate the sine value. NumPy is running very fast math library, mainly used for an array of computing. It allows you to use vector and matrix math, as well as many of the underlying C language function in Python. If you want to learn or scientific data related content through machine learning Python, then you have to learn to use NumPy.

# coding: utf-8
import matplotlib.pyplot as plt
import numpy as np


def main():
    # 指定采样的范围以及样本的数量
    x_values = np.linspace(0, 2 * np.pi, 1000)
    # 计算每个样本对应的正弦值
    y_values = np.sin(x_values)
    # 绘制折线图(线条形状为--, 颜色为蓝色)
    plt.plot(x_values, y_values, '--b')
    plt.show()


if __name__ == '__main__':
    main()

Running the program, results as shown below.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-MijtY0vT-1581379107401) (./ res / result5.png)]

If a plurality of images to be drawn in a coordinate system, the code may be modified in the following manner.

# coding: utf-8
import matplotlib.pyplot as plt
import numpy as np


def main():
    x_values = np.linspace(0, 2 * np.pi, 1000)
    plt.plot(x_values, np.sin(x_values), '--b')
    plt.plot(x_values, np.sin(2 * x_values), '--r')
    plt.show()


if __name__ == '__main__':
    main()

Code modified operational effects as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-q8uRixS9-1581379107402) (./ res / result6.png)]

If desired the two curves are plotted in the two coordinate systems, you may operate in the following manner.

# coding: utf-8
import matplotlib.pyplot as plt
import numpy as np


def main():
    # 将样本数量减少为50个
    x_values = np.linspace(0, 2 * np.pi, 50)
    # 设置绘图为2行1列活跃区为1区(第一个图)
    plt.subplot(2, 1, 1)
    plt.plot(x_values, np.sin(x_values), 'o-b')
    # 设置绘图为2行1列活跃区为2区(第二个图)
    plt.subplot(2, 1, 2)
    plt.plot(x_values, np.sin(2 * x_values), '.-r')
    plt.show()


if __name__ == '__main__':
    main()

Results as shown below.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-dekvJ2iV-1581379107402) (./ res / result7.png)]

Draw a histogram

We may be generated by the normal function of the random module NumPy normal sample data, which represents the desired three parameters, standard deviation and number of samples, and then plotted as a histogram of the code as follows.

# coding: utf-8
import matplotlib.pyplot as plt
import numpy as np


def main():
    # 通过random模块的normal函数产生1000个正态分布的样本
    data = np.random.normal(10.0, 5.0, 1000)
    # 绘制直方图(直方的数量为10个)
    plt.hist(data, 10)
    plt.show()


if __name__ == '__main__':
    main()

Operation effect as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-N1uHSbBp-1581379107402) (./ res / result8.png)]

Draw vector using Pygal

Vector graphics (SVG) is a computer graphics using points, the image geometry based on mathematical equations represents in a straight line or a polygon, the application is currently very much an image file format, stands for "Scalable Vector Graphics". And using pixels represent different bitmap image, store the image data based on XML SVG, which is an open standard defined by the W3C vector graphics language, it can be used to design Web clearer images, because regardless of SVG and resolution, without losing detail or clarity when the effects of any amplification. SVG can directly use the code to describe the image, you can also use any word processing tool to open it by changing the SVG code we can make an image with interactive features.

Python Pygal can be used to generate SVG, it can be installed by a pip.

from random import randint
import pygal


def roll_dice(n=1):
	total = 0
	for _ in range(n):
		total += randint(1, 6)
	return total


def main():
    results = []
    # 将两颗色子摇10000次记录点数
    for _ in range(10000):
        face = roll_dice(2)
        results.append(face)
    freqs = []
    # 统计2~12点各出现了多少次
    for value in range(2, 13):
        freq = results.count(value)
        freqs.append(freq)
    # 绘制柱状图
    hist = pygal.Bar()
    hist.title = 'Result of rolling two dice'
    hist.x_labels = [x for x in range(2, 13)]
    hist.add('Frequency', freqs)
    # 保存矢量图
    hist.render_to_file('result.svg')


if __name__ == '__main__':
    main()
    

Run the above procedures, results as shown in FIG.

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-yu4GLAxU-1581379107403) (./ res / result9.png)]

postscript

Matplotlib and NumPy strong we are here just peep the tip of the iceberg, we will use these two third-party libraries in a subsequent contents inside, to the point when we adjourned to introduce other features.


I welcome the attention of the public number, reply keyword " Python ", there will be a gift both hands! ! ! I wish you a successful interview ! ! !

Published 95 original articles · won praise 0 · Views 3055

Guess you like

Origin blog.csdn.net/weixin_41818794/article/details/104257984