How to Look Great God python graphics and visualization

Python has a lot of visualization tools herein, this only describes Matplotlib.

Matplotlib is a 2D graphics library, which supports both hard copy and cross-system interaction, it can, under the IPython interactive environment, Web applications in use in Python scripts. The project was started in 2002 by the John Hunter, whose aim is to build a MATLAB-style graphics interface for the Python. If one GUI toolkit (e.g. IPython) binding, Matplotlib further includes scaling and translation, such as interactive functions. It not only supports many different operating systems on a variety of back-end GUI, but also to export images to a variety of common appetite (vector) and raster (raster) Figure: PDF, SVG, JPG, PNG, BMP, GIF and so on.

Matplotlib package
called "a picture is worth a thousand words," Many times we need to look through visual way to analyze the data, although there are some Pandas in drawing operations, but comparatively speaking, Matplotlib better in terms of graphics display. Matplotlib Python provides a convenient interface, we can operate on Matplotlib by Pyplot, in most cases, Pyplot MATLAB commands and is somewhat similar.

Introducing a simple operation Matplotlib package (herein require pip install matplotlib):

import matplotlib.pyplot as plt#约定俗成的写法plt
#首先定义两个函数(正弦&余弦)
import numpy as np

X=np.linspace(-np.pi,np.pi,256,endpoint=True)#-π to+π的256个值
C,S=np.cos(X),np.sin(X)
plt.plot(X,C)
plt.plot(X,S)
#在ipython的交互环境中需要这句话才能显示出来
plt.show(

Output Results: Here Insert Picture Description
The basic architecture of drawing commands and their property

Examples of the above we can see that almost all of the frame and drawing properties are prepared with the default settings. What are the basic framework we now look Pyplot drawing is used Photoshop knows, first define a mapping canvas, canvas here is the Figure, then put the other stuff "draw" on the Figure.

1) create a sub-plot in the Figure, and set properties

x=np.linspace(0,10,1000)#X轴数据
y1=np.sin(x)#Y轴数据
y2=np.cos(x**2)#Y轴数据  x**2即x的平方

plt.figure(figsize=(8,4))

plt.plot(x,y1,label="$sin(x)$",color="red",linewidth=2)#将$包围的内容渲染为数学公式
plt.plot(x,y2,"b--",label="$cos(x^2)$")
#指定曲线的颜色和线性,如‘b--’表示蓝色虚线(b:蓝色,-:虚线)

plt.xlabel("Time(s)")
plt.ylabel("Volt")
plt.title("PyPlot First Example")

'''
使用关键字参数可以指定所绘制的曲线的各种属性:
label:给曲线指定一个标签名称,此标签将在图标中显示。如果标签字符串的前后都有字符'$',则Matplotlib会使用其内嵌的LaTex引擎将其显示为数学公式
color:指定曲线的颜色。颜色可以用如下方法表示
       英文单词
       以‘#’字符开头的3个16进制数,如‘#ff0000’表示红色。
       以0~1的RGB表示,如(1.0,0.0,0.0)也表示红色。
linewidth:指定权限的宽度,可以不是整数,也可以使用缩写形式的参数名lw。
'''

plt.ylim(-1.5,1.5)
plt.legend()#显示左下角的图例

plt.show()

2) Create multiple sub-plot in Figure

If you need to draw multiple graphs, you can pass an integer parameter specifies the number of the chart to the Figure, if the specified number of drawing objects already exist, it will not create a new object, but rather let it be the current drawing objects.

fig1=plt.figure(2)
plt.subplot(211)
#subplot(211)把绘图区域等分为2行*1列共两个区域,然后在区域1(上区域)中创建一个轴对象
plt.subplot(212)#在区域2(下区域)创建一个轴对象
plt.show()

Output Results: Here Insert Picture Description
We can also split blocks again by the command (corresponding to the operation in Word split cells)

f1=plt.figure(5)#弹出对话框时的标题,如果显示的形式为弹出对话框的话
plt.subplot(221)
plt.subplot(222)
plt.subplot(212)
plt.subplots_adjust(left=0.08,right=0.95,wspace=0.25,hspace=0.45)
# subplots_adjust的操作时类似于网页css格式化中的边距处理,左边距离多少?
# 右边距离多少?这取决于你需要绘制的大小和各个模块之间的间距
plt.show()

Output Results: Here Insert Picture Description
3) by setting this attribute object Axes of the plot
during the above operation we are drawing pattern on the Figure, but too much When drawing the pattern, and the need to select a different set of small format modules, objects can be very Axes a better solution to this problem.

fig,axes=plt.subplots(nrows=2,ncols=2)#定一个2*2的plot
plt.show()

Output Results: Here Insert Picture Description
Now we need to operate each plot (subplot) command, setting their title and delete the horizontal and vertical coordinate values.

fig,axes=plt.subplots(nrows=2,ncols=2)#定一个2*2的plot
axes[0,0].set(title='Upper Left')
axes[0,1].set(title='Upper Right')
axes[1,0].set(title='Lower Left')
axes[1,1].set(title='Lower Right')

# 通过Axes的flat属性进行遍历
for ax in axes.flat:
#     xticks和yticks设置为空置
    ax.set(xticks=[],yticks=[])
plt.show()

Output: Here Insert Picture Description
In addition, in practical terms, the underlying operating plot operation is the operating Axes object, but if we do not use while operating a plot Axes, which by default is plot.subplot (111), that is actually a plot Axes special case.

4) Save Figure objects

Last operation is saved, our aim is to use drawing in other studies, or hope can save the results down, save operation at this time of need. plt.savefig(r"C:\Users\123\Desktop\save_test.png",dpi=520)#默认像素dpi是80
Obviously the higher the preservation of pixels, the larger the memory. Here only used savefig attribute Figure saved.

Also, in addition to the basic operation of the above, Matplotlib there are other advantages of drawing, here only briefly when drawing it needed to pay attention to things
Seaborn module introduction

Earlier we introduced the simple graphics library and property settings Matplotlib for drawing routine using Pandas drawing functions have been sufficient, but if the study of the properties of Matplotlib API is more thorough, almost no problem can not be solved. But Matplotlib still has its shortcomings, Matplotlib very high degree of automation, however, learn how to set up the system in order to obtain an attractive figure is quite difficult. In order to control the appearance of Matplotlib charts, Seaborn module comes with many custom themes and advanced interface.

1) No effect module added Seaborn

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

np.random.seed(sum(map(ord,"aesthetics")))
#首先定义一个函数用来画正弦函数,可帮助了解可以控制的不同风格参数
def sinplot(flip=1):
    x=np.linspace(0,14,100)
    for i in range(1,7):
        plt.plot(x,np.sin(x+i*0.5)*(7-i)*flip)
sinplot()
plt.show()

Output: Here Insert Picture Description
2) was added Seaborn effect module

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

# 添加了Seaborn模块

np.random.seed(sum(map(ord,"aesthetics")))
#首先定义一个函数用来画正弦函数,可帮助了解可以控制的不同风格参数
def sinplot(flip=1):
    x=np.linspace(0,14,100)
    for i in range(1,7):
        plt.plot(x,np.sin(x+i*0.5)*(7-i)*flip)
#转换成Seaborn模块,只需要引入seaborn模块
import seaborn as sns#添加Seaborn模块
sinplot()
plt.show()

Output effect: Here Insert Picture Description
small series editors used jupyter notebook, with and without obvious difference Seaborn effect module.

The advantage of using Seaborn are:

Seaborn default white background and light gray gridlines inspired Matplotlib, softer colors than Matplotlib
Seaborn the drawing style parameters and data parameters provided separately.
Wherein, Seaborn has two sets of function control style: axes_style () / set_style () function and plotting_context () / set_context () function.

axes_style () function and plotting_context () function returns the dictionary, set_style () function and set_context () function sets Matplotlib.

Use set_style () function

import seaborn as sns

'''
Seaborn有5种预定义的主题:
darkgrid(灰色背景+白网格)
whitegrid(白色背景+黑网格)
dark(仅灰色背景)
white(仅白色背景)
ticks(坐标轴带刻度)
默认的主题是darkgrid,修改主题可以使用set_style函数
'''
sns.set_style("whitegrid")
sinplot()#即上段代码中定义的函数
plt.show()

Output Results: Here Insert Picture Description
Use set_context () function

'''
上下文(context)可以设置输出图片的大小尺寸(scale)
Seaborn中预定义的上下文有4种:paper、notebook、talk和poster
默认使用notebook上下文
'''
sns.set_context("poster")
sinplot()#即前文定义的函数
plt.show()

Output: Here Insert Picture Description
Use Seaborn "be cool"

However Seaborn can be used not only to change the background color, or changing the size of the canvas, there are many other uses, such as the following examples.

'''
Annotated heatmaps
================================
'''
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

#通过加载sns自带数据库中的数据(具体数据可以不关心)
flights_long=sns.load_dataset("flights")
flights=flights_long.pivot("month","year","passengers")

# 使用每个单元格中的数据值绘制一个热力图heatmap
sns.heatmap(flights,annot=True,fmt="d",linewidths=.5)
plt.show()

Output: Here Insert Picture Description
descriptive statistics graphical overview

Descriptive statistics is a statistical means or by means of numerical summary chart to describe the data. Data analysis phase of data mining work, we can make use descriptive statistics to describe or summarize the data of the basic situation, one can sort out their thinking, better display data from the analysis of the results to others. Numerical analysis, we often have to calculate the statistical characteristics of data, used for scientific computing Numpy and Scipy tools to meet our needs. Matplotlib tool can be used to draw the map, diagram analysis of needs.

1) Production data

To produce their own data, including individual height, weight and amount to borrow books a year (why not make your own data because each real data can be analyzed next, for example, some data can not draw a pie chart, another angle also shows an example of where the data actually has no real meaning, just to analysis II, for example, but does not mean that these analyzes can not play a role in specific applications).

Further, the following data are reflected in the action library Seaborn effect.

# 案例分析
from numpy import array
from numpy.random import normal

def getData():
    heights=[]
    weights=[]
    books=[]
    N=10000
    for i in range(N):
        while True:
            #身高服从均值为172,标准差为6的正态分布
            height=normal(172,6)
            if 0<height:
                break
        while True:
            #体重由身高作为自变量的线性回归模型产生,误差服从标准正态分布
            weight=(height-80)*0.7+normal(0,1)
            if 0<weight:
                break
        while True:
            #借阅量服从均值为20,标准差为5的正态分布
            number=normal(20,5)
            if 0<=number and number<=50:
                book='E' if number<10 else ('D' if number<15 else ('C' if number<20 else ('B' if number<25 else 'A')))
                break
        heights.append(height)
        weights.append(weight)
        books.append(book)
   return array(heights),array(weights),array(books)
heights,weights,books=getData()

2) the number of frequency analysis

(1) Qualitative Analysis

It is a bar graph and pie chart tool qualitative data common frequency analysis, to be the calculated number of each class frequency before use.

Histogram. Histogram height of the column is used to refer to a type of frequency, using the amount of borrowing books Matplotlib qualitative variables a histogram of the code as follows. (Continued Segment Code)

from matplotlib import pyplot

#绘制柱状图
def drawBar(books):
    xticks=['A','B','C','D','E']
    bookGroup={}
    #对每一类借阅量进行频数统计
    for book in books:
        bookGroup[book]=bookGroup.get(book,0)+1
    #创建柱状图
    #第一个参数为柱的横坐标
    #第二个参数为柱的高度
    #参数align为柱的对齐方式,以第一个参数为参考标准
    pyplot.bar(range(5),[bookGroup.get(xtick,0) for xtick in xticks],align='center')
    
    #设置柱的文字说明
    #第一个参数为文字说明的横坐标
    #第二个参数为文字说明的内容
    pyplot.xticks(range(5),xticks)
    #设置横坐标的文字说明
    pyplot.xlabel("Types of Students")
    #设置纵坐标的文字说明
    pyplot.ylabel("Frequency")
    #设置标题
    pyplot.title("Numbers of Books Students Read")
    #绘图
    pyplot.show()
drawBar(books)

Output Results: The Here Insert Picture Description
pie chart. FIG matters pie sector area to refer to a type of frequency, using the amount of borrowing books Matplotlib qualitative variables pie chart drawing code as follows:

#绘制饼形图
def drawPie(books):
    labels=['A','B','C','D','E']
    bookGroup={}
    for book in books:
        bookGroup[book]=bookGroup.get(book,0)+1
    #创建饼形图
    #第一个参数是扇形的面积
    #labels参数为扇形的说明文字
    #autopct参数为扇形占比的显示格式
    pyplot.pie([bookGroup.get(label,0) for label in labels],labels=labels,autopct='%1.1f%%')
    pyplot.title("Number of Books Students Read")
    pyplot.show()
drawPie(books)

Output Results: Here Insert Picture Description
(2) Quantitative Analysis

Histogram similar histogram, the height of the post is to refer to frequency, except that the quantitative data which is divided into several consecutive sections, drawn on these successive column sections.

Histogram. Use of this quantitative variables Matplotlib height histogrammed code is as follows:

#绘制直方图
def drawHist(heights):
    #创建直方图
    #第一个参数为待绘制的定量数据,不同于定性数据,这里并没有实现进行频数统计
    #第二个参数为划分的区间个数
    pyplot.hist(heights,100)
    pyplot.xlabel('Heights')
    pyplot.ylabel('Frequency')
    pyplot.title('Height of Students')
    pyplot.show()
drawHist(heights)

Output Results: The Here Insert Picture Description
cumulative curve. Matplotlib code uses the cumulative curve plotted for the quantitative variables Height follows:

#绘制累积曲线
def drawCumulativaHist(heights):
    #创建累积曲线
    #第一个参数为待绘制的定量数据
    #第二个参数为划分的区间个数
    #normal参数为是否无量纲化
    #histtype参数为‘step’,绘制阶梯状的曲线
    #cumulative参数为是否累积
    pyplot.hist(heights,20,normed=True,histtype='step',cumulative=True)
    pyplot.xlabel('Heights')
    pyplot.ylabel('Frequency')
    pyplot.title('Heights of Students')
    pyplot.show()
drawCumulativaHist(heights)

Output: Here Insert Picture Description
3) analysis of the relationship

Scatter plot. In the scattergram, respectively independent and dependent variables as the abscissa. When the independent variable linear correlation with the dependent variable, a scatter plot points approximate distribution in a straight line. We as the independent variable height and weight as the dependent variable, discuss the impact of height on body weight. Matplotlib scattergram drawn using the following code:

#绘制散点图
def drawScatter(heights,weights):
    #创建散点图
    #第一个参数为点的横坐标
    #第二个参数为点的纵坐标
    pyplot.scatter(heights,weights)
    pyplot.xlabel('Heights')
    pyplot.ylabel('Weight')
    pyplot.title('Heights & Weight of Students')
    pyplot.show()
drawScatter(heights,weights)

Output: Here Insert Picture Description
4) explore analysis

Box in FIG. When not explicitly target data analysis, we conducted a number of exploratory analysis of the data, the data center can know the location, extent and degree of divergence deviation. Use Matplotlib box height on drawing FIG code is as follows:

#绘制箱型图
def drawBox(heights):
    #创建箱型图
    #第一个参数为待绘制的定量数据
    #第二个参数为数据的文字说明
    pyplot.boxplot([heights],labels=['Heights'])
    pyplot.title('Heights of Students')
    pyplot.show()
drawBox(heights)

Output: Here Insert Picture Description
Note:

The difference between the upper quartile and the lower quartile called interquartile range, which is a measure of the degree of divergence of the index data
on the boundary line and the lower line is the median distance of 1.5 times the interquartile range, higher than or below the lower boundary of the boundary data as an outlier
descriptive statistics it is easy, simple and intuitive data analysis means. However, due to the simple, multivariate relationship is difficult to describe. In real life, the argument usually is diverse: there are factors that determine the weight of not only height, as well as dietary habits, obesity genes. , We may be processed by some advanced multivariate data processing means, for example, engineering features may be used to select a plurality of mutual information strong correlation characteristics as independent variables to the dependent variable, a main component may also be used analysis to eliminate some redundancy argument to reduce the computational complexity.
Content on more than how many, and finally to recommend a good reputation in the number of public institutions [programmers], there are a lot of old-timers learning skills, learning experience, interview skills, workplace experience and other share, the more we carefully prepared the zero-based introductory information on actual project data every day to explain the timing of Python programmers technology, and share some learning methods need to pay attention to small detailsHere Insert Picture Description

Published 29 original articles · won praise 0 · views 10000 +

Guess you like

Origin blog.csdn.net/chengxun02/article/details/105017183