Recommend a Python visualization artifact with one line of code

Friends who have studied Python data analysis know that there are many excellent third-party libraries in visualization tools, such as matplotlib, seaborn, plotly, Boken, pyecharts and so on. These visualization libraries have their own characteristics and are widely used in practical applications.

Plotly, Boken, etc. are all interactive visualization tools. Combined with Jupyter notebook, they can display the analyzed results very flexibly and conveniently. Although the effect is very cool, such as plotly, but every time you need to write a long code, one is troublesome, and the other is not easy to maintain.

I think that in the data analysis stage, more time should be spent on analysis, dimension selection, dismantling and merging, business understanding and judgment. If you can reduce the amount of code and make cool visualization effects, it will greatly improve efficiency. Of course, unless there are special needs, this method is only for those who want to quickly visualize and analyze.

Technology Exchange

Technology must learn to share and communicate, and it is not recommended to work behind closed doors. One person can go fast, and a group of people can go farther.

Relevant files and codes have been uploaded, and can be obtained by adding to the communication group. The group has more than 2,000 members. The best way to add notes is: source + interest direction, so that it is convenient to find like-minded friends.

Method ①, add WeChat account: dkl88194, remarks: from CSDN + add group
Method ②, WeChat search official account: Python learning and data mining, background reply: add group

This article introduces you to a great tool, cufflinks , which can perfectly solve this problem, and the effect is just as cool.

Introduction to cufflinks

Just like seaborn encapsulates matplotlib, cufflinks has made a further package on the basis of plotly, with unified methods and simple parameter configuration. Secondly, it can also freely and flexibly draw pictures in combination with pandas dataframe. It can be described as **"pandas like visualization"**

It is no exaggeration to say that I only need one line of code to draw all kinds of cool visual graphics, which is very efficient and lowers the threshold for use.

The github link of cufflinks is as follows:

https://github.com/santosjorge/cufflinks

cufflinks install

Not much to say about the installation, just pip install directly.

pip install cufflinks

How are cufflinks used?

The cufflinks library has been continuously updated, and the latest version is V0.14.0, which supports plotly3.0. First, let's see what kinds of graphics it supports, which can be viewed through help.

import cufflinks as cf
cf.help()

Use 'cufflinks.help(figure)' to see the list of available parameters for the given figure.
Use 'DataFrame.iplot(kind=figure)' to plot the respective figure
Figures:
  bar
  box
  bubble
  bubble3d
  candle
  choroplet
  distplot
  heatmap
  histogram
  ohlc
  pie
  ratio
  scatter
  scatter3d
  scattergeo
  spread
  surface
  violin

The method of use is actually very simple. Let me summarize, its format is roughly like this:

picture

  • DataFrame: A data frame representing pandas;

  • Figure: represents the drawable graphics we saw above, such as bar, box, histogram, etc.;

  • iplot: represents the drawing method, in which there are many parameters that can be configured to adjust the visual graphics that match your own style;

cufflinks instance

We use a few examples to experience the above usage. Friends who have used plotly may know that if the online mode is used, the generated graphics are limited. Therefore, we set it to offline mode first, so as to avoid the problem of number limit.

import pandas as pd
import cufflinks as cf
import numpy as np

cf.set_config_file(offline=True)

Then we need to operate according to the above usage format. First, we need to have a DataFrame. If there is no data at hand, we can generate a random number first. cufflinks has a method for generating random numbers called datagen, which is used to generate random data of different dimensions, such as the following.

lines line graph

cf.datagen.lines(1,500).ta_plot(study='sma',periods=[13,21,55])

1) cufflinks uses datagen to generate random numbers;

2) The figure is defined in the form of lines, and the data is (1,500);

3) Then use ta_plot to draw this set of time series, and set the parameters to SMA to show the timing analysis of three different periods.

picture

box box plot

Still the same as the above usage, one line of code solves it.

cf.datagen.box(20).iplot(kind='box',legend=False)

picture

It can be seen that each box on the x-axis has a corresponding name. This is because cufflinks recognizes the box graphic through the kind parameter and automatically generates a name for it. If we only generate random numbers, it looks like this. By default, 100 rows of randomly distributed data are generated, and the number of columns is selected by ourselves.

picture

histogram histogram

cf.datagen.histogram(3).iplot(kind='histogram')

picture

Like plotly, we can distinguish and select the specified area through some auxiliary small tool box selection or lasso selection, as long as one line of code.

Of course, in addition to random data, any other dataframe data frame is fine, including the data we imported ourselves.

histogram bar graph

df=pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.iplot(kind='bar',barmode='stack')

picture

Above we generated a (10,4) dataframe data frame, the names are a, b, c, d. Then cufflinks will automatically identify and draw graphics according to the kind type in iplot. The parameter is set to stack mode.

scatter scatter plot

df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df.iplot(kind='scatter',mode='markers',colors=['orange','teal','blue','yellow'],size=10)

picture

bubble bubble chart

df.iplot(kind='bubble',x='a',y='b',size='c')

picture

scatter matrix scatter matrix plot

df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
df.scatter_matrix()

picture

subplots subplots

df=cf.datagen.lines(4)
df.iplot(subplots=True,shape=(4,1),shared_xaxes=True,vertical_spacing=.02,fill=True)

picture

df.iplot(subplots=True,subplot_titles=True,legend=False)

picture

Another example is a little more complicated.

df=cf.datagen.bubble(10,50,mode='stocks')
figs=cf.figures(df,[dict(kind='histogram',keys='x',color='blue'),
                    dict(kind='scatter',mode='markers',x='x',y='y',size=5),
                    dict(kind='scatter',mode='markers',x='x',y='y',size=5,color='teal')],asList=True)
figs.append(cf.datagen.lines(1).figure(bestfit=True,colors=['blue'],bestfit_colors=['pink']))
base_layout=cf.tools.get_base_layout(figs)
sp=cf.subplots(figs,shape=(3,2),base_layout=base_layout,vertical_spacing=.15,horizontal_spacing=.03,
               specs=[[{
    
    'rowspan':2},{
    
    }],[None,{
    
    }],[{
    
    'colspan':2},None]],
               subplot_titles=['Histogram','Scatter 1','Scatter 2','Bestfit Line'])
sp['layout'].update(showlegend=False)
cf.iplot(sp)

picture

shapes shape map

If we want to add some straight lines on the lines diagram as a reference, we can use the hlines type diagram at this time.

df=cf.datagen.lines(3,columns=['a','b','c'])
df.iplot(hline=[dict(y=-1,color='blue',width=3),dict(y=1,color='pink',dash='dash')])

picture

Or to mark a certain area, you can use the hspan type.

df.iplot(hspan=[(-1,1),(2,5)])

picture

Or the vertical bar area, you can use the vspan type.

df.iplot(vspan={
    
    'x0':'2015-02-15','x1':'2015-03-15','color':'teal','fill':True,'opacity':.4})

picture

If you are not familiar with the parameters in iplot, you can directly enter the following code to query.

help(df.iplot)

Summarize

How about it, isn't it very fast and convenient? The above introduction is a general drawable type, of course, you can make more visual graphics according to your needs. If it is a regular graph, it can be realized in one line. In addition, cufflinks also has powerful color management functions, if you are interested, you can learn by yourself.

Guess you like

Origin blog.csdn.net/qq_34160248/article/details/132652477