Friends who have studied Python data analysis know that there are many excellent third-party libraries in visualization tools, such as matplotlib, seaborn, plotly, Boken, pyecharts and so on. These visualization libraries have their own characteristics and are widely used in practical applications.
Plotly, Boken, etc. are all interactive visualization tools. Combined with Jupyter notebook, they can display the analyzed results very flexibly and conveniently. Although the effect is very cool, such as plotly, but every time you need to write a long code, one is troublesome, and the other is not easy to maintain.
I think that in the data analysis stage, more time should be spent on analysis, dimension selection, dismantling and merging, business understanding and judgment. If you can reduce the amount of code and make cool visualization effects, it will greatly improve efficiency. Of course, unless there are special needs, this method is only for those who want to quickly visualize and analyze.
Technology Exchange
Technology must learn to share and communicate, and it is not recommended to work behind closed doors. One person can go fast, and a group of people can go farther.
Relevant files and codes have been uploaded, and can be obtained by adding to the communication group. The group has more than 2,000 members. The best way to add notes is: source + interest direction, so that it is convenient to find like-minded friends.
Method ①, add WeChat account: dkl88194, remarks: from CSDN + add group
Method ②, WeChat search official account: Python learning and data mining, background reply: add group
This article introduces you to a great tool, cufflinks , which can perfectly solve this problem, and the effect is just as cool.
Introduction to cufflinks
Just like seaborn encapsulates matplotlib, cufflinks has made a further package on the basis of plotly, with unified methods and simple parameter configuration. Secondly, it can also freely and flexibly draw pictures in combination with pandas dataframe. It can be described as **"pandas like visualization"**
It is no exaggeration to say that I only need one line of code to draw all kinds of cool visual graphics, which is very efficient and lowers the threshold for use.
The github link of cufflinks is as follows:
https://github.com/santosjorge/cufflinks
cufflinks install
Not much to say about the installation, just pip install directly.
pip install cufflinks
How are cufflinks used?
The cufflinks library has been continuously updated, and the latest version is V0.14.0, which supports plotly3.0. First, let's see what kinds of graphics it supports, which can be viewed through help.
import cufflinks as cf
cf.help()
Use 'cufflinks.help(figure)' to see the list of available parameters for the given figure.
Use 'DataFrame.iplot(kind=figure)' to plot the respective figure
Figures:
bar
box
bubble
bubble3d
candle
choroplet
distplot
heatmap
histogram
ohlc
pie
ratio
scatter
scatter3d
scattergeo
spread
surface
violin
The method of use is actually very simple. Let me summarize, its format is roughly like this:
-
DataFrame: A data frame representing pandas;
-
Figure: represents the drawable graphics we saw above, such as bar, box, histogram, etc.;
-
iplot: represents the drawing method, in which there are many parameters that can be configured to adjust the visual graphics that match your own style;
cufflinks instance
We use a few examples to experience the above usage. Friends who have used plotly may know that if the online mode is used, the generated graphics are limited. Therefore, we set it to offline mode first, so as to avoid the problem of number limit.
import pandas as pd
import cufflinks as cf
import numpy as np
cf.set_config_file(offline=True)
Then we need to operate according to the above usage format. First, we need to have a DataFrame. If there is no data at hand, we can generate a random number first. cufflinks has a method for generating random numbers called datagen, which is used to generate random data of different dimensions, such as the following.
lines line graph
cf.datagen.lines(1,500).ta_plot(study='sma',periods=[13,21,55])
1) cufflinks uses datagen to generate random numbers;
2) The figure is defined in the form of lines, and the data is (1,500);
3) Then use ta_plot to draw this set of time series, and set the parameters to SMA to show the timing analysis of three different periods.
box box plot
Still the same as the above usage, one line of code solves it.
cf.datagen.box(20).iplot(kind='box',legend=False)
It can be seen that each box on the x-axis has a corresponding name. This is because cufflinks recognizes the box graphic through the kind parameter and automatically generates a name for it. If we only generate random numbers, it looks like this. By default, 100 rows of randomly distributed data are generated, and the number of columns is selected by ourselves.
histogram histogram
cf.datagen.histogram(3).iplot(kind='histogram')
Like plotly, we can distinguish and select the specified area through some auxiliary small tool box selection or lasso selection, as long as one line of code.
Of course, in addition to random data, any other dataframe data frame is fine, including the data we imported ourselves.
histogram bar graph
df=pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.iplot(kind='bar',barmode='stack')
Above we generated a (10,4) dataframe data frame, the names are a, b, c, d. Then cufflinks will automatically identify and draw graphics according to the kind type in iplot. The parameter is set to stack mode.
scatter scatter plot
df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df.iplot(kind='scatter',mode='markers',colors=['orange','teal','blue','yellow'],size=10)
bubble bubble chart
df.iplot(kind='bubble',x='a',y='b',size='c')
scatter matrix scatter matrix plot
df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
df.scatter_matrix()
subplots subplots
df=cf.datagen.lines(4)
df.iplot(subplots=True,shape=(4,1),shared_xaxes=True,vertical_spacing=.02,fill=True)
df.iplot(subplots=True,subplot_titles=True,legend=False)
Another example is a little more complicated.
df=cf.datagen.bubble(10,50,mode='stocks')
figs=cf.figures(df,[dict(kind='histogram',keys='x',color='blue'),
dict(kind='scatter',mode='markers',x='x',y='y',size=5),
dict(kind='scatter',mode='markers',x='x',y='y',size=5,color='teal')],asList=True)
figs.append(cf.datagen.lines(1).figure(bestfit=True,colors=['blue'],bestfit_colors=['pink']))
base_layout=cf.tools.get_base_layout(figs)
sp=cf.subplots(figs,shape=(3,2),base_layout=base_layout,vertical_spacing=.15,horizontal_spacing=.03,
specs=[[{
'rowspan':2},{
}],[None,{
}],[{
'colspan':2},None]],
subplot_titles=['Histogram','Scatter 1','Scatter 2','Bestfit Line'])
sp['layout'].update(showlegend=False)
cf.iplot(sp)
shapes shape map
If we want to add some straight lines on the lines diagram as a reference, we can use the hlines type diagram at this time.
df=cf.datagen.lines(3,columns=['a','b','c'])
df.iplot(hline=[dict(y=-1,color='blue',width=3),dict(y=1,color='pink',dash='dash')])
Or to mark a certain area, you can use the hspan type.
df.iplot(hspan=[(-1,1),(2,5)])
Or the vertical bar area, you can use the vspan type.
df.iplot(vspan={
'x0':'2015-02-15','x1':'2015-03-15','color':'teal','fill':True,'opacity':.4})
If you are not familiar with the parameters in iplot, you can directly enter the following code to query.
help(df.iplot)
Summarize
How about it, isn't it very fast and convenient? The above introduction is a general drawable type, of course, you can make more visual graphics according to your needs. If it is a regular graph, it can be realized in one line. In addition, cufflinks also has powerful color management functions, if you are interested, you can learn by yourself.