Pandas Data Analysis: Quick Chart Visualization Detailed Explanation of Various Operations + Example Code (1)

Table of contents

foreword

1. Basic drawing: plot

Second, the base map

 1. Bar Chart

2. Histogram

Pay attention, prevent getting lost, if there are any mistakes, please leave a message for advice, thank you very much


foreword

Generally, when we are doing data mining or data analysis, or when we are developing big data to extract the data in the database, we can only look left and right with the tabular data, and we always hope that we can immediately generate a data visualization according to what we want. charts to present data more intuitively. When we want to visualize data, we often need to call a lot of libraries and functions, as well as data conversion and a lot of code processing and writing. This is very tedious work. It is true that only for data visualization, we do not need engineering programming to realize data visualization. This is all done by data analysts and professional reporting tools. For daily analysis, we can directly perform according to our own needs. It is enough to quickly generate a picture, and Pandas just has this function. Of course, it still relies on the matplotlib library, but it is easier to compress the code. Let's take a look at how to quickly draw a picture.

The Pandas data analysis series column has been updated for a long time, basically covering all aspects of using pandas to deal with daily business and routine data analysis. It took a lot of time and thought to create from the basic data structure to the processing of various data and the professional explanation of common functions of pandas. If you have friends who need to be engaged in data analysis or big data development, you can recommend and subscribe to the column, which will be in the first time. Learn the most practical and common knowledge of Pandas data analysis. This blog is long and involves various operations such as processing text data (str/object). It is worth reading and practicing. I will pick out the essence of Pandas and discuss it in detail. Bloggers will maintain blog posts for a long time. If you have any mistakes or doubts, you can point them out in the comment area. Thank you for your support.


1. Basic drawing: plot

The plot method on Series and DataFrame is just a simple wrapper for plt.plot(). Here we use a piece of actual data for visualization:

 This is a piece of real subway traffic characteristic data, we use this data to show:

df_flow['客流量'].plot()

 If the index consists of dates, calling the gcf().autofmt_xdate() method nicely formats the x-axis.

 On a DataFrame, plot() conveniently plots all columns with labels:

df_flow_mark[['湿度','风级','降水量']].plot()

 

 You can use the x and y keywords in plot() to plot one column against another, for example we want to use Saturday's traffic versus Sunday's traffic:

df_flow_7=df_flow[df_flow['日期']=='星期日'].iloc[:7,:]
df_flow_7.rename(columns={'客流量':'星期日客流量'},inplace=True)
df_flow_6=df_flow[df_flow['日期']=='星期六'].iloc[:7,:]
df_flow_6.rename(columns={'客流量':'星期六客流量'},inplace=True)
df_compare=pd.concat([columns_convert_df(df_flow_7['星期日客流量']),columns_convert_df(df_flow_6['星期六客流量'])],axis=1)
df_compare.plot(x='星期日客流量',y='星期六客流量')

 

Second, the base map

According to the kind keyword after Pandas packaging, let's sort out the types of basemaps:

These other plots can also be created using the DataFrame.plot method <kind> instead of providing the kind keyword argument. This makes it easier to discover plot methods and the specific parameters they use:

df.plot.area     df.plot.barh     df.plot.density  df.plot.hist     df.plot.line     df.plot.scatter
df.plot.bar      df.plot.box      df.plot.hexbin   df.plot.kde      df.plot.pie

In addition to these types, there are DataFrame.hist() and DataFrame.boxplot() methods, which use separate interfaces.

Finally, there are several plotting functions in pandas. Plot with a Series or DataFrame as argument. These include:

They are:

  • Scattering matrix
  • Andrews curve
  • Parallel coordinates
  • Lag graph
  • autocorrelation plot
  • Guide map
  • Radwitz figure

Plots can also be decorated with error bars or tables.

 1. Bar Chart

df_flow_mark['客流量'].plot(kind='bar')
df_flow_mark['客流量'].plot.bar()

Multiple label charts can also be drawn together:

df_flow_mark[['风级','降水量']].plot.bar()

 

To generate a stacked bar chart, pass stacked=True:

df_flow_mark[['风级','降水量']].plot.bar(stacked=True)

 

Looking at the default map of maatplotlib for a long time is a bit tired. I change the theme here, and the effect is still the same.

To get a horizontal bar chart you can use the barh method:

df_flow_mark[['风级','降水量']].plot.barh(stacked=True)

 

2. Histogram

Histograms can be plotted using the DataFrame.plo.hist() and Series.plot.hist() methods.

df4 = pd.DataFrame(
    {
        "a": np.random.randn(1000) + 1,
        "b": np.random.randn(1000),
        "c": np.random.randn(1000) - 1,
    },
    columns=["a", "b", "c"],
)
plt.figure();
df4.plot.hist(alpha=0.5)

 

Histograms can be stacked using stacked=True. The bin size can be changed using the bins keyword.

df4.plot.hist(stacked=True, bins=20);

 

Additional keywords supported by matplotlib hist can be passed. For example, horizontal and cumulative histograms can be drawn through orientation='horizontal'and .cumulative=True

 See the hist method and the matplotlib hist documentation for details.
Existing interface DataFrame.hist, but still can use hist to draw histogram

plt.figure();
df_flow_mark['风级'].hist();

 

 DataFrame.hist() can plot histograms of columns over multiple subplots:

plt.figure();
df_flow_mark[['风级','降水量']].diff().hist(color="k", alpha=0.5, bins=50);

 

 The by keyword can be specified to draw grouped histograms:

data = pd.Series(np.random.randn(1000))

data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4));

 

 Additionally, the by keyword can be specified in DataFrame.plot.hist():

data = pd.DataFrame(
    {
        "a": np.random.choice(["x", "y", "z"], 1000),
        "b": np.random.choice(["e", "f", "g"], 1000),
        "c": np.random.randn(1000),
        "d": np.random.randn(1000) - 1
    }
)
data.plot.hist(by=["a", "b"], figsize=(10, 5));

 

Pay attention, prevent getting lost, if there are any mistakes, please leave a message for advice, thank you very much

That's all for this issue. I'm fanstuck, if you have any questions, feel free to leave a message to discuss, see you in the next issue

Guess you like

Origin blog.csdn.net/master_hunter/article/details/126928777