Pandas data analysis: detailed explanation of various operations for fast chart visualization + example code (2)

Table of contents

foreword

1. Box plot

2. Area fill map

3. Scatter plot

Pay attention, prevent getting lost, if there are any mistakes, please leave a message for advice, thank you very much

 


foreword

Generally, when we are doing data mining or data analysis, or when we are developing big data to extract the data in the database, we can only look left and right with the tabular data, and we always hope that we can immediately generate a data visualization according to what we want. charts to present the data more intuitively. When we want to visualize data, we often need to call a lot of libraries and functions, as well as data conversion and a lot of code processing and writing. This is very tedious work. It is true that only for data visualization, we do not need engineering programming to realize data visualization. This is all done by data analysts and professional reporting tools. For daily analysis, we can directly perform according to our own needs. It is enough to quickly generate a picture, and Pandas just has this function. Of course, it still relies on the matplotlib library, but it is easier to compress the code. Let's take a look at how to quickly draw a picture.

Pandas Data Analysis: Quick Chart Visualization Detailed Explanation of Various Operations + Example Code (1)

The Pandas data analysis series column has been updated for a long time, basically covering all aspects of using pandas to deal with daily business and routine data analysis. It took a lot of time and thought to create from the basic data structure to the processing of various data and the professional explanation of common functions of pandas. If you have friends who need to be engaged in data analysis or big data development, you can recommend and subscribe to the column, which will be in the first time. Learn the most practical and common knowledge of Pandas data analysis. This blog is long and involves various operations such as data visualization. It is worth reading and practicing. I will pick out the essence of Pandas and discuss it in detail. Bloggers will maintain blog posts for a long time. If you have any mistakes or doubts, you can point them out in the comment area. Thank you for your support.


1. Box plot

The data is still the data of the previous article:

 transfer

  • Series.plot.box()
  • DataFrame.plot.box()
  • DataFrame.boxplot()

A boxplot can be drawn to visualize the distribution of values ​​in each column.

df_flow_mark[['风级','降水量']].plot.box()

 

 Boxplots can be colored by passing the color keyword. You can pass a dictionary dict with keys as boxes, whiskers, medians, caps. If some keys are missing in the dict, default colors are used for the corresponding ones. Additionally, boxplots have the sym keyword to specify the leaflet style.

color = {
    "boxes": "DarkGreen",
    "whiskers": "DarkOrange",
    "medians": "DarkBlue",
    "caps": "Gray",
}
df_flow_mark[['风级','降水量']].plot.box(color=color, sym="r+")

 

 Creating a dataset is more obvious:

df = pd.DataFrame(np.random.rand(10, 5), columns=["A", "B", "C", "D", "E"])
color = {
    "boxes": "DarkGreen",
    "whiskers": "DarkOrange",
    "medians": "DarkBlue",
    "caps": "Gray",
}
df.plot.box(color=color, sym="r+")

 

Additionally, other keywords supported by matplotlib boxplots can be passed. For example, horizontal and custom positioned boxplots can be drawn via vert=False and the positions keyword.

df.plot.box(vert=False, positions=[1, 4, 5, 6, 8])

 

 Existing interfaces can still use DataFrame.boxplot:

df.boxplot()

 Grouping can be created by creating a hierarchical boxplot using the by keyword argument. E.g,

df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
plt.figure();
bp = df.boxplot(by="X")

 

 

 You can also pass a subset of columns to print, as well as group by multiple columns:

df = pd.DataFrame(np.random.rand(10, 3), columns=["Col1", "Col2", "Col3"])
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
df["Y"] = pd.Series(["A", "B", "A", "B", "A", "B", "A", "B", "A", "B"])
plt.figure();
bp = df.boxplot(column=["Col1", "Col2"], by=["X", "Y"])

 

 Same thing with DataFrame.plot.box():

df = pd.DataFrame(np.random.rand(10, 3), columns=["Col1", "Col2", "Col3"])
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
plt.figure()
bp = df.plot.box(columns=["Col1", "Col2"], by="X")

In the boxplot, the return type can be controlled by the return_type, keyword. Valid options are {"axes", "dict", "both", "None}. Faceted, boxplots created by DataFrame.boxplot by keyword also affect the output type:

np.random.seed(1234)
df_box = pd.DataFrame(np.random.randn(50, 2))
df_box["g"] = np.random.choice(["A", "B"], size=50)
df_box.loc[df_box["g"] == "B", 1] += 3
bp = df_box.boxplot(by="g")

 

 The subplots above are split first by the numeric column and then by the value of the g column. The subplots below are split first by g-values ​​and then by numeric columns.

bp = df_box.groupby("g").boxplot()

 

2. Area fill map

Area charts can be created using Series.plot.area() and DataFrame.plot.area(). By default, area charts are stacked. To generate a stacked area chart, each column must be either all positive or all negative.

When the input data contains NaNs, it will be automatically filled with 0s. If you want to drop or fill with different values, you can use DataFrame.dropna() or DataFrame.fillna() before calling plot.

The code is as follows (example):

df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
df.plot.area();

 

 To generate unstacked plots, pass stacked=False. The Alpha value is set to 0.5.

df.plot.area(stacked=False);

 

3. Scatter plot

 Scatter plots can be drawn using the DataFrame.plot.scatter() method, which requires numeric columns for the x- and y-axes. These can be specified by the x and y keywords.

df_flow_mark.plot.scatter(x='日期',y='客流量')

To plot multiple column groups on a single axis, you can repeat the print method specifying the target axis. It is recommended to specify color and label keywords to distinguish each group.

df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])
df["species"] = pd.Categorical(
    ["setosa"] * 20 + ["versicolor"] * 20 + ["virginica"] * 10
)
ax = df.plot.scatter(x="a", y="b", color="DarkBlue", label="Group 1")
df.plot.scatter(x="c", y="d", color="DarkGreen", label="Group 2", ax=ax);

The keyword c can be used as the name of a column, giving each point a color:

df.plot.scatter(x="a", y="b", c="c", s=50);

If you pass a categorical column to c, a discrete colorbar will be generated:

df.plot.scatter(x="a", y="b", c="species", cmap="viridis", s=50);

Additional keywords supported by matplotlib.scatter can be passed. The example below shows a bubble chart that uses one column of the DataFrame as the bubble size.

df_flow_mark.plot.scatter(x='日期',y='客流量',s=df_flow_mark['湿度']*200)

Pay attention, prevent getting lost, if there are any mistakes, please leave a message for advice, thank you very much

That's all for this issue. I'm fanstuck, if you have any questions, feel free to leave a message to discuss, see you in the next issue


Guess you like

Origin blog.csdn.net/master_hunter/article/details/126956553