Some tips for drawing box plots in python

introduction

In this article, there was a need to use python to draw a box diagram, but there are many requirements, so I keep reprinting it. Today, I suddenly remembered that this thing can be summarized. I just have to think about what to do next, and I have enough I have free time, so I am going to review some basic concepts again.

Box plot principle

Regarding the principle, here are two better-written articles on this site:

Matplotlib - box plot, box plot boxplot () all usage details

Python box plot drawing and extraction of eigenvalues

I am also using these two articles as a reference here. The introduction of the box diagram is the same as the schematic diagram in the second article:

insert image description here

And if python is used to draw a box plot, the specific source code field is:

# Autogenerated by boilerplate.py.  Do not edit as changes will be lost.
@_copy_docstring_and_deprecators(Axes.boxplot)
def boxplot(
        x, notch=None, sym=None, vert=None, whis=None,
        positions=None, widths=None, patch_artist=None,
        bootstrap=None, usermedians=None, conf_intervals=None,
        meanline=None, showmeans=None, showcaps=None, showbox=None,
        showfliers=None, boxprops=None, labels=None, flierprops=None,
        medianprops=None, meanprops=None, capprops=None,
        whiskerprops=None, manage_ticks=True, autorange=False,
        zorder=None, capwidths=None, *, data=None):
    return gca().boxplot(
        x, notch=notch, sym=sym, vert=vert, whis=whis,
        positions=positions, widths=widths, patch_artist=patch_artist,
        bootstrap=bootstrap, usermedians=usermedians,
        conf_intervals=conf_intervals, meanline=meanline,
        showmeans=showmeans, showcaps=showcaps, showbox=showbox,
        showfliers=showfliers, boxprops=boxprops, labels=labels,
        flierprops=flierprops, medianprops=medianprops,
        meanprops=meanprops, capprops=capprops,
        whiskerprops=whiskerprops, manage_ticks=manage_ticks,
        autorange=autorange, zorder=zorder, capwidths=capwidths,
        **({
    
    "data": data} if data is not None else {
    
    }))

(Referenced from: https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/pyplot.py#L2473-L2494)

According to the explanations in the above two articles, some introductions have been changed as follows:

parameter illustrate parameter illustrate
x Specify the data to draw the boxplot, which can be a set of data or multiple sets of data; showcaps Whether to display the two lines at the top and end of the boxplot, which are displayed by default;
notch Whether to display the boxplot in the form of a notch, the default is not a notch, that is, a rectangle showbox Whether to display the box of the box plot, which is displayed by default;
sym Specifies the shape of the abnormal point, which is displayed by a blue + sign by default; showfliers Whether to display abnormal values, the default display;
vert Whether to place the boxplot vertically, the default is vertical, False is horizontal; boxprops Set the properties of the box, such as border color, fill color, etc.;
whis Specify the distance between the upper and lower whiskers and the upper and lower quartiles, and the default is 1.5 times the quartile difference; labels Add labels, legends to boxplots
positions Specify the position of the boxplot, the default is range(1, N+1), N is the number of boxplots; filerprops Set the properties of outliers, such as the shape, size, fill color, etc. of outliers;
widths Specify the width of the boxplot, the default is 0.5; medianprops Set the properties of the median, such as line type, thickness, etc.;
patch_artist Whether to fill the color of the box, the default is False; meanprops Set the properties of the mean, such as point size, color, etc.;
meanline Whether to represent the mean in the form of a line, and the default is to represent it in points; capprops Set the properties of the top and end lines of the boxplot, such as color, thickness, etc.;
showmeans Whether to display the mean value or not by default; whiskerprops Set whisker properties, such as color, thickness, line type, etc.;
manage_ticks Whether to adapt to the position of the label, the default is True; autorange Whether to automatically adjust the range, the default is False;

Then go directly to the actual combat stage.

Box plot drawing

Here is a simplified version directly, because my points are extracted from the people in the drone video stream, so I omit the previous details, and directly give a simplified version, first of all, extract the average pedestrian trajectory:

def throw_time(array,start_x,end_x,y):
    indexs = []
    index = 1
    person_throw_time = []
    for i in range(max(array[:,1])):
        if i == 0:
            continue
        each_person_data = array[array[:,1] == i]

        each_person_data = each_person_data[each_person_data[:,2]>start_x]
        each_person_data = each_person_data[each_person_data[:,2]<end_x]
        each_person_data = each_person_data[each_person_data[:,3]>y]
        if each_person_data.shape[0] < 4:
            continue
        each_person_data[:,2] = each_person_data[:,2] + (each_person_data[:,4] / 2)
        each_person_data[:,3] = each_person_data[:,3] + (each_person_data[:,5] / 2)
        person_time = (each_person_data[-1,0] - each_person_data[0,0])*0.04
        print("person time = ",person_time)
        if person_time < 5:
            continue
        person_throw_time.append(person_time)
        indexs.append(index)
        index = index + 1
    return indexs,person_throw_time

indexs1,person_throw_time1 = throw_time(array1,500,1400,400)      
# print(person_throw_time1)
# [10.36, 9.76, 9.48, 9.56, 6.16, 8.36, 8.6, 8.76, 5.6000000000000005, 9.84, 8.0, 9.88, 8.36, 9.16, 8.0, 8.92, 8.32, 9.68, 7.6000000000000005, 8.24, 7.08, 8.8, 8.6, 9.88, 9.64, 9.36, 10.16, 9.56, 7.4, 9.32, 8.48, 9.88, 9.16, 9.48, 9.64, 8.76]
indexs2,person_throw_time2 = throw_time(array2,500,1400,400)
indexs3,person_throw_time3 = throw_time(array3,450,1300,400)
indexs4,person_throw_time4 = throw_time(array4,600,1400,400)

Then you will get a series of scattered points and their index coordinates, and then draw a picture according to this:

    matplotlib.rc("font", family='Times New Roman')
    plt.ylabel('time(s)', fontsize=18)        
    
    # # 绘图
    ax = plt.subplot()
    ax.boxplot([person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4], widths=0.4, patch_artist=True,showfliers=False,boxprops={
    
    'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={
    
    'color': 'red', 'linewidth': 3})
    # 设置轴坐标值刻度的标签
    ax.set_xticklabels(['List 1', 'List 2', 'List 3', 'List 4'], fontsize=14)
    plt.show()

insert image description here

The code I've chosen here creates a boxplot with four boxes, each containing data from one of the [person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4] lists. The boxes are filled with azure blue and a black border is drawn around them, a red line is drawn at the mean of each box, and no outliers are shown.

Maybe most people do this to meet their needs. I thought it was right at first, because the above is based on some deviation colors of the first version and the improvement of the second version after the error of the legend, but in the end I made it to the sixth version. , and changed the drawing logic again.

Draw a box plot according to the principle

Is there a situation where the demand gives another set of data that I don’t know where it came from, and I hope I can generate a comparison chart, and its data directly gives the 5 points of the box chart without doing too much I didn't have the slightest bit of defense to cover up, just appeared like this, and directly threw me an Excel sheet, me. . . Then I sorted out the data, converted the above my [person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4] into a dataframe and used describe to find its corresponding 5 equal points. Because the real data involves some security issues, it is replaced by simple numbers. Right now:

import pandas as pd

# 假设这是您的四个列表
person_throw_time1 = [1, 2, 3, 4, 5]
person_throw_time2 = [6, 7, 8, 9, 10]
person_throw_time3 = [11, 12, 13, 14, 15]
person_throw_time4 = [16, 17, 18, 19, 20]

# 将四个列表合并成一个dataframe
data = pd.DataFrame({
    
    'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4})

# 使用describe方法计算统计信息
statistics = data.describe()

print(statistics)

Then the corresponding data can be obtained:

           data1      data2      data3      data4
count   5.000000   5.000000   5.000000   5.000000
mean    3.000000   8.000000  13.000000  18.000000
std     1.581139   1.581139   1.581139   1.581139
min     1.000000   6.000000  11.000000  16.000000
25%     2.000000   7.000000  12.000000  17.000000
50%     3.000000   8.000000  13.000000  18.000000
75%     4.000000   9.000000  14.000000  19.000000
max     5.000000  10.000000  15.000000  20.000000

I rearranged it here, and put the three sets of experimental results together (PS: some modifications were made, so the non-standard quintiles):

[
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

But the impact is not big, here is a redrawing of the above data as follows:

import matplotlib.pyplot as plt
import matplotlib

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

# 提取数据和标签
labels = [row[0] for row in data]
box_data = [row[1:] for row in data]

# 设置字体
matplotlib.rc("font", family='Times New Roman')

# 绘制箱型图
fig, ax = plt.subplots()
ax.boxplot(box_data, widths=0.4, patch_artist=True, showfliers=False,
           boxprops={
    
    'facecolor': 'skyblue', 'linewidth': 0.8, 'edgecolor': 'black'},
           meanline=True, meanprops={
    
    'color': 'red', 'linewidth': 3})

# 设置轴标签
ax.set_ylabel('time(s)', fontsize=18)
ax.set_xticklabels(labels, rotation=45, fontsize=12)

plt.show()

insert image description here

But there is still a problem after drawing, that is, the upper and lower boundaries of some box diagrams are gone. I don’t know why, so this needs to be re-debugged here, and here we need to use python to draw another format of box diagrams. , that is, convert the above data into a dictionary format:

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
    
    
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)
print(draw_data)
# [{'whislo': 6.1, 'q1': 9.15, 'med': 9.84, 'q3': 10.44, 'whishi': 11.16}, {'whislo': 7.0, 'q1': 9.47, 'med': 10.05, 'q3': 10.81, 'whishi': 12.02}, {'whislo': 14.16, 'q1': 18.41, 'med': 20.19, 'q3': 21.08, 'whishi': 25.42}, {'whislo': 6.54, 'q1': 8.65, 'med': 9.1, 'q3': 9.39, 'whishi': 10.08}, {'whislo': 7.31, 'q1': 9.1, 'med': 9.5, 'q3': 10.31, 'whishi': 10.86}, {'whislo': 10.32, 'q1': 14.18, 'med': 15.42, 'q3': 18.08, 'whishi': 20.72}, {'whislo': 6.14, 'q1': 8.1, 'med': 8.44, 'q3': 9.1, 'whishi': 9.82}, {'whislo': 6.22, 'q1': 8.3, 'med': 8.7, 'q3': 9.2, 'whishi': 10.12}, {'whislo': 8.72, 'q1': 10.61, 'med': 12.71, 'q3': 16.11, 'whishi': 17.91}, {'whislo': 7.1, 'q1': 8.75, 'med': 8.84, 'q3': 9.1, 'whishi': 10.96}, {'whislo': 7.3, 'q1': 8.85, 'med': 9.04, 'q3': 9.1, 'whishi': 11.19}, {'whislo': 7.6, 'q1': 8.3, 'med': 8.4, 'q3': 9.0, 'whishi': 12.55}]

After getting the dictionary converted from the list here, at the same time, ax.boxplot()it becomes ax.bxp(), because boxplot is used to draw a single boxplot, and bxp is multiple, and each boxplot can be composed of five statistical values ​​(minimum value, lower quartile) quantile, median, upper quartile, and maximum). So the code is:


import matplotlib.pyplot as plt
import matplotlib




data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
    
    
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)

matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)

ax = plt.subplot()
# ax.boxplot([row1_data, row2_data, row3_data, row4_data, row5_data, row6_data, row7_data, row8_data], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})


ax.bxp(draw_data, widths=0.4, patch_artist=True,showfliers=False,boxprops={
    
    'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={
    
    'color': 'red', 'linewidth': 3})
# boxplot
# ax.bxp(draw_data, showfliers=False)

ax.set_xticklabels(['List 1', '2epochs List 1', '3epochs List 1', 'List 2', '2epochs List 2', '3epochs List 2', 'List 3', '2epochs List 3', '3epochs List 3', 'List 4', '2epochs List 4', '3epochs List 4'], fontsize=14)
plt.show()

insert image description here

Guess you like

Origin blog.csdn.net/submarineas/article/details/130523114