introduction
In this article, there was a need to use python to draw a box diagram, but there are many requirements, so I keep reprinting it. Today, I suddenly remembered that this thing can be summarized. I just have to think about what to do next, and I have enough I have free time, so I am going to review some basic concepts again.
Box plot principle
Regarding the principle, here are two better-written articles on this site:
Matplotlib - box plot, box plot boxplot () all usage details
Python box plot drawing and extraction of eigenvalues
I am also using these two articles as a reference here. The introduction of the box diagram is the same as the schematic diagram in the second article:
And if python is used to draw a box plot, the specific source code field is:
# Autogenerated by boilerplate.py. Do not edit as changes will be lost.
@_copy_docstring_and_deprecators(Axes.boxplot)
def boxplot(
x, notch=None, sym=None, vert=None, whis=None,
positions=None, widths=None, patch_artist=None,
bootstrap=None, usermedians=None, conf_intervals=None,
meanline=None, showmeans=None, showcaps=None, showbox=None,
showfliers=None, boxprops=None, labels=None, flierprops=None,
medianprops=None, meanprops=None, capprops=None,
whiskerprops=None, manage_ticks=True, autorange=False,
zorder=None, capwidths=None, *, data=None):
return gca().boxplot(
x, notch=notch, sym=sym, vert=vert, whis=whis,
positions=positions, widths=widths, patch_artist=patch_artist,
bootstrap=bootstrap, usermedians=usermedians,
conf_intervals=conf_intervals, meanline=meanline,
showmeans=showmeans, showcaps=showcaps, showbox=showbox,
showfliers=showfliers, boxprops=boxprops, labels=labels,
flierprops=flierprops, medianprops=medianprops,
meanprops=meanprops, capprops=capprops,
whiskerprops=whiskerprops, manage_ticks=manage_ticks,
autorange=autorange, zorder=zorder, capwidths=capwidths,
**({
"data": data} if data is not None else {
}))
(Referenced from: https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/pyplot.py#L2473-L2494)
According to the explanations in the above two articles, some introductions have been changed as follows:
parameter | illustrate | parameter | illustrate |
---|---|---|---|
x | Specify the data to draw the boxplot, which can be a set of data or multiple sets of data; | showcaps | Whether to display the two lines at the top and end of the boxplot, which are displayed by default; |
notch | Whether to display the boxplot in the form of a notch, the default is not a notch, that is, a rectangle | showbox | Whether to display the box of the box plot, which is displayed by default; |
sym | Specifies the shape of the abnormal point, which is displayed by a blue + sign by default; | showfliers | Whether to display abnormal values, the default display; |
vert | Whether to place the boxplot vertically, the default is vertical, False is horizontal; | boxprops | Set the properties of the box, such as border color, fill color, etc.; |
whis | Specify the distance between the upper and lower whiskers and the upper and lower quartiles, and the default is 1.5 times the quartile difference; | labels | Add labels, legends to boxplots |
positions | Specify the position of the boxplot, the default is range(1, N+1), N is the number of boxplots; | filerprops | Set the properties of outliers, such as the shape, size, fill color, etc. of outliers; |
widths | Specify the width of the boxplot, the default is 0.5; | medianprops | Set the properties of the median, such as line type, thickness, etc.; |
patch_artist | Whether to fill the color of the box, the default is False; | meanprops | Set the properties of the mean, such as point size, color, etc.; |
meanline | Whether to represent the mean in the form of a line, and the default is to represent it in points; | capprops | Set the properties of the top and end lines of the boxplot, such as color, thickness, etc.; |
showmeans | Whether to display the mean value or not by default; | whiskerprops | Set whisker properties, such as color, thickness, line type, etc.; |
manage_ticks | Whether to adapt to the position of the label, the default is True; | autorange | Whether to automatically adjust the range, the default is False; |
Then go directly to the actual combat stage.
Box plot drawing
Here is a simplified version directly, because my points are extracted from the people in the drone video stream, so I omit the previous details, and directly give a simplified version, first of all, extract the average pedestrian trajectory:
def throw_time(array,start_x,end_x,y):
indexs = []
index = 1
person_throw_time = []
for i in range(max(array[:,1])):
if i == 0:
continue
each_person_data = array[array[:,1] == i]
each_person_data = each_person_data[each_person_data[:,2]>start_x]
each_person_data = each_person_data[each_person_data[:,2]<end_x]
each_person_data = each_person_data[each_person_data[:,3]>y]
if each_person_data.shape[0] < 4:
continue
each_person_data[:,2] = each_person_data[:,2] + (each_person_data[:,4] / 2)
each_person_data[:,3] = each_person_data[:,3] + (each_person_data[:,5] / 2)
person_time = (each_person_data[-1,0] - each_person_data[0,0])*0.04
print("person time = ",person_time)
if person_time < 5:
continue
person_throw_time.append(person_time)
indexs.append(index)
index = index + 1
return indexs,person_throw_time
indexs1,person_throw_time1 = throw_time(array1,500,1400,400)
# print(person_throw_time1)
# [10.36, 9.76, 9.48, 9.56, 6.16, 8.36, 8.6, 8.76, 5.6000000000000005, 9.84, 8.0, 9.88, 8.36, 9.16, 8.0, 8.92, 8.32, 9.68, 7.6000000000000005, 8.24, 7.08, 8.8, 8.6, 9.88, 9.64, 9.36, 10.16, 9.56, 7.4, 9.32, 8.48, 9.88, 9.16, 9.48, 9.64, 8.76]
indexs2,person_throw_time2 = throw_time(array2,500,1400,400)
indexs3,person_throw_time3 = throw_time(array3,450,1300,400)
indexs4,person_throw_time4 = throw_time(array4,600,1400,400)
Then you will get a series of scattered points and their index coordinates, and then draw a picture according to this:
matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)
# # 绘图
ax = plt.subplot()
ax.boxplot([person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4], widths=0.4, patch_artist=True,showfliers=False,boxprops={
'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={
'color': 'red', 'linewidth': 3})
# 设置轴坐标值刻度的标签
ax.set_xticklabels(['List 1', 'List 2', 'List 3', 'List 4'], fontsize=14)
plt.show()
The code I've chosen here creates a boxplot with four boxes, each containing data from one of the [person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4] lists. The boxes are filled with azure blue and a black border is drawn around them, a red line is drawn at the mean of each box, and no outliers are shown.
Maybe most people do this to meet their needs. I thought it was right at first, because the above is based on some deviation colors of the first version and the improvement of the second version after the error of the legend, but in the end I made it to the sixth version. , and changed the drawing logic again.
Draw a box plot according to the principle
Is there a situation where the demand gives another set of data that I don’t know where it came from, and I hope I can generate a comparison chart, and its data directly gives the 5 points of the box chart without doing too much I didn't have the slightest bit of defense to cover up, just appeared like this, and directly threw me an Excel sheet, me. . . Then I sorted out the data, converted the above my [person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4] into a dataframe and used describe to find its corresponding 5 equal points. Because the real data involves some security issues, it is replaced by simple numbers. Right now:
import pandas as pd
# 假设这是您的四个列表
person_throw_time1 = [1, 2, 3, 4, 5]
person_throw_time2 = [6, 7, 8, 9, 10]
person_throw_time3 = [11, 12, 13, 14, 15]
person_throw_time4 = [16, 17, 18, 19, 20]
# 将四个列表合并成一个dataframe
data = pd.DataFrame({
'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4})
# 使用describe方法计算统计信息
statistics = data.describe()
print(statistics)
Then the corresponding data can be obtained:
data1 data2 data3 data4
count 5.000000 5.000000 5.000000 5.000000
mean 3.000000 8.000000 13.000000 18.000000
std 1.581139 1.581139 1.581139 1.581139
min 1.000000 6.000000 11.000000 16.000000
25% 2.000000 7.000000 12.000000 17.000000
50% 3.000000 8.000000 13.000000 18.000000
75% 4.000000 9.000000 14.000000 19.000000
max 5.000000 10.000000 15.000000 20.000000
I rearranged it here, and put the three sets of experimental results together (PS: some modifications were made, so the non-standard quintiles):
[
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
But the impact is not big, here is a redrawing of the above data as follows:
import matplotlib.pyplot as plt
import matplotlib
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
# 提取数据和标签
labels = [row[0] for row in data]
box_data = [row[1:] for row in data]
# 设置字体
matplotlib.rc("font", family='Times New Roman')
# 绘制箱型图
fig, ax = plt.subplots()
ax.boxplot(box_data, widths=0.4, patch_artist=True, showfliers=False,
boxprops={
'facecolor': 'skyblue', 'linewidth': 0.8, 'edgecolor': 'black'},
meanline=True, meanprops={
'color': 'red', 'linewidth': 3})
# 设置轴标签
ax.set_ylabel('time(s)', fontsize=18)
ax.set_xticklabels(labels, rotation=45, fontsize=12)
plt.show()
But there is still a problem after drawing, that is, the upper and lower boundaries of some box diagrams are gone. I don’t know why, so this needs to be re-debugged here, and here we need to use python to draw another format of box diagrams. , that is, convert the above data into a dictionary format:
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
def convert_to_dict(data):
draw_data = []
for row in data:
draw_data.append({
"whislo": row[1],
"q1": row[2],
"med": row[3],
"q3": row[4],
"whishi": row[5]
})
return draw_data
draw_data = convert_to_dict(data)
print(draw_data)
# [{'whislo': 6.1, 'q1': 9.15, 'med': 9.84, 'q3': 10.44, 'whishi': 11.16}, {'whislo': 7.0, 'q1': 9.47, 'med': 10.05, 'q3': 10.81, 'whishi': 12.02}, {'whislo': 14.16, 'q1': 18.41, 'med': 20.19, 'q3': 21.08, 'whishi': 25.42}, {'whislo': 6.54, 'q1': 8.65, 'med': 9.1, 'q3': 9.39, 'whishi': 10.08}, {'whislo': 7.31, 'q1': 9.1, 'med': 9.5, 'q3': 10.31, 'whishi': 10.86}, {'whislo': 10.32, 'q1': 14.18, 'med': 15.42, 'q3': 18.08, 'whishi': 20.72}, {'whislo': 6.14, 'q1': 8.1, 'med': 8.44, 'q3': 9.1, 'whishi': 9.82}, {'whislo': 6.22, 'q1': 8.3, 'med': 8.7, 'q3': 9.2, 'whishi': 10.12}, {'whislo': 8.72, 'q1': 10.61, 'med': 12.71, 'q3': 16.11, 'whishi': 17.91}, {'whislo': 7.1, 'q1': 8.75, 'med': 8.84, 'q3': 9.1, 'whishi': 10.96}, {'whislo': 7.3, 'q1': 8.85, 'med': 9.04, 'q3': 9.1, 'whishi': 11.19}, {'whislo': 7.6, 'q1': 8.3, 'med': 8.4, 'q3': 9.0, 'whishi': 12.55}]
After getting the dictionary converted from the list here, at the same time, ax.boxplot()
it becomes ax.bxp()
, because boxplot is used to draw a single boxplot, and bxp is multiple, and each boxplot can be composed of five statistical values (minimum value, lower quartile) quantile, median, upper quartile, and maximum). So the code is:
import matplotlib.pyplot as plt
import matplotlib
data = [
["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]
def convert_to_dict(data):
draw_data = []
for row in data:
draw_data.append({
"whislo": row[1],
"q1": row[2],
"med": row[3],
"q3": row[4],
"whishi": row[5]
})
return draw_data
draw_data = convert_to_dict(data)
matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)
ax = plt.subplot()
# ax.boxplot([row1_data, row2_data, row3_data, row4_data, row5_data, row6_data, row7_data, row8_data], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
ax.bxp(draw_data, widths=0.4, patch_artist=True,showfliers=False,boxprops={
'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={
'color': 'red', 'linewidth': 3})
# boxplot
# ax.bxp(draw_data, showfliers=False)
ax.set_xticklabels(['List 1', '2epochs List 1', '3epochs List 1', 'List 2', '2epochs List 2', '3epochs List 2', 'List 3', '2epochs List 3', '3epochs List 3', 'List 4', '2epochs List 4', '3epochs List 4'], fontsize=14)
plt.show()