Box plot
Box plot is based on a five-digit display data distribution standardization method
as the picture shows:
- Median: Median
- Q1: The first quartile (25% quantile)
- Q3: The third quartile (75% quantile) defines the distance between Q1 and Q3 as the interquartile range (IQR)
- Minimum:Q1-1.5*IQR
- Maximum : Q3 + 1.5 * IQR
- Outliers: Data beyond Minimum or Maximum, namely outliers
If the data is normally distributed, the corresponding probability distribution can be seen in the figure below, that is, outliers only account for 0.7%
use
- Box plots are for continuous variables
- Sometimes if there are many outliers in the data, you may need to consider some transformations (such as taking the logarithm)
- The effective way to use box plots is to compare and draw grouped box plots with one or more qualitative data.
Use matplotlib.pyplot to draw box plots
plt.boxplot([range(20), range(15)], labels=['a', 'b'])
plt.show()
Reference article:
https://blog.csdn.net/dujiahei/article/details/82056283
https://zhuanlan.zhihu.com/p/110580568?from_voters_page=true