matplotlib - box plot

box plot

Box plot (English: Box plot), also known as box-and-whisker plot, box plot, box plot, or box-whisker plot, is a statistical graph used to display a set of data dispersion data. It is named for its shape like a box. It is also often used in various fields and can quickly identify outliers. The biggest advantage of the box plot is that it is not affected by outliers, can accurately and stably describe the discrete distribution of data, and is also conducive to data cleaning. In addition to box plots, histograms, scatter plots, etc. can generally be used for outlier detection. The structure of the box diagram is as follows:
insert image description here
Horizontal box diagram
insert image description here
The principle of detecting outliers in the box diagram is that the sample points that are not between -1.5 IQR and 1.5 IQR are considered abnormal points. Using the capping method, any value outside the 5th and 95th percentile range can be considered an outlier, or a data point that is three standard deviations or greater from the mean can also be considered an outlier . Note: Since outliers are only detected for special data points that are influential, their selection also depends on the understanding of data and business.

Draw a boxplot

import matplotlib.pyplot as plt
import numpy as np
import random


np.random.seed(100)
data = np.random.normal(size=1000, loc=0, scale=1)
plt.boxplot(data, sym='o', whis=1.5)
plt.show()

insert image description here

Guess you like

Origin blog.csdn.net/weixin_47166032/article/details/121317307