1 Introduction to box plots
Box plot, used as a statistical chart showing the spread of data in a set of data.
Advantages: Unaffected by outliers, it can accurately and stably describe the discrete distribution of data
An example: a set of numbers 12, 15, 17, 19, 20, 23, 25, 28, 30, 33, 34, 35, 36, 37 (14 in total)
important parameters:
1. Lower quartile Q1: equal to the position of the 25th percentile number Q1 after all the values in the sample are arranged from small to large
= (14+1)/4=3.75 [0.75 of the 3rd + 0.2 of the 4th 5] Q1
=0.25×third item+0.75×fourth item=0.25×17+0.75×19=18.5;,
2. Median (second quartile) Q2: The 50th percentile of all values in the sample arranged from small to large
The position of Q2=2×(14+1)/4=7.5
Q2=0.5×7th item+0.5×8th item=0.5×25+0.5×28=26.5;
3. Upper quartile Q3: Equal to the 75th percentile of all values in the sample arranged from small to large
The position of Q3=3×(14+1)/4=11.25
Q3=0.75×11th item+0.25×12th item=0.75×34+0.25×35=34.25;
4. Interquartile range (IQR):
IQR=Q3-Q1
5. Upper limit: the maximum value within the non-anomalous range
Upper limit=Q3+1.5IQR
6. Upper limit: the maximum value within the non-anomalous range
Lower limit=Q1-1.5IQR
7. Outliers: The outliers between the inner limit and the outer limit are mild outliers [mild outliers] ||||The outliers outside the outer limit are extreme outliers [extreme outliers]
2 Box plot analysis
- Identification of outliers
- Judging the skewness and tail weight of the data
For samples with a standard normal distribution, only a few are outliers. The more outliers, the heavier the tail and the smaller the degree of freedom.
**Skewness indicates the degree of deviation. If the outliers are concentrated on the side of the smaller value, the distribution is left-skewed; if the outliers are concentrated on the side of the larger value, the distribution is right-skewed. - Compare the shapes of several batches of data according to different box diagrams
From the box diagram, you can see the [average, median, distribution interval, outlier] of the data
Learning link: