A brief introduction to box plots

1 Introduction to box plots

Box plot, used as a statistical chart showing the spread of data in a set of data.
Advantages: Unaffected by outliers, it can accurately and stably describe the discrete distribution of data
insert image description here

An example: a set of numbers 12, 15, 17, 19, 20, 23, 25, 28, 30, 33, 34, 35, 36, 37 (14 in total)
important parameters:
1. Lower quartile Q1: equal to the position of the 25th percentile number Q1 after all the values ​​in the sample are arranged from small to large
= (14+1)/4=3.75 [0.75 of the 3rd + 0.2 of the 4th 5] Q1
=0.25×third item+0.75×fourth item=0.25×17+0.75×19=18.5;,

2. Median (second quartile) Q2: The 50th percentile of all values ​​in the sample arranged from small to large

The position of Q2=2×(14+1)/4=7.5
Q2=0.5×7th item+0.5×8th item=0.5×25+0.5×28=26.5;

3. Upper quartile Q3: Equal to the 75th percentile of all values ​​in the sample arranged from small to large

The position of Q3=3×(14+1)/4=11.25
Q3=0.75×11th item+0.25×12th item=0.75×34+0.25×35=34.25;

4. Interquartile range (IQR):

IQR=Q3-Q1

5. Upper limit: the maximum value within the non-anomalous range

Upper limit=Q3+1.5IQR

6. Upper limit: the maximum value within the non-anomalous range

Lower limit=Q1-1.5IQR

7. Outliers: The outliers between the inner limit and the outer limit are mild outliers [mild outliers] ||||The outliers outside the outer limit are extreme outliers [extreme outliers]

2 Box plot analysis

  1. Identification of outliers
  2. Judging the skewness and tail weight of the data
    For samples with a standard normal distribution, only a few are outliers. The more outliers, the heavier the tail and the smaller the degree of freedom.
    **Skewness indicates the degree of deviation. If the outliers are concentrated on the side of the smaller value, the distribution is left-skewed; if the outliers are concentrated on the side of the larger value, the distribution is right-skewed.
  3. Compare the shapes of several batches of data according to different box diagrams
    From the box diagram, you can see the [average, median, distribution interval, outlier] of the data

Learning link:

Guess you like

Origin blog.csdn.net/weixin_45913084/article/details/131109460