Why can box plots detect outliers and what is the principle?

 

to sum up:

The outlier is estimated using 1.5IQR, IQR=Q3-Q1, which is a value between 25%-75%,

In theory, Q4-Q3 should have only 0.5 IQR, and more than 1.5 IQR indicates that the data is abnormal.

 

The upper limit is the maximum value in the non-anomalous range. The lower limit is the minimum value in the non-abnormal range.

The first thing to know is how to calculate the interquartile range?

Interquartile range IQR=Q3-Q1,

Then the upper limit=Q3+1.5IQR, the lower limit=Q1-1.5IQR

 

Specific numerical calculation:

https://baijiahao.baidu.com/s?id=1591167651227320027&wfr=spider&for=pc

 

On both sides are the max value and min value, the abnormal value line is not displayed.

The following are specific examples of box plots:

Figure 2. ExampleFigure 2. Example

This set of data shows:

  • Minimum ( minimum )=5

  • Lower quartile ( Q1 )=7

  • Median ( Med - Q2)=8.5

  • Upper quartile ( Q3 )=9

  • Maximum ( maximum )=10

  • Mean=8

  • Interquartile range=(\displaystyle Q3-Q1)=2 (i.e. ΔQ)

In the interval Q3+1.5ΔQ, the value outside Q1-1.5ΔQ is regarded as farout.

  • farout: Not displayed on the map, only a symbol ∇.

  • Maximum value range: Q3+1.5ΔQ

  • Minimum interval: Q1-1.5ΔQ

The maximum and minimum values ​​are generated in this interval. Values ​​outside the interval are considered outlier and displayed on the graph.

  • mild outlier = 3.5

  • extreme outlier = 0.5

https://baike.baidu.com/item/%E7%AE%B1%E5%BD%A2%E5%9B%BE/10671164?fr=aladdin

 

Guess you like

Origin blog.csdn.net/kevin1993best/article/details/107565560