Understanding the confidence interval

Author: you know to play the data
link: https://www.zhihu.com/question/26419030/answer/129658977
Source: know almost
copyrighted by the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

To understand the confidence interval, we must first understand the relationship between the population and the sample. Statistics is essentially a science that studies the relationship between samples and populations. In order to illustrate the concepts of white samples and populations, here is an example. Assuming that a pot of soup is the whole, in order to know the taste of the soup, we use a spoon to hit a small spoon, and this small spoon is the sample. Whether a small spoonful can correctly reflect the taste of the whole pot of soup often depends on whether the soup is evenly stirred, and whether it is randomly sampled in statistics. After understanding the population and sample, let's talk about confidence intervals. Here is another example to help understand. Suppose we want to know the average height of a certain middle school boy. There are two methods of violence: find all the boys in the middle school, record their heights, and find the average. Although this method is accurate and has a huge single cost, it is practically impossible to operate. Statistical method: Randomly select 100 boys as a sample, and estimate the average height (true value) of the middle school boys from the average height (estimated value) of these 100 boys. When using statistical methods, it is easiest to think of taking the average height of these 100 boys as the average height of all boys in the middle school. However, it is easy to make mistakes to use a fixed value as the inference result. Moreover, the average height obtained by different samples will definitely be different. At this time, statisticians thought of a cunning way, which is to use a numerical interval to represent the inference result. The probability of an interval containing the true value is of course greatly increased. Here this interval is the confidence interval. But because of different sampling, the confidence interval we obtain will also be different. Suppose we sample 100 times (100 boys each time), then we can get 100 different confidence intervals. The 95% confidence interval means that among the 100 confidence intervals, more than 95 intervals contain the true value of the average height of the middle school boy. Finally, here to explain the easy misunderstanding: 95% confidence interval means that the true value has a 95% probability of falling within the current confidence interval. This statement is inaccurate. The true value is either in the interval or not in the interval. The 95% confidence interval indicates the percentage of the interval containing the true value among the multiple confidence intervals obtained by multiple sampling. As shown in the figure below, the vertical dashed line represents the true value, and the horizontal solid line represents the confidence interval one by one. Among the 25 confidence intervals, only one (red line) does not contain the true value, and more than 95% of the intervals contain the true value. value.

Guess you like

Origin blog.csdn.net/JGL121314/article/details/113766455