To understand the confidence interval, it is necessary to think from the most basic and core idea of statistics, which is to use samples to estimate the population.
Confidence level refers to the probability that the overall parameter value falls within a certain area of the sample statistical value;
and the confidence interval refers to the error range between the sample statistical value and the overall parameter value under a certain confidence level.
The larger the confidence interval, the higher the confidence level.
1. The concept of confidence interval
1.1 Confidence interval
Confidence interval is a kind of interval estimation
Let's take a look at what is point estimation and what is interval estimation.
1.1.1 Point estimation and interval estimation
A scratch card used to be very popular: the
rules of the game are (assuming there is only one jackpot):
- The jackpot is fixed in advance and must be printed on a scratch card
- After buying the scratch card, scratch it to know if you have won the prize
Then we have at least two strategies to scratch the prize:
- Point estimate: Buy one, which is equivalent to guessing that this one will win the prize
- Interval estimation: Buy a box, which is equivalent to guessing that there will be a winning prize in this box
Obviously, the hit rate of the interval estimate will be higher (of course the cost will be higher because the risk is reduced).
%%%
When we do a certain experiment, there is no way to completely eliminate the error. In this case, we will give the result an acceptable error range. It is called a confidence interval in statistics. The confidence interval is a random variable. It is based on the sample The sample determines that there will be a confidence interval for each sample drawn.
Confidence interval refers to the estimated interval of overall parameters constructed by sample statistics.
The confidence interval is also called the estimation interval, which is used to estimate the value range of the parameter. The common 52%-64%, or 8-12, is the confidence interval (estimated interval).
The confidence interval shows the degree of confidence that the true value of this parameter falls around the measured value (predicted value).
Next, let's look at how the confidence interval is estimated.
1.1.2 Confidence interval
We use the estimation of human height to explain what is the confidence interval.
1.1.2.1 God's perspective
There is no way for us to know the true average height of human beings, because it is almost impossible to count everyone.
But this data must be real, and we can say that God knows it.
Here we introduce the perspective of God, that is, the true distribution of human height as seen by God.
1.1.2.2 Point estimation
As stupid humans, we can only sample statistics from the crowd:
interval estimation can improve this problem.
1.1.2.3 Confidence interval
Confidence interval provides a method of interval estimation.
However, compared with the point estimate:
- Point estimation and interval estimation, do not know which point or which interval is better
- However, according to 95% 95\%9 The interval constructed by the 5 % confidence interval, if I construct 100 such intervals, about 95 of them will containμ \muμ
It's like using a fishing net to catch fish. I know that if I go down a hundred times, I will get the fish I want about 95 times, but I don't know if it is the current net: the
remaining problem is 95% 95\ %9 How is the 5 % confidence interval constructed?
1.1.2.4 95 % 95\% 9 5 % confidence interval
1.1.2.5 Summary
in conclusion:
- The confidence interval requires the estimator to be a constant
- 95 % 95\% 9 5 % is also called the confidence level, which is a habit in statistics and can be adjusted according to the application
1.2 Confidence level
The probability that the confidence interval contains the population mean
Confidence
eg: 95% confidence level means: 100 samples are drawn, there are 100 confidence intervals, among which 95 confidence intervals may contain the true average of the population.
1.2.1 Is the higher the confidence level the better
This question depends on what you need to count? What is the economic benefit? Under normal circumstances, 95% is used as the commonly used confidence level. The principle lies in 3 sigma control (6 sigma is even used in some strict fields). At this time, there is already a high confidence level. As the confidence level increases, the span of the confidence interval becomes larger, and the accuracy of parameter estimation must decrease. The point estimate is only one value, with high accuracy, but low confidence. The choice between accuracy and confidence is entirely up to the analyst's choice.
1.3 Steps to calculate the confidence interval
Step 1: Find the mean of a sample
Step 2: Calculate the sampling error.
After practice, people usually think that the survey:
the sampling error of 100 samples is ±10%
, the sampling error of 500 samples is ±5%
, the sampling error of 1,200 samples is ±3%
The third step: Use the "sample mean" obtained in the first step to add and subtract the "sampling error" calculated in the second step to obtain the two endpoints of the confidence interval.
1.4 Examples
The Gallup Company of the United States conducted a survey of 3,500 consumers (about 1,200 in each country) in the United States, Germany, and Japan on consumers’ perceptions of the quality of American products. The results of the survey: 55% Americans think American products are of good quality, while only 26% of Germans and 17% of Japanese hold the same view. The sampling error is ±3%, and the confidence level is 95%. Then the confidence intervals of consumers in these three countries are:
Confidence interval of sampling error for country sample mean
US 55%±3% 52%-58%
Germany 26%±3% 23%-29%
Japan 17%±3% 14%-20%
2. About the width of the confidence interval
A narrow confidence interval can provide more information about the overall parameters than a wide confidence interval.
Assuming that the average score of the exam for the whole class is 65, then
Confidence interval | The meaning of the interval width |
---|---|
0-100 points | 100, wide, it means nothing to tell you |
30-80 points | 50, narrower, you can estimate the approximate average score (55 points) |
60-70 points | 10. Narrow, you can almost determine the average score of the whole class (65 points) |
3. The influence of sample size on the confidence interval
Impact: When the confidence level is fixed, the larger the sample size, the narrower the confidence interval.
The following is the change table of the relationship between the sample size and the confidence interval calculated in practice (assuming the confidence level is the same):
Sample size | Confidence interval | Interval width |
---|---|---|
100 | 50%—70% | 20, wide |
800 | 56.2%-63.2% | 7, narrower |
1,600 | 57.5%—63% | 5.5, narrower |
3,200 | 58.5%—62% | 3.5, narrower |
From the above table:
- 1. In the case of the same confidence level, the larger the sample size, the narrower the confidence interval.
- 2. The narrowing of the confidence interval is not as fast as the increase in the sample size, which means that the sample size is not doubled, and the confidence interval is also narrowed by one time. It is doubled), so when the sample size reaches a certain amount (usually 1,200, as in the above example, the three countries each sampled 1,200 consumers), no more samples will be added.
Verify the relationship between the confidence interval and the sample size through the calculation formula of the confidence interval
- Confidence interval = inferred value of the sample ± (reliability coefficient ×)
It can be seen from the above formula that
when other factors remain unchanged, the larger the sample size (larger), the narrower (smaller) the confidence interval.
Fourth, the impact of the confidence level on the confidence interval
Impact: In the case of the same sample size, the higher the confidence level, the wider the confidence interval.
For example: The United States has conducted a survey of the president’s job satisfaction. Among the 1,200 people sampled in the survey, 60% praised the work of the president. The sampling error was ±3% and the confidence level was 95%; if the sampling error was reduced to ±2.3%, the confidence level was reduced to 90%. The comparison of the two sets of numbers is as follows:
Sampling error | Confidence level | Confidence interval | Interval width |
---|---|---|---|
±3% | 95% | 60%±3%=57%-63% | 6 wide |
±2.3% | 90% | 60%±2.3%=57.7%-62.3% | 4.6 Narrow |
From the above table:
In the case of the same sample size (both are 1,200 people), the higher the confidence level (95%), the wider the confidence interval.
5. The impact of sample size on confidence level
Impact: When the confidence interval remains unchanged, the larger the sample size, the higher the confidence level.
for example:
Confidence interval | Sample size | Confidence level |
---|---|---|
52%-58% | 1,200 | 95% |
(The example of Gallup, USA)
https://www.zhihu.com/question/26419030/answer/103956460
https://zhuanlan.zhihu.com/p/38755140
https://zhuanlan.zhihu.com/p/110612323