Confidence interval

To understand the confidence interval, it is necessary to think from the most basic and core idea of ​​statistics, which is to use samples to estimate the population.

Confidence level refers to the probability that the overall parameter value falls within a certain area of ​​the sample statistical value;
and the confidence interval refers to the error range between the sample statistical value and the overall parameter value under a certain confidence level.
The larger the confidence interval, the higher the confidence level.

1. The concept of confidence interval

1.1 Confidence interval

Confidence interval is a kind of interval estimation

Let's take a look at what is point estimation and what is interval estimation.

1.1.1 Point estimation and interval estimation

A scratch card used to be very popular: the
Insert picture description here
rules of the game are (assuming there is only one jackpot):

  • The jackpot is fixed in advance and must be printed on a scratch card
  • After buying the scratch card, scratch it to know if you have won the prize

Then we have at least two strategies to scratch the prize:

  • Point estimate: Buy one, which is equivalent to guessing that this one will win the prize
  • Interval estimation: Buy a box, which is equivalent to guessing that there will be a winning prize in this box

Obviously, the hit rate of the interval estimate will be higher (of course the cost will be higher because the risk is reduced).

%%%
When we do a certain experiment, there is no way to completely eliminate the error. In this case, we will give the result an acceptable error range. It is called a confidence interval in statistics. The confidence interval is a random variable. It is based on the sample The sample determines that there will be a confidence interval for each sample drawn.

Confidence interval refers to the estimated interval of overall parameters constructed by sample statistics.

The confidence interval is also called the estimation interval, which is used to estimate the value range of the parameter. The common 52%-64%, or 8-12, is the confidence interval (estimated interval).

The confidence interval shows the degree of confidence that the true value of this parameter falls around the measured value (predicted value).

Next, let's look at how the confidence interval is estimated.

1.1.2 Confidence interval

We use the estimation of human height to explain what is the confidence interval.

1.1.2.1 God's perspective

There is no way for us to know the true average height of human beings, because it is almost impossible to count everyone.

But this data must be real, and we can say that God knows it.

Here we introduce the perspective of God, that is, the true distribution of human height as seen by God.

Insert picture description here

1.1.2.2 Point estimation

As stupid humans, we can only sample statistics from the crowd:
Insert picture description here
Insert picture description here
interval estimation can improve this problem.

1.1.2.3 Confidence interval

Confidence interval provides a method of interval estimation.

Insert picture description here
However, compared with the point estimate:

  • Point estimation and interval estimation, do not know which point or which interval is better
  • However, according to 95% 95\%9 The interval constructed by the 5 % confidence interval, if I construct 100 such intervals, about 95 of them will containμ \muμ

It's like using a fishing net to catch fish. I know that if I go down a hundred times, I will get the fish I want about 95 times, but I don't know if it is the current net: the
Insert picture description here
remaining problem is 95% 95\ %9 How is the 5 % confidence interval constructed?

1.1.2.4 95 % 95\% 9 5 % confidence interval

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

1.1.2.5 Summary

in conclusion:

  • The confidence interval requires the estimator to be a constant
  • 95 % 95\% 9 5 % is also called the confidence level, which is a habit in statistics and can be adjusted according to the application

1.2 Confidence level

The probability that the confidence interval contains the population mean

Confidence

eg: 95% confidence level means: 100 samples are drawn, there are 100 confidence intervals, among which 95 confidence intervals may contain the true average of the population.
Insert picture description here

1.2.1 Is the higher the confidence level the better

This question depends on what you need to count? What is the economic benefit? Under normal circumstances, 95% is used as the commonly used confidence level. The principle lies in 3 sigma control (6 sigma is even used in some strict fields). At this time, there is already a high confidence level. As the confidence level increases, the span of the confidence interval becomes larger, and the accuracy of parameter estimation must decrease. The point estimate is only one value, with high accuracy, but low confidence. The choice between accuracy and confidence is entirely up to the analyst's choice.

1.3 Steps to calculate the confidence interval

Step 1: Find the mean of a sample

Step 2: Calculate the sampling error.

After practice, people usually think that the survey:
the sampling error of 100 samples is ±10%
, the sampling error of 500 samples is ±5%
, the sampling error of 1,200 samples is ±3%

The third step: Use the "sample mean" obtained in the first step to add and subtract the "sampling error" calculated in the second step to obtain the two endpoints of the confidence interval.

1.4 Examples

The Gallup Company of the United States conducted a survey of 3,500 consumers (about 1,200 in each country) in the United States, Germany, and Japan on consumers’ perceptions of the quality of American products. The results of the survey: 55% Americans think American products are of good quality, while only 26% of Germans and 17% of Japanese hold the same view. The sampling error is ±3%, and the confidence level is 95%. Then the confidence intervals of consumers in these three countries are:

Confidence interval of sampling error for country sample mean

US 55%±3% 52%-58%
Germany 26%±3% 23%-29%
Japan 17%±3% 14%-20%

2. About the width of the confidence interval

A narrow confidence interval can provide more information about the overall parameters than a wide confidence interval.

Assuming that the average score of the exam for the whole class is 65, then

Confidence interval The meaning of the interval width
0-100 points 100, wide, it means nothing to tell you
30-80 points 50, narrower, you can estimate the approximate average score (55 points)
60-70 points 10. Narrow, you can almost determine the average score of the whole class (65 points)

3. The influence of sample size on the confidence interval

Impact: When the confidence level is fixed, the larger the sample size, the narrower the confidence interval.
The following is the change table of the relationship between the sample size and the confidence interval calculated in practice (assuming the confidence level is the same):

Sample size Confidence interval Interval width
100 50%—70% 20, wide
800 56.2%-63.2% 7, narrower
1,600 57.5%—63% 5.5, narrower
3,200 58.5%—62% 3.5, narrower

From the above table:

  • 1. In the case of the same confidence level, the larger the sample size, the narrower the confidence interval.
  • 2. The narrowing of the confidence interval is not as fast as the increase in the sample size, which means that the sample size is not doubled, and the confidence interval is also narrowed by one time. It is doubled), so when the sample size reaches a certain amount (usually 1,200, as in the above example, the three countries each sampled 1,200 consumers), no more samples will be added.

Verify the relationship between the confidence interval and the sample size through the calculation formula of the confidence interval

  • Confidence interval = inferred value of the sample ± (reliability coefficient ×)

It can be seen from the above formula that
when other factors remain unchanged, the larger the sample size (larger), the narrower (smaller) the confidence interval.

Fourth, the impact of the confidence level on the confidence interval

Impact: In the case of the same sample size, the higher the confidence level, the wider the confidence interval.

For example: The United States has conducted a survey of the president’s job satisfaction. Among the 1,200 people sampled in the survey, 60% praised the work of the president. The sampling error was ±3% and the confidence level was 95%; if the sampling error was reduced to ±2.3%, the confidence level was reduced to 90%. The comparison of the two sets of numbers is as follows:

Sampling error Confidence level Confidence interval Interval width
±3% 95% 60%±3%=57%-63% 6 wide
±2.3% 90% 60%±2.3%=57.7%-62.3% 4.6 Narrow

From the above table:
In the case of the same sample size (both are 1,200 people), the higher the confidence level (95%), the wider the confidence interval.

5. The impact of sample size on confidence level

Impact: When the confidence interval remains unchanged, the larger the sample size, the higher the confidence level.
for example:

Confidence interval Sample size Confidence level
52%-58% 1,200 95%

(The example of Gallup, USA)

https://www.zhihu.com/question/26419030/answer/103956460
https://zhuanlan.zhihu.com/p/38755140
https://zhuanlan.zhihu.com/p/110612323

Guess you like

Origin blog.csdn.net/Anne033/article/details/109739681