Some strange knowledge points

1. Confidence interval

Reprinted in Zhihu Gaozan's answer https://www.zhihu.com/question/26419030?sort=created , this answer is basically the same as the introduction in the textbook, and it is more easy to understand.

First of all, the important thing is said three times:

Confidence intervals are for random variables!
Confidence intervals are for random variables!
Confidence intervals are for random variables!

Most likely misinterpretation of confidence intervals: A 95% confidence interval has a 95% chance of including the true parameter.

To understand the confidence interval, there are several basic statistical concepts that need to be clarified. It is nonsense to understand the confidence interval without these concepts. Confidence interval Whose confidence interval? Is this problem clear? The confidence interval is the confidence interval of the parameter, and what is the parameter?

The parameter is a parameter of the population, how is the confidence interval calculated? It is calculated through the sample (sample), and what is the connection between the sample and the population?

1) Overall, that is, all data. It can be assumed that the population obeys a certain distribution, such as a normal distribution. A normal distribution is uniquely determined by two parameters, the mean and the variance, both of which are fixed values ​​rather than changing.

2) (random) sample, the sample is the data obtained from the population, for example, from a normal distribution, we can get 0.54, this 0.54 is a sample. A very important point: a sample does not necessarily have only one value, we can get a sample (0.1,-5,12), this sample has 3 values, 3 is the size of this sample.

3) Parameter estimation. In practice, the overall distribution is often unknown, but we can make assumptions, such as assuming that the weight of a person is a normal distribution. After making this assumption, the next question is what is the normal distribution parameter? That is, how to calculate the mean and variance. The solution to this problem is parameter estimation. There are many methods in statistics, so I won’t expand on it. But the parameter estimation is estimated from the sample, which is the key point: sample -> overall parameter.

4) Are the parameters estimated by different samples the same? There is no reason to be the same, so the question arises, what should I do if the overall estimates of different samples are different? Interval estimation, that is, given an interval, let the overall parameters be included in it. But must the overall parameters be included? Obviously not necessarily, it depends on the sample, if some samples happen to be selected, the estimated parameters may be far from the overall.

The last point is also the most important point. Many people who claim to be engaged in statistics also misunderstand, that is, how to explain the confidence interval?

5) For example, given a set of parameters, the confidence interval [a,b] of the overall mean is calculated. Does it mean that there is a 95% probability that the overall mean is within this interval? This understanding is the result of logical confusion, and I have not understood the basic issues of what is a constant and what is a random variable.

First of all, the overall parameter is a constant, but you don't know it, it is an unknown constant, and if you don't know it, it doesn't mean random. It's two concepts. Then, once the interval is estimated, the interval is also definite, the parameters are definite, there is no random problem, then it should be clear to everyone that the answer begins with the biggest misunderstanding of the confidence interval "95% confidence interval has 95% probability of including Where is the problem with real parameters".

So what is the correct explanation? There can be many kinds, here is an explanation: 95% confidence interval means that if you use the same steps to select samples and calculate the confidence interval, then there is a 95% probability that you will calculate it for 100 independent processes Intervals may include true parameter values.

The figure below is an example, sampling 100 times, and calculating the confidence interval of the overall parameter 100 times, most of the time the confidence interval covers the true value, but there are also cases where it does not.
insert image description here

2. Total probability formula and Bayesian formula

1. Total probability formula

Total probability formula:
P ( A ) = P ( AB 1 ) + P ( AB 2 ) + . . . + P ( AB n ) P(A)=P(AB_{1})+P(AB_{2}) +...+P(AB_{n})P(A)=P(AB1)+P(AB2)+...+P(ABn)
P ( A ) = P ( B 1 ) P ( A ∣ B 1 ) + P ( B 2 ) P ( A ∣ B 2 ) + . . . + P ( B n ) P ( A ∣ B n ) P(A)=P(B_{1})P(A|B_{1})+P(B_{2})P(A|B_{2})+...+P(B_{n})P(A|B_{n}) P(A)=P(B1)P(AB1)+P(B2)P(AB2)+...+P(Bn)P(ABn)

The significance of the total probability formula:
event AAThe occurrence of A has various possible reasonsB i ( i = 1 , 2 , . . . , n ) B_{i}(i=1,2,...,n)Bi(i=1,2,...,n ) , ifAAA CauseB i B_{i}Bicause, then AAThe probability of A happening isP ( AB i ) = P ( B i ) P ( A ∣ B i ) P(AB_{i})=P(B_{i})P(A|B_{i})P(ABi)=P(Bi)P(ABi) , each of which could lead toAAA occurs, thenAAThe probability of A happening is that all causeAAThe sum of the probabilities of the causes of A 's occurrence is the total probability formula.

2. Bayesian formula

贝牙式:
P ( B i ∣ A ) = P ( B i ) P ( A ∣ B i ) ∑ i = 1 n P ( B i ) P ( A ∣ B i ) , i = 1 , 2 , . . . , n P(B_{i}|A)=\frac{P(B_{i})P(A|B_{i})}{\sum_{i=1}^{n}P(B_{ i})P(A|B_{i})}, i=1,2,...,nP(BiA)=i=1nP(Bi)P(ABi)P(Bi)P(ABi),i=1,2,...,n

Significance of Bayesian formula:
In event AAUnder the condition that A has occurred, the Bayesian formula can be used to calculate the resultingAAVarious causes of A happeningB i B_{i}BiThe probability.

3. Cross entropy loss function

L = − ∑ n = 1 m ∑ i = 1 T y n i l o g s n i L=-\sum_{n=1}^{m}\sum_{i=1}^{T}y_{ni}logs_{ni} L=n=1mi=1Tyit islogsit is
Where:
m: total number of samples
T: total number of categories
yni y_{ni}yit is: The nth piece of data, if the true label is i, then yni y_{ni}yit isis 1, and the rest are 0
sni s_{ni}sit is: The nth piece of data, the softmax value when the predicted label is i

Guess you like

Origin blog.csdn.net/qq_39439006/article/details/121611352