Full explanation of hypothesis testing and confidence intervals for a full explanation and discussion of their contact

In real life, it is often difficult to know the population mean, for example, I know that the average age of the 10,000 people Fi is 20 years old, but the network does not know how many people actually mean for the overall age of the Internet, it may be 30 years of age may be 25 years old so often there is a joke, the overall mean is God can know.

But we can estimate it by statistics.

EDITORIAL

Hypothesis testing: when the population mean μ is known (assuming that I know is a value or an interval), I'll test this hypothesis is correct by distribution of statistics,
confidence intervals: when μ is unknown population mean when I by statistic to estimate the unknown population mean in the approximate range of years.

The concept of hypothesis testing (in fact, two steps)

The first step: a hypothesis

The second step: test the hypothesis

Specific steps hypothesis test

1. Consider find events hypothesis put forward for its statistics and write its distribution (normal, two, chi-square, t, F, etc.), and then see if the population variance and sample variance known.

I wrote the blog before, there are four major distribution Introduction
Here Insert Picture Description
Here Insert Picture Description

2. Then the default assumption is valid, distributed computing on the hypothetical event P (p-value) value test

This probability
Here Insert Picture Description
namely: a number of poor population mean and the sample mean is greater than the probability of
: the next event extreme cases (small probability) the probability of occurrence
(by the central limit theorem we know that when a large number of samples, most of the sample mean is the overall mean nearby, so the difference is greater than a certain number, we judge this test out the sample mean is an extreme case, or that is a small probability event)

1.首先我们会划分哪些为小概率事件
2.然后将小概率事件发生的概率相加起来
3.当事件样本非常大的时候,经过多年的检验的测算,我们发现一般相加得到的小概率事件发生的概率为0.05或者0.01或者0.1。(根据小概率原理)
4.我们将所有极端事件发生(小概率事件发生)的概率即0.05或者0.1定义为显著性水平

显著性水平为α,意为对于极端事件的极端字眼做一个界限
或者说是调控极端事件的极端程度
当原假设为真时所得到的样本观察结果或更极端结果出现的概率
p值我们一般取的是假设事件较为极端(小概率发生)的一面
如果我假设的事件主观上我认为发生概率很大,我的p值就设假设的反面,算反面的概率,如果我假设的事件主观上我认为发生概率很小,我的p值就设假设的正面,

但是我们一般把我们认为该事件不太可能发生的情况作为假设,为此来判断该事件是否为小概率事件,p越小越拒绝原假设,越大越接受原假设。

原因很简单:因为显著性水平是来评判小概率事件的,而要与显著性水平比较概率,我们也应该把事件的不太可能发生的情况作为假设来算概率与之检验。

3.将P值与显著水平进行比较,一般我们显著水平设为0.05

若我认为假设的事件发生概率很大,我p值设的是假设的反面
当P值比显著水平还小时,说明这个假设的反面事件发生概率巨小,明显假设正确。
当P值比显著水平还大时,说明这个假设的反面事件发生概率符合显著水平,假设假设错误。
若我认为假设的事件发生概率很小,我p值设的是假设的正面
当P值比显著水平还小时,说明这个假设事件发生概率巨小,明显假设错误。
当P值比显著水平还大时,说明这个假设事件发生概率符合显著水平,假设假设正确。

但是我们一般把我们认为该事件不太可能发生的情况作为假设,为此来判断该事件是否为小概率事件,p越小越拒绝原假设,越大越接受原假设。

这句话我要说两遍,因为总有人会把假设事件设为大概率事件。

Here Insert Picture Description
这个为小概率事件发生的概率,展开写x拔到μ0的距离

Here Insert Picture Description
这就是拒绝域,

Here Insert Picture Description
若总体方差知道,我们将其标准正态化,
若总体方差不知道,我们用样本标准差代替将其化为t分布化
Here Insert Picture Description
所以我们既可以通过p值检验,也可以通过x拔是否在拒绝域里检验,貌似都是一样的。

置信区间的概念

理解前提:中心极限定理和大数定理

大数定理用一句话概括:
1.当你抽的样本越多,越接近总体的数量,那么他的样本均值越会趋近于总体均值。

中心极限定理两句话概括:
1.样本平均值约等于总体平均值。
2.不管总体是什么分布,抽的样本足够大时,样本平均值的分布呈正态分布。

(说人话就是我测1000个学生的学习水平能得到中国所有学生学习水平,因为抽样样本的平均值和总体样本的平均值差不多。而且呈正态分布,也就是你去抽1000份样本平均值,大部分都在总体平均值周围,2条正好更能说明1条是正确的)

Gerry arts Theorem one sentence:
1. The sample size is extremely large, distribution of the sample is close to the overall distribution.

When the overall normal distribution, the mean of the distribution of samples taken always normally distributed (sample size is 1, also satisfied)
Overall not normally distributed, a single sample size> = 30 time, the distribution of the sample mean in line n normal distribution; (central limit theorem)

The confidence interval is a method of interval estimation. 95% confidence interval

Therefore, to obtain a sample large enough, because the sample mean of the distribution is subject to the normal distribution,

Published 19 original articles · won praise 4 · Views 504

Guess you like

Origin blog.csdn.net/qq_35050438/article/details/102988897