统计入门——可汗学院
Sample and population
Sample is a part of population that is so selected to represent the entire group.
阅读资料:
Sample Vs Population
Difference between sample and population
Summary of Population and Sample
Measurement of central tendency of a data set: Mean, Median
Measurement of dispersion: Variance, Standard deviation
Concept | Description | Sample | Population |
---|---|---|---|
Mean | The mean of a sample or a population is computed by adding all of the observations and dividing by the number of observations. |
|
|
Variance | In a population, variance is the average squared deviation from the population mean. |
|
|
Standard Deviation | The standard deviation is the square root of the variance |
|
|
方差的另一个公式:
Law of Large Number
可以参见之前一篇:大数定理和中心极限定理
谢益辉:大数定理和中心极限定理
概率论中讨论随机变量序列的算术平均值向随机变量各数学期望的算术平均值收敛的定律。 在随机事件的大量重复出现中,往往呈现几乎必然的规律,这个规律就是大数定律。 通俗地说,这个定理就是,在试验不变的条件下,重复试验多次,随机事件的频率近似于它的概率。
所谓大数定律是,
,…是一列独立同分布
的可积随机变量,
,则
最后收敛的方式是依概率收敛的话称作弱大数定律,几乎处处收敛的话称作强大数定律。
Central limit theorem
中心极限定理描述随机变量序列收敛于正态分布。
就是说从一个存在均值和方差的总体中简单随机抽样得到的样本均值是服从正态分布的(当n>=30)。
图中为随机变量的概率分布,假设样本大小为4,不断抽样,计算样本均值。
绘制频率直方图,可以发现,随着样本大小n的增大,样本均值(随机变量)会越来越接近正态分布。
当样本量 逐渐趋于无穷大时, 个抽样样本的均值的频数逐渐趋于正态分布,其对原总体的分布不做任何要求,意味着无论总体是什么分布,其抽样样本的均值的频数的分布都随着抽样数的增多而趋于正态分布,如上图,这个正态分布的均值会越来越逼近总体均值,并且其方差满足 , 为总体的标准差,注意抽样样本要多次抽取,一个容量为N的抽样样本是无法构成分布的。
Sampling distribution
What is a sampling distribution?
What is the distribution of the values that we could get for the statistics?
what is the frequency with which I can get different values for the statistic that is trying to estimate the parameter?
That distribution is a sampling distribution.
A sampling distribution for the sample mean with sample size of 2
Sampling Distribution of Sample Proportion
从桶里取球,黄球的比例p=0.6。
p=0.6:
p=0.1:
p=0.9:
sample size n=10:
sample size n=50 (tighter distribution):
the higher the sample size, the smaller the standard deviation
Normal conditions for sampling distributions of sample proportion
Under which conditions does the sampling distributions of sample proportion look roughly normal/ right skewed/ left skewed?
The mean of the sampling distribution of sample proportion is going to be the same thing as the population mean.