统计学总结

7. Parameter Estimation

  • Model and parameters
  • Properties of good estimators
    • Unbiasedness, consistency
    • UMVUE, efficiency
  • MLE
  • Bayesian Estimation
    • why?
    • Prior and Posterior
    • Conjugate distribution
    • Limitations

Reason: statistic estimation is not general estimation problem.

  • Formulation:
    X 1 , X 2 , . . . , X n   i . i . d f ( x ; θ )     θ u n k n o w n E s t i m a t o r : ϕ ^ = ϕ ( X ) , ϕ : R n E X_1, X_2,..., X_n \ i.i.d \sim f(x ; \theta) \ \ \ \theta \in unknown\\ Estimator: \hat \phi = \phi(X) , \phi: \mathbb{R}^{n} \rightarrow E

Properties of Good Estimators:

Correctness:
  • Unbiasedness: 样本量抽样分布的数学期望等于被估计总体的参数
    E [ ϕ ( X ) ] = θ  for  X f ( x ; θ ) E[\phi(X)]=\theta \text { for } X \sim f(x ; \theta)
  • Consistency: 随样本量增大,估计量收敛于总体的被估计参数
    ϕ ( X ) θ  in probability for  X f ( x ; θ ) \phi(X) \rightarrow \theta \text { in probability for } X \sim f(x ; \theta)
  • Example:
    s 2 = 1 n 1 i = 1 n ( X i X ) 2 σ ^ 2 = 1 n i = 1 n ( X i X ) 2 \begin{aligned} s^{2} &=\frac{1}{n-1} \sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2} 无偏的\\ \hat{\sigma}^{2} &=\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2} 一致的\end{aligned}
  • Accurate:
  • Efficient:
  • UMVUE is very restrictive. Efficient is weaker condition.

Maximum Likelihood Estimation

Why?
MLE is a framework to design consistent and efficient estimator under very general conditions.

Formulation

  • The likelihood function:
    L ( X ; θ ) = i = 1 n f ( X i ; θ ) X    i . i . d f ( x ; θ )     θ u n k n o w n L(X ; \theta)=\prod_{i=1}^{n} f\left(X_{i} ; \theta\right) \\ X ~ \ i.i.d \sim f(x ; \theta) \ \ \ \theta \in unknown\\
  • MLE: For given data samples X=x
    θ ^ = a r g m a x θ E L ( x ; θ ) = L ( x ; θ ^ ) \hat\theta=argmax _{\theta \in E} L(x ; \theta)=L(x ; \hat{\theta})

Limitations:

  • To solve MLE, even numerically, could be very challenging.
  • MLE does not guarantee good performance in finite sample.

Bayesian Estimation

With Bayesian estimation, we can easily update our estimator in a fashion that samples are collected sequentially.

Formulation:

  • θ ~ E
  • f 0 ( θ ) f_{0}(\theta) as the prior of θ \theta
  • f 1 ( θ ) f_{1}(\theta) called posterior, which gives the distribution of θ \theta on condition data
    f 1 ( θ ) = f ( θ X ) = L ( x ; θ ) f 0 ( θ ) E L ( x ; u ) f 0 ( u ) d u f_{1}(\theta)= f(\theta|X)=\frac{L(x ; \theta) f_{0}(\theta)}{\int_{E} L(x ; u) f_{0}(u) d u}

Sequential Bayesian Estimation
Intuitively, if more data Xn+1,…,Xn+m is available, we can take the previous posterior f1 as the new prior and update the belief again using the new data only:

f 2 ( θ ) = L ( x ; θ ) f 1 ( θ ) E L ( x ; u ) f 1 ( u ) d u f_{2}(\theta)=\frac{L(x ; \theta) f_{1}(\theta)}{\int_{E} L(x ; u) f_{1}(u) d u}

Limitations:

  • Its dependence on the prior, which can be any distribution on E. A very strong prior could lead to a non-consistent estimation.
    • In the information-based trade example, what will happen if we pick p0 = 1?
      On the other hand, a weak prior could lead to slow convergence.
  • The computation of the posterior could be very costly when the parameter space E is large.

8. Confidence Interval

  • Three constructions of CI for i.i.d samples:
    • normal
    • t
    • bootstrap
  • When and how?

Central Limit Theory

  • Theorem: { X i } \{X_i\} is a sequence of i.i.d. samples of X with E [ X ] = μ E[X] = μ and
    V a r ( X ) = σ 2 Var(X) = σ^2 . Then,
    n σ ( X n μ ) N ( 0 , 1 ) \frac{\sqrt{n}}{\sigma}\left(\overline{X}_{n}-\mu\right) \Rightarrow N(0,1)
  • Therefore, when n is “large”, for any α > 0
    P ( n σ ( X n μ ) > a ) P ( Z > a ) P\left(\left|\frac{\sqrt{n}}{\sigma}\left(\overline{X}_{n}-\mu\right)\right|>a\right) \approx P(|Z|>a)
    where Z is a standard normal r.v.

Confidence Interval(z-distribution)

  • For any confidence level a a , we simply choose ϕ \phi such that
    P ( Z > ϕ ) = 1 a P(|Z|>\phi)=1-a , then the a confidence interval is
    [ X n ϕ σ n , X n + ϕ σ n ] \left[\overline{X}_{n}-\phi \frac{\sigma}{\sqrt{n}}, \overline{X}_{n}+\phi \frac{\sigma}{\sqrt{n}}\right]
  • 95% CI means that: 如果做了100次抽样,大概有95次找到的区间包含真值,有5次找到的区间不包含真值。
    s . e . σ . x = σ / n 样本均值的标准误差s.e.为\sigma_{ . \overline{x}}=\sigma / \sqrt{n}

The Effect of Sample Size

  • The magnitude of estimation error, measured by the half length of CI, is
    ϕ σ n \phi \frac{\sigma}{\sqrt{n}}
  • In order to have the estimation error ≈ ε, we need the sample size
    n ϕ 2 σ 2 ε 2 n \approx \frac{\phi^{2} \sigma^{2}}{\varepsilon^{2}}
    Intuitively, to improve the estimation accuracy by 10 times, we need enlarge the sample size by 100 times.

CI for Small Samples

  • Theorem: (CI of t-distribution)
    If X 1 , X 2 , . . . , X n X1, X2,...,Xn are i.i.d. samples of a normal distribution N ( μ , σ 2 ) N(μ,σ^2) , then
    n s ( X n μ ) t ( n 1 ) \frac{\sqrt{n}}{s}\left(\overline{X}_{n}-\mu\right) \sim t(n-1) , a t-distribution with degree of freedom n − 1.
  • Remark:
    • t-distribution is more disperse than normal.
    • When n → ∞, t(n − 1) ⇒ N (0, 1).

Bootstrap

9. Significance Test

  • Formulation of general hypothesis test
    • Parameter space
    • Hypothesis / Alternative
    • Hypothesis testing
  • Significance test
    • 5 steps
    • What is the intuition
    • How to choose the hypothesis and alternative
    • How to interpret the p-value
    • Type I and II errors

Steps of a Significance Test

  1. Assumptions: underlying probability model for population
  2. Hypothesis: Formulate the statement or prediction in your research problem into a statement about the population parameter.
  3. Test Statistic: the test statistic measures how “far” the point estimate of parameter is from its null hypothesis value(s), conditional on that null hypothesis is true.
  4. P-Value: the tail probability beyond the observed value of test statistic, if we presume null hypothesis is true. 事件发生的不可能程度
  5. Conclusion: Report and interpret the p-value in the context of the study. Make a decision about H0 based on p-value.

Type I & Type II errors & Interpreting P-Value

Inference on Single Variables

Population proportion
  • z-test
  • Difference from CI
  • Small sample: binomial test

Population mean
  • t-test
  • Relation with CI
  • Small sample: bootstrap

Inference on Two Variables

  • Independent samples
    • Population proportion: z-test
    • Population mean: t-test
    • Small sample: permutation test
  • Paired data: t_test for single variable
  • standard error of z:
    z = ( p 1 p 2 ) ( π 1 π 2 ) p 1 ( 1 p 1 ) n 1 + p 2 ( 1 p 2 ) n 2 z=\frac{\left(p_{1}-p_{2}\right)-\left(\pi_{1}-\pi_{2}\right)}{\sqrt{\frac{p_{1}\left(1-p_{1}\right)}{n_{1}}+\frac{p_{2}\left(1-p_{2}\right)}{n_{2}}}}

  • standard error of u:
    一般不做要求,直接给出

  • Conclude CI:
    Given our estimation on the standard error for the estimated mean or proportion difference, we can construct the confidence interval for mean or proportion difference:
    [ ( x y ) ϕ α s e , ( x y ) + ϕ α s e ] \left[(\overline{x}-\overline{y})-\phi_{\alpha} s e,(\overline{x}-\overline{y})+\phi_{\alpha} s e\right]
    The coefficient φα is determined by α and model assumptions (normal
    distribution for proportions, t distribution for means).

Permutation Test

检验是两个总体是否是同一个服从同样的分布

Paired data

10. Multiple Regression

  • Assumptions
  • Interpretation of estimation results
  • Inference methods:
    • t-test for single coefficient
    • F-test for nested models
  • Residual analysis

Assumptions(linear regression model)

y i = β 0 + k = 1 p β k g k ( x i k ) + ε i y_{i}=\beta_{0}+\sum_{k=1}^{p} \beta_{k} g_{k}\left(x_{i k}\right)+\varepsilon_{i}

where the functions g k g_k are known. Besides, we assume the following conditions on ε i ε_i :

  • Independence: ε i ε_i are independent.
  • Zero mean: E [ ε x ] = 0 E[ε|x] = 0 for all possible value of x = ( x 1 , . . . , x m ) x = (x1, ..., xm) .
  • Equal variance: V a r ( ε x ) = σ 2 Var(ε|x) = σ2 .
  • Normality: ε i ε_i are normal conditional on x.

T-test & F-test

Residual analysis

  • DW-test 检验是否独立,原假设是残差独立不相关
  • JB-test 检验是否正太分布,原假设是残差是正太分布

Assumptions(logistic regression)

n=nrow(data)
tpr=fpr=rep(0,n)
#compute TPR and FPR for different threshold
for (i in 1:n)
{
  threshold=data$prob[i]
  tp=sum(data$prob>threshold&data$obs==1)
  fp=sum(data$prob>threshold&data$obs==0)
  tn=sum(data$prob<=threshold&data$obs==1)
  fn=sum(data$prob<=threshold&data$obs==0)
  tpr[i]=tp/(tp+tn)  #true positive rate
  fpr[i]=fp/(fp+fn)  #false positive rate
}
# plot ROC
plot(fpr,tpr,type='l',ylim = c(0,1),xlim = c(0,1),main = 'ROC')
abline(a=0,b=1)

猜你喜欢

转载自blog.csdn.net/weixin_37409506/article/details/90231581