【统计学习笔记】习题一

【统计学习笔记】习题一

在这里插入图片描述

1.1.1 伯努利分布的极大似然估计

P ( X = 1 ) = θ P ( X = 0 ) = 1 − θ P(X=1)=\theta\quad P(X=0)=1-\theta P(X=1)=θP(X=0)=1θ
设随机变量k次取1,n-k次取0,则
似然函数为:
L ( θ ) = ∏ i = 1 n P ( x i ; θ ) = θ k ( 1 − θ ) n − k L(\theta)=\prod\limits_{i=1}^nP(x_i;\theta)=\theta^k(1-\theta)^{n-k} L(θ)=i=1nP(xi;θ)=θk(1θ)nk
取对数:
log ⁡ ( L ( θ ) ) = k log ⁡ ( θ ) + ( n − k ) log ⁡ ( 1 − θ ) \log(L(\theta))=k\log(\theta)+(n-k)\log(1-\theta) log(L(θ))=klog(θ)+(nk)log(1θ)
求导:
∂ log ⁡ ( L ( θ ) ) ∂ θ = k θ − n − k 1 − θ \frac{\partial{\log(L(\theta))}}{\partial{\theta}}=\frac{k}{\theta}-\frac{n-k}{1-\theta} θlog(L(θ))=θk1θnk
θ = k / n \theta=k/n θ=k/n时,导数为0,故 θ \theta θ的极大似然估计值为 k / n k/n k/n

1.1.2 贝叶斯估计

由贝叶斯定理可得:
P ( θ ∣ A 1 , A 2 , ⋯   , A n ) = P ( A 1 , A 2 , ⋯   , A n ∣ θ ) × P ( θ ) P ( A 1 , A 2 , ⋯   , A n ) P(\theta|A_1,A_2,\cdots,A_n)=\frac{P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)}{P(A_1,A_2,\cdots,A_n)} P(θA1,A2,,An)=P(A1,A2,,An)P(A1,A2,,Anθ)×P(θ)
θ \theta θ的贝叶斯估计值为:

θ ^ = a r g max ⁡ θ P ( θ ∣ A 1 , A 2 , ⋯   , A n ) = a r g max ⁡ θ ∏ P ( A 1 , A 2 , ⋯   , A n ∣ θ ) × P ( θ ) = a r g max ⁡ θ θ k ( 1 − θ ) n − k θ α − 1 ( 1 − θ ) β − 1 \hat{\theta}=arg\max\limits_\theta P(\theta|A_1,A_2,\cdots,A_n)\\=arg\max\limits_\theta \prod P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)\\=arg\max\limits_\theta\theta^k(1-\theta)^{n-k}\theta^{\alpha-1}(1-\theta)^{\beta-1} θ^=argθmaxP(θA1,A2,,An)=argθmaxP(A1,A2,,Anθ)×P(θ)=argθmaxθk(1θ)nkθα1(1θ)β1
求导可得,
θ ^ = k + ( α − 1 ) n + ( α − 1 ) + ( β − 1 ) \hat\theta=\frac{k+(\alpha-1)}{n+(\alpha-1)+(\beta-1)} θ^=n+(α1)+(β1)k+(α1)
其中 α , β \alpha,\beta α,β β \beta β分布中的参数。

1.2 极大似然估计是经验风险最小化的特殊情况

经验风险最小化就是求解优化问题:
min ⁡ f ∈ F 1 N ∑ i = 1 N L ( y i , f ( x i ) ) \min\limits_{f\in\mathcal{F}}\frac{1}{N}\sum\limits_{i=1}^{N}L(y_i,f(x_i)) fFminN1i=1NL(yi,f(xi))
当模型是条件概率分布、损失函数是对数损失函数时,这个问题就变成了: min ⁡ θ ∈ Θ − 1 N ∑ i = 1 N log ⁡ P ( y i ∣ ( x i ; θ ) ) \min\limits_{\theta\in\Theta}-\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta)) θΘminN1i=1NlogP(yi(xi;θ))
等价于极大似然估计:
max ⁡ θ ∈ Θ 1 N ∑ i = 1 N log ⁡ P ( y i ∣ ( x i ; θ ) ) \max\limits_{\theta\in\Theta}\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta)) θΘmaxN1i=1NlogP(yi(xi;θ))

おすすめ

転載: blog.csdn.net/qq_39573785/article/details/107210393