统计学习方法第四章课后习题(转载+重新排版+自己解读)

4.1 用极大似然估计法推导朴素贝叶斯法中的先验概率估计公式(4.8)和条件概率估计公式(4.9)

首先是(4.8)
P ( Y = c k ) = i = 1 N I ( y i = c k ) N P({Y=c_k})=\frac {\sum_{i=1}^NI(y_i=c_k)} {N}
###################下面开始证明###############################
下面的 a j l a_{jl} 表示的是第j个特征可能取的第 l l 个值
x i j x_i^{(j)} 指的是第j个样本的第j个特征
P ( x ( j ) = a j l Y = c k ) P(x^{(j)}=a_{jl}|Y=c_k)
= 1 = 1 N ( x i ( j ) = a j l , y i = c k ) i = 1 N I ( Y = c k ) = \frac {\sum_{1=1}^N(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(Y=c_k)}
p = P ( Y = c k ) p=P(Y=c_k)
相当于从样本中独立同分布地随机抽取N个样本,每个样本的结果为 y i y_i

似然概率
P ( y 1 , y 2 , . . . , y n ) P(y_1,y_2,...,y_n)
= p i = 1 N I ( y i = c k ) ( 1 p ) i = 1 N I ( y i c k ) =p^{ \sum_{i=1}^N·I(y_i=c_k) }· (1-p)^{\sum_{i=1}^NI(y_i≠c_k)}
然后求解最大似然概率:
d P ( y 1 , y 2 , . . . , y n ) d p \frac {dP(y_1,y_2,...,y_n)} {dp}
= i = 1 N I ( y i = c k ) p i = 1 N I ( y i = c k ) 1 ( 1 p ) i = 1 N I ( y i c k ) = \sum_{i=1}^NI(y_i=c_k)p^{\sum_{i=1}^NI(y_i=c_k)-1}·(1-p)^{\sum_{i=1}^NI(y_i≠c_k)}
i = 1 N I ( y i c k ) ( 1 p ) i = 1 N I ( y i c k ) 1 p i = 1 N I ( y i = c k ) -\sum_{i=1}^NI(y_i≠c_k)(1-p)^{\sum_{i=1}^NI(y_i≠c_k)-1}·p^{\sum_{i=1}^NI(y_i=c_k) }

= p [ i = 1 N I ( y i = c k ) ] 1 ( 1 p ) [ i = 1 N I ( y i c k ) ] 1 = p^{[\sum_{i=1}^NI(y_i=c_k)]-1}·(1-p)^{[\sum_{i=1}^NI(y_i≠c_k)]-1}
[ ( 1 p ) i = 1 N I ( y i = c k ) p i = 1 N I ( y i c k ) ] = 0 ·[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0

[ ( 1 p ) i = 1 N I ( y i = c k ) p i = 1 N I ( y i c k ) ] = 0 ∴[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0
Σ i N I ( y i = c k ) = p ( i = 1 N I ( y i = c k ) + i = 1 N I ( y i c k ) ) = p N 又∵Σ_i^NI(y_i=c_k)=p(\sum_{i=1}^NI(y_i=c_k)+\sum_{i=1}^NI(y_i≠c_k))=pN

p = P ( Y = c k ) = i = 1 N I ( y i = c k ) N ∴p=P(Y=c_k)=\frac {\sum_{i=1}^NI(y_i=c_k)} {N}①
(4.8)证明结束
###############################################
接下来证明(4.9)
P ( X ( j ) = a j l Y = c k ) P(X^{(j)}=a_{jl}|Y=c_k)
= i = 1 N I ( x i ( j ) = a j l , y i = c k ) i = 1 N I ( y i = c k ) =\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^NI(y_i=c_k)}
j [ 1 , n ] j∈[1,n]
l [ 1 , S j ] l∈[1,S_j]
k [ 1 , K ] k∈[1,K]
##############下面开始证明#########
P ( Y = c k , x ( j ) = a i j ) P(Y=c_k,x^{(j)}=a_{ij})
= i = 1 N I ( y i = c k , x i ( j ) = a j l ) N = \frac { \sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl}) } {N}②
而需要证明的式子的左边是:
P ( x ( j ) = a j l Y = c k ) P({x^{(j)}=a_{jl}|Y=c_k})
= P ( Y = c k , x ( j ) = a j l ) P ( Y = c k ) =\frac {P(Y=c_k,x^{(j)}=a_{jl})} {P(Y=c_k)}③
接下来,
把①代入③的分母,
把②代入③的分子。
得到:
P ( x ( j ) = a j l Y = c k ) = [ i = 1 N I ( y i = c k , x i ( j ) = a j l ) N ] [ j = 1 N I ( y i = c k ) N ] P({x^{(j)}=a_{jl}|Y=c_k})= \frac {[\frac{\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})}{N}]}{[\frac{\sum_{j=1}^NI(y_i=c_k)}{N}]}

= i = 1 N I ( y i = c k , x i ( j ) = a j l ) i = 1 N I ( y i = c k ) =\frac {\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})} {\sum_{i=1}^NI(y_i=c_k)}

(4.9)证明结束

#######下面开始证明(4.11)########################
假设先验概率为均匀概率,那么有:
p = 1 k = > p K 1 = 0 ( 1 ) p= \frac {1}{k} =>pK-1=0(1)
另外根据①,也就是式(4.8)有以下关系:
p N i = 1 N I ( y i = c k ) = 0 2 pN-\sum_{i=1}^NI(y_i=c_k)=0(2)
注意:严格来讲,上面(1)(2)中p,并不是同一个p
(1)中的p指的是样本分布绝对均匀的情况
(2)中的p是根据实际样本分布得到的数值
( 1 ) λ + ( 2 ) = 0 (1)·λ+(2)=0

λ ( p K 1 ) + p N i = 1 N I ( y i = c k ) = 0 λ(pK-1)+pN-\sum_{i=1}^NI(y_i=c_k)=0
P ( Y = c k ) = λ + i = 1 N I ( y i = c k ) λ K + N P(Y=c_k) =\frac {λ+\sum_{i=1}^NI(y_i=c_k)} {λK+N}
(4.11)证明完毕
###############################

#######下面开始证明(4.10)########
根据(4.9)已知极大似然估计为:
p = P ( X ( j ) = a j l Y = c k ) = i = 1 N I ( x i ( j ) = a j l , y i = c k ) i = 1 N I ( y i = c k ) p=P(X^{(j)}=a_{jl}|Y=c_k) =\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(y_i=c_k)}
= > => p i = 1 N I ( y i = c k ) i = 1 N I ( x i ( j ) = a j l , y i = c k ) = 0 p{\sum_{i=1}^NI(y_i=c_k)}-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0 (3)

可以看到(4.10)与(4.9)十分相似,
但是(4.10)比(4.9)的分子和分母多了平滑项。
我们引入一个平滑条件:
Y = c k Y=c_k 时,因为一个属性会有 S j S_j 种取值,我们假设任意属性的各个取值的对应的样本数量是一致的。
那么就有以下关系:
p = P ( X ( j ) = a j l Y = c k ) = 1 S j p=P(X^{(j)}=a_{jl}|Y=c_k)=\frac{1}{S_j}
= > => p S j 1 = 0 ( 4 ) p·S_j-1=0(4)
= > => ( 3 ) + λ ( 4 ) = 0 (3)+λ(4)=0
= > =>
( 3 ) + λ ( 4 ) = p [ i = 1 N I ( y i = c k ) + S j λ ] λ i = 1 N I ( x i ( j ) = a j l , y i = c k ) = 0 (3)+λ(4)=p[{\sum_{i=1}^NI(y_i=c_k)}+S_j·λ]-λ-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0
= > =>

p = i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ i = 1 N I ( y i = c k ) + S j λ p=\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)+λ} {\sum_{i=1}^NI(y_i=c_k)+S_j·λ}
(4.10)证明完毕

猜你喜欢

转载自blog.csdn.net/appleyuchi/article/details/82933297
今日推荐