对损失函数以及参数w的梯度下降公式的推导

根据《统计学习方法》第6章中6.1节介绍,下面对损失函数以及参数 w w 的梯度下降公式的推导:
S i g m o i d Sigmoid 函数为:
g ( z ) = 1 1 + e z g(z)=\frac{1}{1+e^{-z}} 给定一个样本 x x ,可以使用一个线性函数对自变量进行线性组合 z = w 0 + w 1 x 1 + w 2 x 2 + + w n x n = i = 0 n w i x i = w T X z=w_0+w_1x_1+w_2x_2+\dots+w_nx_n=\sum_{i=0}^{n}w_ix_i=w^TX 根据 s i g m o i d sigmoid 函数,预测函数表达式为:
h w ( x ) = g = w ( T X ) = 1 1 + e w T X h_w(x)=g=w(^TX)=\frac{1}{1+e^{-w^TX}}
P ( Y = 1 X ) = h w ( x ) P(Y=1|X)=h_w(x)
P ( Y = 0 X ) = 1 h w ( x ) P(Y=0|X)=1-h_w(x)
P ( Y X ) = h w ( x ) y ( 1 h w ( x ) ) 1 y P(Y|X)=h_w(x)^y(1-h_w(x))^{1-y}

极大似然函数:
L ( w ) = i = 1 m h w ( x i ) i y ( 1 h w ( x i ) ) 1 y i L(w)=\prod_{i=1}^mh_w(x_i)^y_i(1-h_w(x_i))^{1-y_i}
l o g L ( w ) = i = 1 m l o g [ h w ( x i ) y i ( 1 h w ( x i ) ) 1 y i ] = i = 1 m [ y i l o g h w ( x i ) + ( 1 y i ) l o g ( 1 h w ( x i ) ) ] logL(w)=\sum_{i=1}^mlog[h_w(x_i)^yi(1-h_w(x_i))^{1-y_i}]= \sum_{i=1}^m[y_ilogh_w(x_i)+(1-y_i)log(1-h_w(x_i))]
损失函数:
J ( w ) = 1 m i = 1 m [ y i l o g h w ( x ) + ( 1 y i ) l o g ( 1 h w ( x i ) ) ] = 1 m s u m i = 1 m [ y i l n 1 1 + e w x i + ( 1 y i ) l n e w x i 1 + e w x i ] = 1 m s u m i = 1 m [ l n 1 1 + e w x i + y i l n 1 e w x i ] = 1 m i = 1 m [ w x i y i + l n ( 1 + e w x i ) ] J(w)=-\frac{1}{m}\sum_{i=1}^m[y_i \cdot logh_w(x)+(1-y_i)log(1-h_w(x_i))] =-\frac{1}{m}sum_{i=1}^m[y_i \cdot ln \frac{1}{1+e^{wx_i}}+(1-y_i) \cdot ln \frac{e^{-wx_i}}{1+e^{-wx_i}}] =-\frac{1}{m}sum_{i=1}^m[ln \frac{1}{1+e^{wx_i}}+y_i \cdot ln \frac{1}{e^{-wx_i}}] =\frac{1}{m}\sum_{i=1}{m}[-wx_iy_i+ln(1+e^{wx_i})]
梯度下降 w w 参数的梯度为:
J ( w ) w i = 1 m i m [ x i , j y i + x i , j e w x i 1 + e w x i ] = 1 m i m x i , j ( 1 1 + e w x i y i ) = 1 m i m [ h w ( x i ) y i ] x i , j \frac{\partial J(w)}{\partial w_i}=\frac{1}{m}\sum_i^m[-x_{i,j}y_i+\frac{x_{i,j}\cdot e^{wx_i}}{1+e^{wx_i}}] =\frac{1}{m}\sum_i^mx_{i,j}(\frac{1}{1+e^{-wx_i}}-y_i) =\frac{1}{m}\sum_i^m[h_w(x_i)-y_i]x_{i,j}

所以最后的 w w 参数公式为:
w j + 1 = w j α i = 1 m [ h w ( x i ) y i ] x i , j w_{j+1}=w_j-\alpha\sum_{i=1}^m[h_w(x_i)-y_i]x_{i,j} 对于随机梯度下降的 w w 参数公式为:
w j + 1 = w j α [ h w ( x ) y ] x j w_{j+1}=w_j-\alpha[h_w(x)-y]x_j

猜你喜欢

转载自blog.csdn.net/qq_43019451/article/details/90415239