梯度下降求解

梯度下降求解过程

线性回归

机器学习按目标函数进行迭代,使目标函数结果接近最小值
梯度下降,目标函数:
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x i ) − y i ) \displaystyle J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m\big(h_\theta(x_i)-y_i\big) J(θ0,θ1)=2m1i=1m(hθ(xi)yi)


批量梯度下降: ∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m ( y i − h θ ( x i ) ) x i j \displaystyle \frac{\partial{J(\theta)}}{\partial{\theta_j}}=-\frac{1}{m}\sum_{i=1}^m(y_i-h_\theta(x_i))x_{ij} θjJ(θ)=m1i=1m(yihθ(xi))xij

θ j ′ = θ j + 1 m ∑ i = 1 m ( y i − h θ ( x i ) ) x i j \theta_j'=\theta_j+\frac{1}{m}\sum_{i=1}^m(y_i-h_\theta(x_i))x_{ij} θj=θj+m1i=1m(yihθ(xi))xij
批量梯度下降使用所有样本,速度很慢,容易得到最优解


随机梯度下降: θ j ′ = θ j + ( y i − h θ ( x i ) ) x i j \theta_j'=\theta_j+(y_i-h_\theta(x_i))x_{ij} θj=θj+(yihθ(xi))xij
每次找一个样本,迭代速度快,但不一定每次都朝着收敛的方向


小批量梯度下降法: θ j ′ = θ − α 1 10 ∑ k = i i + 9 ( h θ ( x k ) − y k ) x k j \theta_j'=\theta-\alpha\frac{1}{10}\sum_{k=i}^{i+9}(h_\theta(x_k)-y_k)x_{kj} θj=θα101k=ii+9(hθ(xk)yk)xkj
每次更新选择一小部分数据迭代,以上两种方式结合


逻辑回归

分类数据的回归分析Logistic regression
Sigmoid函数: g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1
自变量取值为任意实数,值域[0,1]
将实数域内的值,映射到了0-1区间,完成了由值到概率的转换

预测函数: h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}} hθ(x)=g(θTx)=1+eθTx1

其中函数:
θ 0 + θ 1 x 1 + , ⋯   , + θ n x n = ∑ i = 1 n θ i x i = θ T x \theta_0+\theta_1x_1+,\cdots,+\theta_nx_n=\displaystyle \sum_{i=1}^n\theta_ix_i=\theta^Tx θ0+θ1x1+,,+θnxn=i=1nθixi=θTx

分类任务: p ( y = 1 ∣ x ; θ ) = h θ ( x ) ⋯ ① p ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) ⋯ ② p ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y ⋯ ③ \begin{aligned} p(y=1|x;\theta)&=h_\theta(x)&\cdots①\\p(y=0|x;\theta)&=1-h_\theta(x) &\cdots②\\p(y|x;\theta)&=(h_\theta(x))^y(1-h_\theta(x))^{1-y}&\cdots③\end{aligned} p(y=1x;θ)p(y=0x;θ)p(yx;θ)=hθ(x)=1hθ(x)=(hθ(x))y(1hθ(x))1y
二分类任务1,2 整合为3

似然函数: L ( θ ) = ∏ i = 1 m p ( y i ∣ x i ; θ ) = ∏ i = 1 m ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i L(\theta)=\prod_{i=1}^mp(y_i|x_i;\theta)=\prod_{i=1}^m(h_\theta(x_i))^{y_i}(1-h_\theta(x_i))^{1-y_i} L(θ)=i=1mp(yixi;θ)=i=1m(hθ(xi))yi(1hθ(xi))1yi

对数似然: l ( θ ) = l o g L ( θ ) = ∑ i = 1 m ( y i l o g h θ ( x i ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ) l(\theta)=logL(\theta)=\sum_{i=1}^m\bigg(y_ilogh_\theta(x_i)+(1-y_i)log(1-h_\theta(x_i))\bigg) l(θ)=logL(θ)=i=1m(yiloghθ(xi)+(1yi)log(1hθ(xi)))

引入函数: J ( θ ) = − 1 m l ( θ ) J(\theta)=-\frac{1}{m}l(\theta) J(θ)=m1l(θ)求解 J ( θ ) J(\theta) J(θ)的最小值

求解过程:
l ( θ ) = l o g L ( θ ) = ∑ i = 1 m ( y i l o g h θ ( x i ) + l o g ( 1 − h θ ( x i ) ) ) ∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) ∂ ∂ θ j h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ∂ ∂ θ j h θ ( x i ) ) = − 1 m ∑ i = 1 m ( y i 1 g ( θ T x i ) − ( 1 − y i ) 1 1 − g ( θ i x ) ) ∂ ∂ θ j g ( θ T x i ) = − 1 m ∑ i = 1 m ( y i 1 g ( θ T x i ) − ( 1 − y i ) 1 1 − g ( θ T x i ) ) g ( θ T x i ) ( 1 − g ( θ T x i ) ) ∂ ∂ θ j θ T x i = − 1 m ∑ i = 1 m ( y i ( 1 − g ( θ T x i ) ) − ( 1 − y i ) g ( θ T x i ) ) x i j = − 1 m ∑ i = 1 m ( y i − g ( θ T x i ) ) x i j = 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \begin{aligned}l(\theta)=logL(\theta)&=\sum_{i=1}^m\bigg(y_ilogh_\theta(x_i)+log(1-h_\theta(x_i))\bigg)\\ \frac{\partial J(\theta)}{\partial_{\theta_j}}&=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{h_\theta(x_i)}\frac{\partial}{\partial_{\theta_j}}h_\theta(x_i)-(1-y_i)\frac{1}{1-h_\theta(x_i)}\frac{\partial}{\partial_{\theta_j}}h_\theta(x_i)\bigg)\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{g(\theta^Tx_i)}-(1-y_i)\frac{1}{1-g(\theta^x_i)}\bigg)\frac{\partial}{\partial_{\theta_j}}g(\theta^Tx_i)\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{g(\theta^Tx_i)}-(1-y_i)\frac{1}{1-g(\theta^Tx_i)}\bigg)g(\theta^Tx_i)(1-g(\theta^Tx_i))\frac{\partial}{\partial\theta_j}\theta^Tx_i\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i(1-g(\theta^Tx_i))-(1-y_i)g(\theta^Tx_i)\bigg)x_{i}^j\\ &=-\frac{1}{m}\sum_{i=1}^m\big(y_i-g(\theta^Tx_i)\big)x_i^j\\ &=\frac{1}{m}\sum_{i=1}{m}\big(h_\theta(x_i)-y_i\big)x_i^j \end{aligned} l(θ)=logL(θ)θjJ(θ)=i=1m(yiloghθ(xi)+log(1hθ(xi)))=m1i=1m(yihθ(xi)1θjhθ(xi)(1yi)1hθ(xi)1θjhθ(xi))=m1i=1m(yig(θTxi)1(1yi)1g(θix)1)θjg(θTxi)=m1i=1m(yig(θTxi)1(1yi)1g(θTxi)1)g(θTxi)(1g(θTxi))θjθTxi=m1i=1m(yi(1g(θTxi))(1yi)g(θTxi))xij=m1i=1m(yig(θTxi))xij=m1i=1m(hθ(xi)yi)xij

逻辑回归参数更新:
θ j = θ j − α 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j \theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i)x_i^j θj=θjαm1i=1m(hθ(xi)yi)xij

猜你喜欢

转载自blog.csdn.net/rankiy/article/details/103383258