[机器学习笔记] 梯度下降法

版权声明:原创博客,转载请标明出处! https://blog.csdn.net/caipengbenren/article/details/89469771

[Machine Learning notebook by NG] Gradient Descent


Linear Regression Model(线性模型)
h θ = θ 0 + θ 1 x h_\theta=\theta_0+\theta_1x
J ( θ 0 , θ 1 ) = 1 2 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) 2 J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^{2}


Gradient Descent algorithm

在这里插入图片描述
repeat until convergence {
               θ j = θ j α θ j j ( θ 0 , θ 1 ) \theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}j(\theta_0,\theta_1)
              j = 0 o r 1 j = 0 or 1
}
α \alpha means learning rate,that is to say:it means the size of gradient step


Correct: Simultaneous update
t e m p 0 = θ 0 α θ 0 j ( θ 0 , θ 1 ) temp0=\theta_0-\alpha\frac{\partial}{\partial\theta_0}j(\theta_0,\theta_1)
t e m p 1 = θ 0 α θ 1 j ( θ 0 , θ 1 ) temp1=\theta_0-\alpha\frac{\partial}{\partial\theta_1}j(\theta_0,\theta_1)
θ 0 = t e m p 0 \theta_0=temp0
θ 1 = t e m p 1 \theta_1=temp1

Notice that:update θ 0 \theta_0 and θ 1 \theta_1 simultaneously

different Start point may obtion result,as than pictures show。
在这里插入图片描述
在这里插入图片描述


the size of learning rate α \alpha

if α \alpha is too small,gradient descent can be slow.
if α \alpha is too large, gradient descent can overshoot the minimum it may fail to converge, or even diverge.

猜你喜欢

转载自blog.csdn.net/caipengbenren/article/details/89469771