机器学习算法2_梯度下降法

机器学习算法第二篇

  1. 本文目内容:梯度下降法算法推导
  2. 本文基于多元线性回归
  3. 数学核心向

一 逻辑推演

  1. 有数据Data( 上标为列号,下标为行号)
    [ x 1 1 x 2 1 . . . x n 1 y 1 x 1 2 x 2 2 . . . x n 2 y 2 x 1 3 x 2 3 . . . x n 3 y 3 . . . . . . . . . . . . . . x 1 m x 2 m . . . x n m y m ] \begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1& y^1 \\ x_1^2 & x_2^2 \quad...&x_n^2& y^2\\ x_1^3 & x_2^3 \quad...&x_n^3& y^3 \\ . & . \quad...& .& . \\ . & . \quad...& .& . \\ x_1^m & x_2^m \quad...&x_n^m& y^m \\ \end{bmatrix}
    x _ d a t a = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] 令x\_data =\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3 \\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}
    y _ d a t a = [ y 1 y 2 y 3 . . y m ] 令y\_data =\begin{bmatrix} y^1 \\ y^2\\ y^3 \\ . \\ . \\ y^m \\ \end{bmatrix}

  2. 设回归线 h θ ( x ) h_\theta(x) : θ 1 x 1 + θ 2 x 2 + . . . + θ n x n = 0 \theta_1x_1+\theta_2x_2 +...+\theta_nx_n=0
    任意点 P ( x 1 , x 2 . . . x n ) P(x_1,x_2...x_n) 带入该式可得到点P到回归线的距离d

  3. 线性回归的目标为:求合适的参数 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n) 组成的超平面 h θ ( x ) h_\theta(x)
    使得众测试点带入该式后所得的值(误差)的平方和最小

即 : min ( d 2 ) \min (\sum d^2)
即 :拟合度最高

  1. 根据3式构建代价函数 costfunction
    J ( θ 1 , θ 2 . . . θ n ) = 1 2 m i = 1 m ( h θ ( x i ) y i ) 2 J(\theta_1,\theta_2...\theta_n)=\frac{1}{2m}\sum_{i=1}^m (h_\theta(x^i)-y^i)^2
    该函数表示训练集所有的m个点带入后 距离回归线的距离的平方和

  2. 函数结构观念调整: 在训练阶段, 代价函数的x 与y 都是已知量 θ \theta 为变量量
    因此3式所述的目标等价于求函数 J ( θ 1 , θ 2 . . . θ n ) J(\theta_1,\theta_2...\theta_n) 的值取最小值时候的变量 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n)

二 梯度下降法:

  • 概念:
    通过将各变量 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n) 不断朝函数取得极值时的变量 ( ω 1 , ω 2 . . . ω n ) (\omega_1,\omega_2...\omega_n) 方向靠拢,从而获得代价函数取最小值时候的各 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n) 参数

方法一:

方法一更易于理解,但计算复杂度较高,求导难度大

  \

  1. 对变量 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n) 进行初始随机赋值
    θ 1 = 1 , θ 1 = 5 , θ 3 = 0.5 , θ n = 10 \theta_1=1, \\ \theta_1=5,\\ \theta_3=0.5,\\ \theta_n=10

  \
2. 对函数的表达式的逐个变量 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n) 求偏导,得到 J ( θ 1 , θ 2 . . . θ n ) = ( f θ 1 , f θ 2 , f θ 3 . . . f θ i ) ∇J(\theta_1,\theta_2...\theta_n)=\left(\frac{\partial f}{\partial\theta_1},\frac{\partial f}{\partial\theta_2},\frac{\partial f}{\partial\theta_3}...\frac{\partial f}{\partial\theta_i}\right)

f θ 1 = 1 m i = 1 m ( h θ ( x i ) y i ) x 1 i f θ 2 = 1 m i = 1 m ( h θ ( x i ) y i ) x 2 i . . . . . . f θ n = 1 m i = 1 m ( h θ ( x i ) y i ) x n i \frac{\partial f}{\partial\theta_1}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_1^i \\ \frac{\partial f}{\partial\theta_2}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_2^i\\ ...\\...\\ \frac{\partial f}{\partial\theta_n}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_n^i
  \

  1. 设定学习率 控制每次迭代 变量的移动距离
    l R = 0.001 , , lR=0.001,该参数为示例,应按照实际情况调整
      \

  2. 分别对每个变量进行迭代
    θ 1 = θ 1 l R f θ 1 θ 2 = θ 2 l R f θ 2 θ 3 = θ 3 l R f θ 3 . . . . . . θ m = θ m l R f θ m 1 \theta_1=\theta_1-lR\frac{\partial f}{\partial\theta_1}\\\theta_2=\theta_2-lR\frac{\partial f}{\partial\theta_2}\\\theta_3=\theta_3-lR\frac{\partial f}{\partial\theta_3}\\...\\...\\\theta_m=\theta_m-lR\frac{\partial f}{\partial\theta_m1}
      \

5.重复第四步到足够次数,可得到距离代价函数取最小值时候的变量 ( ω 1 , ω 2 . . . ω n ) (\omega_1,\omega_2...\omega_n) 极为接近的变量 ( θ 1 , θ 2 . . . θ n ) (\theta_1,\theta_2...\theta_n)

方法二

方法二:将代价函数矩阵化,从而极大化简计算复杂度和求导复杂度

  1. 子式矩阵化
  • θ = [ θ 1 θ 2 θ 3 . . θ m ] \theta= \begin{bmatrix} \theta_1 \\ \theta_2\\ \theta_3 \\ . \\ . \\ \theta_m \\ \end{bmatrix}

  • J ( θ 1 , θ 2 . . . θ n ) J ( θ ) J(\theta_1,\theta_2...\theta_n) \Rightarrow J(\theta)

  • y i Y = y _ d a t a = [ y 1 y 2 y 3 . . y m ] y^i\Rightarrow Y=y\_data=\begin{bmatrix} y^1 \\ y^2\\ y^3 \\ . \\ . \\ y^m \\ \end{bmatrix}

  • x i X = x _ d a t a = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] x^i\Rightarrow X=x\_data=\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3\\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}

  • h θ ( x i ) [ h θ ( x 1 ) h θ ( x 2 ) h θ ( x 3 ) . . h θ ( x m ) ] = [ θ 1 x 1 1 + θ 2 x 2 1 + . . . + θ n x n 1 θ 1 x 1 2 + θ 2 x 2 2 + . . . + θ n x n 2 θ 1 x 1 3 + θ 2 x 2 3 + . . . + θ n x n 3 . . . . . . . . θ 1 x 1 m + θ 2 x 2 m + . . . + θ n x n m ] = X θ h_\theta(x^i)\Rightarrow \begin{bmatrix} h_\theta (x^1) \\ h_\theta (x^2) \\ h_\theta (x^3 ) \\ . \\ . \\ h_\theta (x^m) \\ \end{bmatrix}=\begin{bmatrix} \theta_1x_1^1+\theta_2x_2^1 +...+\theta_nx_n^1\\ \theta_1x_1^2+\theta_2x_2 ^2+...+\theta_nx_n^2\\ \theta_1x_1^3+\theta_2x_2 ^3+...+\theta_nx_n^3\\ \quad... . \\ \quad... .\\ \theta_1x_1^m+\theta_2x_2^m +...+\theta_nx_n^m\\ \end{bmatrix}=X \cdot \theta

  • f θ i J ( θ ) = [ f θ 1 f θ 2 f θ 3 . . f θ n ] \frac{\partial f}{\partial\theta_i}\Rightarrow ∇J(\theta)=\begin{bmatrix} \frac{\partial f}{\partial\theta_1}\\ \frac{\partial f}{\partial\theta_2}\\ \frac{\partial f}{\partial\theta_3}\\ . \\ . \\ \frac{\partial f}{\partial\theta_n}\\ \end{bmatrix}

2.变换流程

因为
f θ 1 = 1 m i = 1 m ( h θ ( x i ) y i ) x 1 i f θ 2 = 1 m i = 1 m ( h θ ( x i ) y i ) x 2 i . . . . . . f θ n = 1 m i = 1 m ( h θ ( x i ) y i ) x n i \frac{\partial f}{\partial\theta_1}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_1^i \\ \frac{\partial f}{\partial\theta_2}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_2^i\\ ...\\...\\ \frac{\partial f}{\partial\theta_n}=\frac{1}{m}\sum_{i=1}^m\left(h_\theta(x^i)-y^i\right)x_n^i

所以
J ( θ ) = [ f θ 1 f θ 2 f θ 3 . . f θ n ] = [ x 1 1 ( h θ ( x 1 ) y 1 ) + x 1 2 ( h θ ( x 2 ) y 2 ) + . . . + x 1 m ( h θ ( x m ) y m ) x 2 1 ( h θ ( x 1 ) y 1 ) + x 2 2 ( h θ ( x 2 ) y 2 ) + . . . + x 2 m ( h θ ( x m ) y m ) x 3 1 ( h θ ( x 1 ) y 1 ) + x 3 2 ( h θ ( x 2 ) y 2 ) + . . . + x 3 m ( h θ ( x m ) y m ) . . . . . . . . x n 1 ( h θ ( x 1 ) y 1 ) + x n 2 ( h θ ( x 2 ) y 2 ) + . . . + x n m ( h θ ( x m ) y m ) ] ∇J(\theta)=\begin{bmatrix} \frac{\partial f}{\partial\theta_1}\\ \frac{\partial f}{\partial\theta_2}\\ \frac{\partial f}{\partial\theta_3}\\ . \\ . \\ \frac{\partial f}{\partial\theta_n}\\ \end{bmatrix}=\begin{bmatrix} x_1^1(h_\theta(x^1)-y^1)+x_1^2(h_\theta(x^2)-y^2)+...+x_1^m(h_\theta(x^m)-y^m)\\ x_2^1(h_\theta(x^1)-y^1)+x_2^2(h_\theta(x^2)-y^2)+...+x_2^m(h_\theta(x^m)-y^m)\\ x_3^1(h_\theta(x^1)-y^1)+x_3^2(h_\theta(x^2)-y^2)+...+x_3^m(h_\theta(x^m)-y^m)\\ \quad... . \\ \quad... .\\ x_n^1(h_\theta(x^1)-y^1)+x_n^2(h_\theta(x^2)-y^2)+...+x_n^m(h_\theta(x^m)-y^m)\\ \end{bmatrix}
经过奇妙变换得到
J ( θ ) = [ f θ 1 f θ 2 f θ 3 . . f θ n ] = = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] T [ h θ ( x 1 ) y 1 h θ ( x 2 ) y 2 h θ ( x 3 y 3 . . . h θ ( x m ) y m ] ∇J(\theta)=\begin{bmatrix} \frac{\partial f}{\partial\theta_1}\\ \frac{\partial f}{\partial\theta_2}\\ \frac{\partial f}{\partial\theta_3}\\ . \\ . \\ \frac{\partial f}{\partial\theta_n}\\ \end{bmatrix}==\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3\\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}^T\begin{bmatrix} h_\theta(x^1)-y^1\\ h_\theta(x^2)-y^2\\ h_\theta(x^3-y^3\\ ...\\ h_\theta(x^m)-y^m\\ \end{bmatrix}

再变

J ( θ ) = [ f θ 1 f θ 2 f θ 3 . . f θ n ] = = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] T ( [ h θ ( x 1 ) h θ ( x 2 ) h θ ( x 3 . . . h θ ( x m ) ] [ y 1 y 2 y 3 . . . y m ] ) ∇J(\theta)=\begin{bmatrix} \frac{\partial f}{\partial\theta_1}\\ \frac{\partial f}{\partial\theta_2}\\ \frac{\partial f}{\partial\theta_3}\\ . \\ . \\ \frac{\partial f}{\partial\theta_n}\\ \end{bmatrix}==\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3\\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}^T \left( \begin{bmatrix} h_\theta(x^1)\\ h_\theta(x^2)\\ h_\theta(x^3\\ ...\\ h_\theta(x^m)\\ \end{bmatrix}-\begin{bmatrix} y^1\\ y^2\\ y^3\\ ...\\ y^m\\ \end{bmatrix} \right)

: 代入子式们得到:
J ( θ ) = X T ( X θ Y ) ∇J(\theta)=X^T(X\theta-Y)

  \
又因为
θ 1 = θ 1 l R f θ 1 θ 2 = θ 2 l R f θ 2 θ 3 = θ 3 l R f θ 3 . . . . . . θ m = θ m l R f θ m 1 \theta_1=\theta_1-lR\frac{\partial f}{\partial\theta_1}\\\theta_2=\theta_2-lR\frac{\partial f}{\partial\theta_2}\\\theta_3=\theta_3-lR\frac{\partial f}{\partial\theta_3}\\...\\...\\\theta_m=\theta_m-lR\frac{\partial f}{\partial\theta_m1}

J ( θ ) : 代入子式与∇J(\theta)得到最终式子:
θ = θ L R ( J ( θ ) ) \theta=\theta-LR(∇J(\theta))
- L R 式子里除LR外都是向量

, ( ω 1 , ω 2 . . . ω n ) ( θ 1 , θ 2 . . . θ n ) 重复上式足够次数,可得到距离代价函数取最小值时候的变量(\omega_1,\omega_2...\omega_n)极为接近的变量(\theta_1,\theta_2...\theta_n)

猜你喜欢

转载自blog.csdn.net/weixin_44341114/article/details/88539037
今日推荐