机器学习算法3_岭回归

机器学习算法第三篇

  1. 本文目内容:岭回归算法推导
  2. 本文基于多元线性回归
  3. 数学核心向
      \
      \

背景:

标准方程法解线性回归时的代价函数矩阵化过程中
当x_data中数据的特征比数据的行数多时候,
无法化成下式子,因为 X T X X^TX 不是满秩矩阵,无法求逆
θ = ( X T X ) 1 X T Y \theta=(X^TX)^{-1}X^TY

  \
  \

岭回归概念

  • 故科学家们在原代价函数中导入了偏差项 1 m j = 1 n θ 2 \frac{1}{m}\sum_{j=1}^n\theta^2 ,解决该问题
      \

(1) J ( θ 1 , θ 2 . . . θ n ) = 1 m i = 1 m ( h θ ( x i ) y i ) 2 标准方程法代价函数\Rightarrow J(\theta_1,\theta_2...\theta_n)= \frac{1}{m}\sum_{i=1}^m (h_\theta(x^i)-y^i)^2\tag 1

(2) J ( θ 1 , θ 2 . . . θ n ) = 1 m { i = 1 m ( h θ ( x i ) y i ) 2 + λ j = 1 n θ 2 } 岭回归代价函数\Rightarrow J(\theta_1,\theta_2...\theta_n)= \frac{1}{m}\left\{\sum_{i=1}^m (h_\theta(x^i)-y^i)^2+ \lambda\sum_{j=1}^n\theta^2\right\}\tag 2
(3) ( 2 ) θ = ( X T X + λ I ) 1 X T Y 1 (2)式矩阵化得\Rightarrow \theta=(X^TX+\lambda I)^{-1}X^TY1\tag 3

岭回归的优点

  1. 解决标准方程法中求逆问题(上面已述)
  2. 通过调节 λ \lambda 的值可以得到更好的估计,(l2正则化分析,本文暂不展开,以后开篇专门分析)
  3. 解决多重公线问题(非本文内容暂不展开)

推导过程

  • 推导过程指展开上述(2)式到(3)的变换过程

  • 首先假设有数据Data( 上标为列号,下标为行号)
    [ x 1 1 x 2 1 . . . x n 1 y 1 x 1 2 x 2 2 . . . x n 2 y 2 x 1 3 x 2 3 . . . x n 3 y 3 . . . . . . . . . . . . . . x 1 m x 2 m . . . x n m y m ] \begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1& y^1 \\ x_1^2 & x_2^2 \quad...&x_n^2& y^2\\ x_1^3 & x_2^3 \quad...&x_n^3& y^3 \\ . & . \quad...& .& . \\ . & . \quad...& .& . \\ x_1^m & x_2^m \quad...&x_n^m& y^m \\ \end{bmatrix}
    x _ d a t a = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] 令x\_data =\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3 \\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}
    y _ d a t a = [ y 1 y 2 y 3 . . y m ] 令y\_data =\begin{bmatrix} y^1 \\ y^2\\ y^3 \\ . \\ . \\ y^m \\ \end{bmatrix}

  \

  \

  \

  \

  • 然后将代价函数(2)中的子式分别转换成矩阵

(4) θ = [ θ 1 θ 2 θ 3 . . θ n ] \theta= \begin{bmatrix} \theta_1 \\ \theta_2\\ \theta_3 \\ . \\ . \\ \theta_n \\ \end{bmatrix}\tag 4

  \

(5) J ( θ 1 , θ 2 . . . θ n ) J ( θ ) J(\theta_1,\theta_2...\theta_n) \Rightarrow J(\theta)\tag 5

  \

(6) j = 1 n θ 2 = ( θ 1 2 + θ 2 2 + θ 3 2 + . . + θ n 2 ) = θ T θ \sum_{j=1}^n\theta^2=(\theta_1^2+\theta_2^2+\theta_3^2+..+\theta_n^2)=\theta ^T\theta\tag 6

  \

(7) y i Y = [ y 1 y 2 y 3 . . y m ] y^i\Rightarrow Y=\begin{bmatrix} y^1 \\ y^2\\ y^3 \\ . \\ . \\ y^m \\ \end{bmatrix}\tag 7

  \

(8) x i X = [ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 x 1 3 x 2 3 . . . x n 3 . . . . . . . . . . . . x 1 m x 2 m . . . x n m ] x^i\Rightarrow X=\begin{bmatrix} x_1^1 & x_2^1 \quad ...&x_n^1\\ x_1^2 & x_2^2 \quad...&x_n^2\\ x_1^3 & x_2^3 \quad...&x_n^3\\ . & . \quad...& . \\ . & . \quad...& .\\ x_1^m & x_2^m \quad...&x_n^m\\ \end{bmatrix}\tag 8

  \

(9) h θ ( x i ) [ h θ ( x 1 ) h θ ( x 2 ) h θ ( x 3 ) . . h θ ( x m ) ] = [ θ 1 x 1 1 + θ 2 x 2 1 + . . . + θ n x n 1 θ 1 x 1 2 + θ 2 x 2 2 + . . . + θ n x n 2 θ 1 x 1 3 + θ 2 x 2 3 + . . . + θ n x n 3 . . . . . . . . θ 1 x 1 m + θ 2 x 2 m + . . . + θ n x n m ] = X θ h_\theta(x^i)\Rightarrow \begin{bmatrix} h_\theta (x^1) \\ h_\theta (x^2) \\ h_\theta (x^3 ) \\ . \\ . \\ h_\theta (x^m) \\ \end{bmatrix}=\begin{bmatrix} \theta_1x_1^1+\theta_2x_2^1 +...+\theta_nx_n^1\\ \theta_1x_1^2+\theta_2x_2 ^2+...+\theta_nx_n^2\\ \theta_1x_1^3+\theta_2x_2 ^3+...+\theta_nx_n^3\\ \quad... . \\ \quad... .\\ \theta_1x_1^m+\theta_2x_2^m +...+\theta_nx_n^m\\ \end{bmatrix}=X \cdot \theta\tag 9

  \
  \

  • 最后进行变换
      \

(2) J ( θ 1 , θ 2 . . . θ n ) = 1 m { i = 1 m ( h θ ( x i ) y i ) 2 + λ j = 1 n θ 2 } 岭回归代价函数\Rightarrow J(\theta_1,\theta_2...\theta_n)= \frac{1}{m}\left\{\sum_{i=1}^m (h_\theta(x^i)-y^i)^2+ \lambda\sum_{j=1}^n\theta^2\right\}\tag 2

  \

J ( θ 1 , θ 2 . . . θ n ) = 1 m { [ h θ ( x 1 ) y 1 ] 2 + [ h θ ( x 2 ) y 2 ] 2 + [ h θ ( x 3 ) y 3 ] 2 . . . + [ h θ ( x m ) y m ] 2 + λ ( θ 1 2 + θ 2 2 + θ 3 2 + . . + θ n 2 ) } \Rightarrow J(\theta_1,\theta_2...\theta_n)= \frac{1}{m} \left\{ \left[ h_\theta(x^1)-y^1\right]^2 + \left[h_\theta(x^2)-y^2\right]^2+ \left[ h_\theta(x^3)-y^3\right]^2...+ \left[ h_\theta(x^m)-y^m\right]^2 +\lambda(\theta_1^2+\theta_2^2+\theta_3^2+..+\theta_n^2) \right \}

[ a b c ] T [ a b v ] = a 2 + b 2 + c 2 套公式\begin{bmatrix}a\\b\\c\end{bmatrix}^T \begin{bmatrix}a\\b\\v\end{bmatrix} =a^2+b^2+c^2
所以上上式可改为
J ( θ ) = 1 m { [ h θ ( x 1 ) y 1 h θ ( x 2 ) y 2 h θ ( x 3 y 3 . . . h θ ( x m ) y m ] T [ h θ ( x 1 ) y 1 h θ ( x 2 ) y 2 h θ ( x 3 y 3 . . . h θ ( x m ) y m ] + λ θ T θ } \Rightarrow J(\theta)=\frac{1}{m} \left\{ \begin{bmatrix} h_\theta(x^1)-y^1\\ h_\theta(x^2)-y^2\\ h_\theta(x^3-y^3\\ ...\\ h_\theta(x^m)-y^m\\ \end{bmatrix}^T\begin{bmatrix} h_\theta(x^1)-y^1\\ h_\theta(x^2)-y^2\\ h_\theta(x^3-y^3\\ ...\\ h_\theta(x^m)-y^m\\ \end{bmatrix}+\lambda\theta^T\theta \right\}

  \

J ( θ ) = 1 m { ( [ h θ ( x 1 ) h θ ( x 2 ) h θ ( x 3 . . . h θ ( x m ) ] [ y 1 y 2 y 3 . . . y m ] ) T ( [ h θ ( x 1 ) h θ ( x 2 ) h θ ( x 3 . . . h θ ( x m ) ] [ y 1 y 2 y 3 . . . y m ] ) + λ θ T θ } \Rightarrow J(\theta)=\frac{1}{m} \left\{ \left(\begin{bmatrix} h_\theta(x^1)\\ h_\theta(x^2)\\ h_\theta(x^3\\ ...\\ h_\theta(x^m)\\ \end{bmatrix}-\begin{bmatrix} y^1\\y^2\\y^3\\...\\y^m\\ \end{bmatrix} \right)^T\left( \begin{bmatrix} h_\theta(x^1)\\ h_\theta(x^2)\\ h_\theta(x^3\\ ...\\h_\theta(x^m)\\ \end{bmatrix}-\begin{bmatrix} y^1\\-y^2\\y^3\\...\\y^m\\ \end{bmatrix} \right)+\lambda\theta^T\theta \right\}
  \
将子式带入

J ( θ ) = 1 m [ ( X θ Y ) T ( X θ Y ) + λ θ T θ ] J(\theta)=\frac{1}{m} \left[ (X\theta-Y)^T(X\theta-Y)+\lambda\theta^T\theta \right]

J ( θ ) = 1 m [ ( θ T X T X θ θ T X T Y Y T X θ + Y T Y + λ θ T θ ] \Rightarrow J(\theta)= \frac{1}{m} [(\theta^TX^TX\theta-\theta^TX^TY -Y^TX\theta+Y^TY+\lambda\theta^T\theta]
  \
d J ( θ ) d θ = 1 m [ 2 X T X θ 2 X T Y + λ θ ] 查表求导得 \\\frac{dJ(\theta)}{d\theta}=\frac{1}{m}[2X^TX\theta -2X^TY+\lambda\theta]
  \
0 ( λ ) θ = ( X T X λ I ) 1 X T Y 令上式等于0得(此处的\lambda经过减半变换) \\ \theta=(X^TX-\lambda I)^{-1}X^TY

猜你喜欢

转载自blog.csdn.net/weixin_44341114/article/details/88555645