机器学习:矩阵、向量求导理解

前言

关于函数求导大部分人都懂得如何求,但基本都是单变量对单变量和多变量对单变量的求导,在机器学习(深度学习)中经常性要用到向量对向量,矩阵对向量的求导等,掌握这些求导法则也是十分必要的。

本文将由简入繁,讲述如何矩阵、向量求导,也可以将本文作为手册,在需要的时候查询对应求导法则。

1. 向量与单变量求导

1.1 向量对单变量

向量包含行、列向量,对于单个变量的求导类似,也很容易理解。

一般情况下我们约定单个向量符号 x \mathbf x 指列向量,如果要表示行向量,添加一个T转置符号 x T \mathbf x^T

  • 列向量
    y = [ y 1 y 2 . . . y n ] , x \mathbf y=\begin{bmatrix}y_1\\ y_2\\ ...\\ y_n\end{bmatrix},x 是单变量,则 y x = [ y 1 x y 2 x . . . , y n x ] \frac{\partial \mathbf y}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ ...,\\ \frac{\partial y_n}{\partial x} \end{bmatrix}

  • 行向量
    y T = [ y 1 , y 2 , . . . , y n ] , x \mathbf y^T=[y_1,y_2,...,y_n],x 是单变量,则 y T x = [ y 1 x , y 2 x , . . . , y n x ] \frac{\partial \mathbf y^T}{\partial x} = [\frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},...,\frac{\partial y_n}{\partial x}]

1.2 单变量对向量求导

与向量对单变量求导刚好相反,两者颠倒一下。但总体意义相似,结果是单变量分别对向量中元素的求导组成的新向量。

  • 列向量
    x = [ x 1 x 2 . . . x n ] , y \mathbf x=\begin{bmatrix} x_1\\ x_2\\ ...\\ x_n\end{bmatrix},y 是单变量,则 y x = [ y x 1 y x 2 . . . , y x n ] \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ ...,\\ \frac{\partial y}{\partial x_n} \end{bmatrix}
  • 行向量
    x T = [ x 1 , x 2 , . . . , x n ] , y \mathbf x^T=[x_1,x_2,...,x_n],y 是单变量,则 y x T = [ y x 1 , y x 2 , . . . , y x n ] \frac{\partial y}{\partial \mathbf x^T} = [\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},...,\frac{\partial y}{\partial x_n}]

2. 矩阵与单变量求导

2.1 矩阵对单变量求导

矩阵对单变量的求导与向量求导类似,就是每个元素分别对变量求导。

Y = [ y 11 . . . y 1 n . . . . . . . . . y m 1 . . . y m n ] Y= \begin{bmatrix} y_{11} & ... & y_{1n} \\ ... & ... & ... \\ y_{m1} & ... & y_{mn} \end{bmatrix} ,则 Y x = [ y 11 x . . . y 1 n x . . . . . . . . . y m 1 x . . . y m n x ] \frac{\partial Y}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & ... & \frac{\partial y_{1n}}{\partial x} \\ ... & ... & ... \\ \frac{\partial y_{m1}}{\partial x} & ... & \frac{\partial y_{mn}}{\partial x} \end{bmatrix}

2.2 单变量对矩阵求导

X = [ x 11 . . . x 1 n . . . . . . . . . x m 1 . . . x m n ] X= \begin{bmatrix} x_{11} & ... & x_{1n} \\ ... & ... & ... \\ x_{m1} & ... & x_{mn} \end{bmatrix} ,则 y X = [ y x 11 . . . y x 1 n . . . . . . . . . y x m 1 . . . y x m n ] \frac{\partial y}{\partial X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & ... & \frac{\partial y}{\partial x_{1n}} \\ ... & ... & ... \\ \frac{\partial y}{\partial x_{m1}} & ... & \frac{\partial y}{\partial x_{mn}} \end{bmatrix}

3. 向量对向量的求导

向量之间的求导要比单变量的情况稍微复杂一些,但只要懂了单变量对向量的求导,就可以理解为多个单变量对向量求导组合。这里按照列行行列列列,行行四种情况讨论:

向量对向量的求导说起来简单但记忆起来也容易让人头晕,常常会分不清先哪一个对哪一个求导,在这里,我提供一种我自己较为喜欢的求导记忆方法:看下偏导,若是行向量,则上偏导不变,按分母划分;若是列向量,则下偏导不变,按上偏导划分,经过这样一步后,得到的每个元素都是单变量和向量的求导,易求导。

3.1 列向量对行向量求导

y = [ y 1 y 2 . . . y m ] , x T = [ x 1 , x 2 , . . . , x n ] , y x T \mathbf y=\begin{bmatrix}y_1\\\\y_2\\\\...\\\\y_m\end{bmatrix},\mathbf x^T =[x_1,x_2,...,x_n],求 \frac{\partial \mathbf y}{\partial \mathbf x^T}

下偏导为行向量, y \partial \mathbf y 不变,按 x T \partial \mathbf x^T 划分


y x T = [ y x 1 , . . . , y x n ] = [ y 1 x 1 . . . y 1 x n . . . . . . . . . y m x 1 . . . y m x n ] \frac{\partial \mathbf y}{\partial \mathbf x^T} = [\frac{\partial \mathbf y}{\partial x_1},...,\frac{\partial \mathbf y}{\partial x_n}] = \begin{bmatrix}\frac{\partial y_1}{\partial x_1} & ... & \frac{\partial y_1}{\partial x_n} \\\\... & ... & ... \\\\\frac{\partial y_m}{\partial x_1} & ... & \frac{\partial y_m}{\partial x_n}\end{bmatrix}

后一步化简是因为 y x i = [ y 1 x i . . . y m x i ] \frac{\partial \mathbf y}{\partial x_i} = \begin{bmatrix}\frac{\partial y_1}{\partial x_i} \\... \\\frac{\partial y_m}{\partial x_i} \end{bmatrix} ,n个列向量组合而成

3.2 行向量对列向量求导

y T = [ y 1 , y 2 , . . . , y m ] , x = [ x 1 x 2 . . . x n ] , y T x \mathbf y^T =[y_1,y_2,...,y_m], \mathbf x=\begin{bmatrix}x_1\\x_2\\...\\x_n\end{bmatrix},求\frac{\partial \mathbf y^T}{\partial \mathbf x}

下偏导为列向量,不变,按照上偏导划分


y T x = [ y 1 x , . . . , y m x ] = [ y 1 x 1 . . . y m x 1 . . . . . . . . . y 1 x n . . . y m x n ] \frac{\partial \mathbf y^T}{\partial \mathbf x} = [ \frac{\partial y_1}{\partial \mathbf x}, ..., \frac{\partial y_m}{\partial \mathbf x} ] = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & ... & \frac{\partial y_m}{\partial x_1} \\ ... & ... & ... \\ \frac{\partial y_1}{\partial x_n} & ... & \frac{\partial y_m}{\partial x_n} \end{bmatrix}

后一步化简是因为 y i x = [ y i x 1 . . . y i x n ] \frac{\partial y_i}{\partial \mathbf x} = \begin{bmatrix}\frac{\partial y_i}{\partial x_1} \\... \\\frac{\partial y_i}{\partial x_n} \end{bmatrix} ,由m个列向量组成

3.3 列向量对列向量求导

y = [ y 1 y 2 . . . y m ] , x = [ x 1 x 2 . . . x n ] , y x \mathbf y=\begin{bmatrix}y_1\\y_2\\...\\y_m\end{bmatrix}, \mathbf x=\begin{bmatrix}x_1\\x_2\\...\\x_n\end{bmatrix},求 \frac{\partial \mathbf y}{\partial \mathbf x}

下偏导为列向量,按上偏导划分

y x = [ y 1 x y 2 x . . . y m x ] = [ y 1 x 1 y 1 x 2 . . . y m x 1 y m x 2 . . . y m x n ] \frac{\partial \mathbf y}{\partial \mathbf x} =\begin{bmatrix} \frac{\partial y_1}{\partial \mathbf x}\\ \frac{\partial y_2}{\partial \mathbf x}\\ ...\\\\ \frac{\partial y_m}{\partial \mathbf x}\end{bmatrix} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} \\ ... \\ \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_m}{\partial x_2} \\ ... \\ \frac{\partial y_m}{\partial x_n} \end{bmatrix}

最后结果是一个大列向量

3.4 行向量对行向量求导

y T = [ y 1 , y 2 , . . . , y m ] , x T = [ x 1 , x 2 , . . . , x n ] , y T x T \mathbf y^T =[y_1,y_2,...,y_m], \mathbf x^T=[x_1,x_2,...,x_n],求\frac{\partial \mathbf y^T}{\partial \mathbf x^T}

下偏导是行向量,按下偏导划分

y T x T = [ y T x 1 , y T x 2 , . . . , y T x n ] = [ y 1 x 1 , y 2 x 1 , . . . , y m x 1 , . . . , y m x n ] \frac{\partial \mathbf y^T}{\partial \mathbf x^T} =[ \frac{\partial \mathbf y^T}{\partial x_1}, \frac{\partial \mathbf y^T}{\partial x_2} ,..., \frac{\partial \mathbf y^T}{\partial x_n}] = [\frac{\partial y_1}{\partial x_1},\frac{\partial y_2}{\partial x_1},...,\frac{\partial y_m}{\partial x_1},...,\frac{\partial y_m}{\partial x_n}]

最后结果是一个大行向量

向量之间的求导理解和记忆会稍微难一些,但是仔细自己推敲一下就可以掌握了。

4. 矩阵和向量的求导

矩阵和向量的求导其实可以类比向量之间,矩阵由多个向量组合,我们只需将其分解为多个向量就可以一步步化简。

4.1 矩阵对列向量求导

Y = [ y 11 . . . y 1 n . . . . . . . . . y m 1 . . . y m n ] , x = [ x 1 x 2 . . . x p ] , Y x Y= \begin{bmatrix}y_{11} & ... & y_{1n} \\... & ... & ... \\y_{m1} & ... & y_{mn}\end{bmatrix},\mathbf x = \begin{bmatrix}x_1 \\x_2 \\...\\x_p\end{bmatrix},求\frac{\partial Y}{\partial \mathbf x}

同样的道理, x \mathbf x 是行向量,不变,对Y划分,这里我直接同时行列划分

Y x = [ y 11 x . . . y 1 n x . . . . . . . . . y m 1 x . . . y m n x ] \frac{\partial Y}{\partial \mathbf x}= \begin{bmatrix} \frac{\partial y_{11}}{\partial \mathbf x} &... & \frac{\partial y_{1n}}{\partial \mathbf x}\\ ... & ... & ...\\ \frac{\partial y_{m1}}{\partial \mathbf x} & ... & \frac{\partial y_{mn}}{\partial \mathbf x} \end{bmatrix}

4.2 矩阵对行向量求导

Y = [ y 11 . . . y 1 n . . . . . . . . . y m 1 . . . y m n ] , x T = [ x 1 , x 2 , . . . , x p ] , Y x T Y= \begin{bmatrix}y_{11} & ... & y_{1n} \\... & ... & ... \\y_{m1} & ... & y_{mn}\end{bmatrix},\mathbf x^T = [x_1,x_2,...,x_p],求\frac{\partial Y}{\partial \mathbf x^T}

x \mathbf x 是行向量,因此直接对下偏导划分

Y x T = [ Y x 1 , Y x 2 , . . . , Y x p ] \frac{\partial Y}{\partial \mathbf x^T} = [\frac{\partial Y}{\partial x_1},\frac{\partial Y}{\partial x_2},...,\frac{\partial Y}{\partial x_p}]

4.3 列向量对矩阵求导

y = [ y 1 y 2 . . . y m ] , X = [ x 11 . . . x 1 q . . . . . . . . . x p 1 . . . x p q ] , y X \mathbf y=\begin{bmatrix}y_1\\y_2\\...\\y_m\end{bmatrix},X= \begin{bmatrix}x_{11} & ... & x_{1q} \\... & ... & ... \\x_{p1} & ... & x_{pq}\end{bmatrix},求 \frac{\partial \mathbf y}{\partial X}

在下偏导是矩阵的时候,将其分割成向量的形式,再按法则计算

X = [ x 1 , x 2 , . . . , x q ] X = [\mathbf x_1,\mathbf x_2,...,\mathbf x_q] (列向量的组合),则可将其视为行向量,对其划分

y X = [ y x 1 , y x 2 , . . . , y x q ] \frac{\partial \mathbf y}{\partial X} = [\frac{\partial \mathbf y}{\partial \mathbf x_1},\frac{\partial \mathbf y}{\partial \mathbf x_2},...,\frac{\partial \mathbf y}{\partial \mathbf x_q} ]
y x i \frac{\partial \mathbf y}{\partial \mathbf x_i} 则可按照列向量对列向量的法则计算。

这里还有另外一种解释方法,考虑单变量对矩阵求导。当下偏导为矩阵时,可考虑上偏导,若上偏导为列向量,则按上偏导划分,否则按下偏导划分

由此有
y X = [ y 1 X . . . y m X ] \frac{\partial \mathbf y}{\partial X} = \begin{bmatrix} \frac{\partial y_1}{\partial X} \\ ...\\ \frac{\partial y_m}{\partial X}\\ \end{bmatrix}

4.4 行向量对矩阵求导

y T = [ y 1 , y 2 , . . . , y m ] , X = [ x 11 . . . x 1 q . . . . . . . . . x p 1 . . . x p q ] , y T X \mathbf y^T=[y_1,y_2,...,y_m],X= \begin{bmatrix}x_{11} & ... & x_{1q} \\... & ... & ... \\x_{p1} & ... & x_{pq}\end{bmatrix},求 \frac{\partial \mathbf y^T}{\partial X}

一种方法依然是按照将矩阵分成向量,再按向量处理,这里直接给出第二种,根据上偏导选择划分对象,由于 y T \mathbf y^T 是行向量,按下偏导划分

y T X = [ y T x 11 . . . y T x 1 q . . . . . . . . . y T x p 1 . . . y T x p q ] \frac{\partial \mathbf y^T}{\partial X} = \begin{bmatrix}\frac{\partial \mathbf y^T}{\partial x_{11}} & ... & \frac{\partial \mathbf y^T}{\partial x_{1q}}\\ ... & ... & ...\\ \frac{\partial \mathbf y^T}{\partial x_{p1}} & ... &\frac{\partial \mathbf y^T}{\partial x_{pq}} \end{bmatrix}

5. 矩阵对矩阵的求导

Y = [ y 11 . . . y 1 n . . . . . . . . . y m 1 . . . y m n ] , X = [ x 11 . . . x 1 q . . . . . . . . . x p 1 . . . x p q ] , Y X Y= \begin{bmatrix}y_{11} & ... & y_{1n} \\... & ... & ... \\y_{m1} & ... & y_{mn}\end{bmatrix},X= \begin{bmatrix}x_{11} & ... & x_{1q} \\... & ... & ... \\x_{p1} & ... & x_{pq}\end{bmatrix},求 \frac{\partial Y}{\partial X}

将Y做行向量划分,X做列向量划分, Y = [ y 1 T y 2 T . . . y m T ] , X = [ x 1 , x 2 , . . . , x q ] Y = \begin{bmatrix}\mathbf y_1^T\\ \mathbf y_2^T\\ ...\\ \mathbf y_m^T \end{bmatrix}, X = [\mathbf x_1, \mathbf x_2,..., \mathbf x_q] ,此时就是列向量对行向量求导

容易有
Y X = [ y 1 T x 1 . . . y 1 T x q . . . . . . . . . y m T x 1 . . . y m T x q ] \frac{\partial Y}{\partial X} = \begin{bmatrix}\frac{\partial \mathbf y_1^T}{\partial \mathbf x_1} & ... & \frac{\partial \mathbf y_1^T}{\partial \mathbf x_q} \\ ... &...&...\\ \frac{\partial \mathbf y_m^T}{\partial \mathbf x_1} &...&\frac{\partial \mathbf y_m^T}{\partial \mathbf x_q} \end{bmatrix}

6. 参考资料

  • 周晓飞.UCAS.矩阵.向量求导法则
发布了46 篇原创文章 · 获赞 99 · 访问量 6万+

猜你喜欢

转载自blog.csdn.net/GentleCP/article/details/104657019