向量的导数运算和向量叉乘以及点乘的导数运算

目的:最近在写优化代码,需要对函数中的变量求导,以及求得它们的雅克比矩阵。因此用到向量以及矩阵的求导。

一个向量可以表示为如下: Y = [ y 1 , y 2 , . . . , y m ] T Y=[y_1,y_2,...,y_m]^T Y=[y1,y2,...,ym]T
向量导数的基本知识。它分为以下几类:
1)向量 Y = [ y 1 , y 2 , . . . , y m ] T Y=[y_1,y_2,...,y_m]^T Y=[y1,y2,...,ym]T x x x标量求导:
∂ Y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \\ \cfrac{\partial{y_2}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} xY= xy1xy2xym
如果 Y = [ y 1 , y 2 , . . . , y m ] Y=[y_1,y_2,...,y_m] Y=[y1,y2,...,ym]是行向量,则求导
∂ Y ∂ x = [ ∂ y 1 ∂ x   ∂ y 2 ∂ x … ∂ y m ∂ x ] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x}} \space \cfrac{\partial{y_2}}{\partial{x}} \ldots \cfrac{\partial{y_m}}{\partial{x}} \end{bmatrix} xY=[xy1 xy2xym]

2)标量 y y y对向量 X = [ x 1 , x 2 , . . . , x m ] T X=[x_1,x_2,...,x_m]^T X=[x1,x2,...,xm]T求导
∂ y ∂ X = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x m ] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \\ \cfrac{\partial{y}}{\partial{x_2}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} Xy= x1yx2yxmy
如果 X = [ x 1 , x 2 , . . . , x m ] X=[x_1,x_2,...,x_m] X=[x1,x2,...,xm]为行向量:
∂ y ∂ X = [ ∂ y ∂ x 1   ∂ y ∂ x 2 … ∂ y ∂ x m ] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_1}} \space \cfrac{\partial{y}}{\partial{x_2}} \ldots \cfrac{\partial{y}}{\partial{x_m}} \end{bmatrix} Xy=[x1y x2yxmy]

3)向量 Y = [ y 1 , y 2 , . . . , y m ] T Y=[y_1,y_2,...,y_m]^T Y=[y1,y2,...,ym]T对向量 X = [ x 1 , x 2 , . . . , x n ] X=[x_1,x_2,...,x_n] X=[x1,x2,...,xn]求导
∂ Y ∂ X = [ ∂ y 1 ∂ x 1    ∂ y 1 ∂ x 2    …    ∂ y 1 ∂ x n ∂ y 2 ∂ x 1    ∂ y 2 ∂ x 2    …    ∂ y 2 ∂ x n ⋮ ∂ y m ∂ x 1    ∂ y m ∂ x 2    …    ∂ y m ∂ x n ] \cfrac{\partial{Y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y_1}}{\partial{x_1}} \space \space \cfrac{\partial{y_1}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_1}}{\partial{x_n}} \\ \cfrac{\partial{y_2}}{\partial{x_1}} \space \space \cfrac{\partial{y_2}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_2}}{\partial{x_n}} \\ \vdots \\ \cfrac{\partial{y_m}}{\partial{x_1}} \space \space \cfrac{\partial{y_m}}{\partial{x_2}} \space \space \ldots \space \space \cfrac{\partial{y_m}}{\partial{x_n}} \end{bmatrix} XY= x1y1  x2y1    xny1x1y2  x2y2    xny2x1ym  x2ym    xnym
向量对向量求导也是所谓的雅克比矩阵,它在优化中非常见。

如果是矩阵的话,
Y Y Y是矩阵的时候,它的表达:
Y = [ y 11    y 12    …    y 1 n y 21    y 22    …    y 2 n ⋮ y m 1    y m 2    …    y m n ] Y=\begin{bmatrix} y_{11} \space \space y_{12} \space \space \ldots \space \space y_{1n} \\ y_{21} \space \space y_{22} \space \space \ldots \space \space y_{2n} \\ \vdots \\ y_{m1} \space \space y_{m2} \space \space \ldots \space \space y_{mn} \end{bmatrix} Y= y11  y12    y1ny21  y22    y2nym1  ym2    ymn
X X X是矩阵的时候,它的表达:
X = [ x 11    x 12    …    x 1 n x 21    x 22    …    x 2 n ⋮ x m 1    x m 2    …    x m n ] X=\begin{bmatrix} x_{11} \space \space x_{12} \space \space \ldots \space \space x_{1n} \\ x_{21} \space \space x_{22} \space \space \ldots \space \space x_{2n} \\ \vdots \\ x_{m1} \space \space x_{m2} \space \space \ldots \space \space x_{mn} \end{bmatrix} X= x11  x12    x1nx21  x22    x2nxm1  xm2    xmn

矩阵的导数有两种,如下
1)矩阵 Y Y Y对标量 x x x求导:
∂ Y ∂ x = [ ∂ y 11 ∂ x    ∂ y 12 ∂ x    …    ∂ y 1 n ∂ x ∂ y 21 ∂ x    ∂ y 22 ∂ x    …    ∂ y 2 n ∂ x ⋮ ∂ y m 1 ∂ x    ∂ y m 2 ∂ x    …    ∂ y m n ∂ x ] \cfrac{\partial{Y}}{\partial{x}}=\begin{bmatrix} \cfrac{\partial{y_{11}}}{\partial{x}} \space \space \cfrac{\partial{y_{12}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{1n}}}{\partial{x}} \\ \cfrac{\partial{y_{21}}}{\partial{x}} \space \space \cfrac{\partial{y_{22}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{2n}}}{\partial{x}} \\ \vdots \\ \cfrac{\partial{y_{m1}}}{\partial{x}} \space \space \cfrac{\partial{y_{m2}}}{\partial{x}} \space \space \ldots \space \space \cfrac{\partial{y_{mn}}}{\partial{x}} \end{bmatrix} xY= xy11  xy12    xy1nxy21  xy22    xy2nxym1  xym2    xymn
2)标量 y y y对矩阵 X X X求导:
∂ y ∂ X = [ ∂ y ∂ x 11    ∂ y ∂ x 12    …    ∂ y ∂ x 1 n ∂ y ∂ x 21    ∂ y ∂ x 22    …    ∂ y ∂ x 2 n ⋮ ∂ y ∂ x m 1    ∂ y ∂ x m 2    …    ∂ y ∂ x m n ] \cfrac{\partial{y}}{\partial{X}}=\begin{bmatrix} \cfrac{\partial{y}}{\partial{x_{11}}} \space \space \cfrac{\partial{y}}{\partial{x_{12}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{1n}}} \\ \cfrac{\partial{y}}{\partial{x_{21}}} \space \space \cfrac{\partial{y}}{\partial{x_{22}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{2n}}} \\ \vdots \\ \cfrac{\partial{y}}{\partial{x_{m1}}} \space \space \cfrac{\partial{y}}{\partial{x_{m2}}} \space \space \ldots \space \space \cfrac{\partial{y}}{\partial{x_{mn}}} \end{bmatrix} Xy= x11y  x12y    x1nyx21y  x22y    x2nyxm1y  xm2y    xmny
这是基本的向量的导数定义。基于这些定义以及一些基本的运算法则,得到一些组合的公式。在几何算法的编程中非常有用。

公式中的向量求导,在一般公式中会多个向量以及向量依赖,因此,在求导数的时候希望它能满足标量求导的链式法则。
假设向量相互依赖的关系为: U − > V − > W U->V->W U>V>W
则偏导数为:
∂ W ∂ U = ∂ W ∂ V    ∂ V ∂ U \cfrac{\partial{W}}{\partial{U}}=\cfrac{\partial{W}}{\partial{V}} \space \space \cfrac{\partial{V}}{\partial{U}} UW=VW  UV

证明:只需要拆开逐一对元素求导得到:
∂ w i ∂ u j = ∑ k ∂ w i ∂ v k   ∂ v k ∂ u j = ∂ w i ∂ V   ∂ V ∂ u j \cfrac{\partial{w_i}}{\partial{u_j}} = \sum_{k}\cfrac{\partial{w_i}}{\partial{v_k}}\space \cfrac{\partial{v_k}}{\partial{u_j}} =\cfrac{\partial{w_i}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{u_j}} ujwi=kvkwi ujvk=Vwi ujV
由此可见 ∂ w i ∂ u j \cfrac{\partial{w_i}}{\partial{u_j}} ujwi是等于矩阵 ∂ W ∂ V \cfrac{\partial{W}}{\partial{V}} VW i i i行和矩阵 ∂ V ∂ U \cfrac{\partial{V}}{\partial{U}} UV的第 j j j列的内积,这是矩阵的乘法定义。
它很容易能推广到多层中间变量的情景。

在变量中遇到的情况是常常公式为 F F F为一个实数,中间变量都是向量的时候,它的依赖为:
X − > V − > U − > f X->V->U->f X>V>U>f
根据雅克比矩阵的传递性可以得到如下:
∂ F ∂ X = ∂ F ∂ U   ∂ U ∂ V   ∂ V ∂ X \cfrac{\partial{F}}{\partial{X}} = \cfrac{\partial{F}}{\partial{U}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} XF=UF VU XV
因为 f f f为标量,因此它写成如下形式:
∂ f ∂ X T = ∂ f ∂ U T   ∂ U ∂ V   ∂ V ∂ X \cfrac{\partial{f}}{\partial{X^T}} = \cfrac{\partial{f}}{\partial{U^T}}\space \cfrac{\partial{U}}{\partial{V}} \space \cfrac{\partial{V}}{\partial{X}} XTf=UTf VU XV
为了便于计算,上述需要转为行向量 U T U^T UT, X T X^T XT计算。这个非常重要。

下面介绍向量求倒数的时候遇到的常用的运算公式,它们有以下两类

1)两向量 U U U, V V V(列向量)点积的结果对 W W W求导:
∂ ( U T V ) ∂ W = ( ∂ U ∂ W ) T V + ( ∂ V ∂ W ) T U   ( 4 ) \cfrac{\partial{(U^T V)}}{\partial{W}} = ( \cfrac{\partial{U}}{\partial{W}})^T V + ( \cfrac{\partial{V}}{\partial{W}})^T U \space (4) W(UTV)=(WU)TV+(WV)TU (4)
点积的导数公式证明后续补上。
证明:假设 U = [ u 0 u 1 u 3 ] U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix} U= u0u1u3 V = [ v 0 v 1 v 3 ] V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix} V= v0v1v3 ,它们为三维向量。得到点乘为 f = U T V f=U^T V f=UTV,它是一个标量为: f = u 0 v 0 + u 1 v 1 + u 2 v 2 f=u_0v_0+u_1v_1+u_2v_2 f=u0v0+u1v1+u2v2,然后求它对 W W W的导数

∂ f ∂ W = ∂ ( u 0 v 0 + u 1 v 1 + u 2 v 2 ) ∂ W = ∂ u 0 ∂ W v 0 + ∂ v 0 ∂ W u 0 + ∂ u 1 ∂ W v 1 + ∂ v 1 ∂ W u 1 + ∂ u 2 ∂ W v 2 + ∂ v 2 ∂ W u 2 = ( ∂ u 0 ∂ W v 0 + ∂ u 1 ∂ W v 1 + ∂ u 2 ∂ W v 2 ) + ( ∂ v 0 ∂ W u 0 + ∂ v 1 ∂ W u 1 + ∂ v 2 ∂ W u 2 ) = ( ∂ U ∂ W ) T V + ( ∂ V ∂ W ) T U \cfrac{\partial{f}}{\partial{W}}=\cfrac{\partial{(u_0v_0+u_1v_1+u_2v_2)}}{\partial{W}} \\ =\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2 + \cfrac{\partial{v_2}}{\partial{W}}u_2 \\ =(\cfrac{\partial{u_0}}{\partial{W}}v_0 + \cfrac{\partial{u_1}}{\partial{W}}v_1 + \cfrac{\partial{u_2}}{\partial{W}}v_2) + (\cfrac{\partial{v_0}}{\partial{W}}u_0 + \cfrac{\partial{v_1}}{\partial{W}}u_1 + \cfrac{\partial{v_2}}{\partial{W}}u_2) \\ =( \cfrac{\partial{U}}{\partial{W}})^T V + ( \cfrac{\partial{V}}{\partial{W}})^T U Wf=W(u0v0+u1v1+u2v2)=Wu0v0+Wv0u0+Wu1v1+Wv1u1+Wu2v2+Wv2u2=(Wu0v0+Wu1v1+Wu2v2)+(Wv0u0+Wv1u1+Wv2u2)=(WU)TV+(WV)TU

它可以推广到其它的维度。证明完毕。

如果 W W W是标量其实直接代入 ( 4 ) (4) (4)即可。但是如果 W W W为向量,在计算中,一般都是把 W W W变成行向量。因为定义jacobi矩阵是,列向量对行向量就行求导。因此 W W W可以表示为 W T W^T WT(行向量),所以 ( 4 ) (4) (4),写成:
∂ ( U T V ) ∂ W T = ( ∂ U ∂ W T ) T V + ( ∂ V ∂ W T ) T U \cfrac{\partial{(U^T V)}}{\partial{W^T}} = ( \cfrac{\partial{U}}{\partial{W^T}})^T V + ( \cfrac{\partial{V}}{\partial{W^T}})^T U WT(UTV)=(WTU)TV+(WTV)TU

2)两个向量 U U U, V V V (列向量)叉积的结果对 W W W求导:
∂ ( U × V ) ∂ W = − S k e w ( V ) ( ∂ U ∂ W ) + S k e w ( U ) ( ∂ V ∂ W )   ( 5 ) \cfrac{\partial{(U \times V)}}{\partial{W}} = -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) \space (5) W(U×V)=Skew(V)(WU)+Skew(U)(WV) (5)
其中
S k e w ( U ) = [ 0    − U 3    U 2 U 3    0    − U 1 − U 2    U 1    0 ] Skew(U) = \begin{bmatrix} 0 \space \space -U_3 \space \space U_2 \\ U_3 \space \space 0 \space \space -U_1 \\ -U_2 \space \space U_1 \space \space 0 \end{bmatrix} Skew(U)= 0  U3  U2U3  0  U1U2  U1  0
其中 S k e w ( V ) Skew(V) Skew(V)是将叉乘转化为点积的矩阵。它非常容易证明,因为它就是矩阵展开即可。
对于多个向量叉乘的时候,需要对公式进行转化。叉乘满足分配率。
∂ ( U × V ) ∂ W = ( ∂ U ∂ W ) × V + U × ( ∂ V ∂ W )   ( 6 ) \cfrac{\partial{(U \times V)}}{\partial{W}} = ( \cfrac{\partial{U}}{\partial{W}}) \times V + U \times ( \cfrac{\partial{V}}{\partial{W}}) \space (6) W(U×V)=(WU)×V+U×(WV) (6)
证明后续再补上。(5)和(6)两者的公式是想通的。只是表达形式不同。它们的转化后面再补上。

证明:假设 U = [ u 0 u 1 u 3 ] U=\begin{bmatrix} u_0 \\ u_1 \\ u_3 \end{bmatrix} U= u0u1u3 V = [ v 0 v 1 v 3 ] V=\begin{bmatrix} v_0 \\ v_1 \\ v_3 \end{bmatrix} V= v0v1v3 ,它们为三维向量。

U × V = [ i    j    k u 0    u 1    u 2 v 0    v 1    v 2 ] = ( u 1 v 2 − u 1 v 2 ) i + ( u 2 v 0 − u 0 v 2 ) j + ( u 0 v 1 − u 1 v 0 ) k U \times V = \begin{bmatrix} i \space \space j \space \space k \\ u_0 \space \space u_1 \space \space u_2 \\ v_0 \space \space v_1 \space \space v_2 \end{bmatrix} \\ = (u_1v_2 - u_1v_2)i+ (u_2v_0 - u_0v_2)j+ (u_0v_1 - u_1v_0)k U×V= i  j  ku0  u1  u2v0  v1  v2 =(u1v2u1v2)i+(u2v0u0v2)j+(u0v1u1v0)k

它是一个向量,因此展开后,它的表达为如下:

U × V = [ ( u 1 v 2 − u 2 v 1 ) ( u 2 v 0 − u 0 v 2 ) ( u 0 v 1 − u 1 v 0 ) ] U \times V = \begin{bmatrix} (u_1v_2 - u_2v_1) \\ (u_2v_0 - u_0v_2) \\ (u_0v_1 - u_1v_0) \end{bmatrix} U×V= (u1v2u2v1)(u2v0u0v2)(u0v1u1v0)

展开后得到如下:

∂ ( U × V ) ∂ W = [ ∂ ( u 1 v 2 − u 2 v 1 ) ∂ W ∂ ( u 2 v 0 − u 0 v 2 ) ∂ W ∂ ( u 0 v 1 − u 1 v 0 ) ∂ W ] = ∂ ( u 1 v 2 − u 2 v 1 ) ∂ W I + ∂ ( u 2 v 0 − u 0 v 2 ) ∂ W J + ∂ ( u 0 v 1 − u 1 v 0 ) ∂ W K = ( ∂ u 1 ∂ W ∗ v 2 + ∂ v 2 ∂ W ∗ u 1 − ∂ u 2 ∂ W ∗ v 1 − ∂ v 1 ∂ W ∗ u 2 ) I + ( ∂ u 2 ∂ W ∗ v 0 + ∂ v 0 ∂ W ∗ u 2 − ∂ u 0 ∂ W ∗ v 2 − ∂ v 2 ∂ W ∗ u 0 ) J + ( ∂ u 0 ∂ W ∗ v 1 + ∂ v 1 ∂ W ∗ u 0 − ∂ u 1 ∂ W ∗ v 0 − ∂ v 0 ∂ W ∗ u 1 ) K = [ ( ∂ u 1 ∂ W ∗ v 2 − ∂ u 2 ∂ W ∗ v 1 ) I + ( ∂ u 2 ∂ W ∗ v 0 − ∂ u 0 ∂ W ∗ v 2 ) J + ( ∂ u 0 ∂ W ∗ v 1 − ∂ u 1 ∂ W ∗ v 0 ) K ] + [ ( ∂ v 2 ∂ W ∗ u 1 − ∂ v 1 ∂ W ∗ u 2 ) I + ( ∂ v 0 ∂ W ∗ u 2 − ∂ v 2 ∂ W ∗ u 0 ) J + ( ∂ v 1 ∂ W ∗ u 0 − ∂ v 0 ∂ W ∗ u 1 ) K ] = ( ∂ U ∂ W ) × V − ( ∂ V ∂ W ) × U = − V × ( ∂ U ∂ W ) + U × ( ∂ V ∂ W ) = − S k e w ( V ) ( ∂ U ∂ W ) + S k e w ( U ) ( ∂ V ∂ W ) \cfrac{\partial{(U \times V)}}{\partial{W}} = \begin{bmatrix} \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} \\ \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}\\ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}\\ \end{bmatrix} = \cfrac{\partial{(u_1v_2 - u_2v_1) }}{\partial{W}} I + \cfrac{\partial{ (u_2v_0 - u_0v_2)}}{\partial{W}}J+ \cfrac{\partial{ (u_0v_1 - u_1v_0)}}{\partial{W}}K \\ = (\cfrac{\partial{u_1}}{\partial{W}}*v_2+\cfrac{\partial{v_2}}{\partial{W}}*u_1-\cfrac{\partial{u_2}}{\partial{W}}*v_1-\cfrac{\partial{v_1}}{\partial{W}}*u_2)I+(\cfrac{\partial{u_2}}{\partial{W}}*v_0+\cfrac{\partial{v_0}}{\partial{W}}*u_2-\cfrac{\partial{u_0}}{\partial{W}}*v_2-\cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1+\cfrac{\partial{v_1}}{\partial{W}}*u_0-\cfrac{\partial{u_1}}{\partial{W}}*v_0-\cfrac{\partial{v_0}}{\partial{W}}*u_1)K \\ =[(\cfrac{\partial{u_1}}{\partial{W}}*v_2 -\cfrac{\partial{u_2}}{\partial{W}}*v_1)I + (\cfrac{\partial{u_2}}{\partial{W}}*v_0 - \cfrac{\partial{u_0}}{\partial{W}}*v_2)J + (\cfrac{\partial{u_0}}{\partial{W}}*v_1 - \cfrac{\partial{u_1}}{\partial{W}}*v_0)K] + [(\cfrac{\partial{v_2}}{\partial{W}}*u_1 -\cfrac{\partial{v_1}}{\partial{W}}*u_2)I + (\cfrac{\partial{v_0}}{\partial{W}}*u_2 - \cfrac{\partial{v_2}}{\partial{W}}*u_0)J + (\cfrac{\partial{v_1}}{\partial{W}}*u_0 - \cfrac{\partial{v_0}}{\partial{W}}*u_1)K] \\ =( \cfrac{\partial{U}}{\partial{W}}) \times V - ( \cfrac{\partial{V}}{\partial{W}}) \times U = -V \times (\cfrac{\partial{U}}{\partial{W}}) + U \times ( \cfrac{\partial{V}}{\partial{W}})= -Skew(V)( \cfrac{\partial{U}}{\partial{W}}) +Skew(U)( \cfrac{\partial{V}}{\partial{W}}) W(U×V)= W(u1v2u2v1)W(u2v0u0v2)W(u0v1u1v0) =W(u1v2u2v1)I+W(u2v0u0v2)J+W(u0v1u1v0)K=(Wu1v2+Wv2u1Wu2v1Wv1u2)I+(Wu2v0+Wv0u2Wu0v2Wv2u0)J+(Wu0v1+Wv1u0Wu1v0Wv0u1)K=[(Wu1v2Wu2v1)I+(Wu2v0Wu0v2)J+(Wu0v1Wu1v0)K]+[(Wv2u1Wv1u2)I+(Wv0u2Wv2u0)J+(Wv1u0Wv0u1)K]=(WU)×V(WV)×U=V×(WU)+U×(WV)=Skew(V)(WU)+Skew(U)(WV)

其中的假设 a , b a,b a,b为向量,易得如下
a × b = − b × a a \times b = -b \times a a×b=b×a

从三维可以拓展到多维向量中。证明完毕

猜你喜欢

转载自blog.csdn.net/weixin_43851636/article/details/125340140
今日推荐