【机器学习】函数对向量、矩阵的梯度(向量、矩阵求导)

函数对向量、矩阵的梯度(向量、矩阵求导)

主要讨论实值函数对矩阵或向量的梯度。先给出定义,若函数 f : R m × n → R f:\mathbb{R}^{m\times n}\rightarrow \mathbb{R} f:Rm×nR,则 ∂ f ∂ X \frac{\partial f}{\partial X} Xf也是一个 m × n m\times n m×n矩阵,且满足:
( ∂ f ∂ X ) i j = ∂ f ∂ x i j \left( \frac{\partial f}{\partial X} \right) _{ij}=\frac{\partial f}{\partial x_{ij}} (Xf)ij=xijf

则表示实值函数对矩阵的梯度,记做 ∇ X f \nabla _{\boldsymbol{X}}f Xf

同样地,若函数 f : R m → R f:\mathbb{R}^{m}\rightarrow \mathbb{R} f:RmR,则 ∂ f ∂ x \frac{\partial f}{\partial x} xf也是一个 m m m维列向量,且满足:
( ∂ f ∂ x ) i = ∂ f ∂ x i \left( \frac{\partial f}{\partial \boldsymbol{x}} \right) _i=\frac{\partial f}{\partial x_i} (xf)i=xif

则表示实值函数对向量的梯度,记做 ∇ x f \nabla _{\boldsymbol{x}}f xf

总结:实值函数对向量或矩阵的梯度,与该向量或矩阵同型。下面从定义出发,推导机器学习中常用的向量/矩阵梯度公式。

∇ ( a T x ) = ∇ ( x T a ) = a \nabla \left( \boldsymbol{a}^T\boldsymbol{x} \right) =\nabla \left( \boldsymbol{x}^T\boldsymbol{a} \right) =\boldsymbol{a} (aTx)=(xTa)=a
证明:
∇ ( a T x ) i = ∂ a T x ∂ x i = ∂ ( ∑ j a j x j ) ∂ x i = a i ⇒    ∇ ( a T x ) = a \nabla \left( \boldsymbol{a}^T\boldsymbol{x} \right) _i=\frac{\partial \boldsymbol{a}^T\boldsymbol{x}}{\partial x_i}=\frac{\partial \left( \sum_j{a_jx_j} \right)}{\partial x_i}=a_i\\\Rightarrow \,\, \nabla \left( \boldsymbol{a}^T\boldsymbol{x} \right) =\boldsymbol{a} (aTx)i=xiaTx=xi(jajxj)=ai(aTx)=a

∇ ∥ x ∥ 2 2 = ∇ ( x T x ) = 2 x \nabla \lVert \boldsymbol{x} \rVert _{2}^{2}=\nabla \left( \boldsymbol{x}^T\boldsymbol{x} \right) =2\boldsymbol{x} x22=(xTx)=2x
证明:
∇ ( x T x ) i = ∇ ( ∑ j x j 2 ) x i = 2 x i ⇒    ∇ ( x T x ) = 2 x \nabla \left( \boldsymbol{x}^T\boldsymbol{x} \right) _i=\frac{\nabla \left( \sum_j{x_{j}^{2}} \right)}{x_i}=2x_i\\\Rightarrow \,\, \nabla \left( \boldsymbol{x}^T\boldsymbol{x} \right) =2\boldsymbol{x} (xTx)i=xi(jxj2)=2xi(xTx)=2x

∇ ( x T A x ) = ( A + A T ) x \nabla \left( \boldsymbol{x}^T\boldsymbol{Ax} \right) =\left( \boldsymbol{A}+\boldsymbol{A}^T \right) \boldsymbol{x} (xTAx)=(A+AT)x
证明:
L H S = ∇ ( x C T A x ) + ∇ ( x T A x C ) = ( x C T A ) T + A x C = ( A + A T ) x = R H S LHS=\nabla \left( \boldsymbol{x}_{C}^{T}\boldsymbol{Ax} \right) +\nabla \left( \boldsymbol{x}^T\boldsymbol{Ax}_C \right) \\=\left( \boldsymbol{x}_{C}^{T}\boldsymbol{A} \right) ^T+\boldsymbol{Ax}_C\\=\left( \boldsymbol{A}+\boldsymbol{A}^T \right) \boldsymbol{x}\\=RHS LHS=(xCTAx)+(xTAxC)=(xCTA)T+AxC=(A+AT)x=RHS

猜你喜欢

转载自blog.csdn.net/FRIGIDWINTER/article/details/114685016
今日推荐