Matrix guide (II)

Paper undertake Part  , in terms of matrix matrix derivation surgery. Lowercase x represents a scalar, bold lowercase letters [official]represent column vectors X represents uppercase matrix. Matrix derivation using the quantization matrix idea, a method often used in second order optimization problem is solved.

First, let's look at the definition of pondering. Matrix derivative of the matrix, the definition of what kind of needs? First, the matrix F (p × q) derivative of the matrix X (m × n) should contain all mnpq a partial derivatives [official], so that no loss of information; second, derivative and derivative have brief contact, as calculated derivative and Application this contact is required; third, derivatives concise algorithm starting from the whole. We define a vector [official](p × 1) of the vector [official](m × 1) a derivative of [official](m × p), there [official]; redefinition matrix (column-priority) the quantization [official](mn × 1), and define the matrix F matrix X derivative [official](mn × pq). Derivative and differential linked [official]. Some explanations are as follows:

  1. According to this definition, the amount of labeled derivative of f matrix X (m × n) is [official]a mn × 1 vector, is not compatible with the definition of the article, but both easily interchangeable. To avoid confusion, a symbol [official]represents Part defined m × n matrix, there [official]. Although the technique may be used Benpian scalar matrix derivation this particular case, but the use of the article is more convenient in the art. Readers equivalent transformation by the two methods may test articles on the study.
  2. Scalar second derivative matrix, also called the Hessian matrix, defined as [official](mn × mn), it is a symmetric matrix. Vectors [official]or matrix [official]derivation Hessian matrix can be obtained, but the matrix [official]starting easier.
  3. [official]When the derivative matrix is vectorized, the drawbacks is that the destruction of the structure of the matrix to some extent, will lead to the results become complicated form; benefits of multivariate calculus regarding gradient, Hessian matrix conclusions can still be used up, simply matrix vectorization. For example optimization problem, update Newton's method [official]to meet [official].
  4. In profile, the matrix derivatives there are other definitions of matrices, such as [official](mp × nq), or [official](mp × nq), it is compatible with scalar Part of the definition of a derivative matrix, but contact differential and derivative ( dF is equal to [official]the m × n sub-block one by one, respectively the inner product dX) concise enough, not convenient calculation and application. [5] as defined above are reviewed, and the definition of criticism that they are bad, can cooperate with the differential operation is well defined.

 

Then to establish algorithms. To contact still use derivative and differential [official], differentiated approach and the same article, to give the derivative from the differential needs some vectorization techniques:

  1. Linear:[official] .
  2. Matrix multiplication: [official]where [official]represents the Kronecker product, A (m × n) and B (p × q) is the Kronecker product [official](mp × nq). This formula proves See "Matrix Analysis and Applications" on page 107-108 of Zhang Xian.
  3. Transpose: [official], A is an m × n matrix, wherein [official](mn × mn) is a switching matrix (commutation matrix), the column will be prioritized to the priority quantization row becomes the quantization. For example [official].
  4. Element-wise multiplication: [official]wherein [official](mn × mn) is the element A (column-by) aligned diagonal matrix.

 

Look can be asserted if the matrix function F is a matrix X by subtraction multiplication, inverse, determinant, element-wise arithmetic functions such as configuration, using the appropriate algorithm for F differentiating do to quantify and tips to other items exchanged into vec (dX) left, derivative and differential control of contact [official], i.e., the derivative can be obtained.

特别地,若矩阵退化为向量,对照导数与微分的联系[official],即能得到导数。

 

再谈一谈复合:假设已求得[official],而Y是X的函数,如何求[official]呢?从导数与微分的联系入手,[official],可以推出链式法则[official]

和标量对矩阵的导数相比,矩阵对矩阵的导数形式更加复杂,从不同角度出发常会得到形式不同的结果。有一些Kronecker积和交换矩阵相关的恒等式,可用来做等价变形:

  1. [official]
  2. [official]
  3. [official]。可以对[official]求导来证明,一方面,直接求导得到[official];另一方面,引入[official],有[official],用链式法则得到[official]
  4. [official]
  5. [official],A是m×n矩阵,B是p×q矩阵。可以对[official]做向量化来证明,一方面,[official];另一方面,[official]

 

接下来演示一些算例。

例1:[official],X是m×n矩阵,求[official]

解:先求微分:[official],再做向量化,使用矩阵乘法的技巧,注意在dX右侧添加单位阵:[official],对照导数与微分的联系得到[official]

特例:如果X退化为向量,即[official],则根据向量的导数与微分的关系[official],得到[official]

 

例2:[official],X是n×n矩阵,求[official][official]

解:使用上篇中的技术可求得[official]。为求[official],先求微分:[official],再做向量化,使用转置和矩阵乘法的技巧[official],对照导数与微分的联系,得到[official],注意它是对称矩阵。在[official]是对称矩阵时,可简化为[official]

 

例3:[official],A是l×m矩阵,X是m×n矩阵,B是n×p矩阵,exp为逐元素函数,求[official]

解:先求微分:[official],再做向量化,使用矩阵乘法的技巧:[official],再用逐元素乘法的技巧:[official],再用矩阵乘法的技巧:[official],对照导数与微分的联系得到[official]

 

例4【一元logistic回归】:[official],求[official][official]。其中[official]是取值0或1的标量,[official][official]列向量。

解:使用上篇中的技术可求得[official],其中[official] 为sigmoid函数。为求[official],先求微分:[official],其中[official]为sigmoid函数的导数,对照导数与微分的联系,得到[official]

推广:样本[official][official],求[official][official]。有两种方法,方法一:先对每个样本求导,然后相加;方法二:定义矩阵[official],向量[official],将[official]写成矩阵形式[official],进而可以使用上篇中的技术求得[official]。为求[official],先求微分,再用逐元素乘法的技巧:[official],对照导数与微分的联系,得到[official]

 

例5【多元logistic回归】:[official],求[official][official]。其中其中[official]是除一个元素为1外其它元素为0的[official]列向量,[official][official]矩阵,[official][official]列向量,[official]是标量。

解:上篇中已求得[official]。为求[official],先求微分:定义[official][official][official],注意这里化简去掉逐元素乘法,第一项中[official],第二项中[official]。定义矩阵[official][official],做向量化并使用矩阵乘法的技巧,得到[official]

 

最后做个总结。我们发展了从整体出发的矩阵求导的技术,导数与微分的联系是计算的枢纽,标量对矩阵的导数与微分的联系是[official],先对f求微分,再使用迹技巧可求得导数,特别地,标量对向量的导数与微分的联系是[official];矩阵对矩阵的导数与微分的联系是[official],先对F求微分,再使用向量化的技巧可求得导数,特别地,向量对向量的导数与微分的联系是[official]

 

 

参考资料:

  1. Zhang Xian up.  Matrix Analysis and Applications Tsinghua University Press Ltd, 2004.
  2. Fackler, Paul L. "Notes on matrix calculus." North Carolina State University(2005).
  3. Petersen, Kaare Brandt, and Michael Syskind Pedersen. "The matrix cookbook." Technical University of Denmark 7 (2008): 15.
  4. HU, Pili. "Matrix Calculus: Derivation and Simple Application." (2012).
  5. Magnus, Jan R., and Heinz Neudecker. "Matrix Differential Calculus with Applications in Statistics and Econometrics." Wiley, 2019.

Guess you like

Origin www.cnblogs.com/psztswcbyy/p/11542362.html