Machine learning portal (c) of the linear regression ---- side story (matrix derivation)

Machine learning portal (c) of the linear regression ---- side story (matrix derivation)

Smile of heart

His face was now to become false

I also do not growl

Sound from the bottom of the soul

Go throat has become vague

So I do not speak

Solving linear regression problems, the optimal parameters of the second method is a direct derivation function loss, and make derivative is zero. This requires the derivation of the matrix. Below us to introduce matrix derivation. Great Leap Forward myself. . . .

Matrix defined guide

A matrix function is a \ (m \ times n \) matrix \ (A \) is mapped as a function of a number of, i.e. \ (f: \ mathbb {R } ^ {m \ times n} \ mapsto \ mathbb {R } \) .
On a guide matrix, and this matrix is obtained having a matrix of the same size. And define the matrix function matrix \ (A \) elements derivative $ (i, j) $ position matrix function with respect to (A \) \ in $ (i, j) $ location variable derivative, i.e.,
\ [ \ nabla_ {A} f (A ) = \ left [\ begin {array} {ccc} {\ frac {\ partial f} {\ partial A_ {11}}} & {\ cdots} & {\ frac {\ partial f} {\ partial A_ {1 n}}} \\ {\ vdots} & {\ ddots} & {\ vdots} \\ {\ frac {\ partial f} {\ partial A_ {m 1}}} & { \ cdots} & {\ frac {
\ partial f} {\ partial A_ {mn}}} \ end {array} \ right] \] For example, conventional matrix function \ (f: \ mathbb {R } ^ { 2 \ 2} Times \ mapsto \ R & lt mathbb {} \) , \
[F (A) = \ {FRAC. 3} {2} {A_. 11} {12 is +5 A_ ^ {2}} + {A_ 21 is A_} {22} \]
it matrix \ (A \) 的导数为,
\[ \nabla_{A} f(A)=\left[ \begin{array}{cc}{\frac{3}{2}} & {10 A_{12}} \\ {A_{22}} & {A_{21}} \end{array}\right] \]

The nature of the track and trace

Defined trace operator (trace operator) and the sum of diagonal elements, i.e.,
\ [\ TR OperatorName {A} = \ sum_. 1 = {I}} ^ {n-A_ {II} \]
two are multiplied together square matrix \ (AB \) , having the following properties,
\ [\ TR OperatorName {AB} = \ TR OperatorName {BA} \]
thus the following properties corollary,
\ [\ the aligned the begin {} \ {TR} OperatorName ABC = \ operatorname {tr} CAB = \ operatorname {tr} BCA \\ \ operatorname {tr} ABCD = \ operatorname {tr} DABC = \ operatorname {tr} CDAB = \ operatorname {tr} BCDA \ end {aligned} \ ]
\ (a \) is a real number, the following properties are also difficult to track verification,
\ [\ the aligned the begin {} \ {OperatorName TR & a} = \ OperatorName TR} {T} {a ^ \\ \ {OperatorName TR } (A + B) & = \ operatorname {tr} A + \ operatorname {tr} B \\ \ operatorname {tr} a A & = a \ operatorname {tr} A \ end {aligned} \]

Matrix Derivative Nature

Derivative matrix has the following properties,
\ [\ the begin {Equation} \ the begin {the aligned} \ nabla_ {A} \ OperatorName {TR} AB & = B ^ {T} \\ \ nabla_ {A ^ {T}} F (A ) & = \ left (\ nabla_ {A} f (A) \ right) ^ {T} \\ \ nabla_ {A} \ operatorname {tr} ABA ^ {T} C & = CA B + C ^ {T} AB ^ {T} \\ \ nabla_
{A} | A | & = | A | \ left (A ^ {- 1} \ right) ^ {T} \ end {aligned} \ end {equation} \] thereof derivation of the matrix, is generally seen as a variable. For the first properties, suppose we have a fixed \ (n \ times m \) matrix \ (B \) , then \ (\ operatorname {tr} AB \) can be regarded as a \ (f: \ mathbb {R } ^ {m \ times n} \ mapsto \ mathbb {R} \) matrix function, a first property is said to \ (a \) $ (I, J) derivative position result is $ \ (B \) position matrix elements $ $ (j, i).

With a look, not nervous, knowing the effect can be, you can really have the time to " difficult to verify " something.

Guess you like

Origin www.cnblogs.com/qizhien/p/11569481.html