Machine learning portal (d) of the linear regression ---- (normal equations)

Talk about the least-squares problem

With the guiding means matrix, we can find closed-form solution of parameter values ​​minimizing the loss function (closed-form solution). First we put that loss of function in the form of expression vectors.

Each training sample matrix on the line can be obtained by a \ (m \ times n \) design matrix \ (X-\) (Design Matrix), i.e.
\ [X = \ left [\ begin {array} {c} { - \ left (x ^ {( 1)} \ right) ^ {T} -} \\ {- \ left (x ^ {(2)} \ right) ^ {T} -} \\ {\ vdots} \ \ {- \ left (X ^ {(m)} \ right) ^ {T} -} \ End {Array} \ right] \]
$ \ VEC {Y} $ a $ m $ dimensional column vector comprising corresponding to the tag,
\ [\ VEC {Y} = \ left [\ Array the begin {C} {} {^ {Y (. 1) {}} \\ \ \\ vdots {Y} ^ {(m)}} \ end {array} \ right] \
] Accordingly, there is,
\ [\ Begin {aligned} X \ theta- \ vec {y} & = \ left [\ begin {array} {c} {\ left (x ^ {(1)} \ right) ^ {T} \ theta} \\ {\ vdots} \\ {\ left (x ^ {(m)} \ right) ^ {T} \ theta} \ end {array} \ right] - \ left [\ begin {array} {c} { y ^ {(1)}} \\ {\ vdots} \\ {y ^ {(m)}} \ end {array} \ right] \\ & = \ left [\ begin {array} {c} {h_ {\ theta} \ left (x ^ {(1)} \ right) -y ^ {(1)}} \\ {\ vdots} \\ {h _ {\ theta} \ left (x ^ {(m)} \ right) -y ^ {(m
)}} \ end {array} \ right] \ end {aligned} \] first, we first mean square error loss function \ (J (\ theta) = \ frac {1} { 2} \ sum_ {i = 1 } ^ {m} \ left (h _ {\ theta} \ left (x ^ {(i)} \ right) -y ^ {(i)} \ right) ^ {2} \ ) expressed as a form of a vector, there are,
\ [\ the aligned the begin {} \ FRAC. 1} {2} {(X-\ theta-\ VEC {Y}) ^ {T} (X-\ theta-\ VEC {Y}) & = \ frac {1} { 2} \ sum_ {i = 1} ^ {m} \ left (h _ {\ theta} \ left (x ^ {(i)} \ right) -y ^ {(i)} \ right) ^ {2} \\
& = J (\ theta) \ end {aligned} \] can actually see the loss function on the vector parameter \ (\ Theta \) matrix function. Utilizing two Matrices derivation before,
\ [\ Begin {equation} \ begin {aligned} \ nabla_ {A ^ {T}} f (A) & = \ left (\ nabla_ {A} f (A) \ right) ^ {T} \\ \ nabla_ {A} \ operatorname {tr}
ABA ^ {T} C & = CA B + C ^ {T} AB ^ {T} \\ \ end {aligned} \ end {equation} \] we have,
\ [\ begin {equation} \ begin {aligned } \ nabla_ {A ^ {T}} \ operatorname {tr} ABA ^ {T} C & = B ^ {T} A ^ {T} C ^ {T} + BA ^ { T} C \\ \ end {aligned
} \ end {equation} \] Next loss function of the parameter vector \ (\ Theta \) derivation we have,
\ [\ Begin {aligned} \ nabla _ {\ theta} J (\ theta) & = \ nabla _ {\ theta} \ frac {1} {2} (X \ theta- \ vec {y}) ^ {T} ( X \ theta- \ vec {y} ) \\ & = \ frac {1} {2} \ nabla _ {\ theta} \ left (\ theta ^ {T} X ^ {T} X \ theta- \ theta ^ { T} X ^ {T} \ vec {y} - \ vec {y} ^ {T} X \ theta + \ vec {y} ^ {T} \ vec {y} \ right) \\ & = \ frac {1 } {2} \ nabla _ { \ theta} \ operatorname {tr} \ left (\ theta ^ {T} X ^ {T} X \ theta- \ theta ^ {T} X ^ {T} \ vec {y} - \ vec {y} ^ {T } X \ theta + \ vec {y} ^ {T} \ vec {y} \ right) \\ & = \ frac {1} {2} \ nabla _ {\ theta} \ left ( \ operatorname {tr} \ theta ^ {T} (X ^ {T} X) \ theta I-2 \ operatorname {tr} \ vec {y} ^ {T} X \ theta \ right) \\ & = \ frac {1} {2} \ left (X ^ {T} X \ theta + X ^ {T} X \ theta-2 X ^ {T} \ vec {y} \ right) \\ & = X ^ {T} X \ theta-X ^ {T
} \ vec {y} \ end {aligned} \] the third uses a scalar equation, it is the track itself. First half of the fifth equation, uses the above properties, used in the second half of $ \ nabla_ {A} \ operatorname {tr} AB = B ^ {T} $.

We make this derivative equals zero, is obtained normal equation,
\ [\ the begin {Equation} \ the begin {the aligned} X-^ {T} X-\ Theta = X-^ {T} \ VEC {Y} \ End {the aligned} \ End { equation} \]
the last parameter closed-form solution can be obtained,
\ [\ the begin Equation} {\ the aligned the begin {} \ = Theta (X-X-^ {T}) ^ {-}. 1 X-T ^ {} \ {Y VEC } \ end {aligned} \ end
{equation} \] kick, knock.

img

Guess you like

Origin www.cnblogs.com/qizhien/p/11572305.html