Linear Regression:
\ (h_ \ theta (x)
= \ theta_0 + \ theta_1x_1 + \ cdots + \ theta_nx_n = X \ theta \) assuming the function \ (h_ \ theta (x) \) of \ (m + 1 \) vector. \ (\ Theta \) of \ ((n + 1) * 1 \) vector, inside + model parameters of the n 1 algebra, \ (X-\) of \ (m * (n + 1 ) \) of matrix
Maximum likelihood estimate
Principle: large probability event is more likely to occur in a single observation; observing that occurred in the first event of its probability to be big
goal: to find observable data system capable of producing at a higher probability tree
Derivation
\(y^{(i)} = \theta^Tx^{(i)} + \epsilon ^{(i)}\)
- \ (^ {Y (i)} \) : the i-th label value
- \ (^ {X (i)} \) : i-th sample
- \ (\ Theta the Tx ^ {^ (i)} \) : the current (Theta \ \) \ , the predicted value of the i-th sample
- \ (\ ^ {Epsilon (I)} \) : the current (Theta \ \) \ errors, the predicted and actual values of
- Error \ (\ epsilon ^ {(i )} (1 \ leq i \ leq n) \) are independent and identically distributed , and with mean 0 and variance \ (\ sigma ^ 2 \) a Gaussian distribution (central limit theorem )
- Practical problems, many random phenomenon can be seen as a comprehensive reflection of the many factors affecting the independent, often follow a normal distribution
Gaussian distribution: \ (P (X) = \ {FRAC. 1} {\ Sigma \ sqrt {2 \ E PI}} ^ {- \ {FRAC (Xu) ^ 2} {2 \ Sigma ^ 2}} \)
For the i th sample
- \(p(\epsilon^{(i)})=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(\epsilon^{(i)})^2}{2\sigma^2}}\)
- \(p(y^{(i)}|x^{(i)};\theta)=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
Likelihood function:
- \ (L (\ theta) = \ prod ^ m_ {i = 1} p (y ^ {(i)} | x ^ {(i)}; \ theta) = \ prod ^ m_ {i = 1} \ frac {1} {\ sigma \ sqrt {2 \ pi}} e ^ {- \ frac {(y ^ {(i)} - \ theta ^ Tx ^ {(i)}) ^ 2} {2 \ sigma ^ 2 }} \)
logarithm (logarithm does not affect the extreme values, and the simplified calculation): - \(l(\theta)=logL(\theta)\)
=\(log\prod^m_{i=1}\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
= \(\sum^m_{i=1}log\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
= \(\sum_{i=1}^mlog\frac{1}{\sigma \sqrt{2\pi}}-\frac{1}{\sigma^2}\cdot{\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2\)
Purports \ (l (\ theta) \ ) Maximum the \ ({\ frac {1} {2}} \ sum_ {i = 1} ^ m (y ^ {(i)} - \ theta ^ Tx ^ { (i)}) ^ 2 \ ) minimum to. Reason: \ (\ Sigma \) is the error variance is constant.
损失函数\(loss(y,\hat{y})=J(\theta)={\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2\)
- \ (^ {Y (i)} \) : the i-th label value
- \ (^ {X (i)} \) : i-th sample
- \ (\ Theta \) : variable model to learn the purpose of minimizing the loss function
Solution 1, so that the derivative is equal to 0:
\ (J (\ Theta) = {\ FRAC {1} {2}} \ sum_ {I = 1} ^ m (Y ^ {(I)} - \ Theta ^ the Tx ^ {(I)}) ^ 2 = \ FRAC. 1} {2} {(X-\ Theta -Y) ^ T (X-\ Theta -Y) \) -> \ (min_ \ Theta J (\ Theta) \)
\ (\ nabla_ \ Theta J (\ Theta) = \ nabla_ \ Theta (\ FRAC. 1} {2} {(X-\ Theta -Y) ^ T (X-\ -Y Theta)) \)
= \ (\ nabla_ \ Theta (\ FRAC. 1} {2} {(\ Theta the TX ^ -Y ^ T ^ T) (X-\ -Y Theta)) \)
= \ (\ nabla_ \ Theta (\ FRAC. 1} {2} {(\ ^ ^ the TX the TX Theta \ Theta - \ Theta ^ -Y ^ the TX the TX TY ^ \ ^ -Y TY Theta)) \)
= \ (\ FRAC. 1} {2} {(2X the TX ^ \ ^ TY the -X-Theta - (the TX the Y ^) ^ -Y ^ TY T) \)
= \ (X-the TX ^ \ ^ TY the -X-Theta \)
if \ (X ^ TX \) reversibly => \ (\ Theta = (X-the TX ^) ^ {-1} X ^ TY \)
is actually greater than the number of samples as the number of features and other reasons, \ (X-^ the TX \) often irreversible, additional data may be added, resulting in a final matrix is invertible => \ (\ Theta = (X- ^ TX + \ lambda I) ^ {- 1} X ^ TY \)(Solution ridge regression is the above formula: \ (J (\ Theta) = {\ FRAC {. 1} {2}} \ sum_ {I =. 1} ^ m (Y ^ {(I)} - \ Theta ^ the Tx ^ {(I)}) ^ 2 + \ the lambda \ sum_. 1} = {J} ^ {n-\ Theta ^ 2_j \) )
Here used to derive knowledge
Solving method, gradient descent method, global approach (partial) optimal solution:
Irresponsible first issue to be typeset and follow-up