Is seeking the minimum cost function
Gradient descent
When the minimum value of the cost function before solving linear regression equation, we use the following equation to iterate θ value.
We know that the cost function after regularization, we are beginning to punish the θ1, θ0 did not move. So we will θ separated from the iterative equation. Get the following iterative equation:
in fact there is no change, at least to separate out the calculation of the equation θ0 only. Then the start values θj from 1, n-to . If we want to use this method to find the section regularization objective function, we need to add one on the θj equation . FIG After addition:
and this Equation becomes what shape, to obtain:
1 [alpha]- (Ronda / m) is less than 1, and [alpha] (Ronda / m) are very small numbers, the 1-α * ( Rhonda / m) is approximately equal to 0.99 ** sometimes
we can see the updated θj becomes approximately 0.99 times the original, only a little smaller.
Regarding the second partial differential summation, you can view another article I wrote:
https://blog.csdn.net/Ace_bb/article/details/103996097
The normal equation
I have assumed that there is a set of data, there are n variables, m set of samples.
Whereby all data samples constitute a m * (n + 1) matrix of dimension X, as shown in FIG. For each of the samples constituting the prediction value y m-dimensional vector. As shown below:
Our purpose is to seek cost function J (θ) to obtain the minimum value [theta] , [theta] is a vector , can be used directly to calculate the following formula, when Ronda> 0 may be used.
The intermediate matrix is a (n + 1) * (n + 1) dimensional matrix, only the first diagonal element is 0, and all 1, the off-diagonal elements on all zeros .
When m <n, X may lead to a result multiplied by the transpose of the matrix X irreversible. Therefore, m <n by the amount of time not
------------------------------
Network lesson images from Andrew Ng teacher:
https://www.bilibili.com/video/ av9912938? p = 43