#Week2 Linear Regression with One Variable

一、Model Representation

Or in house prices predicted, for example, a picture is worth a thousand words:
Here Insert Picture Description
hrepresents a from xthe yfunction map.

二、Cost Function

Because univariate linear regression, it is assumed that the function is:
\ [H _ {\ Theta} (X) = \ theta_0 + \ theta_1x \]
So the next question is how to determine the parameters \ (\ theta_0 \) and \ (\ theta_1 \ ) ?
These two parameters will determine the predictive value of the gap between us and the training set of actual data model, which is modeling errors .
Then the regression, the cost function to select the following squared error function reasonable:
\ [J (\ theta_0, \ theta_1) = \ FRAC {. 1} {2M} \ sum_ {I =. 1} ^ {m} (H_ { \ Theta} ({^ X (I)}) - {^ Y (I)}) ^ 2 \]
m is the number of samples in the training set, \ (^ {X (I)} \) is the size of each house , \ (^ {Y (I)} \) is the actual price.
Just look for such \ (J (\ theta_0, \ theta_1) \) minimum parameters.
The reason for dividing by two, mainly to offset the 2 square during the subsequent derivation of the gradient descent method.

三、Gradient Descent

To find the minimum of the cost function using a gradient descent method.

  • Calculated by a random combination of parameters \ (J \)
  • Find that a \ (J \) decreased most combination of parameters update the parameters, until a local optimal solution

Like down, like every step, each time you select the fastest decline direction until a local minimum.
In the batch gradient descent algorithm (all training samples should be used), the synchronization updates all parameters:
Here Insert Picture Description
\ (\ Alpha \) is the learning rate, indicate how long each step of the walk.
If \ (\ alpha \) is too small, then the update process will be very slow; if \ (\ alpha \) is too large, it may skip the lowest point, leading to divergence.
When approaching a local optimum, since the slope will become increasingly smaller, so that each step will automatically go small, no need to reduce the learning rate \ (\ Alpha \) .

四、Gradient Descent For Linear Regression

Prior to the regression model using a gradient descent algorithm:
for \ (J (\ theta_0, \ theta_1) \) requirements on \ (\ theta_0 \) , \ (\ theta_1 \) partial derivatives, into the parameter updating formula, are:
Here Insert Picture Description

Guess you like

Origin www.cnblogs.com/EIMadrigal/p/12130849.html