[2014 Stanford Machine Learning tutorial notes] Chapter 2 - Linear regression gradient descent

    In the previous section, we talk about the gradient descent algorithm, this section we will want to combine the cost function gradient descent and get a linear regression algorithm. It can be used to fit a straight line data model.

    First, we look back at prior knowledge. Gradient descent left side, the right side is a linear regression model (including assuming a linear variance peace cost function).

    We have to do is to apply the gradient descent algorithm to minimize the squared difference cost function. To apply gradient descent, we have to figure out the derivative term formula.

    When we finished the calculation, we will return to them on behalf of our gradient descent algorithm. We get the formula shown below.

    This is our linear regression algorithm. Note: When implementing the algorithm, we have to also update [theta] 0 and [theta] 1 . 

    Next, we look at how gradient descent is achieved. In our previous example, there have been it is easy to fall into local optimum. Below, we will lead to a different starting position we have two different local optima.

   However, linear regression function arcuate always a cost function as shown below.

    

    This feature is a function of only one global optimum . When you calculate this gradient cost function decline, as long as you are using a linear regression, it will always converge to the global optimum. Because it does not have other local optima.

    Now we look at the use of this algorithm. There are curves and the cost function J is assumed that a function of a curve.

    We initialization parameter value, how we would get a line.

   If we gradient descent step, we will move a little to the lower left from this point. Then we'll get another one new line. We decline every step, you will get less costly of the new line. As we continue down the straight line looks better and better. Finally, we got to the global minimum.

    Obviously, at this time the line a good fit data. This is the gradient descent.

    We have just learning algorithm is sometimes also referred to as Batch gradient descent (batch gradient descent). It means that every step of gradient descent, we have used all the training samples in the gradient descent, in the calculation of the differential derivation term, we need to summation, therefore, in each individual gradient descent in eventually we have to calculate such a thing, the sum of all sample items need training. Therefore, batch gradient descent method name shows that we need to consider all the "batch " training samples, and in fact, sometimes there are other types of gradient descent, this is not the " batch " type, without considering the entire training set but each time only concerned with some small subset of the training set. In later lessons, we will introduce these methods.

    The gradient algorithm to batch the linear regression, which is used for linear regression gradient descent method.

    If you have studied before linear algebra, you should know Numerical Methods for calculating the minimum cost function, this does not require an iterative gradient descent algorithm. In a later lesson, we will talk about this method, it can without the need for a multi-step gradient descent, but also to solve the minimum cost function, which is another called normal equation ( Normal the Equations ) of method. Indeed, in the case of larger amounts of data, the gradient descent method to be more suitable than the normal equation .

    Now that we have mastered the gradient descent, we can use the gradient descent method in a different environment, we will use it in a lot of different machine learning problems. So, congratulations on the success you learn your first machine learning algorithms.

 

Guess you like

Origin www.cnblogs.com/shirleyya/p/12597873.html