[Ch04-02] using a gradient descent method to solve the linear regression

Series blog, the original author maintained on GitHub: https://aka.ms/beginnerAI ,
click on the star with a star do not mean more stars the author harder.

4.2 gradient descent

With the least squares method to do a benchmark, we use the gradient descent method for solving w and b, which can compare the results of both.

Principia Mathematica 4.2.1

In the following equation, we require that x is a sample value characteristic (single feature), y is the sample value tag, z is the predicted value, the subscript \ (I \) represents one of the sample.

Preset function (Hypothesis Function)

It is a linear function:

\[z_i = x_i \cdot w + b \tag{1}\]

Loss function (Loss Function)

For the mean square error function:

\[loss(w,b) = \frac{1}{2} (z_i-y_i)^2 \tag{2}\]

Comparison with the method of least squares can be seen, the model and gradient descent method and the least squares loss function is the same, each is a linear model plus the variance loss function, is used to fit the model, used to evaluate the effect of the loss function.

The difference is that, from the Method of Least Squares pilot loss function, obtained directly analytic mathematical solution, the gradient descent and subsequent neural networks, transmission errors are the use of the derivative, and then step by step approach approximate solution in an iterative manner.

Gradient calculating 4.2.2

Z calculated gradient

根据公式2:
\[ {\partial loss \over \partial z_i}=z_i - y_i \tag{3} \]

Calculating gradient w

We used as an error measure value of the loss, by finding its influence w, i.e. loss of the partial derivative of w, w, to obtain a gradient. Since the loss by the formula 2-> Equation 1 is indirectly linked to w, so we use the chain rule derivative to derivative by a single sample.

The formula 1 and formula 3:

\[ {\partial{loss} \over \partial{w}} = \frac{\partial{loss}}{\partial{z_i}}\frac{\partial{z_i}}{\partial{w}}=(z_i-y_i)x_i \tag{4} \]

Calculating a gradient b

\[ \frac{\partial{loss}}{\partial{b}} = \frac{\partial{loss}}{\partial{z_i}}\frac{\partial{z_i}}{\partial{b}}=z_i-y_i \tag{5} \]

4.2.3 code implementation

if __name__ == '__main__':

    reader = SimpleDataReader()
    reader.ReadData()
    X,Y = reader.GetWholeTrainSamples()

    eta = 0.1
    w, b = 0.0, 0.0
    for i in range(reader.num_train):
        # get x and y value for one sample
        xi = X[i]
        yi = Y[i]
        # 公式1
        zi = xi * w + b
        # 公式3
        dz = zi - yi
        # 公式4
        dw = dz * xi
        # 公式5
        db = dz
        # update w,b
        w = w - eta * dw
        b = b - eta * db

    print("w=", w)    
    print("b=", b)

We can see in the code, we are fully in accordance with the formula is derived to achieve a code so famous gradient descent, in fact, the derived results into mathematical formulas and codes, directly on the iterative process where! In addition, we have no direct calculation of the loss function value, and just put it into the derivation of the formula.

4.2.4 operating results

w= [1.71629006]
b= [3.19684087]

Readers may notice that the results (w1 = 2.056827, b1 = 2.965434) above results and least squares difference more, this question later in this chapter we stay in local solutions.

Code location

ch04, Level2

Guess you like

Origin www.cnblogs.com/woodyh5/p/11988496.html