Python Neural Network Learning (4)--Machine Learning--Linear Regression

foreword

Finally, I feel that I have a deeper understanding of this chapter, and I have written a decent code implementation for your reference. I feel that I can write this article. Everyone has been waiting for a long time.

linear regression

What is linear regression?

Linear regression is often used in the prediction task of continuous values. The most classic example is: assuming that the salary level is only related to the working hours, then we need to find a straight line. Although this straight line cannot pass through all the sample points, it can be as small as possible. In this case, a prediction is given, as shown in the figure below:

After getting this straight line, we can enter the working years, and then calculate an approximate salary. Although it is not necessarily consistent with the actual salary, we can also give a rough estimate.

linear model

It is easy to see that the regression line is a straight line, and we all keep the formula of the straight line in mind:

y = a*x + b

 Or we could write like this:

y = a * x + b * 1

 In this case, in this example, according to the image, x is the number of working years, and y is the salary level.

The goal is to find a straight line  y = ax+b and predict the salary level as accurately as possible based on the number of years of work .

analyze

So now to analyze this formula, before that, I ask a question: In the above formula, what is known and what is unknown ?

When calculating this linear regression model, it is necessary to calculate the target straight line for known samples (input, output), (input, output) that is (number of working years, salary level), target straight line (slope, intercept) But I don't know, I have to get it through calculation. So, (x, y) is known and (a, b) is unknown . Everyone must be aware of this place.

Now that we have figured out who is known and who is unknown, it is time to determine how to calculate these two unknowns.

The goal is to find a straight line that is as accurate as possible, which means that for all samples, the error should be as small as possible .

Suppose, for the given data, the most ideal and best straight line is  y = a*x+b , according to the previous idea: first initialize a random straight line, then slowly correct the parameters according to the error, and finally get a nearly perfect parameter as the final parameter.

Randomly an initial straight line: h = \hat{a} * x + \hat{b} , for this random straight line, input a working year, you will get a different salary level h, this salary level h is different from the real y, there is an error.

Then, the single-sample error is: loss = h - y.

When calculating the errors of all samples, we can add up the errors of each sample, but at this time, the positive and negative losses will be offset, so to eliminate the negative sign, there are two ways to eliminate the negative sign:

1. Absolute value.

2. Square.

But for the convenience of derivation, here choose method 2, take the square, that is: (assuming a total of n samples)

Loss(a, b) = \sum_{i=i}^{n} (h_{i} - y_{i})^2 = \sum_{i=i}^{n}(a*x_{i}+b-y_{i})^2

a and b in this are a and b in h.

Let's review our goal again: find a straight line that is as accurate as possible. That is to say , the total error must be minimized, that is to say, Loss(a, b) must be minimized .

At this time, the problem is transformed into the problem of finding the extreme value of the binary function, that is to say: find the minimum value of Loss(a, b). According to the method of finding the extreme value of the binary function, we need to calculate the partial derivatives of a and b respectively (at this time, x and y are regarded as constants), and the results of the derivative are as follows:

\frac{\partial Loss}{\partial a} = 2\sum_{i=1}^{n}(a*x+b) * x ,\frac{\partial Loss}{\partial b} = 2\sum_{i=1}^{n}(a*x+b) * 1 = 2\sum_{i=1}^{n}(a*x+b)

After derivation, there are two methods at this time:

1. Least square method

After the derivation of the least squares method, the above two partial derivatives are 0, and the values ​​of a and b are obtained by simultaneous solution, and then the straight line is obtained by direct calculation, which is the process of solving the equation, which is mentioned here.

2. Gradient descent method ( the reasoning process is only to help everyone understand, and no professionals have been found to verify it. If you want to know the most correct reasoning process, you can go to Baidu Encyclopedia or find a textbook. )

Some people may not know what the gradient means. I will give you the most simple way of understanding here. It may not conform to the definition, but it helps to understand:

The unary function has a slope after derivation, and the binary and above functions have a gradient after derivation. It can be understood as: the gradient is the slope of high dimension. Just like: the plane is vertical, and the high dimension is orthogonal.

Pause here for a moment, and review the one-variable equation: y = f(x) = x^2 + 2 , then this y = f(x) is a concave function ( unless otherwise specified, the concave-convexity in this article is defined in the 2021 postgraduate entrance examination mathematics, September 12, 2021 ) , then it will have a minimum value somewhere, assuming that x_{0} the minimum value is obtained in the place, then the derivative of this place is 0. Right now:

f'(x_{0}) = 0. x_{0}But we don’t look at this, we look at this is the least squares method, we are now talking about the gradient descent method, we need to use some means to make us finally deduce this point through an iterative step.

Assuming we x_{1}start from a point, we don't know whether this point is a minimum, but we can find the slope at this point. The slope at this point is particularly easy to find:

k = f'(x_{1})

If k=0, everyone is happy, directly this x_{1}is the minimum point we need to find.

If k\neq 0, assuming k > 0, it is easy to know that the image is facing up at this time:

 Obviously, at this time x_{1}, it is on the right side of the minimum value, and the next point x_{2}needs to go a little to the left, but not too much.

That is to say:x_{2} = x_{1} - \alpha k = x_{1} - \alpha f'(x_{1})

At this time, I saw the prototype of the gradient descent method.

Further generalizing this formula, we can see a very simple formula:

x_{n+1} = x_{n} - \alpha f'(x_{n})

That is, given an initial x, after a certain number of iterations, it can slowly approach the final result of a minimum value.

The above is the problem in the one-variable equation, so now come to the two-variable equation to help you review:

The loss function for the error:Loss(a, b) = \sum_{i=i}^{n}(a*x_{i}+b-y_{i})^2

For the partial derivatives of a:\frac{\partial Loss}{\partial a} = 2\sum_{i=1}^{n}(a*x+b) * x

 Partial derivatives for b:\frac{\partial Loss}{\partial b} = 2\sum_{i=1}^{n}(a*x+b)

 The partial derivative is the derivative in a certain direction. For each individual a or b, it can be regarded as: Loss(a), Loss(b), so the extension in the unary equation can be accepted.

Then the modified rules for a and b are obtained:

a_{n+1} = a_{n} - \alpha \frac{\partial Loss}{\partial a_{n}}b_{n+1} = b_{n} - \alpha \frac{\partial Loss}{\partial b_{n}}

The previous \alphacan be considered as a learning rate. According to these two formulas, the relatively optimal straight line parameters of the target can be calculated step by step.

 Code

The code implementation is very simple, and the comments are all in the code. I feel that the comments should be clearly written. I have simplified the steps that can be simplified. I hope everyone can understand. If you don’t understand, welcome to leave a message in the comment area.

# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt

def main():
    """线性回归,梯度下降法"""
    # 随机编写一组点,使用numpy的数组可以进行加减乘的运算
    x = np.array([0.5, 0.7, 1.0, 1.5, 2.1, 2.3, 3.0, 3.3])
    y = np.array([5.0, 5.6, 5.3, 6.0, 7.0, 6.8, 9.1, 10.5])

    # 随机初始y = ax + b 的参数
    a, b = 1, 1
    times = 10000  # 迭代训练次数
    learning_rate = 0.001

    for i in range(times):  # 开始训练
        # 根据两个偏导数计算Loss/a的偏导数
        dloss_da = 2 * ((a * x + b - y)*x).sum()
        dloss_db = 2 * (a * x + b - y).sum()

        # 根据修正规则,修正参数
        a = a - learning_rate * dloss_da
        b = b - learning_rate * dloss_db

    # 得到最终的直线上x对应的点
    final_y = a * x + b
    # 画散点
    plt.scatter(x, y, label="test data point")
    plt.plot(x, final_y, label='final regression line')  # 训练完的回归线
    plt.legend()  # 将label贴到图片上
    plt.show()  # 展示这个图片

if __name__ == "__main__":
    main()

 The final effect is shown in the figure: 

Notice

When using the gradient descent method, pay attention \alphato the size. If it is too large, one step will pass the minimum value from the current value and go to the other side, as follows:

At this time, you can consider reducing \alphathe size. In actual combat, I \alphahave used the largest value of 0.5. I have never tried it. I don’t know if it \alpha = 0.000001will work. Adjust the actual situation, if you don’t know, you can start small and increase slowly. 

 conclusion

 The time interval between this issue and the previous issue is indeed quite long, but it will not be interrupted, because I hope I can explain it to everyone in the most concise way. If my explanation may be wrong, I will mark it and leave it to everyone After a deep understanding and sufficient mathematical knowledge, go to explore by yourself (such as the derivation process of the gradient descent algorithm, the actual situation is definitely not the case, but I think this is the easiest way to understand, and the knowledge required is only the knowledge of derivation. ).

I hope everyone can learn knowledge, this is the end of this issue, see you in the next issue! Welcome to leave a message in the comment area.

Guess you like

Origin blog.csdn.net/qq_38431572/article/details/120247790
Recommended