Linear regression of ten classic machine learning algorithms

 

Linear regression can be described by the following formula:

Linear regression is a continuous value prediction problem, that is, under the calculation of a given x and model parameter θ, the corresponding equation can be infinitely approximated to the true value y.

Here is a simple example of continuous value prediction:

y = w * x + b

When the two sets of parameters are known, the parameters w and b can be obtained by the elimination method, and the exact solution of the equation can be obtained. That is, w = 1.477, b = 0.089

1.567 = w * 1 + b

3.043 = w * 2 + b

However, in real life, it is often impossible to accurately solve it. Firstly, because the equation of the model itself is unknown, the collected data are all with a certain deviation, and secondly, the data we observe is often noisy. Therefore, a noise factor ε needs to be added to the above formula, namely

y = w * x + b + ε, we assume ε~N(0,1), that is, ε obeys a Gaussian distribution with a mean of 0 and a variance of 1. The above distribution is shown in the following figure:

That is, most of the values ​​are distributed near 0, and the values ​​farther from 0 are less distributed.

Through Gaussian distribution, the above solution process can be changed to:

1.567 = w * 1 + b + eps

3.043 = w * 2 + b + eps

4.519 = w * 3 + b + eps

When we want to get the appropriate values ​​of w and b, we need to observe several more sets of data, and through the iteration of multiple sets of observation data, we can obtain the best overall performance of w and b.

So, how do we solve the two parameters w and b?

Here, the concept of a loss function needs to be introduced, that is, the error between the true value and the predicted value. The loss function formula is as follows:

In order to obtain the best performance w and b, that is, w and b under the condition that the loss function reaches the minimum value, the loss function here is the sum of the errors of each group of observations.

Therefore, we have transformed the problem of estimating model parameters w and b into a problem of minimizing the loss function.

Next, we use the gradient descent algorithm to determine the model parameters w and b. I won't explain much about what is the gradient descent algorithm here. The gradient can be simply understood as the derivative of the function, and the direction of the gradient is the direction in which the value of the function increases. E.g:

For example, the objective function is f(x), the direction of the derivative of the function at the above three points points to the direction in which the value of the function increases. It can also be understood as the direction of the maximum value of the function. When we want to minimize the loss function, we solve the loss function And get the w and b of the corresponding point, that is, the model parameters we want to get. In the above figure, the minimum value of the loss function is about 5, let the model parameters verify that the gradient is changed in the opposite direction. Each change has a fixed step size, that is, the learning rate. Through repeated iterations, find the optimal model parameters .

Therefore, we need to find the partial derivative of the objective function, that is, calculate w'and b'respectively

And update the gradient as follows

Python code deduction:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Seven'

import numpy as np


# y = wx + b
def calculate_loss_function(w, b, points):
    total_error = 0
    for i in range(len(points)):
        x = points[i, 0]
        y = points[i, 1]
        total_error += ((w * x + b) - y) ** 2
    return total_error / float(len(points))


def step_gradient(w_current, b_current, points, learning_rate):
    w_gradient = 0
    b_gradient = 0
    N = float(len(points))
    for i in range(len(points)):
        x = points[i, 0]
        y = points[i, 1]
        # w_gradient = 2x(wx+b-y)
        w_gradient += 2 / N * x * ((w_current * x + b_current) - y)
        # b_gradient = 2(wx+b-y)
        b_gradient += 2 / N * ((w_current * x + b_current) - y)

    new_w = w_current - learning_rate * w_gradient
    new_b = b_current - learning_rate * b_gradient
    return [new_w, new_b]


def gradient_descent_runner(starting_w, starting_b, learning_rate, num_iterations, points):
    w = starting_w
    b = starting_b
    for i in range(num_iterations):
        w, b = step_gradient(w, b, points, learning_rate)
    return [w, b]


def run():
    # 构建模拟数据并添加噪声,并拟合y = 1.477x + 0.089
    x = np.random.uniform(0, 100, 100)
    y = 1.477 * x + 0.089 + np.random.normal(0, 1, 1)
    points = np.array([[i, j] for i, j in zip(x, y)])
    learning_rate = 0.0001
    initial_b = 0
    initial_w = 0
    num_iterations = 1000
    print(f'原始损失函数值为:{calculate_loss_function(initial_w, initial_b, points)}, w={initial_w}, b={initial_b}')
    w, b = gradient_descent_runner(initial_w, initial_b, learning_rate, num_iterations, points)
    print(f'经过{num_iterations}次迭代, 损失函数的值为:{calculate_loss_function(w, b, points)}, w={w}, b={b}')


if __name__ == '__main__':
    run()

 

The operation effect is as follows:

As can be seen from the figure above, after 1000 iterations, the value of w is about 1.49, the value of b is about 0.08, the real w is 1.477, and b is 0.089. The running effect is very close to the real value.

Guess you like

Origin blog.csdn.net/gf19960103/article/details/104655278