1- Simple regression problem

1. Gradient descent (gradient descent)
1. Prediction function
Here is a set of sample points, and the horizontal and vertical coordinates represent a set of causal variables

insert image description here

Our task is to design an algorithm so that the machine can fit these data and help us calculate the parameter w

insert image description here

We can first randomly select a straight line passing through the origin, and then calculate the deviation of all points from the straight line (that is, the error)

insert image description here

Then adjust the slope w of the straight line according to the size of the error, where y=wx is the prediction function

2. Loss function (loss function)/cost function (cost function)
For a point (x1, y1), the error e1=y1-wx1

insert image description here

Here the least square error (Ordinary Least Squares, OLS) is used, that is, the square of the error

insert image description here

Expand the error square of all points, where x1, y1, n are all known numbers

insert image description here

Add them up to find the average, and then merge similar items
where a>0

insert image description here

can be expressed as

insert image description here

That is, the cost function cost/loss function loss

insert image description here

insert image description here

In this way, the mapping process from the prediction function to the cost function is completed. As the left image w increases, the points on the right image move to the right

insert image description here

3. Gradient calculation
Our purpose is to achieve the minimum loss, that is, the value of the parameter w when the parabola obtains the minimum value.
Assuming that the starting point is anywhere on the curve, the process of finding the lowest point is gradient descent.
The direction of descent selected is the tangential direction/gradient Opposite direction/Steepest direction

The gradient is the derivative of the cost function
insert image description here

4. Learning rate (learning rate)
How much error is used for each update parameter needs to be controlled by a parameter. This parameter is the learning rate, also known as the step size.
Choosing the optimal learning rate is important because it determines whether we can quickly converge to the global minimum.
A small learning rate requires many updates to hit the bottom and takes a lot of time, and easily converges only to a local minimum too large a learning rate can
lead to drastic updates, may always be around the global minimum, but never Converges to global minimum
Optimal learning rate quickly hits bottom

insert image description here

Every time new w=old w-slope*learning rate
where slope=f guide=gradient

insert image description here
Loop iterative process: define the cost function → select the starting point → calculate the gradient → advance according to the learning rate → calculate the gradient → advance according to the learning rate → ... reach the lowest point

2. Linear Regression (Linear Regression)
Linear regression is a commonly used forecasting method in statistics and machine learning, which is used to establish a linear relationship model between an independent variable (or called a feature) and a dependent variable. It assumes a linear relationship between the independent and dependent variables and tries to make predictions by fitting a best-fit line (or hyperplane). The goal of linear regression is to find the best-fit straight line or hyperplane by minimizing the difference (error or residual) between the predicted values ​​and the actual observed values.

In simple linear regression, there is only one relationship between the independent variable and one dependent variable. This can be expressed as the equation of a straight line: y=wx+b
Through this prediction function we can get the error e=(wx+by)² The
sum of the errors is the loss function loss
insert image description here
The purpose is to find the minimum loss (error) of w and b value

insert image description here

For a system of linear equations in two variables, we usually use the subtraction method to find the values ​​of parameters b and w. This kind of solution that can be solved accurately is called a closed solution (Closed-form Solution, also called a closed solution),
insert image description here
but the actual data has errors, and we can only obtain an approximate solution,
that is, the actual y=wx+b+ε, where ε It is called Gaussian noise. Due to the existence of Gaussian noise, the data has errors.
Through multiple sets of data of x and y, the result can be closer to the Closed-form Solution
insert image description here

The following code is used to solve the binary linear equation system

Data download: Extraction code: zn73
Each row of the data point set represents a data point, the first column is the independent variable x i , and the second column is the dependent variable y i .

insert image description here

1. Calculate the error function of the linear regression model
The code iterates through each data point to calculate the error between the predicted value of the data point under the regression model and the actual observed value. The square of each error is then added to the total error totalError. Finally, by dividing the total error by the number of data points, the average error is calculated and returned.

insert image description here

Through the indexing operation points[i,0], we can get the value of the independent variable x of the i-th data point, since it is in the first column of each row (with index 0). Similarly, through points[i,1]
, We can get the value of the dependent variable y for the ith data point as it is in the second column of each row (with index 1)
b: Intercept of the regression model.
w: the slope of the regression model.
points: A collection of data points, where each data point consists of an independent variable x and a dependent variable y.

def compute_error_for_line_given_points(b,w,points):
    totalError=0
    for i in range(0,len(points)):
        x=points[i,0]
        y=points[i,1]
        totalError+=(y-(w*x+b))**2
    return totalError/float(len(points))

2. Parameter update in gradient descent
First initialize the intercept gradient b_gradient and slope gradient w_gradient to 0. Then, by iteratively traversing each data point, the gradient value corresponding to each data point is calculated for use in the next update. For each data point, the predicted value is calculated based on the current intercept and slope, and then the gradient is calculated based on the error between the predicted value and the actual observed value. Finally, the gradients of all data points are added to the total gradient and divided by the number N of data points to obtain the average gradient. Next update the intercept and slope using the update rule of gradient descent. According to the current intercept and slope values, subtract the learning rate and multiply the corresponding gradient to obtain the new intercept new_b and slope new_w. Finally, return the updated intercept and slope as a list.

insert image description here
b_current: current intercept value.
w_current: current slope value.
points: A collection of data points, where each data point consists of an independent variable x and a dependent variable y.
learningRate: Learning rate, used to control the step size of each update.

def step_gradient(b_current,w_current,points,learningRate):
    b_gradient=0
    w_gradient=0
    N=float(len(points))
    for i in range(0,len(points)):
        x=points[i,0]
        y=points[i,1]
        b_gradient+=(2*(w_current*x+b_current-y))/N  # 对b偏导
        w_gradient+=(2*(w_current*x+b_current-y)*x)/N  # 对w偏导
    new_b=b_current-learningRate*b_gradient
    new_w=w_current-learningRate*w_gradient
    return [new_b,new_w]

3. The main loop part of the gradient descent algorithm
points: a collection of data points, where each data point consists of an independent variable x and a dependent variable y.
starting_b: initial intercept value.
starting_w: The initial slope value.
learning_rate: Learning rate, used to control the step size of each update.
num_iterations: The number of iterations, indicating the number of steps to run gradient descent.

import numpy as np
def gradient_decent_runner(points,starting_b,starting_w,learing_rate,num_iterations):
    b=starting_b
    w=starting_w
    for i in range(num_iterations):
        b,w=step_gradient(b,w,np.array(points),learing_rate)
    return [b,w]  # 返回最后一次迭代结果,即最终数据

4. run

def run():
    points=np.genfromtxt("data.csv",delimiter=",")  # data.csv更换为文件的存放地址
    learning_rate=0.0001
    initial_b=0
    initial_w=0
    num_iterations=1000
    print("Starting gradient descent at b={0},w={1},error={2}".format(initial_b,initial_w,compute_error_for_line_given_points(initial_b,initial_w,points)))
    [b,w]=gradient_descent_runner(points,initial_b,initial_w,learning_rate,num_iterations)
    print("After {0} interations b={1},w={2},error={3}".format(num_iterations,b,w,compute_error_for_line_given_points(b,w,points)))

Here np.genfromtxt is a function in the NumPy library to load data from a text file and generate a NumPy array. The function can handle text data in various formats, including comma-separated value (CSV) files and files with different delimiters.
Example: There is a CSV file named 'data.csv' with data using comma as delimiter. The np.genfromtxt function will load the data of the file and generate a NumPy array, store it in the variable data, use print(data) to print the loaded data.

import numpy as np
# 从名为 'data.csv' 的 CSV 文件中加载数据
data = np.genfromtxt('data.csv', delimiter=',')
# 打印加载的数据
print(data)

full code

import numpy as np
def compute_error_for_line_given_points(b,w,points):
    totalError=0
    for i in range(0,len(points)):
        x=points[i,0]
        y=points[i,1]
        totalError+=(y-(w*x+b))**2
    return totalError/float(len(points))
def step_gradient(b_current,w_current,points,learningRate):
    b_gradient=0
    w_gradient=0
    N=float(len(points))
    for i in range(0,len(points)):
        x=points[i,0]
        y=points[i,1]
        b_gradient+=(2*(w_current*x+b_current-y))/N
        w_gradient+=(2*(w_current*x+b_current-y)*x)/N
    new_b=b_current-learningRate*b_gradient
    new_w=w_current-learningRate*w_gradient
    return [new_b,new_w]
def gradient_descent_runner(points,starting_b,starting_w,learing_rate,num_iterations):
    b=starting_b
    w=starting_w
    for i in range(num_iterations):
        b,w=step_gradient(b,w,np.array(points),learing_rate)
    return [b,w]
def run():
    points=np.genfromtxt("D:/Deep-Learning-with-PyTorch-Tutorials/lesson04-简单回归案例实战/data.csv",delimiter=",")
    learning_rate=0.0001
    initial_b=0
    initial_w=0
    num_iterations=1000
    print("Starting gradient descent at b={0},w={1},error={2}".format(initial_b,initial_w,compute_error_for_line_given_points(initial_b,initial_w,points)))
    [b,w]=gradient_descent_runner(points,initial_b,initial_w,learning_rate,num_iterations)
    print("After {0} interations b={1},w={2},error={3}".format(num_iterations,b,w,compute_error_for_line_given_points(b,w,points)))
run()

operation result

insert image description here

Guess you like

Origin blog.csdn.net/weixin_45825865/article/details/131272825