Getting Started with pytorch 3 Gradient Propagation + Linear Regression

In the previous blog, I introduced the basic data transformation of tensor: Getting Started with Pytorch 2 Playing with tensor (viewing, extracting, transforming) . By mastering data transformation, we can proficiently use various types of data in pytorch for scientific computing, and provide data-level support for using pytorch to build neural networks. This blog will further explain how to use pytorch for gradient propagation, and implement a practical case of linear regression through gradient propagation .

1 Introduction to Gradient Propagation Mechanism

As we all know, the so-called neural network is a complex function y=f(x) with many parameters, and deep learning is to determine the parameters in the neural network from the data x and label y. This process is essentially an optimization problem. Since it involves finding the optimal parameters, it is inevitable to update the parameters. The current neural network basically implements parameter updates based on the gradient propagation mechanism.

Note: This series of blogs focuses on how to use Pytorch to implement neural networks. It does not introduce too much about some basic concepts of neural networks and the implementation logic at the mathematical level. If there are many students who do not understand these underlying ideas, it is recommended to learn some neural networks first. The basic knowledge of road network, such as watching the in-depth learning course of Mr. Wu Enda .

2 Gradient propagation using pytorch

When creating a tensor, you can specify whether it requires gradients requires_grad=True. This operation will enable PyTorch to track the calculation history of the tensor and automatically calculate the gradient.
A simple example is given below to demonstrate how to automatically calculate gradients using PyTorch. In the example we define a function y = 3 x 2 + 2 x + 1 y=3x^2+2x+1y=3x _2+2x _+1 and calculated atx = 2 x=2x=Gradient at 2 .

import torch
# 创建一个张量,设置 requires_grad=True 以开启梯度计算
x = torch.tensor(2., requires_grad=True)
# 定义一个函数
y = 3*x**2 + 2*x + 1
# 反向传播并计算梯度
y.backward()
# 查看 x 的梯度值
print(x.grad)

Running the above code, by x.gradlooking at the gradient value of x. The output result 14.0 can be obtained, that is, y = 3 x 2 + 2 x + 1 y=3x^2+2x+1y=3x _2+2x _+1 atx = 2 x=2x=The gradient at 2 is 14 1414 . (You can manually calculate and verify, find the derivative of y at x=2)

3 Introduction to linear regression example data

3.1 Dataset production

In this linear regression experiment, we prepared a set of x and y data in advance and stored them in a .csv file. This set of data is generated by our program, and it roughly conforms to y = 4 x + 3 y=4x+3y=4x _+3 , to which we add some degree of noise. The data generation program is:

import numpy as np
import pandas as pd

# 设置随机种子,以便每次运行代码时生成的随机数相同
np.random.seed(42)

# 生成随机的x值
x = np.random.rand(100)

# 计算每个x值对应的y值,并添加一些随机噪声
y = 3 * x + 4 + 0.2 * np.random.randn(100)

# 将x和y组成一个二维数组,并将其转化为DataFrame
data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])

# 将DataFrame保存为csv文件
data.to_csv('./data1.csv', index=False)

You can generate data by yourself according to the program, and call the data data1 in the subsequent linear regression program.

3.2 Dataset visualization

Visualize the data:

import matplotlib.pyplot as plt
import numpy as np

# 创建数据
points = np.genfromtxt("./data1.csv", delimiter=",")

# 画散点图
plt.scatter(points[:,0],points[:,1])

# 设置坐标轴标签
plt.xlabel('x label')
plt.ylabel('y label')

# 显示图形
plt.show()

The visualized result is:

4 Implementing linear regression using Numpy

In order to let everyone better understand the underlying logic of linear regression implementation, the following only relies on Numpy to manually implement linear programming.

4.1 loss function loss

In this linear regression experiment, we define the regression function as y = wx + by=wx+by=wx+b . In the beginning,www andbbb is randomly initialized, and then w and b are continuously updated through the gradient propagation mechanism, and finally the optimal regression result is obtained. Among them, we define Loss to measure the regression result as:L oss = ( wx + b − y ) 2 Loss=(wx+by)^2Loss=(wx+by)2 .
The Loss calculation procedure is:

def compute_error_for_line_given_points(b, w, points):
    totalError = 0
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        totalError += (y - (w * x + b)) ** 2
    return totalError / float(len(points))

4.2 Parameter update

Following the gradient propagation mechanism, the update formula of the two parameters w and b in the regression function is:
w = w − α ∂ L ∂ ww = w - \alpha\frac{\partial L}{\partial w}w=wawL

b = b − α ∂ L ∂ b b = b - \alpha\frac{\partial L}{\partial b} b=babL

Where L is the value of the loss function, α \alphaα is the learning rate.

def step_gradient(b_current, w_current, points, learningRate):
    b_gradient = 0
    w_gradient = 0
    N = float(len(points))
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        b_gradient += -(2/N) * (y - ((w_current * x) + b_current))
        w_gradient += -(2/N) * x * (y - ((w_current * x) + b_current))
    new_b = b_current - (learningRate * b_gradient)
    new_m = w_current - (learningRate * w_gradient)
    return [new_b, new_m]
    

4.3 Iterative update

Through the loop, continuously calculate Loss and update www andbbb , finally determine the optimalwww andbbb as the final result of linear regression.

def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_m
    for i in range(num_iterations):
        b, m = step_gradient(b, m, np.array(points), learning_rate)
    return [b, m]
    

4.4 Overall Implementation

The complete program of linear regression based on numpy is as follows:

import numpy as np

# y = wx + b
def compute_error_for_line_given_points(b, w, points):
    totalError = 0
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        totalError += (y - (w * x + b)) ** 2
    return totalError / float(len(points))

def step_gradient(b_current, w_current, points, learningRate):
    b_gradient = 0
    w_gradient = 0
    N = float(len(points))
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        b_gradient += -(2/N) * (y - ((w_current * x) + b_current))
        w_gradient += -(2/N) * x * (y - ((w_current * x) + b_current))
    new_b = b_current - (learningRate * b_gradient)
    new_m = w_current - (learningRate * w_gradient)
    return [new_b, new_m]

def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_m
    for i in range(num_iterations):
        b, m = step_gradient(b, m, np.array(points), learning_rate)
    return [b, m]

def run():
    points = np.genfromtxt("./data1.csv", delimiter=",")
    learning_rate = 0.1
    initial_b = 0 # initial y-intercept guess
    initial_m = 0 # initial slope guess
    num_iterations = 1000
    print("Running...")
    [b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
    print("After {0} iterations b = {1}, m = {2}, error = {3}".
          format(num_iterations, b, m,
                 compute_error_for_line_given_points(b, m, points))
          )

if __name__ == '__main__':
    run()
    

5 Implementing Linear Regression Using Pytorch

As can be seen from the fourth part, a simple linear regression requires writing a lot of code. This strategy is obviously not advisable for building more complex neural networks. Pytorch was born to solve this problem. It is convenient for users to quickly call various functional functions of the built-in neural network, saving a lot of basic programming time, so that users can devote their main energy to higher-level algorithm design.

5.1 Overall procedure

import torch

#设置随机种子
torch.manual_seed(42)

# 定义模型
class LinearRegressionModel(torch.nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # 使用一个线性层 transformation: y = wx+b

    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

def run():
    #1 导入数据集
    points = np.genfromtxt("./data1.csv", delimiter=",")
    x = points[:, 0]
    y = points[:, 1]
           # 将NumPy数组转换为张量
    x_data = torch.from_numpy(x.reshape(100,1)).float()
    y_data = torch.from_numpy(y.reshape(100,1)).float()

    #2 定义线性回归模型
    model = LinearRegressionModel()

    #3 定义损失函数和优化器
    criterion = torch.nn.MSELoss(reduction='mean')
    optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

    #4 训练模型
    num_iterations = 1000
    print("Running...")
    for epoch in range(num_iterations):
        # 前向传播
        y_pred = model(x_data)
        # 计算loss
        loss = criterion(y_pred, y_data)
        # 梯度清零
        optimizer.zero_grad()
        # 反向传播
        loss.backward()
        # 更新参数
        optimizer.step()

    #5 打印训练结果
        #将torch中的tensor矩阵参数转为numpy类型
    m_martix = model.linear.weight.detach().numpy()
    b_martix = model.linear.bias.detach().numpy()
    m = m_martix[0][0]
    b = b_martix[0]
    print("After {0} iterations b = {1}, m = {2}, error = {3}".
          format(num_iterations, b, m, loss.item())
          )

if __name__ == '__main__':
    run()
    

5.2 Explanation of key parts

5.2.1 Linear layer torch.nn.Linear

class torch.nn.Linear(in_features: int, out_features: int, bias: bool = True): This module should be used to build a simple linear neural network. In order to have learnable parameters (weights and bias terms), specify the size of the input and output in the constructor.
parameter

  • in_features: the size of the input
  • out_features: the size of the output
  • bias: If set to False, the layer will not learn to add bias. Default value: True
    In this program, our input and output are both a data (one-dimensional scalar), so we use torch.nn.Linear(1, 1)to construct a linear regression function.

5.2.2 Loss functions and optimizers

Compared with the previous manual implementation of L oss = ( wx + b − y ) 2 Loss=(wx+by)^2Loss=(wx+by)2. Loss calculation. Pytorch provides various integrated loss functions. For example, the above program callstorch.nn.MSELoss(reduction='mean')a loss function based on the mean square error;
when the linear regression program implemented by Numpy processes the parameter update part, the learning rate The step size is locked, but Pytorch provides some better processing strategies to adjust the learning rate in real time so that the network can converge faster and achieve global optimality. In this program, we use methods to speed up network convergenceAdam. .

5.3 Pytorch neural network programming paradigm

Although the above program only implements a linear regression function, it already has most of the components for Pytorch to build a neural network. Let's take the above program as an example to summarize and sort out the general programming paradigm of using Pytorch to build neural networks. By mastering this paradigm, you can complete the writing of various complex neural networks . Friends who want to quickly learn the pytorch neural network program can focus on it!

  1. __init__()Prepare data: load the data set into the memory to define the network structure: define each layer of the model (such as convolution, pooling, full connection, etc.) in the function of the network model Class class and set the input and output data size; forward()in the function Each layer is spliced ​​to complete the construction of the network
  2. Initialize network parameters: preprocess the input data and initialize the weights and biases of the network
  3. Define the loss function: choose an appropriate loss function to measure the gap between the performance of the network and the training data
  4. Define the optimizer: update the weights and biases of the model by choosing an appropriate optimization algorithm to improve the performance of the model
  5. Train the network: feed the training data into the network for training, and calculate the loss and update the model parameters after each batch
  6. Test network: Use test data to evaluate the performance of the trained neural network and record the results (the program above this part is not shown)

6 Comparison of experimental results

In the linear regression implemented by the above two methods, we set the number of iterations to 1000 to compare the fitting results.
The result of running based on numpy is:

Running…
After 1000 iterations b = 4.043019455022063, m = 2.9080449128447525, error = 0.032263382558699455

The result based on Pytorch is:

Running…
After 1000 iterations b = 4.04301643371582, m = 2.908050060272217, error = 0.0322633795440197

From the dataset generation part we know the correct www andbbb should be close to 4 and 3 (because of the noise in the data, it is impossible to directly equal this number). From the results, the linear regression programs implemented by the two methods are close to this result, indicating that the program itself has the fitting ability we want to see.

7 summary

Gradient propagation is a general method for neural networks to update parameters. It is through gradient propagation that the network has the ability to learn. Pytorch provides a perfect gradient propagation calling method. Although we rarely use this gradient propagation function alone in actual programming, understanding its underlying logic and some necessary calling skills will deepen our understanding of the network and improve our programming ability. Finally, the Pytorch-based linear regression program implemented in the blog is very representative. It covers almost all the steps necessary to build a neural network from scratch and is worth learning.

Guess you like

Origin blog.csdn.net/qq_44949041/article/details/130572127