Implementation of linear regression model for deep learning (principle + pytorch code implementation)

Table of contents

introduction

1. Principle

2. Actual combat

2.1. Generate sample data set

2.2. Load the dataset

2.3. Create a model

2.4. Define the loss function

2.5. Define the optimization algorithm

2.6. Training model

introduction

        I think the linear model can be regarded as the beginning of deep learning. Through this model, we can deeply understand how deep learning trains the model, and what the general process of deep learning training a model is like. Master the routine , we can take pictures of cats and tigers and build models according to our own ideas.

1. Principle

        The so-called linear regression means that we assume that the relationship between the independent variable x and the dependent variable y is linear, and y can be obtained by the weighted sum of the elements in x. For example, if we assume that x is an n*1-dimensional vector containing (x1, x2, ..., xn), each element xi corresponds to a weight wi, then our y can be expressed as: y=x1 x w1 + x2 x w2 + ... + xn x wn +b, this b can be understood as the intercept of the linear model, which we call the offset, which is just a simple translation effect. Several attributes of a sample of xi here, adding that we have multiple samples such as n samples, each sample has m features, then each sample can be visited by row to form a matrix X with n rows and m columns , each row of the matrix corresponds to a sample.

Here we can write the above formula in the form of matrix multiplication:

       The number of columns of X in the above formula corresponds to the number of rows of w. If you don’t understand, you can supplement the knowledge of matrix multiplication. b here can be automatically transformed into the same number of lines as w through the broadcast mechanism.

        Then from the above formula, we can connect to the deep learning neural network, the input is composed of x1-xn, the output is only one y, each xi is connected to y, and a weight wi is added between the connected neurons, thus Now we can construct our neural network linear regression model.

        We know that the job of the neural network is to predict, that is, we give a sample input, and he predicts the sample label. Through the gap between the predicted value and the real value of the sample, we can replace this gap with a loss. Through the loss value, we can optimize our parameters. This parameter refers to our wi and b, because our output prediction The value is calculated from a series of wi and b, so our loss value also contains these parameters. Then what we hope is that this loss value should be small enough, that is, the gap between our predicted value and the real value of the sample is very small, which means that our predicted value is more accurate and accurate. So how to make this loss value small enough, our commonly used method is gradient descent, that is, we propagate the loss value in the direction of derivation, calculate the gradient of the parameter, which is what we call derivation of the parameter, and then our The parameters are updated so that our linear model can more accurately fit the true value of our sample.

2. Actual combat

        After clarifying the above ideas, we can manually type the code to realize our above ideas to verify its rationality. Practice is the only criterion for testing truth, and only when theory and practice are used together can we truly understand. Not much to say, let's implement it step by step and branch by branch.

2.1. Generate sample data set

        In the following study, we will use some ready-made data, such as Minist, fashion-Minist, CIAFAR-10 and other data sets. For the linear regression model here, we define some data sets ourselves.

        Our definition is simpler. If the weight w has only two values, true_w=[4.0, 5.0], bias true_b=[3.0], we can only generate our sample labels if we define this w and b. Here we can Think of ture_w and true_b as our real weights and biases, and later we will use the neural network to fit the approximate values ​​of these parameters. With weights and biases, we can generate our sample labels.

        Randomly initialize several of our samples through the torch.normal() function. The function of this function is to give us the mean and standard deviation, and the shape of the returned tensor. You can refer to this blog: torch.normal() small fresh hee hee invincible blog-CSDN blog torch.normal()

        The predicted value we get is through matrix multiplication, and we need to use the torch.matmul() function, which can realize matrix multiplication of tensor type data. You can refer to this blog: torch.matmul() usage introduces how many blogs there will be tomorrow-CSDN Blog_torch.matmul

The corresponding code is as follows:

import random
import torch
# 第一步:构造一个自定义数据集
true_w = torch.tensor([4.0, 5])                 #定义我们的真实的权值
true_b = torch.tensor([3.0])                    #定义我们的真实的偏置
data_num = 1000                                 #假设我们生成1000个样本
def data_create(w, b, data_num):                #定义一个函数,用于生成我们的样本
    X = torch.normal(0, 1, (data_num, len(w)))  #注意数据集X的形状,行数就是我们的样本数,列数是权值的行数
    Y = torch.matmul(X, w) + b                  #注意数据集标签的形状,它和X的行数保持一致
    Y += torch.normal(0, 0.01, Y.shape)         #给标签向量Y添加一个噪声,均值为0,标准差为0.01,这个并无大碍
    return X, torch.reshape(Y, (-1, 1))         #将我们得到的数据集的特征和标签返回。
X, Y = data_create(true_w, true_b, data_num)    #调用一下我们的函数
for i in range(5):                              #查看X,Y的前五个数据
    print(f'X[{i}]={X[i]}, Y[{i}]={Y[i]}')      #打印输出

         The result is as follows:

2.2. Load the dataset

        After we create the data set, we can load the generated data set in batches, that is, we take several samples from the data set at a time, if the order in which we take samples is out of order.

        The function used here is random.shuffle(), the main function of which is to randomly shuffle the elements in the list.

        yield is equivalent to an iterable object, and each read is executed from the current position down. You can refer to this blog: Detailed explanation of the usage of yield in python - the simplest and clearest explanation Feng Shuanglang's blog-CSDN blog python yield

        The corresponding code is as follows:

# 第二步:按批次从我们创造的数据集中随机的产生一个小批次的数据集
def data_loader(X, Y, batch_size):            #定义函数,参数为样本数据集,样本标签集Y, 每次取数据的数量batch_size
    data_length = len(X)                      #首先获得我们的数据集的大小
    print('数据长度data_length=', data_length)  
    data_list = list(range(data_length))      #生成每一个数据的编号列表,范围从0—数据集的长度
    print('数据列表data_list=',data_list)
    random.shuffle(data_list)                 #打乱我们的这个编号,以保证编号是无序的,我们按照顺序从列表中取出的编号是无序的。
    print('打乱后的数据列表data_list=', data_list)
    for i in range(0, data_length, batch_size): #遍历数据集的长度,步长为batch_size
        data_index = torch.tensor(data_list[i:min(i+batch_size, data_length)])#用min是保证我们最后一次如何数据集中的数据没有batch_size这么多的话则取到数据集的最大长度即可,得到了一个batch_size大小的列表。
        yield X[data_index], Y[data_index]      #可迭代对象,每次只返回一个batch_size大小的数据集样本和标签
    for X, Y in data_loader(X, Y, 2):           #打印结果
        print('X=', X)
        print('Y=', Y)    

        The output is as follows:

2.3. Create a model

        Once we have the data, we can create our model, which is used to predict the result y after calculation based on the input characteristics of the sample. Since we read the sample data according to the batch batch_size each time, we also input so many samples into the model every time, and the obtained y is also a vector composed of so many samples.

        Here we need to initialize our model parameter weights and biases, and the next task is to optimize our parameters. In pytorch, there is a way to record the gradient of the parameters, that is, when we perform calculations on this parameter, the system will automatically record the calculation graph of these parameters, and allocate memory for subsequent gradient calculations until the calculation results are obtained. We only need to explicitly declare requires_grad=True when initializing the parameters. You can refer to this article: Pytorch's requires_grad_ Zed's blog - CSDN blog requires_grad

        The corresponding code is as follows:

# 第三步:定义模型(线性模型)
w = torch.normal(0, 0.1, (2, 1), requires_grad=True)  # 随机的初始化我们的权值
b = torch.zeros(1, requires_grad=True)                # 初始化b为0
def model(X, w, b):                                   # 定义我们的神经网络模型
    return torch.matmul(X, w) + b                    # 返回值是我们经过计算之后输出的预测值Y向量

2.4. Define the loss function

        There are many ways to calculate the loss function. Here we use the square difference loss to calculate the loss function, that is, the difference between the predicted value y_hat and the real value y is then squared. Don't worry about dividing by 2. This is just for the convenience of derivation and will not affect anything.

        The corresponding code is as follows:

# 第四步:定义损失函数
def squre_loss(y_hat, Y):                                  #函数传入预测值和真实值
    return (y_hat - torch.reshape(Y, y_hat.shape))**2 / 2  #返回的是平方损失,这里我们没有除以样本数,返回的是一个向量,向量中的元素由每个样本的平方损失组成。

2.5. Define the optimization algorithm

        The so-called optimization algorithm is how we update our parameters w and b according to the current loss value, so that we can better optimize our model and output the predicted value more accurately. Here we use the gradient descent method to optimize our parameters w and b, that is, to find the derivative. For the use of torch.no_grad(), you can refer to this article: [pytorch series] with torch.no_grad(): detailed explanation of the usage of Daheishan monastery blog-CSDN blog no_grad , to put it bluntly, it is not necessary to record the calculation graph when backpropagating .

        The corresponding code is as follows:

# 第五步:定义优化算法  (即如何更新我们的参数)
# 这里我们需要着重理解,我们在进行参数更新的时候,不需要记录参数的梯度,更新完之后我们还需把当前的梯度设置为0
def sgd(params, lr, batch_size):                    #传入我们需要更新的参数,学习率lr也就是我们更新的步长,批量大小batch_size
    with torch.no_grad():                           #反向传播不需要梯度。
        for param in params:
            param -= lr * param.grad / batch_size   #进行参数w和b的更新
            param.grad.zero_()                      #将当前参数的梯度设置为0,以防止后面的梯度更显时造成梯度的累加

2.6. Training model

        Training the model is our core work. After the previous modules are declared and defined, they need to be used here.

The mean() function is used for summing, you can view this article: numpy mean() function explained Vic_Hao's blog - CSDN blog mean

# 第六步:训练我们的模型
# 定义我们的超参数lr,数据集循环次数epoc,每批量大小batch_size
lr = 0.01
epoc = 5
batch_size = 10
for i in range(epoc):
    for trains, targets in data_loader(X, Y, batch_size=batch_size): #获得每一批量的训练数据和标签
        loss = squre_loss(model(trains, w, b), targets).sum()       #送入网络模型计算损失值,损失值这里我们进行了求和,即计算的是一个批量的整体损失。
        loss.backward()                                             #根据损失进行反向传播,求参数梯度
        sgd([w, b], lr, batch_size)                                 #根据梯度更新我们的参数w和b
    with torch.no_grad():                                           #循环一轮数据集之后,我们测试当前的模型在整个数据集上的整体损失
        global_loss = squre_loss(model(X, w, b), Y)
        print(f'第{i}轮训练,整体损失为:{global_loss.mean()}')
    print(f'第{i}轮循环之后,w和真实值之间的误差为:{w.reshape(true_w.shape) - true_w}')
    print(f'第{i}每一轮循环之后,b和真实值之间的误差为:{float(b - true_b)}')

        The result is:

         It can be seen that our loss is very small, and the error between our optimized w and b and the true_w and True_b defined at the beginning of our article is very small.

The above is the creation of the linear regression model. To really understand the above program, you need to understand the theory first to better understand the program.

Guess you like

Origin blog.csdn.net/BaoITcore/article/details/124752745