Deep Learning Record 1 (Implementation of Linear Regression)

1. Overall thinking

According to the definition of linear regression,

, establish a linear regression model , and use L2 Loss (mean square error) in the calculation of the loss function. At the same time, stochastic gradient descent is used for model optimization.

2. Detailed code analysis

import random
import torch
from d2l import torch as d2l
def synthetic_data(w, b, num_examples):
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))
 
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

Since the data set was not used, the correctness of the model was verified by generating data by itself. The function of the synthetic_data function is to generate a random data set according to the relevant parameters. The input parameters of the function are the real w, b and the number of samples required. Through this function and the defined w, b, you can follow

Generate relevant data with a total sample size of 1000. X is generated according to a discrete normal distribution, each row of which is a sample, and each column is a feature value. y is generated according to the value of X. In order to make the data set close to the real situation, a small disturbance is added to it using a discrete normal distribution.

The return value is the generated sample and the corresponding label. (Note: Under this method,)

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(indices[i:min(i+batch_size, num_examples)])
    yield features[batch_indices], labels[batch_indices]
 
batch_size = 10

features and labels are essentially lists, so in order to realize random reading of data, you only need to scramble the labels when reading the list, and at the same time, you can ensure that all data are read. The data in the indices is arranged in order of natural numbers from 0 to num_examples-1, and the indices are randomly shuffled by using random.shuffle(). Use the for loop to complete the cutting of the indices to achieve the purpose of outputting the number of batch_size samples each time. (Note: This function uses a generator to read batch_size data each time it is called, instead of outputting all at once)

w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

Initialize the parameters to be requested. Since there are two w to be requested and one output result, the size of w is (2, 1). Since gradient calculation is required later, requires_grad=True.

def linreg(X, w, b): 
    """线性回归模型。"""
    return torch.matmul(X, w) + b
 
def squared_loss(y_hat, y): 
    """均方损失。"""
    return (y_hat - y.reshape(y_hat.shape))**2 / 2
 
def sgd(params, lr, batch_size):  
    """小批量随机梯度下降。"""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

Define the linear regression model, the mean square loss, and the stochastic gradient descent function. When calculating the loss, according to the formula, it should be divided by the batch_size. Since the calculation is linear, it is not calculated when the loss is calculated or when the parameters are updated. affect the final result. Since the gradient is accumulated in pytorch, the saved gradient should be cleared after each parameter update.

lr = 0.03 #学习率
num_epochs = 3
net = linreg
loss = squared_loss
 
for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # `X`和`y`的小批量损失
        # 因为`l`形状是(`batch_size`, 1),而不是一个标量。`l`中的所有元素被加到一起,
        # 并以此计算关于[`w`, `b`]的梯度
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

l.sum() is needed here because the y obtained when calculating the loss for each batch is a column vector. According to backpropagation, it is necessary to add and divide the output obtained by calculating all the current samples by the sample Calculate the result of the number. Under this method, the number of update parameters is 3*(1000/10).

 

Guess you like

Origin blog.csdn.net/zhaomengsen/article/details/131352563