Hands-on Deep Learning Notes (3) - Linear Regression Code Implementation

After understanding the key ideas of linear regression, it is time to implement linear regression with code.

1.1 Implementation from scratch

While modern deep learning frameworks can automate almost all of this, implementing it from scratch ensures that you really know what you're doing. At the same time, understanding the more detailed working principle will facilitate us to customize the model, custom layer or custom loss function.

%matplotlib inline
import random
import torch
from d2l import torch as d2l

1.1.1 Generate dataset

Generate a dataset with 1000 samples, each sample contains 2 features sampled from a standard normal distribution.

def synthetic_data(w, b, num_examples):  
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

Each row in features contains a two-dimensional data sample, and each row in labels contains a one-dimensional label value (a scalar).

print('features:', features[0],'\nlabel:', labels[0])
features: tensor([ 0.6631, -0.7805])
label: tensor([8.1842])

By generating a scatter plot of the second feature features[:, 1] and labels, the linear relationship between the two can be visually observed.

d2l.set_figsize()
d2l.plt.scatter(features[:, (1)].detach().numpy(), labels.detach().numpy(), 1);

insert image description here

1.1.2 Read the dataset

Define a function that shuffles the samples in the dataset and fetches the data in mini-batches.
The data_iter function, which receives batch size, feature matrix, and label vector as input, generates mini-batches of size batch_size. Each mini-batch contains a set of features and labels.

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 这些样本是随机读取的,没有特定的顺序
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]
batch_size = 10

for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break

output:

tensor([[-0.8841,  0.0372],
        [-0.3387,  0.3164],
        [ 0.3212,  2.0915],
        [ 0.4819, -1.2344],
        [-0.2791, -0.1832],
        [ 1.6380,  0.6086],
        [ 0.7341,  1.1638],
        [ 1.0234, -1.5223],
        [-2.6958,  0.1999],
        [-1.5663, -2.0430]]) 
 tensor([[ 2.2757],
        [ 2.4552],
        [-2.2666],
        [ 9.3455],
        [ 4.2686],
        [ 5.4153],
        [ 1.6975],
        [11.4085],
        [-1.8559],
        [ 8.0139]])

1.1.3 Initializing model parameters

The weights are initialized by sampling random numbers from a normal distribution with mean 0 and standard deviation 0.01, and biases are initialized to 0.

w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

After initializing the parameters, our task is to update these parameters until these parameters are sufficient to fit our data.

1.1.4 Defining the model

To compute the output of the linear model, we simply compute the matrix-vector multiplication of the input features X and the model weights w followed by the bias b. Xw is a vector and b is a scalar. When we add a scalar to a vector, the scalar is added to each component of the vector.

def linreg(X, w, b):  
    """线性回归模型"""
    return torch.matmul(X, w) + b

1.1.5 Define the loss function

Use a squared loss function.

def squared_loss(y_hat, y):  
    """均方损失"""
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

insert image description here

1.1.6 Defining the Optimization Algorithm

At each step, a mini-batch randomly drawn from the dataset is used, and then the gradient of the loss is calculated according to the parameters. Next, update our parameters in the direction of reducing the loss. The following function implements mini-batch stochastic gradient descent updates. The function accepts a set of model parameters, learning rate, and batch size as input.

def sgd(params, lr, batch_size):  
    """小批量随机梯度下降"""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

1.1.7 Training

In each iteration, we read a small batch of training samples and pass our model to obtain a set of predictions. After calculating the loss, we start backpropagation, storing the gradient of each parameter. Finally, we call the optimization algorithm sgd to update the model parameters. To recap, we will execute the following loop:

  • Initialization parameters
  • Repeat the following workouts until done
    insert image description here

Use the data_iter function to traverse the entire dataset and use all samples in the training dataset once (assuming the number of samples is divisible by the batch size). The number of iterations num_epochs and the learning rate lr are both hyperparameters, set to 3 and 0.03, respectively.

lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # X和y的小批量损失
        # 因为l形状是(batch_size,1),而不是一个标量。l中的所有元素被加到一起,
        # 并以此计算关于[w,b]的梯度
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {
      
      epoch + 1}, loss {
      
      float(train_l.mean()):f}')
epoch 1, loss 0.040067
epoch 2, loss 0.000148
epoch 3, loss 0.000049
print(f'w的估计误差: {
      
      true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {
      
      true_b - b}')
w的估计误差: tensor([ 0.0005, -0.0003], grad_fn=<SubBackward0>)
b的估计误差: tensor([-0.0001], grad_fn=<RsubBackward1>)

1.2 Concise implementation

Cite some open source frameworks that automate repetitive tasks in gradient-based learning algorithms. Such as data iterators, loss functions, optimizers and neural network layers, etc.

1.2.1 Generate dataset

import numpy as np
import torch
from torch.utils import data
from d2l import torch as d2l

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

1.2.2 Read the dataset

def load_array(data_arrays, batch_size, is_train=True):  
    """构造一个PyTorch数据迭代器"""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10
data_iter = load_array((features, labels), batch_size)

Construct a Python iterator with iter, and use next to get the first item from the iterator.

next(iter(data_iter))
[tensor([[ 0.3142, -0.9088],
         [ 0.3454, -0.0573],
         [ 0.6141,  0.2420],
         [-0.6898,  1.4459],
         [ 0.8067, -0.3340],
         [ 0.4517, -0.0349],
         [-0.0894,  1.7150],
         [-0.2578, -1.3239],
         [ 1.8576, -0.1634],
         [-0.1818, -2.7210]]),
 tensor([[ 7.9111],
         [ 5.0854],
         [ 4.6106],
         [-2.0876],
         [ 6.9367],
         [ 5.2169],
         [-1.8110],
         [ 8.1817],
         [ 8.4688],
         [13.1087]])]

1.2.3 Defining the model

For standard deep learning models, we can use the framework's predefined layers. This allows us to focus only on which layers are used to construct the model, rather than on the implementation details of the layers. We first define a model variable net, which is an instance of the Sequential class. The Sequential class concatenates multiple layers together. When given input data, the Sequential instance passes the data into the first layer, then uses the output of the first layer as the input of the second layer, and so on. In the example below, our model contains only one layer, so Sequential is not actually needed. But since almost all models in the future are multi-layered, using Sequential here will familiarize you with "standard pipelines".

In PyTorch, fully connected layers are defined in the Linear class. It's worth noting that we pass two parameters into nn.Linear. The first specifies the input feature shape, which is 2, and the second specifies the output feature shape, which is a single scalar, so 1.

# nn是神经网络的缩写
from torch import nn
net = nn.Sequential(nn.Linear(2, 1))

1.2.4 Initialize model parameters

Before using net, we need to initialize the model parameters. Such as weights and biases in linear regression models. Deep learning frameworks usually have predefined methods to initialize parameters. Here, we specify that each weight parameter should be randomly sampled from a normal distribution with mean 0 and standard deviation 0.01, and the bias parameter will be initialized to zero.

net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

1.2.5 Define the loss function

The mean squared error is calculated using the MSELoss class, also known as the squared L2 norm. By default it returns the average of all sample losses.

loss = nn.MSELoss()

1.2.6 Defining the Optimization Algorithm

The mini-batch stochastic gradient descent algorithm is a standard tool for optimizing neural networks, and PyTorch implements many variants of this algorithm in the optim module. When we instantiate an SGD instance, we specify the parameters to optimize (available from our model via net.parameters() ) and a dictionary of hyperparameters required by the optimization algorithm. Mini-batch stochastic gradient descent only needs to set the lr value, which is set to 0.03 here.

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

1.2.7 Training

In each iteration cycle, we will traverse the dataset (train_data) completely, and keep getting a mini-batch of inputs and corresponding labels from it. For each mini-batch, we do the following steps:

  • Generate predictions by calling net(X) and compute the loss l (forward propagation).
  • Gradients are computed by doing backpropagation.
  • Update model parameters by calling the optimizer.

To better measure the training effect, we calculate the loss after each epoch and print it to monitor the training process.

num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X) ,y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {
      
      epoch + 1}, loss {
      
      l:f}')
epoch 1, loss 0.000275
epoch 2, loss 0.000107
epoch 3, loss 0.000108

Compare the real parameters of the generated dataset with the model parameters obtained by training with limited data. To access the parameters, we first access the desired layer from net, then read the weights and biases of that layer. As in the implementation from scratch, our estimated parameters are very close to the true parameters of the generated data.

w = net[0].weight.data
print('w的估计误差:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('b的估计误差:', true_b - b)
w的估计误差: tensor([-0.0002, -0.0001])
b的估计误差: tensor([0.0014])

Guess you like

Origin blog.csdn.net/qq_52118067/article/details/122563903