The broad and intense interest in deep learning over the past few years has inspired companies, academics, and hobbyists to develop various mature open-source frameworks to automate the repetitive work of gradient-based learning algorithms. In Section 3.2, we rely solely on

(i) Tensors for data storage and linear algebra;
(ii) Automatic differentiation for computing gradients.
In practice, since data iterators, loss functions, optimizers, and neural network layers are so common, modern libraries implement these components for us as well.

In this section, we will show you how to succinctly implement the linear regression model in Section 3.2 using the deep learning framework's high-level API.

3.3.1 Generate dataset

First, we will generate the same dataset as in Section 3.2.

import numpy as np
import torch
from torch.utils import data
from d2l import torch as d2l

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

3.3.2. Read the dataset

Instead of rolling our own iterators, we can call existing APIs in the framework to read data. We pass in features and labels as parameters, and batch_size is specified when instantiating the data iterator object. Additionally, the boolean is_train indicates whether we want the data iterator object to shuffle the data (through the dataset) every epoch.

def load_array(data_arrays, batch_size, is_train=True):  #@save
    """Construct a PyTorch data iterator."""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10
data_iter = load_array((features, labels), batch_size)

Now we can use data_iter in much the same way we called the function in Section 3.2. To verify that it is working, we can read and print the first batch of examples. In contrast to Section 3.2, here we construct a Python iterator using and get the first item from the iterator using .

next(iter(data_iter))

[tensor([[-0.4517, -0.3277],
         [-0.5566,  0.3060],
         [-0.6281, -0.2933],
         [ 0.4836, -0.8837],
         [ 0.3179, -0.4385],
         [ 0.9690,  0.4170],
         [ 0.6503, -2.3574],
         [-0.1246,  2.4129],
         [ 1.6695,  0.8556],
         [ 0.1999, -0.3050]]),
 tensor([[ 4.4221],
         [ 2.0469],
         [ 3.9289],
         [ 8.1781],
         [ 6.3392],
         [ 4.7124],
         [13.5202],
         [-4.2467],
         [ 4.6434],
         [ 5.6319]])]

3.3.3. Defining the model

When we implemented linear regression from scratch in Section 3.2, we explicitly defined the model parameters and encoded the computation using basic linear algebra operations to produce the output. You should know how to do this. But once your model gets more complex, and you have to do it almost every day, you'll be happy to get help. The situation is similar to writing your own blog from scratch. Doing it once or twice is rewarding and enlightening, but if you spend a month reinventing the wheel every time you need a blog, you're going to be a terrible web developer.

For standard operations, we can use the framework's predefined layers, which allow us to focus specifically on the layers used to build the model, rather than on the implementation. We will first define a model variable net which will refer to an instance of the Sequential class. The class Sequential defines a container for multiple layers that will be chained together. Given input data, a Sequential instance passes it through the first layer, then passes the output as the input to the second layer, and so on. In the example below, our model only contains one layer, so we don't really need Sequential. But since almost all of our future models will involve multiple layers, we'll use it anyway to familiarize you with the most standard work process.

Recall the architecture of a single-layer network, as shown in Figure 3.1.2. This layer is called a fully connected layer because each of its inputs is connected to each of its outputs via matrix-vector multiplication.
In PyTorch, fully connected layers are Lineardefined in the class. Note that we are passing two arguments to nn.Linear. The first specifies the input feature dimension, which is 2, and the second is the output feature dimension, which is a single scalar, so 1.

# `nn` is an abbreviation for neural networks
from torch import nn

net = nn.Sequential(nn.Linear(2, 1))

3.3.4. Initialize model parameters

Before using net, we need to initialize model parameters such as weights and biases in linear regression models. Deep learning frameworks usually have a predefined method to initialize parameters. Here, we specify that each weight parameter should be randomly sampled 0from 0.01a . The bias parameter will be initialized to zero.

Since we have specified the input and output dimensions at construction time nn.Linear, we can now directly access the parameters to specify their initial values. We first go through the localization layer net[0], the first layer in the network, and then access the parameters using the weight.dataand bias.datamethods . Next we use the substitution method normal_and fill_override the parameter value.

net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

tensor([0.])

3.3.5 Defining the loss function

This class MSELosscomputes the mean squared error. By default it returns the average loss of the examples.

loss = nn.MSELoss()

3.3.6 Defining the Optimization Algorithm

MinibatchStochastic gradient descent is a standard tool for optimizing neural networks, so it is PyTorchsupported along optimwith many variants of this algorithm in the module. When we instantiate an SGD instance, we specify the parameters to optimize (available through our network net.parameters()) and use a dictionary of hyperparameters required by our optimization algorithm. Mini-batch stochastic gradient descent only requires us to set the value lr, which is set here 0.03.

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

3.3.7. Training

You may have noticed that expressing our model via the high-level API of a deep learning framework requires relatively few lines of code. We don't have to assign parameters individually, define loss functions, or implement mini-batch stochastic gradient descent. Once we start working with more complex models, the benefits of the high-level API will greatly increase. However, once we have all the basic parts ready, the training loop itself is very similar to what we would do when we implemented everything from scratch.

train_data.zero_grad()Flush your memory, pytorch will record data for gradients, so need to clear: for some epoch, we will go through the following steps on the dataset (for each minibatch, we will go through:

Generate predictions by calling net(X)and compute the loss l(forward pass).
Calculate the gradient by running backpropagation.
Update model parameters by calling our optimizer.

For better measure, we calculate the loss after each epoch and print it to monitor progress.

num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X) ,y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {
      
      epoch + 1}, loss {
      
      l:f}')

epoch 1, loss 0.000325
epoch 2, loss 0.000103
epoch 3, loss 0.000101

Below, we compare the model parameters learned by training on limited data with the actual parameters of the generated dataset. To access the parameters, we first visit the layer we need, net and then the weights and biases of that layer. As with our implementation from scratch, note that our estimated parameters are close to their true counterparts.

w = net[0].weight.data
print('error in estimating w:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('error in estimating b:', true_b - b)

error in estimating w: tensor([ 0.0004, -0.0006])
error in estimating b: tensor([-0.0010])

3.3.8. General

Using PyTorch's high-level API, we can implement models more concisely.
In PyTorch, the data module provides tools for data processing, and the nn module defines a large number of neural network layers and common loss functions.
We can initialize parameters _ by replacing their values with methods ending in .

3.3.9. practise

How can we change the learning rate of the code to make it behave the same if we replace nn.MSELoss(reduction='sum')it with . nn.MSELoss()Why?

reduction ( string , optional ) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum' . 'none' : no reduction will be applied, 'mean' : the sum of the output will be divided by the number of elements in the output, 'sum' : the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction . Default: 'mean'

Check out the PyTorch documentation to see what loss functions and initialization methods are provided. Replace loss with Huber's loss.
CLASStorch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')
How do you access gradients net[0].weight?

print(net[0].weight.grad())

For more:
Search for accessing the gradient:https://d2l.ai/chapter_deep-learning-computation/parameters.html

refer to

https://d2l.ai/chapter_linear-networks/linear-regression-concise.html

Translation: 3.3. Concise implementation of linear regression in pytorch