PyTorch implements basic linear regression

Reference article for linear regression theoretical knowledge: Linear Regression

Below we will implement the entire linear regression method from scratch, including dataset generation, model, loss function and mini-batch stochastic gradient descent optimizer.

1.Import

%matplotlib inline
import random
import torch
from d2l import torch as d2l

2. Generate data set

We will generate a dataset containing 1000 samples, each containing 2 features sampled from the standard normal distribution. Our synthetic data set is a matrix X ∈ R 1000 × 2 \mathbf{X}\in \mathbb{R}^{1000 \times 2}XR1000 × 2 .
Using linear model parametersw = [ 2 , − 3.4 ] ⊤ \mathbf{w} = [2, -3.4]^\topw=[2,3.4] b = 4.2 b = 4.2 b=4.2 and the noise termϵ \epsilonϵ generates a data set and its labels:y = X w + b + ϵ . \mathbf{y}= \mathbf{X} \mathbf{w} + b + \mathbf\epsilon.y=Xw+b+ϵ _
ϵ \epsilonϵ can be viewed as the potential observation error in model prediction and labeling. Here we believe that the standard assumption holds, that is,ϵ \epsilonϵ follows a normal distribution with mean 0. To simplify the problem, we set the standard deviation to 0.01.

def synthetic_data(w, b, num_examples):  #@save
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))  #该函数返回从单独的正态分布中提取的随机数的张量 normal(mean, std, size)
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

Each row in features contains a two-dimensional data sample, and each row in labels contains a one-dimensional label value (a scalar). This
can be visually observed by generating a scatter plot of the second feature features[:, 1] and labels. to the linear relationship between the two.

d2l.set_figsize()
d2l.plt.scatter(features[:, 1].detach().numpy(), labels.detach().numpy(), 1);

Insert image description here

3. Read the data set

When training a model, we iterate over the data set, taking a small batch of samples at a time, and use them to update our model. Since this process is the basis for training machine learning algorithms, it is necessary to define a function that shuffles the samples in the dataset and obtains the data in small batches.
Define a data_iter function that receives the batch size, feature matrix, and label vector as input and generates a mini-batch of size batch_size. Each mini-batch contains a set of features and labels.

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))  # 生成下标
    # 这些样本是随机读取的,没有特定的顺序
    random.shuffle(indices)  # 把下标随机打乱,用随机的顺序访问样本 
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)]) # 得到batch_size大小个的随机下标
        yield features[batch_indices], labels[batch_indices]   # torch.Tensor 张量的下标可以是一个数组

When we run iterations, we continuously obtain different mini-batches until we have traversed the entire data set.

4.Initialize model parameters

Initialize the weights by sampling random numbers from a normal distribution with mean 0 and standard deviation 0.01, and initialize the biases to 0.

w = torch.normal(0, 0.01, size=(2,1), requires_grad=True) # pytorch自动计算梯度
b = torch.zeros(1, requires_grad=True)

After initializing the parameters, our task is to update these parameters until they adequately fit our data. Each update requires computing the gradient of the loss function with respect to the model parameters. With this gradient, we can update each parameter in the direction of reducing the loss. Because calculating gradients manually is tedious and error-prone, no one calculates gradients manually.

5. Define the model

def linreg(X, w, b):  #@save
    """线性回归模型"""
    return torch.matmul(X, w) + b # 广播机制

6.Loss function

def squared_loss(y_hat, y):  #@save
    """均方损失"""
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

7. Optimization algorithm

At each step, a mini-batch randomly drawn from the dataset is used, and the gradient of the loss is calculated based on the parameters. Next, update our parameters in the direction of reducing the loss. The following function implements mini-batch stochastic gradient descent updates. This function accepts as input a set of model parameters, learning rate, and batch size. The size of each update step is determined by the learning rate lr. Because the loss we compute is the sum of a batch of samples, we normalize the step size by the batch size (batch_size) so that the step size does not depend on our choice of batch size.

def sgd(params, lr, batch_size):  #@save
    """小批量随机梯度下降"""
    with torch.no_grad():
        #torch.no_grad上一个上下文管理器,在你确定不需要调用Tensor.backward()时
        #可以用torch.no_grad来屏蔽梯度计算
        #在被torch.no_grad管控下计算得到的tensor,它的requires_grad就是False
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()  # 梯度清零

8.Training

In each iteration cycle (epoch), we use the data_iter function to iterate through the entire data set and use all samples in the training data set once (assuming the number of samples is evenly divisible by the batch size). The number of iteration cycles num_epochs and the learning rate lr here are both hyperparameters, set to 3 and 0.03 respectively. Setting hyperparameters is tricky and requires tuning through trial and error.

lr = 0.03  # 可以尝试不同的学习率
num_epochs = 3
net = linreg   # 定义模型
loss = squared_loss   # 定义损失

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # X和y的小批量损失
        # 因为l形状是(batch_size,1),而不是一个标量。l中的所有元素被加到一起,
        # 并以此计算关于[w,b]的梯度
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)  # 用更新过的参数计算损失
        print(f'epoch {
      
      epoch + 1}, loss {
      
      float(train_l.mean()):f}')

Insert image description here

Guess you like

Origin blog.csdn.net/Luo_LA/article/details/128670531