Hands-on deep learning [1] - linear regression

Hands-on Deep Learning Website: Hands-on Deep Learning

Note: This part only briefly introduces the basic knowledge and attaches a complete code implementation. For more information, please refer to the above URL.

brief description

Required preliminaries

  • partial derivative of mathematics
  • Linear Algebra

linear model

Regression is a class of methods that can model the relationship between one or more independent variables and a dependent variable. In the natural and social sciences, regression is often used to represent the relationship between inputs and outputs.

Linear regression is based on several simple assumptions: First, it is assumed that the relationship between the independent variable x and the dependent variable y is linear, that is, y can be expressed as the x-weighted sum of the elements in the medium, where some noise of the observation value is usually allowed to be included; secondly , we assume that any noise is relatively normal, such as the noise follows a normal distribution.
Given a dataset, the goal of linear regression is to find the weights w and biases b of the model. The formula is:
insert image description here
Put the weight into a vector and become:
insert image description here

loss function

Use squared error:
insert image description here
where y is the real data.
Expanding the above formula, it is:
insert image description here
the ultimate goal is to find a set of parameters (w, b) that can minimize the total loss on all training samples, namely:
insert image description here

Optimization

During the training process, it is necessary to continuously optimize w and b so that the final loss is minimized, which requires an optimization method, the commonly used gradient descent method.
For linear regression, the form of parameter update is:
insert image description here
Because before updating the parameters each time, we must traverse the entire data set. Therefore, **We usually randomly sample a small batch of samples every time we need to calculate an update. This variant is called mini-batch stochastic gradient descent. **As shown in the figure above, B is the number of small batch samples selected, and n is the learning rate.

Expanding the above formula, it becomes:
insert image description here

the code

Implement linear regression from scratch

1. Generate a dataset

We generate a dataset with 1000 samples, each containing 2 features sampled from a standard normal distribution.

import torch
import random
from d2l import torch as d2l
# 生成数据集
def generate_data(w,b,num_examples):
	# 生成正太分布的数据
    X = torch.normal(0,1,(num_examples,len(w)))
    # 进行矩阵乘法
    y = torch.matmul(X,w) + b
    y += torch.normal(0,0.01,y.shape)
    return X,y.reshape((-1,1))
# 真实的w
true_w = torch.tensor([2,-3.4])
# 真实的b
true_b = 4.2
# features shape: N * len(W)
# lables shape: N
features,labels = generate_data(true_w,true_b,1000)
print('features:',features[0],'\nlabel:',labels[0])
# 展示生成的数据
d2l.set_figsize()
d2l.plt.scatter(features[:,1].detach().numpy(),labels.detach().numpy(),1)

2. Read data operation

Define a data_iter function that takes as input a batch size, a feature matrix, and a label vector, and generates a mini-batch of size batch_size.

# 小批量读取数据集
def data_iter(batch_size,features,lables):
    # 获取第一维的大小
    num_examples = len(features)
    indices = list(range(num_examples))
    # 打乱顺序
    random.shuffle(indices)
    for i in range(0,num_examples,batch_size):
        batch_indices = torch.tensor(indices[i:min(i+batch_size,num_examples)])
        yield features[batch_indices],labels[batch_indices]
# 调用该函数
# 小批量大小为10
batch_size = 10
for X,y in data_iter(batch_size,features,batch_size):
    print(X,'\n',y)
    break

3. Define the relevant parts of the model

(1) Initialization parameters
We initialize the weights by sampling random numbers from a normal distribution with mean 0 and standard deviation 0.01, and initialize the bias to 0.

(2) Define the model
Use the wx+b form.

(3) Loss function
Use the square loss function

(4) Optimization method
In each step, a mini-batch randomly drawn from the dataset is used, and then the gradient of the loss is calculated according to the parameters. Next, update our parameters in the direction of reducing the loss. The size of each step update is determined by the learning rate lr . Because the loss we compute is the sum of a batch of samples, we normalize the step size by the batch size (batch_size) so that the step size does not depend on our choice of batch size.

# 初始化模型参数
def init_params():
    # w服从均值为0,方差为0.01的正太分布,大小为(2,1)
    w = torch.normal(0,0.01,(2,1),requires_grad = True)
    b = torch.zeros(1,requires_grad = True)
    return w,b
# 定义模型
def linear_reg(X,w,b):
    # wx + b
    return torch.matmul(X,w) + b

# 定义损失函数
def squared_loss(y_pred,y):
    return (y_pred - y.reshape(y_pred.shape)) ** 2 / 2

# 定义优化算法
def sgd(params,lr,batch_size):
    # 使梯度计算disable
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            # 手动将梯度设置成 0 ,在下一次计算梯度的时候就不会和上一次相关了
            param.grad.zero_()

4. Training model

Execute the following loop:

  • Initialization parameters
  • Repeat the following exercises until complete
    • Calculate the gradient (l.sum().backward())
    • Update parameters (sgd)
# 训练
# 学习率
lr = 0.02
# 迭代次数
num_epoches = 3
net = linear_reg
loss = squared_loss
w,b = init_params()
for epoch in range(num_epoches):
    for X,y in data_iter(batch_size,features,labels):
        l = loss(net(X,w,b),y)
        # 后向传播计算梯度
        l.sum().backward()
        sgd([w,b],lr,batch_size)
    with torch.no_grad():
        train_l = loss(net(features,w,b),labels)
        print(f'epoch {
      
      epoch + 1},loss {
      
      float(train_l.mean())}')

print(f'w的估计误差: {
      
      true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {
      
      true_b - b}')

Framework implementation of linear regression

# 简洁实现
import numpy as np
import torch
from torch.utils import data
from d2l import torch as d2l
from torch import nn

true_w = torch.tensor([2,-3.4])
true_b = 4.2
# 生成数据集
features,labels = d2l.synthetic_data(true_w,true_b,1000)

def load_dataset(data_arrs,batch_size,is_train = True):
    dataset = data.TensorDataset(*data_arrs)
    return data.DataLoader(dataset,batch_size,shuffle=is_train)

batch_size = 10
data_iter = load_dataset((features,labels),batch_size)
# iter构造迭代器
next(iter(data_iter))

# 定义模型
net = nn.Sequential(nn.Linear(2,1))
# 初始化参数,注意上面使用的是nn.Sequential,创造的是一个序列,所以net[0]表示我们的网络
net[0].weight.data.normal_(0,0.01)
net[0].bias.data.fill_(0)
# 损失函数
loss = nn.MSELoss()
# 优化算法
trainer = torch.optim.SGD(net.parameters(),lr = 0.03)
# 训练
num_epoches = 3
for epoch in range(num_epoches):
    for X,y in data_iter:
        l = loss(net(X),y)
        # 将梯度重置为0
        trainer.zero_grad()
        # 计算梯度
        l.backward()
        # 更新所有的参数
        trainer.step()
    l = loss(net(features),labels)
    print(f'epch {
      
      epoch + 1} loss {
      
      l:f}')
w = net[0].weight.data
b = net[0].bias.data
print(f'w的估计误差: {
      
      true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {
      
      true_b - b}')

Guess you like

Origin blog.csdn.net/qq_41234663/article/details/129326941