3- depth hands-on learning from scratch complete regression

Generating a data set
Read data
Model initialization parameters
Definition Model
Defined loss function
Custom optimization algorithm
Trainer

After some background linear regression, we can begin to implement the algorithm of. Despite the powerful deep learning framework can reduce a lot of repetitive work, but over-reliance on the convenience it offers, will lead to difficult for us to understand how deep the depth of learning works. Therefore, this section describes how to use only Tensor and autograd training to achieve a linear regression.

First, import the package or module needs this section, which can be used for mapping matplotlib packet, and arranged to be embedded in the display.

%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

Generating a data set

We build a simple artificial training data set that enables us to distinguish visually compare parameters and model parameters learned.
Join the training data set of 1000 samples, the number of input (characteristic number) 2. We Jinsheng Cheng broken sample feature quantities \ [X∈ {R} ^ { 1000 * 2}, \] We used real linear regression model of the right weight \ [W = [2, -3.4 ] ^ {T} \] and deviations b = 4.2, and a random noise generating tag item [epsilon]
\ [y = Xw + b + ε, \] where [epsilon] noise term with mean 0 and standard deviation of 0.01 is too distribution. Noise interference on behalf of the data summary meaningless.

num_inputs =2   ## 特征数量
num_examples=1000   # 样本量
true_w=[2,-3.4]  # 真实的权重系数
true_b=4.2  # 真实的偏置量
features = torch.randn(num_examples,num_inputs,dtype=torch.float32)   # 生成随机的特征
labels = true_w[0]*features[:,0]+true_w[1]*features[:,1]+true_b  # 生成随机的标签
labels += torch.tensor(np.random.normal(0,0.01,size=labels.size()),dtype=torch.float32)  #在标签上加上随机噪声项
print(features[0],labels[0])  # 查看第一个样本

tensor([-0.4866,  0.9289]) tensor(0.0616)

def use_svg_display():
    display.set_matplotlib_formats('svg')
def set_figsize(figsize=(3.5,2.5)):
    use_svg_display()
    plt.rcParams['figure.figsize'] = figsize
    
    
    
    
## 可以单独写一个代码重复利用，比如d2lzh_pytorch 中加上这两个函数
# import sys
# sys.path.append('..')
# from d2lzh_pytorch import *

set_figsize()
plt.scatter(features[:,1].numpy(),labels.numpy(),1)

<matplotlib.collections.PathCollection at 0x1215b51d0>

Read data

def data_iter(batch_size,features,labels):
    num_examples=len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)
    for i in range(0,num_examples,batch_size):
        j=torch.LongTensor(indices[i:min(i+batch_size,num_examples)])  # 最后一次可能不足一个batch
        yield features.index_select(0,j),labels.index_select(0,j)

batch_size=10
for X ,y in data_iter(batch_size,features,labels):
    print(X,y)
    break

tensor([[ 0.0097,  0.3166],
        [-0.9294, -0.5351],
        [ 0.5398,  0.4626],
        [ 0.5905,  0.9588],
        [ 0.1730, -0.3228],
        [ 1.3608, -0.8205],
        [ 1.5391, -0.6738],
        [-1.4577,  0.6428],
        [-1.4004,  0.3694],
        [-0.6668, -0.4032]]) tensor([ 3.1422,  4.1823,  3.7059,  2.1282,  5.6544,  9.7055,  9.5682, -0.9014,
         0.1326,  4.2385])

Model initialization parameters

# 初始化w为一个2行1列的tensor,b初始化为0
w= torch.tensor(np.random.normal(0,0.01,(num_inputs,1)),dtype=torch.float32)
b = torch.zeros(1,dtype=torch.float32)

# 之后的训练模型中，需要对这些参数来迭代参数的值，因此我们要让求导的参数设置为True
w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

tensor([0.], requires_grad=True)

Definition Model

def linreg(X,w,b):
    return torch.mm(X,w)+b

Defined loss function

def squared_loss(y_hat,y):
    return (y_hat-y.view(y_hat.size()))**2/2

Custom optimization algorithm

def sgd(params,lr,batch_size):
    for param in params:
        param.data -=lr* param.grad/batch_size   
# 注意这里更改param时用的是param.data

Trainer

In training, we will model parameters multiple iterations, in each iteration, we small quantities of data samples according to the currently read (wherein X, Y tag), small quantities calculated by calling the function backward stochastic gradient backpropagation and call optimization algorithm sgd iterative model parameters.
Batch_size we previously set for 10, each low-volume loss \ (l \) in the shape of (10,1). Since the variable \ (l \) is not a scalar, we call .sum () to get a summation of scalar, then run \ (l \) .backward (), to get the model parameters related to the gradient of the variable. Note that you need to clear the parameters of the gradient parameter update after each finished

lr = 0.03
num_epochs=3
net = linreg
loss= squared_loss
for epoch in range(num_epochs):
    for X,y in data_iter(batch_size,features,labels):
        l =loss(net(X,w,b),y).sum()
        l.backward()
        sgd([w,b],lr,batch_size)
        
        w.grad.data.zero_()
        b.grad.data.zero_()
    trian_l= loss(net(features,w,b),labels)
    print('epoch %d loss %f'  %(epoch+1,trian_l.mean().item()))

epoch 1 loss 0.038065
epoch 2 loss 0.000144
epoch 3 loss 0.000050

print(true_w,'\n',w)
print(true_b,'\n',b)

[2, -3.4] 
 tensor([[ 1.9996],
        [-3.4005]], requires_grad=True)
4.2 
 tensor([4.1997], requires_grad=True)