The linear regression from scratch learning Pytorch

 

Linear Regression

The main contents include:

  1. The basic elements of linear regression

  2. Linear regression models to achieve zero

  3. Using simple linear regression model to achieve pytorch

The basic elements of linear regression

model

For simplicity, we assume that the price depends only on the housing conditions of the two factors, namely the area (square meters) and Building age (years). Next, we want to explore the specific relationship with the price of these two factors. Linear regression is a linear relationship is assumed between respective input and output:

data set

We usually collect a series of real data, such as multiple houses and their actual selling prices corresponding area and Fangling. We hope to find this data in the above model parameters to minimize the error model to predict the price of the real price. In machine learning terminology, the dataset is called the training data set (training data set) or a training set (training set), a house is called a sample (sample), which is called the real selling price tag (label ), the two factors used to predict the label called characteristic (feature). Wherein the samples used to characterize features.

Loss function

In model training, we need to measure the error between the predicted value and the true value price. Normally we would select a non-negative number as an error, and the smaller value represents the smaller the error. A common choice is a squared function. It evaluation index expression for the sample error

Optimization function - stochastic gradient descent

When the model loss function and simple form, the above error minimization solution of the problem can be formulated directly expressed. Such solution is called analytic solution (analytical solution). Linear regression square error used in this section and just fall into this category. However, most deep learning model and analytical solutions not only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution (numerical solution).

In the numerical solution of the optimization algorithm, a stochastic gradient descent small quantities (mini-batch stochastic gradient descent) are widely used in the depth learning. It's very simple algorithm: selecting a first set of initial values ​​of the model parameters, such as random selection; Next parameters multiple iterations, each iteration may be reduced so that the value of the loss function. In each iteration, the first random uniform sampling a fixed number of training data samples consisting of small quantities (mini-batch), and requirements with respect to the model parameters of an average loss of data samples in small quantities number (gradient) of the guide, and finally with this the product of a positive result of the model parameters as a predetermined amount of decrease in the current iteration.

Learning Rate: Represents each optimization, it is possible to learn the size of the step
batch size: batch size batch size is small batch calculation

In summary, the optimization function has the following two steps:

  • (I) initializing model parameters, in general using random initialization;

  • (Ii) we iterate several times in the data, each parameter is updated by moving in the negative direction of the gradient parameter.

Vector calculation

When training model or predict, we often handle multiple data samples and use vector calculation. Before the introduction of the linear regression vector calculation expression, let's consider two methods for adding two vectors.

  1. A method for adding the vector is a vector according to the two elements one by one to make the scalar addition.

  2. Another method of vector addition is made directly to the two vectors vector addition.

import torch
import time

# init variable a, b as 1000 dimension vector
n = 1000
a = torch.ones(n)
b = torch.ones(n)
# define a timer class to record time
class Timer(object):
    """Record multiple running times."""
    def __init__(self):
        self.times = []
        self.start()

    def start(self):
        # start the timer
        self.start_time = time.time()

    def stop(self):
        # stop the timer and record time into a list
        self.times.append(time.time() - self.start_time)
        return self.times[-1]

    def avg(self):
        # calculate the average and return
        return sum(self.times)/len(self.times)

    def sum(self):
        # return the sum of recorded time
        return sum(self.times)

Now we can be tested. First, the two vectors used for one by one loop element made by the scalar addition.

timer = Timer()
c = torch.zeros(n)
for i in range(n):
    c[i] = a[i] + b[i]
'%.5f sec' % timer.stop()

输出:'0.01232 sec'

Further torch is used to make the direct vector addition of two vectors:

timer.start()
d = a + b
'%.5f sec' % timer.stop()

输出:'0.00029 sec'

Obviously a result, the operation is faster than the former. Therefore, we should as far as possible vector computing to improve computational efficiency.

Linear regression models to achieve zero

# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

print(torch.__version__)

输出:1.3.0

Generating a data set

Using a linear model to generate a data set, generating a data set of 1000 samples, a linear relationship is used to generate the following data:

# set input feature number 
num_inputs = 2
# set example number
num_examples = 1000

# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
                       dtype=torch.float32)

Using the image data generated to show

plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
print(features)

输出:tensor([[ 0.0908, -0.8646],
        [-1.6370,  1.6305],
        [-0.1965,  0.8613],
        ...,
        [-0.9776,  0.0575],
        [ 1.9371, -0.1497],
        [-0.1417, -1.0046]])

Reading the data set

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)  # random read 10 samples
    for i in range(0, num_examples, batch_size):
        j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
        yield  features.index_select(0, j), labels.index_select(0, j)
        
batch_size = 10
for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break
输出:tensor([[ 1.3591,  0.6950],
        [ 0.5206, -0.2726],
        [-0.6639,  0.9716],
        [ 2.7164, -0.6513],
        [-1.0642,  1.9331],
        [-2.2240, -0.3616],
        [-0.9094,  0.6691],
        [-0.2991,  0.2488],
        [ 1.8312,  0.2209],
        [ 0.2833, -1.1672]]) 
 tensor([6.9694, 6.0005, 9.5797, 0.6944, 4.1964, 6.8519, 2.5178, 4.4217, 5.4679,
        9.9754])

Model initialization parameters

w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

Definition Model

Definition of training used to train the model parameters:

def linreg(X, w, b):
    return torch.mm(X, w) + b

Defined loss function

We are using the mean square error loss function:

l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2,

def squared_loss(y_hat, y): 
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

Defined optimization function

Here the optimization function is used in small quantities stochastic gradient descent:

(\mathbf{w},b) \leftarrow (\mathbf{w},b) - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{(\mathbf{w},b)} l^{(i)}(\mathbf{w},b)

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size 

training

When the data set, model, and optimize the loss function defined over the function can be ready for a training model.

# super parameters init
lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_()
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)
    print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))
    
输出:epoch 1, loss 7.605014
epoch 2, loss 7.521966
epoch 3, loss 7.550967
epoch 4, loss 7.542496
epoch 5, loss 7.535208

Using simple linear regression model to achieve pytorch

import torch
from torch import nn
import numpy as np
torch.manual_seed(1)
torch.set_default_tensor_type('torch.FloatTensor')

Generating a data set

Here to generate a data set with the implement from scratch is exactly the same.

num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)

Reading the data set

import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,            # torch TensorDataset format
    batch_size=batch_size,      # mini batch size
    shuffle=True,               # whether shuffle the data or not
    num_workers=2,              # read data in multithreading
)
for X, y in data_iter:
    print(X, '\n', y)
    break
    
输出:tensor([[ 0.5584, -0.4995],
        [-0.1495, -1.6520],
        [-0.3280,  0.2594],
        [-0.4857, -1.2976],
        [ 1.8603,  0.4539],
        [-0.3628,  0.0064],
        [ 1.3235, -0.3536],
        [-2.3426, -0.5968],
        [-0.6290, -0.2948],
        [-0.0787,  0.2180]]) 
 tensor([7.0088, 9.5071, 2.6718, 7.6535, 6.3802, 3.4601, 8.0475, 1.5223, 3.9682,
        3.2977])

Definition Model

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init 
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y
    
net = LinearNet(num_inputs)
# ways to init a multilayer network
# method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # other layers can be added here
    )

# method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))

Model initialization parameters

from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0)  # or you can use `net[0].bias.data.fill_(0)` to modify it directly

输出:Parameter containing:
tensor([0.], requires_grad=True)

for param in net.parameters():
    print(param)
输出:Parameter containing:
tensor([[-0.0142, -0.0161]], requires_grad=True)
Parameter containing:
tensor([0.], requires_grad=True)

Defined loss function

loss = nn.MSELoss()    # nn built-in squared loss function
                       # function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

Defined optimization function

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)   # built-in random gradient descent function
print(optimizer)  # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)`
输出:SGD (
Parameter Group 0
    dampening: 0
    lr: 0.03
    momentum: 0
    nesterov: False
    weight_decay: 0
)

training

num_epochs = 3
for epoch in range(1, num_epochs + 1):
    for X, y in data_iter:
        output = net(X)
        l = loss(output, y.view(-1, 1))
        optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item()))
# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)

输出:epoch 1, loss: 0.000103
epoch 2, loss: 0.000097
epoch 3, loss: 0.000079

 

发布了139 篇原创文章 · 获赞 49 · 访问量 4万+

Guess you like

Origin blog.csdn.net/xiewenrui1996/article/details/104309458