Bo Yu learning platform "hands-on learning" - Linear Regression

Summary

The basic elements of linear regression
Linear regression model from scratch to achieve
Linear regression models using simple pytorch implementation

The basic elements of linear regression

model

Assume property price depends only on the housing area (square meters) and Building age (years), we would like to explore rates specific relationship with these two factors. Linear regression assuming a linear relationship between the output and the respective inputs:

Wherein, w (area) and w (age) of the right housing area and Fangling weight, b is the bias.

data set

We will first do the data collection work, such as multiple houses real selling price and their corresponding area and Fangling. We want to find the model parameters in the data above to minimize the error predicted price model and the real price. In machine learning terminology, the dataset is called the training data set or training set, a house is called a sample, the true selling price called a tag, to two factors predicted labels (area and Fangling ) called features. Characteristics used to characterize the sample characteristics.

Loss function

In model training, we need to measure the error between the price of the predicted value and the true value. Normally we would select a non-negative number as an error, and the smaller value represents the smaller the error. A common choice is a squared function. It assessment index sample error i of the expression for

Optimization function

When the model and loss of function in the form of relatively simple, the above error minimization solution of the problem can be formulated directly expressed. Such solution is called analytic solution. Linear regression and squared error used herein just fall into this category. However, most deep learning model does not analytical solution, only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution.

In the optimization algorithm for numerical solution in small quantities stochastic gradient descent (mini-batch stochastic gradient descent) are widely used in the depth learning. Its algorithm is very simple: first selecting a set of initial values of the model parameters, such as random selection; Next the parameters a plurality of iterations so that each iteration may reduce the value of the loss function. In each iteration, the first random uniform sampling a fixed number of training data samples consisting of small quantities (mini-batch), and requirements with respect to the model parameters of an average loss of data samples in small quantities number (gradient) of the guide, and finally with this the results with the product of a positive number is set in advance as a model parameter decreases the amount of the current iteration in.

Wherein, [eta] is a learning rate, representing the step size in each optimization, it is possible to learn; beta] is the batch size (batch size) small quantities of computation. Optimization function has the following two steps:

Initializing model parameters, in general using random initialization;
On the data iteration times to update each parameter by moving in the negative direction of the gradient parameter.

Linear regression model from scratch to achieve

Import the necessary libraries

import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

Generating a data set

Here we use a linear model to generate a data set, generating a 1000 sample data set, the following is used to generate a linear relationship data:

# 设置输入特征数量为2个特征 
num_inputs = 2
# 设置样本数量为1000
num_examples = 1000

# 设置真实的权重和偏差
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs, dtype=torch.float32)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
# 通过正态分布随机生成一个偏差
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float32)

We can use Matplotlib to display data generated.

def use_svg_display():
    display.set_matplotlib_formats('svg')
    
def set_figsize(figsize=(3.5, 2.5)):
    use_svg_display()
    plt.rcParams['figure.figsize'] = figsize

set_figsize()
plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);

Results are as follows:

Reading the data set

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 打乱数据集
    random.shuffle(indices)  # random read 10 samples
    for i in range(0, num_examples, batch_size):
        j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
        yield  features.index_select(0, j), labels.index_select(0, j)

Initialization model parameters

w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

Definition Model

def linreg(X, w, b):
    return torch.mm(X, w) + b

Defined loss function

We use here is the mean square error loss function:

def squared_loss(y_hat, y): 
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

Defined optimization function

Here the optimization function used in small quantities stochastic gradient descent:

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size

training

When the data sets, models, loss of function and optimization functions are defined after a good start model train.

# super parameters init
lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_()
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)
    print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))

Results are as follows:

It can be seen, along with training, loss continues to decrease. Finally, we output the training results w, b and real true_w, true_b:

w：(tensor([[ 2.0007], [-3.3999]], requires_grad=True),
true_w：[2, -3.4],
b：tensor([4.2005], requires_grad=True),
true_b：4.2

Comparison between the two, the results are very close.

Linear regression models using simple pytorch implementation

Use pytorch method functions for linear regression will clean a lot.

Generating a data set

Generating a data set from scratch implementation is exactly the same.

Reading the data set

Here data module in use pytorch the utils module to read the data sets, the package data set into DataLoader format.

import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,            # torch TensorDataset format
    batch_size=batch_size,      # mini batch size
    shuffle=True,               # whether shuffle the data or not
    num_workers=2,              # read data in multithreading
)

Definition Model

We create a network model class, the linear regression model encapsulated into a simple neural network.

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init 
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y

The network has only one linear layer, we can initialize a multi-layer network used in three ways as needed.

# ways to init a multilayer network
# method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # other layers can be added here
    )

# method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))

Initialization model parameters

As used herein, the pytorch of nn module init initializes the model parameters.

from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0)

Defined loss function

pytorch has helped us a good package series loss function, just call it can be, for example, score loss we use function MSELoss ().

loss = nn.MSELoss()

Defined optimization function

pytorch也帮我们封装好了一系列优化函数，例如我们使用的小批量随机梯度下降函数为SGD()。

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)

训练

训练时使用pytorch自动求导机制。

num_epochs = 3
for epoch in range(1, num_epochs + 1):
    for X, y in data_iter:
        output = net(X)
        l = loss(output, y.view(-1, 1))
        optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item()))

皮卡丘的学习生涯

发布了13 篇原创文章 · 获赞 9 · 访问量 4587

私信关注