Linear Regression
The main contents include:
-
The basic elements of linear regression
-
Linear regression models to achieve zero
-
Using simple linear regression model to achieve pytorch
The basic elements of linear regression
model
For simplicity, we assume that the price depends only on the housing conditions of the two factors, namely the area (square meters) and Building age (years). Next, we want to explore the specific relationship with the price of these two factors. Linear regression is a linear relationship is assumed between respective input and output:
data set
We usually collect a series of real data, such as multiple houses and their actual selling prices corresponding area and Fangling. We hope to find this data in the above model parameters to minimize the error model to predict the price of the real price. In machine learning terminology, the dataset is called the training data set (training data set) or a training set (training set), a house is called a sample (sample), which is called the real selling price tag (label ), the two factors used to predict the label called characteristic (feature). Wherein the samples used to characterize features.
Loss function
In model training, we need to measure the error between the predicted value and the true value price. Normally we would select a non-negative number as an error, and the smaller value represents the smaller the error. A common choice is a squared function. It evaluation index expression for the sample error
Optimization function - stochastic gradient descent
When the model loss function and simple form, the above error minimization solution of the problem can be formulated directly expressed. Such solution is called analytic solution (analytical solution). Linear regression square error used in this section and just fall into this category. However, most deep learning model and analytical solutions not only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution (numerical solution).
In the numerical solution of the optimization algorithm, a stochastic gradient descent small quantities (mini-batch stochastic gradient descent) are widely used in the depth learning. It's very simple algorithm: selecting a first set of initial values of the model parameters, such as random selection; Next parameters multiple iterations, each iteration may be reduced so that the value of the loss function. In each iteration, the first random uniform sampling a fixed number of training data samples consisting of small quantities (mini-batch), and requirements with respect to the model parameters of an average loss of data samples in small quantities number (gradient) of the guide, and finally with this the product of a positive result of the model parameters as a predetermined amount of decrease in the current iteration.
Learning Rate: Represents each optimization, it is possible to learn the size of the step
batch size: batch size batch size is small batch calculation
In summary, the optimization function has the following two steps:
-
(I) initializing model parameters, in general using random initialization;
-
(Ii) we iterate several times in the data, each parameter is updated by moving in the negative direction of the gradient parameter.
Vector calculation
When training model or predict, we often handle multiple data samples and use vector calculation. Before the introduction of the linear regression vector calculation expression, let's consider two methods for adding two vectors.
-
A method for adding the vector is a vector according to the two elements one by one to make the scalar addition.
-
Another method of vector addition is made directly to the two vectors vector addition.
import torch
import time
# init variable a, b as 1000 dimension vector
n = 1000
a = torch.ones(n)
b = torch.ones(n)
# define a timer class to record time
class Timer(object):
"""Record multiple running times."""
def __init__(self):
self.times = []
self.start()
def start(self):
# start the timer
self.start_time = time.time()
def stop(self):
# stop the timer and record time into a list
self.times.append(time.time() - self.start_time)
return self.times[-1]
def avg(self):
# calculate the average and return
return sum(self.times)/len(self.times)
def sum(self):
# return the sum of recorded time
return sum(self.times)
Now we can be tested. First, the two vectors used for one by one loop element made by the scalar addition.
timer = Timer()
c = torch.zeros(n)
for i in range(n):
c[i] = a[i] + b[i]
'%.5f sec' % timer.stop()
输出:'0.01232 sec'
Further torch is used to make the direct vector addition of two vectors:
timer.start()
d = a + b
'%.5f sec' % timer.stop()
输出:'0.00029 sec'
Obviously a result, the operation is faster than the former. Therefore, we should as far as possible vector computing to improve computational efficiency.
Linear regression models to achieve zero
# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random
print(torch.__version__)
输出:1.3.0
Generating a data set
Using a linear model to generate a data set, generating a data set of 1000 samples, a linear relationship is used to generate the following data:
# set input feature number
num_inputs = 2
# set example number
num_examples = 1000
# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2
features = torch.randn(num_examples, num_inputs,
dtype=torch.float32)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
dtype=torch.float32)
Using the image data generated to show
plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);
features = torch.randn(num_examples, num_inputs,
dtype=torch.float32)
print(features)
输出:tensor([[ 0.0908, -0.8646],
[-1.6370, 1.6305],
[-0.1965, 0.8613],
...,
[-0.9776, 0.0575],
[ 1.9371, -0.1497],
[-0.1417, -1.0046]])
Reading the data set
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices) # random read 10 samples
for i in range(0, num_examples, batch_size):
j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
yield features.index_select(0, j), labels.index_select(0, j)
batch_size = 10
for X, y in data_iter(batch_size, features, labels):
print(X, '\n', y)
break
输出:tensor([[ 1.3591, 0.6950],
[ 0.5206, -0.2726],
[-0.6639, 0.9716],
[ 2.7164, -0.6513],
[-1.0642, 1.9331],
[-2.2240, -0.3616],
[-0.9094, 0.6691],
[-0.2991, 0.2488],
[ 1.8312, 0.2209],
[ 0.2833, -1.1672]])
tensor([6.9694, 6.0005, 9.5797, 0.6944, 4.1964, 6.8519, 2.5178, 4.4217, 5.4679,
9.9754])
Model initialization parameters
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)
w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)
Definition Model
Definition of training used to train the model parameters:
def linreg(X, w, b):
return torch.mm(X, w) + b
Defined loss function
We are using the mean square error loss function:
def squared_loss(y_hat, y):
return (y_hat - y.view(y_hat.size())) ** 2 / 2
Defined optimization function
Here the optimization function is used in small quantities stochastic gradient descent:
def sgd(params, lr, batch_size):
for param in params:
param.data -= lr * param.grad / batch_size
training
When the data set, model, and optimize the loss function defined over the function can be ready for a training model.
# super parameters init
lr = 0.03
num_epochs = 5
net = linreg
loss = squared_loss
# training
for epoch in range(num_epochs): # training repeats num_epochs times
# in each epoch, all the samples in dataset will be used once
# X is the feature and y is the label of a batch sample
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y).sum()
# calculate the gradient of batch sample loss
l.backward()
# using small batch random gradient descent to iter model parameters
sgd([w, b], lr, batch_size)
# reset parameter gradient
w.grad.data.zero_()
b.grad.data.zero_()
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))
输出:epoch 1, loss 7.605014
epoch 2, loss 7.521966
epoch 3, loss 7.550967
epoch 4, loss 7.542496
epoch 5, loss 7.535208
Using simple linear regression model to achieve pytorch
import torch
from torch import nn
import numpy as np
torch.manual_seed(1)
torch.set_default_tensor_type('torch.FloatTensor')
Generating a data set
Here to generate a data set with the implement from scratch is exactly the same.
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)
Reading the data set
import torch.utils.data as Data
batch_size = 10
# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)
# put dataset into DataLoader
data_iter = Data.DataLoader(
dataset=dataset, # torch TensorDataset format
batch_size=batch_size, # mini batch size
shuffle=True, # whether shuffle the data or not
num_workers=2, # read data in multithreading
)
for X, y in data_iter:
print(X, '\n', y)
break
输出:tensor([[ 0.5584, -0.4995],
[-0.1495, -1.6520],
[-0.3280, 0.2594],
[-0.4857, -1.2976],
[ 1.8603, 0.4539],
[-0.3628, 0.0064],
[ 1.3235, -0.3536],
[-2.3426, -0.5968],
[-0.6290, -0.2948],
[-0.0787, 0.2180]])
tensor([7.0088, 9.5071, 2.6718, 7.6535, 6.3802, 3.4601, 8.0475, 1.5223, 3.9682,
3.2977])
Definition Model
class LinearNet(nn.Module):
def __init__(self, n_feature):
super(LinearNet, self).__init__() # call father function to init
self.linear = nn.Linear(n_feature, 1) # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`
def forward(self, x):
y = self.linear(x)
return y
net = LinearNet(num_inputs)
# ways to init a multilayer network
# method one
net = nn.Sequential(
nn.Linear(num_inputs, 1)
# other layers can be added here
)
# method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......
# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
('linear', nn.Linear(num_inputs, 1))
# ......
]))
Model initialization parameters
from torch.nn import init
init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0) # or you can use `net[0].bias.data.fill_(0)` to modify it directly
输出:Parameter containing:
tensor([0.], requires_grad=True)
for param in net.parameters():
print(param)
输出:Parameter containing:
tensor([[-0.0142, -0.0161]], requires_grad=True)
Parameter containing:
tensor([0.], requires_grad=True)
Defined loss function
loss = nn.MSELoss() # nn built-in squared loss function
# function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`
Defined optimization function
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.03) # built-in random gradient descent function
print(optimizer) # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)`
输出:SGD (
Parameter Group 0
dampening: 0
lr: 0.03
momentum: 0
nesterov: False
weight_decay: 0
)
training
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
output = net(X)
l = loss(output, y.view(-1, 1))
optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
l.backward()
optimizer.step()
print('epoch %d, loss: %f' % (epoch, l.item()))
# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)
输出:epoch 1, loss: 0.000103
epoch 2, loss: 0.000097
epoch 3, loss: 0.000079