Linear regression
main contents include:
Linear regression of the essential elements
to achieve zero-based linear regression model
of linear regression models using simple pytorch to achieve
the basic elements of linear regression
model
For simplicity, we assume that the price depends only on the housing conditions of the two factors, namely the area (square meters) and Building age (years). Next, we want to explore the specific relationship with the price of these two factors. Linear regression is a linear relationship is assumed between respective input and output:
wage⋅age + + = warea⋅area. price B
. price wage⋅age + B + = warea⋅area
datasets
We usually collect a series of real data, such as multiple houses and their actual selling prices corresponding area and Fangling. We want to find the model parameters in the data above to minimize the error predicted price model and the real price. In machine learning terminology, the dataset is called the training data set (training data set) or a training set (training set), a house is called a sample (sample), which is called the real selling price tag (label ), the two factors used to predict the label called characteristic (feature). Characteristics used to characterize the sample characteristics.
Loss function
In model training, we need to measure the error between the price of the predicted value and the true value. Normally we would select a non-negative number as an error, and the smaller value represents the smaller the error. A common choice is a squared function. It assessing ii index expression for the error of the sample
L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (W, B) = 1nΣi = 1 nl (I) (W, B) = 1nΣi = 1n12 (w⊤x (I) + B-Y (I)) 2.
L (W, B) = . 1nΣi = 1nl (i) ( w, b) = 1nΣi = 1n12 (w⊤x (i) + b-y (i)) 2
optimization function - stochastic gradient descent
When the model and loss of function in the form of relatively simple, the above error minimization solution of the problem can be formulated directly expressed. Such solution is called analytic solution (analytical solution). Linear regression square error used in this section and just fall into this category. However, most deep learning model does not analytical solution, only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution (numerical solution).
In the optimization algorithm for numerical solution in small quantities stochastic gradient descent (mini-batch stochastic gradient descent) are widely used in the depth learning. Its algorithm is very simple: first selecting a set of initial values of the model parameters, such as random selection; Next the parameters a plurality of iterations so that each iteration may reduce the value of the loss function. In each iteration, the first random uniform sampling a fixed number of training data samples consisting of small quantities (mini-batch) BB, then the derivative average loss parameters of the model of the small quantities of data samples (gradient), and finally this result is the product of a positive number of model parameters as a preset amount of decrease in the current iteration.
(W, B) ← (W, B) eta | B | Σi∈B∂ (W, B) L (I) (W, B)
(W, B) ← (W, B) eta | B | Σi∈B∂ (w, b) l (i) (w, b)
learning rate: ηη represents each optimization, it is possible to learn the size of the step
batch size: BB batch size is small batch calculation batch size
In summary, the optimization function has the following two steps:
(i) initializing model parameters, generally using random initialization;
(II) we iterate several times in the data, each parameter is updated by moving in the negative direction of the gradient parameter.
Vector calculation
When training model or predict, we often handle multiple data samples and use vector calculation. Before the introduction of the linear regression vector calculation expression, let's consider two methods for adding two vectors.
A method for adding the vector is a vector according to the two elements one by one to make the scalar addition.
Another method of vector addition is made directly to the two vectors vector addition.
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
Import Torch
Import Time
init variable a, b as 1000 dimension vector
n = 1000
a = torch.ones(n)
b = torch.ones(n)
print(a,b)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
define a timer class to record time
class Timer(object):
“”“Record multiple running times.”""
def init(self):
self.times = []
self.start()
def start(self):
# start the timer
self.start_time = time.time()
def stop(self):
# stop the timer and record time into a list
self.times.append(time.time() - self.start_time)
return self.times[-1]
def avg(self):
# calculate the average and return
return sum(self.times)/len(self.times)
def sum(self):
# return the sum of recorded time
return sum(self.times)
Now we can be tested. First, the two vectors used for one by one loop element made by the scalar addition.
. 1
2
. 3
. 4
. 5
Timer = Timer () # Timer speaking instantiated
c = torch.zeros (n) # n demension initialization vector C
for I in Range (n-):
C [I] = A [I] + B [I ]
'sec .5f%'% timer.stop () # printing time
'0.02496 sec'
Further torch is used to make the direct vector addition of two vectors:
. 1
2
. 3
timer.start ()
D = A + B # done directly using vector addition torch
'sec .5f%'% timer.stop ()
'0.00100 sec'
results it is clear that the operation is faster than the former. Therefore, we should as far as possible vector computing to improve computational efficiency.
Linear regression models to achieve zero
1
2
3
4
5
6
7
8
9
import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random
Print (Torch. Version )
1.3.1
generated data set
Using a linear model to generate a data set, generating a data set of 1000 samples, a linear relationship is used to generate the following data:
price=warea⋅area+wage⋅age+b
price=warea⋅area+wage⋅age+b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
set input feature number
num_inputs = 2
set example number
num_examples = 1000
set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2
= torch.randn Features (num_examples, num_inputs,
DTYPE = torch.float32) * 2 # 1000 vector
labels = true_w [0] * features [:, 0] + true_w [1] * features [:, 1] + true_b # stringent significance linear
Labels + = torch.tensor (np.random.normal (0, 0.01, labels.size size = ()),
DTYPE = torch.float32) # plus a normally distributed random variation generated
using the generated image to show data
1
2
3
4
5
6
7
8
9
10
11
def use_svg_display():
# display on vector graph
display.set_matplotlib_formats(‘svg’)
def set_figsize(figsize = (3.5, 2.5)):
use_svg_display()
# set the size of figure
plt.rcParams[‘figure.figsize’] = figsize
set_figsize ()
plt.scatter (Features [:,. 1] .numpy (), labels.numpy (),. 1); FIG Vector #
svg
Reading the data set
1
2
3
4
5
6
7
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices) # random read 10 samples
for i in range(0, num_examples, batch_size):
j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
yield features.index_select(0, j), labels.index_select(0, j)
1
2
3
4
5
batch_size = 10
X-for, in data_iter Y (the batch_size, Features, Labels):
Print (X-, '\ n-', Y)
BREAK
Tensor ([[- 2.0243, -2.0945],
[-0.8934, -0.8337],
[-1.0098, 0.3432 ],
[-1.1994, -1.1753],
[-1.1607, 2.5120],
[-1.3316, 1.4151],
[-1.2848, -0.2235],
[0.8184, 0.3788],
[1.9521, 0.3147],
[0.2946, -0.7865] ])
Tensor ([7.2761, 5.2392, 1.0158, 5.8023, -6.6642, -3.2760, 2.4018, 4.5289,
7.0451, 7.4536])
initializes the model parameters
1
2
3
4
5
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)
w.requires_grad_ (requires_grad = True) # additional gradient
b.requires_grad_ (requires_grad = True)
Tensor ([0.], requires_grad = True)
model definition
Definition of training used to train the model parameters:
wage⋅age + + = warea⋅area. price B
. price = warea⋅area wage⋅age + B +
. 1
2
DEF the LinReg (X-, W, B):
return torch.mm (X-, W) + B
defined loss function
We use the mean square error loss function:
L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (I) (W, B) = 12 is (Y ^ ( I) -Y (I)) 2,
. 1
2
DEF squared_loss (y_hat, Y):
return (y_hat - y.view (y_hat.size ())) ** 2/2
defined optimization function
Here the optimization function used in small quantities stochastic gradient descent:
(w,b)←(w,b)−η|B|∑i∈B∂(w,b)l(i)(w,b)
(w,b)←(w,b)−η|B|∑i∈B∂(w,b)l(i)(w,b)
1
2
3
def sgd(params, lr, batch_size):
for param in params:
param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track
训练
When the data set, model, and optimize the loss function defined over the function can be ready for a training model.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
super parameters init
lr = 0.03
num_epochs = 5
net = linreg
loss = squared_loss
training
for epoch in range(num_epochs): # training repeats num_epochs times
# in each epoch, all the samples in dataset will be used once
# X is the feature and y is the label of a batch sample
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y).sum()
# calculate the gradient of batch sample loss
l.backward()
# using small batch random gradient descent to iter model parameters
sgd([w, b], lr, batch_size)
# reset parameter gradient 梯度清零
w.grad.data.zero_()
b.grad.data.zero_()
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))
. 1 Epoch, Loss .055429
Epoch 2, Loss 0.000273
Epoch. 3, Loss 0.000053
Epoch. 4, Loss 0.000052
Epoch. 5, Loss 0.000052
. 1
W, true_w, B, true_b
(Tensor ([[2.0005],
[-3.3996]], = True requires_grad ),
[2, -3.4],
Tensor ([4.2004], requires_grad = True),
4.2)
linear regression model used to achieve simplicity pytorch
1
2
3
4
5
6
7
import torch
from torch import nn
import numpy as np
torch.manual_seed(1)
Print (Torch. Version )
torch.set_default_tensor_type ( 'torch.FloatTensor')
1.3.1
generated data set
Here to generate a data set with the implement from scratch is exactly the same.
1
2
3
4
5
6
7
8
9
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)
读取数据集
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch.utils.data as Data
batch_size = 10
combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)
put dataset into DataLoader
data_iter = Data.DataLoader(
dataset=dataset, # torch TensorDataset format
batch_size=batch_size, # mini batch size
shuffle=True, # whether shuffle the data or not
num_workers=2, # read data in multithreading
)
1
2
3
for X, y in data_iter:
print(X, ‘\n’, y)
break
tensor([[ 0.7056, 0.7436],
[ 1.4163, -1.6029],
[-1.7271, 0.4230],
[ 0.8822, 0.8994],
[ 0.3909, 0.9114],
[-1.1081, -2.0318],
[ 1.2801, -1.0039],
[ 0.2195, -0.3447],
[ 0.6381, 1.0030],
[-0.7795, 1.8384]])
tensor([ 3.0932, 12.4683, -0.6865, 2.9290, 1.8722, 8.8914, 10.1730, 5.8373,
2.0714, -3.6013])
model definition
1
2
3
4
5
6
7
8
9
10
11
class LinearNet(nn.Module):
def init(self, n_feature):
super(LinearNet, self).init() # call father function to init
self.linear = nn.Linear(n_feature, 1) # function prototype: torch.nn.Linear(in_features, out_features, bias=True)
def forward(self, x):
y = self.linear(x)
return y
net = LinearNet(num_inputs)
print(net)
LinearNet(
(linear): Linear(in_features=2, out_features=1, bias=True)
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ways to init a multilayer network
method one
net = nn.Sequential(
nn.Linear(num_inputs, 1)
# other layers can be added here
)
method two
net = nn.Sequential()
net.add_module(‘linear’, nn.Linear(num_inputs, 1))
net.add_module …
method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
(‘linear’, nn.Linear(num_inputs, 1))
# …
]))
Print (NET)
Print (NET [0])
the Sequential (
(Linear): Linear (in_features = 2, out_features =. 1, BIAS = True)
)
Linear (in_features = 2, out_features =. 1, BIAS = True)
initializes the model parameters
1
2
3
4
from torch.nn import init
init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0) # or you can use net[0].bias.data.fill_(0)
to modify it directly
Parameter containing:
tensor([0.], requires_grad=True)
1
2
for param in net.parameters():
print(param)
Parameter containing:
tensor([[-0.0142, -0.0161]], requires_grad=True)
Parameter containing:
tensor([0.], requires_grad=True)
定义损失函数
. 1
2
Loss = nn.MSELoss () # NN Squared Loss function Built-in
# the prototype function: torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
defines optimization function
1
2
3
4
import positioned as torch.optim
optimizer = optim.SGD(net.parameters(), lr=0.03) # built-in random gradient descent function
print(optimizer) # function prototype: torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)
SGD (
Parameter Group 0
dampening: 0
lr: 0.03
momentum: 0
nesterov: False
weight_decay: 0
)
训练
1
2
3
4
5
6
7
8
9
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
output = net(X)
l = loss(output, y.view(-1, 1))
optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
l.backward()
optimizer.step()
print(‘epoch %d, loss: %f’ % (epoch, l.item()))
epoch 1, loss: 0.000366
epoch 2, loss: 0.000100
epoch 3, loss: 0.000112
1
2
3
4
result comparision
NET = Dense [0]
Print (true_w, dense.weight.data)
Print (true_b, dense.bias.data)
[2, -3.4] Tensor ([[1.9998, -3.3989]])
4.2 Tensor ([4.1998])
comparison of two implementations
The zero-based realization (recommended for learning)
A better understanding of the principles underlying models and neural networks
Use simple implementation pytorch
Design and implementation of the model can be completed more quickly
1
2