Task1.1 linear regression

Linear regression
main contents include:

Linear regression of the essential elements
to achieve zero-based linear regression model
of linear regression models using simple pytorch to achieve
the basic elements of linear regression

model

For simplicity, we assume that the price depends only on the housing conditions of the two factors, namely the area (square meters) and Building age (years). Next, we want to explore the specific relationship with the price of these two factors. Linear regression is a linear relationship is assumed between respective input and output:

wage⋅age + + = warea⋅area. price B
. price wage⋅age + B + = warea⋅area
datasets

We usually collect a series of real data, such as multiple houses and their actual selling prices corresponding area and Fangling. We want to find the model parameters in the data above to minimize the error predicted price model and the real price. In machine learning terminology, the dataset is called the training data set (training data set) or a training set (training set), a house is called a sample (sample), which is called the real selling price tag (label ), the two factors used to predict the label called characteristic (feature). Characteristics used to characterize the sample characteristics.

Loss function

In model training, we need to measure the error between the price of the predicted value and the true value. Normally we would select a non-negative number as an error, and the smaller value represents the smaller the error. A common choice is a squared function. It assessing ii index expression for the error of the sample

L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (W, B) = 1nΣi = 1 nl (I) (W, B) = 1nΣi = 1n12 (w⊤x (I) + B-Y (I)) 2.
L (W, B) = . 1nΣi = 1nl (i) ( w, b) = 1nΣi = 1n12 (w⊤x (i) + b-y (i)) 2
optimization function - stochastic gradient descent

When the model and loss of function in the form of relatively simple, the above error minimization solution of the problem can be formulated directly expressed. Such solution is called analytic solution (analytical solution). Linear regression square error used in this section and just fall into this category. However, most deep learning model does not analytical solution, only to reduce the value of the loss function as much as possible by optimizing algorithm finite number of iterations model parameters. Such solution is called numerical solution (numerical solution).

In the optimization algorithm for numerical solution in small quantities stochastic gradient descent (mini-batch stochastic gradient descent) are widely used in the depth learning. Its algorithm is very simple: first selecting a set of initial values ​​of the model parameters, such as random selection; Next the parameters a plurality of iterations so that each iteration may reduce the value of the loss function. In each iteration, the first random uniform sampling a fixed number of training data samples consisting of small quantities (mini-batch) BB, then the derivative average loss parameters of the model of the small quantities of data samples (gradient), and finally this result is the product of a positive number of model parameters as a preset amount of decrease in the current iteration.

(W, B) ← (W, B) eta | B | Σi∈B∂ (W, B) L (I) (W, B)
(W, B) ← (W, B) eta | B | Σi∈B∂ (w, b) l (i) (w, b)
learning rate: ηη represents each optimization, it is possible to learn the size of the step
batch size: BB batch size is small batch calculation batch size

In summary, the optimization function has the following two steps:

(i) initializing model parameters, generally using random initialization;
(II) we iterate several times in the data, each parameter is updated by moving in the negative direction of the gradient parameter.
Vector calculation

When training model or predict, we often handle multiple data samples and use vector calculation. Before the introduction of the linear regression vector calculation expression, let's consider two methods for adding two vectors.

A method for adding the vector is a vector according to the two elements one by one to make the scalar addition.
Another method of vector addition is made directly to the two vectors vector addition.
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
Import Torch
Import Time

init variable a, b as 1000 dimension vector

n = 1000
a = torch.ones(n)
b = torch.ones(n)

print(a,b)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

define a timer class to record time

class Timer(object):
“”“Record multiple running times.”""
def init(self):
self.times = []
self.start()

def start(self):
    # start the timer
    self.start_time = time.time()

def stop(self):
    # stop the timer and record time into a list
    self.times.append(time.time() - self.start_time)
    return self.times[-1]

def avg(self):
    # calculate the average and return
    return sum(self.times)/len(self.times)

def sum(self):
    # return the sum of recorded time
    return sum(self.times)

Now we can be tested. First, the two vectors used for one by one loop element made by the scalar addition.

. 1
2
. 3
. 4
. 5
Timer = Timer () # Timer speaking instantiated
c = torch.zeros (n) # n demension initialization vector C
for I in Range (n-):
C [I] = A [I] + B [I ]
'sec .5f%'% timer.stop () # printing time
'0.02496 sec'
Further torch is used to make the direct vector addition of two vectors:

. 1
2
. 3
timer.start ()
D = A + B # done directly using vector addition torch
'sec .5f%'% timer.stop ()
'0.00100 sec'
results it is clear that the operation is faster than the former. Therefore, we should as far as possible vector computing to improve computational efficiency.

Linear regression models to achieve zero

1
2
3
4
5
6
7
8
9

import packages and modules

%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

Print (Torch. Version )
1.3.1
generated data set

Using a linear model to generate a data set, generating a data set of 1000 samples, a linear relationship is used to generate the following data:

price=warea⋅area+wage⋅age+b
price=warea⋅area+wage⋅age+b
1
2
3
4
5
6
7
8
9
10
11
12
13
14

set input feature number

num_inputs = 2

set example number

num_examples = 1000

set true weight and bias in order to generate corresponded label

true_w = [2, -3.4]
true_b = 4.2

= torch.randn Features (num_examples, num_inputs,
DTYPE = torch.float32) * 2 # 1000 vector
labels = true_w [0] * features [:, 0] + true_w [1] * features [:, 1] + true_b # stringent significance linear
Labels + = torch.tensor (np.random.normal (0, 0.01, labels.size size = ()),
DTYPE = torch.float32) # plus a normally distributed random variation generated
using the generated image to show data

1
2
3
4
5
6
7
8
9
10
11
def use_svg_display():
# display on vector graph
display.set_matplotlib_formats(‘svg’)

def set_figsize(figsize = (3.5, 2.5)):
use_svg_display()
# set the size of figure
plt.rcParams[‘figure.figsize’] = figsize

set_figsize ()
plt.scatter (Features [:,. 1] .numpy (), labels.numpy (),. 1); FIG Vector #
svg

Reading the data set

1
2
3
4
5
6
7
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices) # random read 10 samples
for i in range(0, num_examples, batch_size):
j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
yield features.index_select(0, j), labels.index_select(0, j)
1
2
3
4
5
batch_size = 10

X-for, in data_iter Y (the batch_size, Features, Labels):
Print (X-, '\ n-', Y)
BREAK
Tensor ([[- 2.0243, -2.0945],
[-0.8934, -0.8337],
[-1.0098, 0.3432 ],
[-1.1994, -1.1753],
[-1.1607, 2.5120],
[-1.3316, 1.4151],
[-1.2848, -0.2235],
[0.8184, 0.3788],
[1.9521, 0.3147],
[0.2946, -0.7865] ])
Tensor ([7.2761, 5.2392, 1.0158, 5.8023, -6.6642, -3.2760, 2.4018, 4.5289,
7.0451, 7.4536])
initializes the model parameters

1
2
3
4
5
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_ (requires_grad = True) # additional gradient
b.requires_grad_ (requires_grad = True)
Tensor ([0.], requires_grad = True)
model definition

Definition of training used to train the model parameters:

wage⋅age + + = warea⋅area. price B
. price = warea⋅area wage⋅age + B +
. 1
2
DEF the LinReg (X-, W, B):
return torch.mm (X-, W) + B
defined loss function

We use the mean square error loss function:
L (I) (W, B) = 12 is (^ Y (I) -Y (I)) 2,
L (I) (W, B) = 12 is (Y ^ ( I) -Y (I)) 2,
. 1
2
DEF squared_loss (y_hat, Y):
return (y_hat - y.view (y_hat.size ())) ** 2/2
defined optimization function

Here the optimization function used in small quantities stochastic gradient descent:

(w,b)←(w,b)−η|B|∑i∈B∂(w,b)l(i)(w,b)
(w,b)←(w,b)−η|B|∑i∈B∂(w,b)l(i)(w,b)
1
2
3
def sgd(params, lr, batch_size):
for param in params:
param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track
训练

When the data set, model, and optimize the loss function defined over the function can be ready for a training model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

super parameters init

lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

training

for epoch in range(num_epochs): # training repeats num_epochs times
# in each epoch, all the samples in dataset will be used once

# X is the feature and y is the label of a batch sample
for X, y in data_iter(batch_size, features, labels):
    l = loss(net(X, w, b), y).sum()  
    # calculate the gradient of batch sample loss 
    l.backward()  
    # using small batch random gradient descent to iter model parameters
    sgd([w, b], lr, batch_size)  
    # reset parameter gradient 梯度清零
    w.grad.data.zero_() 
    b.grad.data.zero_()
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))

. 1 Epoch, Loss .055429
Epoch 2, Loss 0.000273
Epoch. 3, Loss 0.000053
Epoch. 4, Loss 0.000052
Epoch. 5, Loss 0.000052
. 1
W, true_w, B, true_b
(Tensor ([[2.0005],
[-3.3996]], = True requires_grad ),
[2, -3.4],
Tensor ([4.2004], requires_grad = True),
4.2)
linear regression model used to achieve simplicity pytorch

1
2
3
4
5
6
7
import torch
from torch import nn
import numpy as np
torch.manual_seed(1)

Print (Torch. Version )
torch.set_default_tensor_type ( 'torch.FloatTensor')
1.3.1
generated data set

Here to generate a data set with the implement from scratch is exactly the same.

1
2
3
4
5
6
7
8
9
num_inputs = 2
num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)
读取数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch.utils.data as Data

batch_size = 10

combine featues and labels of dataset

dataset = Data.TensorDataset(features, labels)

put dataset into DataLoader

data_iter = Data.DataLoader(
dataset=dataset, # torch TensorDataset format
batch_size=batch_size, # mini batch size
shuffle=True, # whether shuffle the data or not
num_workers=2, # read data in multithreading
)
1
2
3
for X, y in data_iter:
print(X, ‘\n’, y)
break
tensor([[ 0.7056, 0.7436],
[ 1.4163, -1.6029],
[-1.7271, 0.4230],
[ 0.8822, 0.8994],
[ 0.3909, 0.9114],
[-1.1081, -2.0318],
[ 1.2801, -1.0039],
[ 0.2195, -0.3447],
[ 0.6381, 1.0030],
[-0.7795, 1.8384]])
tensor([ 3.0932, 12.4683, -0.6865, 2.9290, 1.8722, 8.8914, 10.1730, 5.8373,
2.0714, -3.6013])
model definition

1
2
3
4
5
6
7
8
9
10
11
class LinearNet(nn.Module):
def init(self, n_feature):
super(LinearNet, self).init() # call father function to init
self.linear = nn.Linear(n_feature, 1) # function prototype: torch.nn.Linear(in_features, out_features, bias=True)

def forward(self, x):
    y = self.linear(x)
    return y

net = LinearNet(num_inputs)
print(net)
LinearNet(
(linear): Linear(in_features=2, out_features=1, bias=True)
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

ways to init a multilayer network

method one

net = nn.Sequential(
nn.Linear(num_inputs, 1)
# other layers can be added here
)

method two

net = nn.Sequential()
net.add_module(‘linear’, nn.Linear(num_inputs, 1))

net.add_module …

method three

from collections import OrderedDict
net = nn.Sequential(OrderedDict([
(‘linear’, nn.Linear(num_inputs, 1))
# …
]))

Print (NET)
Print (NET [0])
the Sequential (
(Linear): Linear (in_features = 2, out_features =. 1, BIAS = True)
)
Linear (in_features = 2, out_features =. 1, BIAS = True)
initializes the model parameters

1
2
3
4
from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0) # or you can use net[0].bias.data.fill_(0) to modify it directly
Parameter containing:
tensor([0.], requires_grad=True)
1
2
for param in net.parameters():
print(param)
Parameter containing:
tensor([[-0.0142, -0.0161]], requires_grad=True)
Parameter containing:
tensor([0.], requires_grad=True)
定义损失函数

. 1
2
Loss = nn.MSELoss () # NN Squared Loss function Built-in
# the prototype function: torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
defines optimization function

1
2
3
4
import positioned as torch.optim

optimizer = optim.SGD(net.parameters(), lr=0.03) # built-in random gradient descent function
print(optimizer) # function prototype: torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)
SGD (
Parameter Group 0
dampening: 0
lr: 0.03
momentum: 0
nesterov: False
weight_decay: 0
)
训练

1
2
3
4
5
6
7
8
9
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
output = net(X)
l = loss(output, y.view(-1, 1))
optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
l.backward()
optimizer.step()
print(‘epoch %d, loss: %f’ % (epoch, l.item()))
epoch 1, loss: 0.000366
epoch 2, loss: 0.000100
epoch 3, loss: 0.000112
1
2
3
4

result comparision

NET = Dense [0]
Print (true_w, dense.weight.data)
Print (true_b, dense.bias.data)
[2, -3.4] Tensor ([[1.9998, -3.3989]])
4.2 Tensor ([4.1998])
comparison of two implementations

The zero-based realization (recommended for learning)

A better understanding of the principles underlying models and neural networks

Use simple implementation pytorch

Design and implementation of the model can be completed more quickly

1
2

Published an original article · won praise 0 · Views 70

Guess you like

Origin blog.csdn.net/qq_44405249/article/details/104296432