%matplotlib inline
Neural Networks
To build the neural network using torch.nn package.
On a talk already mentioned autograd
, nn
package depends autograd
bag and to define the model derivation.
A nn.Module
respective layer contains a forward(input)
method that returns output
.
E.g:
It is a simple feedforward neural network that accepts an input, and then transferred layer by layer, and finally outputs the calculated result.
A typical neural network training process is as follows:
- Definition contains a number of parameters can be learned (or called weights) neural network model;
- Iterate over the data set;
- Processing the input by the neural network;
- Calculation of the loss (the difference between the correct value and the magnitude of the output);
- The gradient will reverse propagation network parameters;
- Update parameters of the network, mainly using the following simple update principles:
weight = weight - learning_rate * gradient
Defining network
Begin to define a network:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Must be defined in the model forward
function, backward
the function (used to calculate the gradient) are autograd
automatically created.
You can forward
use any operation for Tensor of function.
net.parameters()
Return parameters can be learned (weights) and a list of values
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
10
torch.Size([6, 1, 5, 5])
Test stochastic input 32 × 32.
NOTE: This network (LeNet) input desired size 32 × 32, if the data set using MNIST train the network, the image resized to 32 × 32.
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
tensor([[ 0.1470, -0.0240, 0.0103, 0.0705, 0.0650, -0.0010, -0.0083, 0.0556,
-0.0686, -0.0675]], grad_fn=<AddmmBackward>)
All parameters of the gradient buffer is cleared, then the stochastic gradient back propagation:
net.zero_grad()
out.backward(torch.randn(1, 10))
Note
`` Torch.nn`` support only small quantities input. `` Torch.nn`` entire package only supports low-volume samples without the support of a single sample. For example, `` nn.Conv2d`` accepts a 4-dimensional tensor, `each dimension are sSamples * nChannels * Height * Width (number of samples * * H * W channels)` `. If you have a single sample, just use `` input.unsqueeze (0) `` to add another dimension
Before continuing, we look back so far used in class.
review:
torch.Tensor
: A used automatic callbackward()
implementation supports automatic gradient calculation of multidimensional arrays ,
and save on the vector gradient wrtnn.Module
: Neural network module. Encapsulation parameters, to run on the GPU to move, export, loading and the like.nn.Parameter
: One way variance, when assign it to aModule
time, is automatically registered as a parameter.autograd.Function
: Implement an automatic derivation of the forward and reverse operation is defined, each variable node operation creates at least one function, eachTensor
of the operations to create a back to the creationTensor
and its history encoding function of theFunction
node.
It focused on the following:
- The definition of a network
- Processing input, call backword
Left:
- Calculation of the loss
- Update the network weights
Loss function
A loss function accepts one pair (output, target) as input, calculates a value to estimate how much difference between the target value and the output of the network.
Translator's Note: output for the output of the network, target actual value
nn package many different loss function .
nn.MSELoss
Loss is a relatively simple function, which calculates between the output and the target mean square error ,
for example:
output = net(input)
target = torch.randn(10) # 随机值作为样例
target = target.view(1, -1) # 使target和output的shape相同
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
tensor(0.7241, grad_fn=<MseLossBackward>)
Now, if you use it the .grad_fn
property along the loss
rearward movement direction, you will see a calculating map, as shown below:
::
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
So, when we call loss.backward()
, the whole graph is differentiated
w.r.t. the loss, and all Tensors in the graph that has requires_grad=True
will have their .grad
Tensor accumulated with the gradient.
For illustration, let us follow a few steps backward:
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
<MseLossBackward object at 0x0000001FCC3CEEB8>
<AddmmBackward object at 0x0000001FCC3CEBE0>
<AccumulateGrad object at 0x0000001FCC3CEEB8>
Back Propagation
Call loss.backward () to obtain an error back propagation.
But before calling the need to clear the existing gradient, or gradient to be accumulated to the existing gradient.
Now, we will call loss.backward (), and view the deviation (bias) conv1 items gradient layer before and after the back-propagation.
net.zero_grad() # 清除梯度
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0024, 0.0044, -0.0027, 0.0066, -0.0034, 0.0067])
How to use the loss function
Read later:
nn
Packet contains a variety of depths for the neural network constituting the building blocks and loss function block, complete documentation See here Wallpaper .
The last remaining one thing:
- The right to re-new network
Update weights
In practice, the simplest weight update rule is stochastic gradient descent (SGD):
``weight = weight - learning_rate * gradient``
We can use a simple Python code that implements this rule:
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
But when you want to use neural network is updated using a variety of different rules, such as SGD, Nesterov-SGD, Adam, RMSPROP etc., PyTorch to construct a package torch.optim
implements all of these rules.
They are very simple to use:
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
.. Note::
Observe how gradient buffers had to be manually set to zero using
``optimizer.zero_grad()``. This is because gradients are accumulated
as explained in `Backprop`_ section.