Pytorch Tutorial【Chapter 3. Simple Neural Network】
Article directory
- Pytorch Tutorial【Chapter 3. Simple Neural Network】
Chapter 3. Simple Neural Network
3.1 Train Neural Network Procedure
A typical neural network training process includes the following points:
-
Define a neural network with trainable parameters
-
Iterate over the entire input
-
Process input through neural network
-
Calculate loss(loss)
-
Backpropagating gradients to parameters of a neural network
-
Update the parameters of the network, typically using a simple update method: weight = weight - learning_rate *gradient
3.2 Build Neural Network Procedure Build neural network
- Define a class and inherit from it
torch.nn.Moulde
- Use
torch.nn
components in classes andtorch.nn.functional
components in to build network structures - Rewrite
forward
the method of this class. In this method, further improve the network structure (such as activation function, pooling layer, etc.), and obtain the output of the return value
torch.nn
First, let’s briefly introduce the components that need to be used.
torch.nn.Conv2d(in_channels, out_channels, kernel_size)
Perform convolution operation, the input is (N, C in, H, W) (N, C_{in},H,W)(N,Cin,H,W ) , the output is( N , C out , H , W ) (N,C_{out},H,W)(N,Cout,H,W ) (seeConv2d), the impact of convolution on image size is as follows
torch.nn.Linear(in_features, out_features)
Perform affine mapping, that is, y = x AT + by=xA^T+by=x AT+b , (seeLinear)torch.nn.Flatten(start_dim=1, end_dim=- 1)
, flatten a continuous range into a tensor (see Flatten for details )
torch.nn.functional
Let’s introduce the components that need to be used
-
torch.nn.functional.relu()
, use the Relu activation function on the input. The specific operation is to perform ReLU ( x ) = max ( 0 , x ) \text{ReLU}(x)=\max(0,x) for each elementReLU ( x )=max(0,x ) calculation, (seeReLu) -
torch.nn.functional.softmax()
, use the Softmax activation function for the input. The specific operation is to perform Softmax ( xi ) = exp ( xi ) ∑ j exp ( xj ) \text{Softmax}(x_i)=\frac{\exp(x_i )}{\sum_{j}\exp(x_j)}Softmax(xi)=∑jexp(xj)exp(xi)(See Softmax for details ) -
torch.nn.functional.max_pool2d(input, kernel_size, stride)
, perform max pooling operation, the input is KaTeX parse error: Expected 'EOF', got '_' at position 28: …atch}, \text{in_̲channels}, iH,i…, see Max_Pool2D for details )
For example, in the following code, we need to build a neural network structure as shown below
输入是 [ batch , channels , height , weight ] [\text{batch},\text{channels},\text{height},\text{weight}] [batch,channels,height,weight ] pictures, respectively representing the batch size, number of channels, image height, and image width. Let's take the batch size to be1 11 , then this example becomes[ 1 , 1 , 32 , 32 ] [1,1,32,32][1,1,32,32 ] , the change process of the tensor is as follows
KaTeX parse error: Expected 'EOF', got '_' at position 74: …arrow{\text{max_̲pool}}[1,6,14,1…
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
# 1 is input_channel 6 is output_channel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# affine function y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self,x):
# Max pooling over a (2,2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
# flat the feature as a vector
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x), dim=1)
return x
def num_flat_features(self,x):
size = x.size()[1:] # all dimensions except the batch dimension
num_feature = 1
for s in size:
num_feature *= s
return num_feature
net = Net()
print(net)
or
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
# 1 is input_channel 6 is output_channel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# affine function y = Wx + b
self.flat = nn.Flatten(1,-1) # flat the feature
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self,x):
# Max pooling over a (2,2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
# flat the feature as a vector
x = self.flat(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x), dim=1)
return x
net = Net()
print(net)
The results are as follows, you can view the network structure
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
The trainable parameters of a model can be net.parameters()
returned by calling,
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
The results are as follows
10
torch.Size([6, 1, 5, 5])
Then we can customize some inputs to obtain the output of the network
input = torch.randn(3, 1, 32, 32)
out = net(input)
print(out.data)
sum = 0
for i in out.data[0]:
sum += i.item()
print(sum)
The output is as follows
tensor([[0.0996, 0.1000, 0.0907, 0.0946, 0.0930, 0.1067, 0.1044, 0.1119, 0.1073,
0.0918],
[0.0975, 0.1005, 0.0913, 0.0935, 0.0939, 0.1070, 0.1045, 0.1121, 0.1065,
0.0933],
[0.0968, 0.1009, 0.0896, 0.0978, 0.0903, 0.1107, 0.1055, 0.1130, 0.1043,
0.0913]])
0.9999999925494194
torch.nn.Module.zero_grad()
Set all parameter gradient registers to zero,
net.zero_grad() #把所有参数梯度缓存器置零,用随机的梯度来反向传播
out.backward(torch.randn_like(out))
net.conv1.bias.grad #查看Conv卷积层的偏置和权重一些参数的梯度
net.conv1.weight.grad
The results are as follows
tensor([-0.0064, 0.0136, -0.0046, 0.0008, -0.0044, 0.0021])
...
3.3 Use Loss Function to Backward Use loss function for back propagation
now we are done
-
Define a neural network
-
Process input and invoke backpropagation
Remaining:
-
Calculate loss value
-
Update weights in the network
loss function
A loss function takes a pair of inputs: a model output and a target, and then computes a value that evaluates how far the output is from the target.
There are a few different loss functions in the nn package (see loss-functions for details ). A simple loss function is nn.MSELoss, which calculates the mean square error
input = torch.randn(3, 1, 32, 32)
target = torch.randn(3, 10)
predict = net(input)
criterion = nn.MSELoss()
loss = criterion(predict, target)
print(loss)
print(loss.grad_fn.next_functions[0][0])
We calculated the MSE loss function and were able to trace its computational graph
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear -> sfotmax
-> MSELoss
-> loss
When we use loss.backward()
, the entire computational graph is differentiated, and in order to implement the backpropagation loss, all we need to do is just use loss.backward()
. You need to clear the existing gradients, otherwise the gradients calculated now will be accumulated with the gradients saved in history .
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
The results are as follows
conv1.bias.grad before backward
None
conv1.bias.grad after backward
tensor([-3.4418e-04, -1.0766e-04, 1.0913e-04, -5.5018e-05, 1.7342e-04,
-5.3316e-04])
3.4 Update Parameter of NN Update neural network parameters
3.4.1 Update Manually Manually update parameters
We can manually implement stochastic gradient descent to update parameters (please refer to Chapter 6. Stochastic Approximation for details )
wk + 1 = wk − ak ∇ wkf ( wk , xk ) \textcolor{red}{w_{k+1} = w_k - a_k \nabla_{w_k} f(w_k,x_k)}wk+1=wk−ak∇wkf(wk,xk)
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate) #sub_() is in-place minus
3.4.2 Update Automatically automatically update parameters
However, if you are using neural networks, you want to use different update rules, similar to SGD, Nesterov-SGD, Adam, RMSProp, etc. To make this possible, we built a small package that torch.optim
implements all the methods. We can use it optim.step()
to replace the above manually implemented code, and it is very simple to use.
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward() # 计算梯度
optimizer.step() # Does the update