Typical process for building a neural network
- Define a neural network with learnable parameters
- Iterate over the training data set
- Process input data to flow through a neural network
- Calculate loss value
- Backpropagate the gradients of network parameters
- Update the weight of the network according to certain rules
We first define a neural network implemented in Pytorch:
# 导入若干工具包
import torch
import torch.nn as nn
import torch.nn.functional as F
# 定义一个简单的网络类
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 定义第一层卷积神经网络, 输入通道维度=1, 输出通道维度=6, 卷积核大小3*3
self.conv1 = nn.Conv2d(1, 6, 3)
# 定义第二层卷积神经网络, 输入通道维度=6, 输出通道维度=16, 卷积核大小3*3
self.conv2 = nn.Conv2d(6, 16, 3)
# 定义三层全连接网络
self.fc1 = nn.Linear(16 * 6 * 6, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# 在(2, 2)的池化窗口下执行最大池化操作
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
# 计算size, 除了第0个维度上的batch_size
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Running results
Note:
All trainable parameters in the model can be obtained through net.parameters().
params = list(net.parameters())
print(len(params))
print(params[0].size())
operation result:
- Assume the input size of the image is 32 * 32:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
operation result
- Once you have the output tensor, you can perform gradient zeroing and backpropagation operations.
net.zero_grad()
out.backward(torch.randn(1, 10))
- Note
- The neural network built by torch.nn only supports the input of mini-batches and does not support the input of a single sample. -
For example: nn.Conv2d requires a 4D Tensor with a shape of (nSamples, nChannels, Height, Width). If your If the input is only in the form of a single sample, you need to execute input.unsqueeze(0) to actively expand the 3D Tensor into a 4D Tensor.
loss function
- The input of the loss function is an input pair: (output, target), and then a numerical value is calculated to evaluate the gap between output and target.
- There are several different loss functions available in torch.nn. For example, nn.MSELoss evaluates the difference between the input and the target value by calculating the mean square error loss.
- An example of applying nn.MSELoss to calculate loss:
output = net(input)
target = torch.randn(10)
# 改变target的形状为二维张量, 为了和output匹配
target = target.view(1, -1)
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
operation result:
- Regarding the chain of directional propagation: If we track the direction of loss backpropagation and print it using the .grad_fn attribute, we will see a complete calculation graph as follows:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
- When loss.backward() is called, the entire calculation graph will automatically derive the loss. All Tensors with the attribute requires_grad=True will participate in the gradient derivation operation, and the gradient will be accumulated into the .grad attribute in Tensors.
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
Running results:
backpropagation
- Performing backpropagation in Pytorch is very simple, all the operation is loss.backward().
- Before performing backpropagation, the gradients must be cleared to zero, otherwise the gradients will be accumulated between different batches of data.
Perform a small example of backpropagation:
# Pytorch中执行梯度清零的代码
net.zero_grad()
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
# Pytorch中执行反向传播的代码
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Running results:
Update network parameters
- The simplest algorithm for updating parameters is SGD (stochastic gradient descent).
- The specific algorithm formula expression is: weight = weight - learning_rate
gradient. First, use traditional Python code to implement SGD as follows:
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
Then use the standard code officially recommended by Pytorch as follows:
# 首先导入优化器的包, optim中包含若干常用的优化算法, 比如SGD, Adam等
import torch.optim as optim
# 通过optim创建优化器对象
optimizer = optim.SGD(net.parameters(), lr=0.01)
# 将优化器执行梯度清零的操作
optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
# 对损失值执行反向传播的操作
loss.backward()
# 参数的更新通过一行标准代码来执行
optimizer.step()
Section Summary
: Learned the typical process of building a neural network:
- Define a neural network with learnable parameters
- Iterate over the training data set
- Process input data to flow through a neural network
- Calculate loss value
- Backpropagate the gradients of network parameters
- Update the weight of the network according to certain rules
Learned the definition of loss function:
- Use torch.nn.MSELoss() to calculate the mean square error.
- When performing backpropagation calculation through loss.backward(), the entire calculation graph will automatically derive the loss.
All Tensors with the attribute requires_grad=True will participate in the gradient derivation operation, and the gradient will be accumulated into the Tensors. in the grad attribute.
Learned the calculation method of backpropagation:
- Performing backpropagation in Pytorch is very simple, all the operation is loss.backward().
- Before performing backpropagation, the gradients must be cleared to zero, otherwise the gradients will be accumulated between different batches of data.
- net.zero_grad()
- loss.backward()
Learned how to update parameters:
-
Define an optimizer to perform optimization and update of parameters.
optimizer = optim.SGD(net.parameters(), lr=0.01)
-
Specific parameter updates are performed through the optimizer.
optimizer.step()