PyTorch study notes - method of complete training model

1. Precautions when training the model

(1) Usually we put the hyperparameter settings together to make the code more intuitive and easy to modify:

BATCH_SIZE = 64
LEARNING_RATE = 0.01
EPOCH = 10

(2) In each round of epoch, we will first train the training set, and then use the test set to test the accuracy rate, so generally we will record the total number of training times total_train_stepand total_test_step.

(3) It is generally necessary to set the model to the training state before starting training, and to the evaluation state before testing. These two states will affect a small number of layers such as Dropoutand BatchNorm:

model.train()
	# training

model.eval()
	# evaluation

(4) The accuracy rate is generally calculated in the following methods in classification problems:

import torch

a = torch.tensor([
    [0.3, 0.7],
    [0.6, 0.4]
])  # 假设两个物体二分类的结果

b = torch.tensor([0, 0])  # 正确的标签

print(a.argmax(dim=1)) # tensor([1, 0])，在第1维上取最大值，即对每一行求最大值，将最大值作为分类结果

print(a.argmax(dim=1) == b)  # tensor([False,  True])，与标签进行比较，第一个物体的结果与标签不符，第二个和标签相符

print((a.argmax(dim=1) == b).sum())  # tensor(1)，将所有物体与标签的比较结果求和就是 True 的数量，也就是预测正确的数量

(5) Do not interfere with the model during the test, that is, the neural network cannot generate gradients during the test, so the following code needs to be added before each test:

with torch.no_grad():
	# evaluation

2. Use GPU for training

Prerequisite: The computer has an NVIDIA graphics card and CUDA is configured, and you can use torch.cuda.is_available()to check whether CUDA is available.

When using GPU for training, it is necessary to transfer the Module object and Tensor type data to the GPU for calculation. Generally speaking, it is to put the network model, data, and loss function on the GPU for calculation.

There are two ways to use GPU training, the first is to use cuda()the function , for example:

# 网络模型
model = MyNetwork()
model = model.cuda()

# 损失函数
loss_function = nn.CrossEntropyLoss()
loss_function = loss_function.cuda()

# 数据
for step, data in enumerate(data_loader):
	imgs, targets = data
	imgs = imgs.cuda()
	targets = targets.cuda()

The other is to use to(device), devicewhich is the device we choose to train the model. This method cuda()is slightly different from the following:

For Tensor type data (images, labels, etc.), to(device)after , you need to receive the return value, and the return value is the correctly set deviceTensor.
For the Module object (network model, loss function), the model can to(device)be device, and there is no need to receive the return value. Of course, it is also possible to receive the return value.

For example:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  # 'cuda:0' 表示第 0 号 GPU

# 网络模型
model = MyNetwork()
model.to(device)

# 损失函数
loss_function = nn.CrossEntropyLoss()
loss_function.to(device)

# 数据
for step, data in enumerate(data_loader):
	imgs, targets = data
	imgs = imgs.to(device)
	targets = targets.to(device)