1. Loss function
Students who have a theoretical basis for deep learning must be familiar with loss functions and backpropagation, so I will not introduce the theory in detail here. The loss function refers to the function used to calculate the difference between the label value and the predicted value. In the process of machine learning, there are a variety of loss functions to choose from, typically distance vectors, absolute value vectors, etc. The process of using the loss function is summarized as follows:
- Calculate the gap between the actual output and the target.
- Provide a certain basis for us to update the output (backpropagation).
Official documentation for loss functions: Loss Functions .
(1) nn.L1Loss
: Mean Absolute Error (MAE, Mean Absolute Error), the calculation method is very simple, just take the average of the absolute error of the predicted value and the real value, the formula is: loss = ∣ x 1 − t 1 ∣ + ∣ x 2 − t 2 ∣ + ⋯ + ∣ xn − tn ∣ n loss=\frac{|x_1-t_1|+|x_2-t_2|+\dots +|x_n-t_n|}{n}loss=n∣x1−t1∣+∣x2−t2∣+⋯+∣xn−tn∣。
nn.L1Loss
The data shape in PyTorch1.13 is specified as follows:
Earlier versions needed to specify the batch_size size, which is not needed now. Parameters can be set reduction
, the default mean
is to take the average value, or it can be set to sum
, as the name implies, to take the sum.
The test code is as follows:
import torch.nn as nn
import torch
input = torch.tensor([1.0, 2.0, 3.0])
target = torch.tensor([4.0, -2.0, 5.0])
loss = nn.L1Loss()
result = loss(input, target)
print(result) # tensor(3.)
loss = nn.L1Loss(reduction='sum')
result = loss(input, target)
print(result) # tensor(9.)
(2) nn.MSELoss
: Mean Squared Error (MSE, Mean Squared Error), which is the average of the sum of squares of the difference between the predicted value and the actual value, the formula is: loss = ( x 1 − t 1 ) 2 + ( x 2 − t 2 ) 2 + ⋯ + ( xn − tn ) 2 n loss=\frac{(x_1-t_1)^2+(x_2-t_2)^2+\dots +(x_n-t_n)^2}{n}loss=n(x1−t1)2+(x2−t2)2+⋯+(xn−tn)2。
The usage of this loss function is nn.L1Loss
similar to , the code is as follows:
import torch.nn as nn
import torch
input = torch.tensor([1.0, 2.0, 3.0])
target = torch.tensor([4.0, -2.0, 5.0])
loss = nn.MSELoss()
result = loss(input, target)
print(result) # tensor(9.6667)
loss = nn.MSELoss(reduction='sum')
result = loss(input, target)
print(result) # tensor(29.)
(3) nn.CrossEntropyLoss
: Cross entropy error, training classification CCThis loss function is more commonly used for models of C categories, generally used behind the Softmax layer, assumingxxx is a three-category prediction (C = 3 C=3C=3 ) The output result:[ 0.1 , 0.7 , 0.2 ] [0.1,0.7,0.2][0.1,0.7,0.2], t a r g e t = 1 target=1 target=1 is the label of the correct solution (the subscript starts from 0), then the calculation formula of the loss function is:loss ( x , target ) = − wtargetlogexp ( xtarget ) Σ i = 0 C − 1 exp ( xi ) = wtarget ( − xtarget + log Σ i = 0 C − 1 exp ( xi ) ) loss(x, target)=-w_{target}log\frac{exp(x_{target})}{\Sigma _{i=0}^{C -1}exp(x_i)}=w_{target}(-x_{target}+log\Sigma_{i=0}^{C-1}exp(x_i))loss(x,target)=−wtargetlogSi=0C−1exp(xi)exp(xtarget)=wtarget(−xtarget+logΣi=0C−1exp(xi))。
nn.CrossEntropyLoss
The data shape in PyTorch1.13 is specified as follows:
The test code is as follows:
import torch.nn as nn
import torch
input = torch.tensor([0.1, 0.7, 0.2])
target = torch.tensor(1)
loss = nn.CrossEntropyLoss()
result = loss(input, target)
print(result) # tensor(0.7679)
input = torch.tensor([0.8, 0.1, 0.1])
result = loss(input, target)
print(result) # tensor(1.3897)
2. Backpropagation
Next, take the CIFAR10 data set as an example, use the neural network built in the previous section ( PyTorch study notes - neural network model building practice ) first set batch_size to 1, and look at the output results:
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn
class CIFAR10_Network(nn.Module):
def __init__(self):
super(CIFAR10_Network, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 32, 32]
nn.MaxPool2d(kernel_size=2), # [32, 16, 16]
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 16, 16]
nn.MaxPool2d(kernel_size=2), # [32, 8, 8]
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2), # [64, 8, 8]
nn.MaxPool2d(kernel_size=2), # [64, 4, 4]
nn.Flatten(), # [1024]
nn.Linear(in_features=1024, out_features=64), # [64]
nn.Linear(in_features=64, out_features=10) # [10]
)
def forward(self, input):
output = self.model(input)
return output
network = CIFAR10_Network()
test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=1)
loss = nn.CrossEntropyLoss()
for step, data in enumerate(data_loader):
imgs, targets = data
output = network(imgs)
output_loss = loss(output, targets)
print(output)
print(targets)
print(output_loss)
# tensor([[ 0.1252, -0.1069, -0.0747, 0.0232, 0.0852, 0.1019, 0.0688, -0.1068,
# 0.0854, -0.0740]], grad_fn=<AddmmBackward0>)
# tensor([3])
# tensor(2.2960, grad_fn=<NllLossBackward0>)
Now let's try to solve the second problem, which is how the loss function provides some basis for us to update the output (backpropagation).
For example, for the convolutional layer, each parameter in the convolution kernel is what we need to adjust. Each parameter has an attribute grad
to represent gradient. During backpropagation, each parameter to be updated will find the corresponding gradient. During the optimization process, the parameters can be optimized according to this gradient, and finally the purpose of reducing the value of the loss function can be achieved.
In PyTorch, use backward
the function :
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn
class CIFAR10_Network(nn.Module):
def __init__(self):
super(CIFAR10_Network, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 32, 32]
nn.MaxPool2d(kernel_size=2), # [32, 16, 16]
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 16, 16]
nn.MaxPool2d(kernel_size=2), # [32, 8, 8]
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2), # [64, 8, 8]
nn.MaxPool2d(kernel_size=2), # [64, 4, 4]
nn.Flatten(), # [1024]
nn.Linear(in_features=1024, out_features=64), # [64]
nn.Linear(in_features=64, out_features=10) # [10]
)
def forward(self, input):
output = self.model(input)
return output
network = CIFAR10_Network()
test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=1)
loss = nn.CrossEntropyLoss()
for step, data in enumerate(data_loader):
imgs, targets = data
output = network(imgs)
output_loss = loss(output, targets)
output_loss.backward() # 反向传播
We set a breakpoint before calculating backpropagation, and then we can view the gradient of a certain layer of parameters through the following directory, which is None before backpropagation:
After executing the backpropagation code, you can see grad
that there is a value at:
We have the gradient of each node parameter, and then we can choose a suitable optimizer to optimize these parameters.
3. Optimizer
torch.optim
Official documentation for the optimizer : TORCH.OPTIM .
The optimizer mainly updates the learnable parameters of the model during the model training phase. Commonly used optimizers include: SGD, RMSprop, Adam, etc. The learnable parameters passed into the model when the optimizer is initialized, as well as other hyperparameters such as lr
, momentum
etc., for example:
import torch.optim as optim
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr=0.0001)
During the training process, first call optimizer.zero_grad()
to clear the gradient, then call to loss.backward()
backpropagation, and finally call to optimizer.step()
update the model parameters, for example:
for step, data in enumerate(data_loader):
imgs, targets = data
output = network(imgs)
output_loss = loss(output, targets)
optimizer.zero_grad()
output_loss.backward()
optimizer.step()
Next, let's train the neural network for 20 rounds and see the change of the loss function value:
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
class CIFAR10_Network(nn.Module):
def __init__(self):
super(CIFAR10_Network, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 32, 32]
nn.MaxPool2d(kernel_size=2), # [32, 16, 16]
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2), # [32, 16, 16]
nn.MaxPool2d(kernel_size=2), # [32, 8, 8]
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2), # [64, 8, 8]
nn.MaxPool2d(kernel_size=2), # [64, 4, 4]
nn.Flatten(), # [1024]
nn.Linear(in_features=1024, out_features=64), # [64]
nn.Linear(in_features=64, out_features=10) # [10]
)
def forward(self, input):
output = self.model(input)
return output
network = CIFAR10_Network()
test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=64)
loss = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=0.01)
for epoch in range(20): # 学习20轮
total_loss = 0.0
for step, data in enumerate(data_loader):
imgs, targets = data
output = network(imgs)
output_loss = loss(output, targets)
total_loss += output_loss
optimizer.zero_grad()
output_loss.backward()
optimizer.step()
print(total_loss)
The training results are shown in the figure below. It can be seen that the sum of the loss function values of all batches in each round is indeed continuously decreasing: