PyTorch study notes - loss function and backpropagation

1. Loss function

Students who have a theoretical basis for deep learning must be familiar with loss functions and backpropagation, so I will not introduce the theory in detail here. The loss function refers to the function used to calculate the difference between the label value and the predicted value. In the process of machine learning, there are a variety of loss functions to choose from, typically distance vectors, absolute value vectors, etc. The process of using the loss function is summarized as follows:

  1. Calculate the gap between the actual output and the target.
  2. Provide a certain basis for us to update the output (backpropagation).

Official documentation for loss functions: Loss Functions .

(1) nn.L1Loss: Mean Absolute Error (MAE, Mean Absolute Error), the calculation method is very simple, just take the average of the absolute error of the predicted value and the real value, the formula is: loss = ∣ x 1 − t 1 ∣ + ∣ x 2 − t 2 ∣ + ⋯ + ∣ xn − tn ∣ n loss=\frac{|x_1-t_1|+|x_2-t_2|+\dots +|x_n-t_n|}{n}loss=nx1t1+x2t2++xntn

nn.L1LossThe data shape in PyTorch1.13 is specified as follows:

insert image description here

Earlier versions needed to specify the batch_size size, which is not needed now. Parameters can be set reduction, the default meanis to take the average value, or it can be set to sum, as the name implies, to take the sum.

The test code is as follows:

import torch.nn as nn
import torch

input = torch.tensor([1.0, 2.0, 3.0])
target = torch.tensor([4.0, -2.0, 5.0])

loss = nn.L1Loss()
result = loss(input, target)

print(result)  # tensor(3.)

loss = nn.L1Loss(reduction='sum')
result = loss(input, target)

print(result)  # tensor(9.)

(2) nn.MSELoss: Mean Squared Error (MSE, Mean Squared Error), which is the average of the sum of squares of the difference between the predicted value and the actual value, the formula is: loss = ( x 1 − t 1 ) 2 + ( x 2 − t 2 ) 2 + ⋯ + ( xn − tn ) 2 n loss=\frac{(x_1-t_1)^2+(x_2-t_2)^2+\dots +(x_n-t_n)^2}{n}loss=n(x1t1)2+(x2t2)2++(xntn)2

The usage of this loss function is nn.L1Losssimilar to , the code is as follows:

import torch.nn as nn
import torch

input = torch.tensor([1.0, 2.0, 3.0])
target = torch.tensor([4.0, -2.0, 5.0])

loss = nn.MSELoss()
result = loss(input, target)

print(result)  # tensor(9.6667)

loss = nn.MSELoss(reduction='sum')
result = loss(input, target)

print(result)  # tensor(29.)

(3) nn.CrossEntropyLoss: Cross entropy error, training classification CCThis loss function is more commonly used for models of C categories, generally used behind the Softmax layer, assumingxxx is a three-category prediction (C = 3 C=3C=3 ) The output result:[ 0.1 , 0.7 , 0.2 ] [0.1,0.7,0.2][0.1,0.7,0.2] t a r g e t = 1 target=1 target=1 is the label of the correct solution (the subscript starts from 0), then the calculation formula of the loss function is:loss ( x , target ) = − wtargetlogexp ( xtarget ) Σ i = 0 C − 1 exp ( xi ) = wtarget ( − xtarget + log Σ i = 0 C − 1 exp ( xi ) ) loss(x, target)=-w_{target}log\frac{exp(x_{target})}{\Sigma _{i=0}^{C -1}exp(x_i)}=w_{target}(-x_{target}+log\Sigma_{i=0}^{C-1}exp(x_i))loss(x,target)=wtargetlogSi=0C1exp(xi)exp(xtarget)=wtarget(xtarget+logΣi=0C1exp(xi))

nn.CrossEntropyLossThe data shape in PyTorch1.13 is specified as follows:

insert image description here

The test code is as follows:

import torch.nn as nn
import torch

input = torch.tensor([0.1, 0.7, 0.2])
target = torch.tensor(1)

loss = nn.CrossEntropyLoss()
result = loss(input, target)

print(result)  # tensor(0.7679)

input = torch.tensor([0.8, 0.1, 0.1])
result = loss(input, target)

print(result)  # tensor(1.3897)

2. Backpropagation

Next, take the CIFAR10 data set as an example, use the neural network built in the previous section ( PyTorch study notes - neural network model building practice ) first set batch_size to 1, and look at the output results:

from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn

class CIFAR10_Network(nn.Module):
    def __init__(self):
        super(CIFAR10_Network, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 32, 32]
            nn.MaxPool2d(kernel_size=2),  # [32, 16, 16]
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 16, 16]
            nn.MaxPool2d(kernel_size=2),  # [32, 8, 8]
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2),  # [64, 8, 8]
            nn.MaxPool2d(kernel_size=2),  # [64, 4, 4]
            nn.Flatten(),  # [1024]
            nn.Linear(in_features=1024, out_features=64),  # [64]
            nn.Linear(in_features=64, out_features=10) # [10]
        )

    def forward(self, input):
        output = self.model(input)
        return output

network = CIFAR10_Network()

test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=1)

loss = nn.CrossEntropyLoss()

for step, data in enumerate(data_loader):
    imgs, targets = data
    output = network(imgs)
    output_loss = loss(output, targets)
    print(output)
    print(targets)
    print(output_loss)

# tensor([[ 0.1252, -0.1069, -0.0747,  0.0232,  0.0852,  0.1019,  0.0688, -0.1068,
#           0.0854, -0.0740]], grad_fn=<AddmmBackward0>)

# tensor([3])

# tensor(2.2960, grad_fn=<NllLossBackward0>)

Now let's try to solve the second problem, which is how the loss function provides some basis for us to update the output (backpropagation).

For example, for the convolutional layer, each parameter in the convolution kernel is what we need to adjust. Each parameter has an attribute gradto represent gradient. During backpropagation, each parameter to be updated will find the corresponding gradient. During the optimization process, the parameters can be optimized according to this gradient, and finally the purpose of reducing the value of the loss function can be achieved.

In PyTorch, use backwardthe function :

from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn

class CIFAR10_Network(nn.Module):
    def __init__(self):
        super(CIFAR10_Network, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 32, 32]
            nn.MaxPool2d(kernel_size=2),  # [32, 16, 16]
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 16, 16]
            nn.MaxPool2d(kernel_size=2),  # [32, 8, 8]
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2),  # [64, 8, 8]
            nn.MaxPool2d(kernel_size=2),  # [64, 4, 4]
            nn.Flatten(),  # [1024]
            nn.Linear(in_features=1024, out_features=64),  # [64]
            nn.Linear(in_features=64, out_features=10) # [10]
        )

    def forward(self, input):
        output = self.model(input)
        return output

network = CIFAR10_Network()

test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=1)

loss = nn.CrossEntropyLoss()

for step, data in enumerate(data_loader):
    imgs, targets = data
    output = network(imgs)
    output_loss = loss(output, targets)
    output_loss.backward()  # 反向传播

We set a breakpoint before calculating backpropagation, and then we can view the gradient of a certain layer of parameters through the following directory, which is None before backpropagation:

insert image description here

After executing the backpropagation code, you can see gradthat there is a value at:

insert image description here

We have the gradient of each node parameter, and then we can choose a suitable optimizer to optimize these parameters.

3. Optimizer

torch.optimOfficial documentation for the optimizer : TORCH.OPTIM .

The optimizer mainly updates the learnable parameters of the model during the model training phase. Commonly used optimizers include: SGD, RMSprop, Adam, etc. The learnable parameters passed into the model when the optimizer is initialized, as well as other hyperparameters such as lr, momentumetc., for example:

import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr=0.0001)

During the training process, first call optimizer.zero_grad()to clear the gradient, then call to loss.backward()backpropagation, and finally call to optimizer.step()update the model parameters, for example:

for step, data in enumerate(data_loader):
    imgs, targets = data
    output = network(imgs)
    output_loss = loss(output, targets)
    optimizer.zero_grad()
    output_loss.backward()
    optimizer.step()

Next, let's train the neural network for 20 rounds and see the change of the loss function value:

from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim

class CIFAR10_Network(nn.Module):
    def __init__(self):
        super(CIFAR10_Network, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 32, 32]
            nn.MaxPool2d(kernel_size=2),  # [32, 16, 16]
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=1, padding=2),  # [32, 16, 16]
            nn.MaxPool2d(kernel_size=2),  # [32, 8, 8]
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=1, padding=2),  # [64, 8, 8]
            nn.MaxPool2d(kernel_size=2),  # [64, 4, 4]
            nn.Flatten(),  # [1024]
            nn.Linear(in_features=1024, out_features=64),  # [64]
            nn.Linear(in_features=64, out_features=10) # [10]
        )

    def forward(self, input):
        output = self.model(input)
        return output

network = CIFAR10_Network()

test_set = datasets.CIFAR10('dataset/CIFAR10', train=False, transform=transforms.ToTensor())
data_loader = DataLoader(test_set, batch_size=64)

loss = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=0.01)

for epoch in range(20):  # 学习20轮
    total_loss = 0.0
    for step, data in enumerate(data_loader):
        imgs, targets = data
        output = network(imgs)
        output_loss = loss(output, targets)
        total_loss += output_loss
        optimizer.zero_grad()
        output_loss.backward()
        optimizer.step()
    print(total_loss)

The training results are shown in the figure below. It can be seen that the sum of the loss function values ​​of all batches in each round is indeed continuously decreasing:

insert image description here

Guess you like

Origin blog.csdn.net/m0_51755720/article/details/128083208