[Deep Learning] Loss Function Series (1) Square Loss Function, Cross Entropy Loss Function (including label_smoothing, ignore_index, etc.)

1. Square loss function (Quadratic Loss / MSELoss) :

insert image description here

Pytorchaccomplish:

from torch.nn import MSELoss
loss = nn.MSELoss(reduction="none")
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)

nn.MSELoss()The parameters are as follows:

  • size_average(bool,optional):Deprecated (see: :attr: reduction). By default, the loss is averaged over each loss element in the batch. Note that for some losses there may be multiple elements per sample. If field:attr: size_averageis set to "False", the loss will instead be summed for each mini-batch. When :attr: reduceis "False", this parameter is ignored. Defaults:True
  • reduce(bool,optional):Deprecated (see: :attr: reduction). By default, the loss is averaged or summed over each mini-batch of observations, depending on :attr: size_average. When :attr: reduceis "False", returns the loss per batch element, ignoring :attr: size_average. Defaults:True
  • reduction(str,optional):Specifies the reduction method to apply to the output: 'none'|'mean'|'sum'. "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed. Note: :attr: size_averageand :attr: reduceare being deprecated, during which time specifying either of these parameters will override :attr: reduction. Default value: 'mean'.

2. Cross-entropy loss function (cross-entropy loss)

  1. Binary classification cross entropy loss function
    insert image description here

Pytorchaccomplish:

from torch.nn import BCEWithLogitsLoss
target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
output = torch.full([10, 64], 1.5)  # A prediction (logit)
pos_weight = torch.ones([64])  # All weights are equal to 1
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target)  # -log(sigmoid(1.5))

nn.BCEWithLogitsLoss()The parameters are as follows:

  • weight (Tensor, optional):Manually rescaled weights for each class. If given, must be Ca tensor of size . Otherwise, all weights are considered to be 1.
  • size_average (bool, optional):Deprecated (see :attr: reduction). By default the loss will be averaged over each loss element in the batch. Note that for some losses there may be multiple elements in each sample. If attribute :attr: size_averageis set to "False", then for each mini-batch the losses will be summed instead of averaged. This attribute is ignored when :attr: reduceis "False". Default: Noneignore_index (int, optional): Specifies the target value to ignore and not contribute to the input gradient. When :attr: size_averageis onTrue , the average loss will be applied to targets that are not ignored.
  • reduce (bool, optional):Deprecated (see :attr: reduction). By default, the observations for each mini-batch will be averaged or summed, depending on :attr: size_average. When :attr: reduceis onFalse , return the loss per batch element and ignore :attr: size_average. Defaults:None
  • reduction (str, optional):Specifies the reduction to apply to the output: 'none'| 'mean'| 'sum'. 'none': no reduction will be applied, 'mean': takes a weighted average of the outputs, 'sum': the outputs will be summed. Note: :attr: size_averageand :attr: reduceare being deprecated, meanwhile specifying either of these parameters will override :attr: reduction. Default: 'mean'.
  • pos_weight (Tensor, optional):Used to specify the weight of positive samples. It must be a vector of length equal to the number of categories.

Note1 : BCEWithLogitsLossThe BCELossdifference is that the former calculates the target sigmoid()to achieve the LogSumExp technique and achieve the advantage of numerical stability.

Note2 : pos_weightThe parameter is used to specify the weight of positive samples. Its function is to give higher weight to positive samples when dealing with class imbalance problems, so as to balance the number of categories. By default, all categories are considered equally important, i.e. each category has a weight of 1. If a certain category has fewer numbers than other categories, you can set pos_weighta parameter to give this category a higher weight. pos_weightis a vector whose length is the number of categories, and each element represents the weight of the corresponding category. When the weight of a category is set to a value greater than 1, the contribution of the category to the loss function will become more important.

  1. Multi-classification cross-entropy loss function:

insert image description here

Pytorchaccomplish:

from torch.nn import CrossEntropyLoss

loss = nn.CrossEntropyLoss()

# Example of target with class indices
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)

nn.CrossEntropyLoss()The parameters are as follows:

  • weight (Tensor, optional):Manually rescaled weights for each class. If given, must be a tensor Cof .

  • size_average (bool, optional):Deprecated (see :attr: reduction). By default, the losses are averaged over each batch. Note that for some losses there may be multiple elements per sample. If :attr: size_averageis set to "False", the loss will be accumulated for each mini-batch. reduceIgnored when :attr: is "False". Defaults:True

  • ignore_index (int, optional):Specifies the target value to ignore and not contribute to the input gradient. When :attr: size_averageis onTrue , the loss will be averaged over non-ignored targets. ignore_indexNote that :attr: only applies if the target contains a category index .

  • reduce (bool, optional):Deprecated (see :attr: reduction). By default, the loss size_averageis averaged or summed over each mini-batch of observations according to :attr: . When :attr: reduceis "False", returns the loss per batch element, ignoring :attr: size_average. Defaults:True

  • reduction (str, optional):Specifies the reduction to apply to the output: 'none'| 'mean'| 'sum'. 'none': do not apply reduction, 'mean': take a weighted average of the outputs, 'sum': sum the outputs. Note: :attr: size_averageand :attr: reduceare being deprecated, and specifying either of these parameters will override :attr: reduction. Defaults:'mean'

  • label_smoothing (float, optional):The amount of smoothing used when calculating the loss, the value range is [0.0, 1.0], where 0.0 means no smoothing. The target becomes a mixture of the original labels and the uniform distribution, as described in Rethinking the Inception Architecture for Computer Vision . Default: :math: 0.0.

Note1 : The main function of Pytorchthe medium is the result of merging softmax-log-NLLLoss together.CrossEntropyLoss()

  • SoftmaxThe subsequent values ​​are all between 0 and 1 , so lnthe subsequent value range is from negative infinity to 0.
  • Then Softmaxtake the subsequent results log, so that multiplication is changed to addition to reduce the amount of calculation, while ensuring the monotonicity of the function.
  • to NLLLosscalculate.
    insert image description here

Note2 : label_smoothingis a technique used to reduce overconfidence when computing cross-entropy loss. When training a classifier, labels are usually treated as one-hot vectors, i.e. only the correct class has a probability of 1, and the rest are 0. However, this approach can lead to a model that is very confident about the predicted classes, even if the predicted probability distribution is far from the actual distribution. To alleviate this problem, label smoothing techniques can be used. In label smoothing techniques, the probability of a correct label is reduced from 1 to 1-ε, while the probability of a wrong label is increased from 0 to ε/(C-1) , where C is the number of categories . This can reduce the impact of the correct label on the predicted distribution, make the model more cautious, and avoid overfitting to the noise on the training set. In
, you can use the parameter to specify the strength of label smoothing, ranging from 0 to 1 . By default, no label smoothing is done, ie .nn.CrossEntropyLosslabel_smoothinglabel_smoothing=0

BCEWithLogitsLossIt is not implemented in label_smoothing, ignore_indexwe can implement it ourselves, refer to pytorch-toolbelt :

from typing import Optional

import torch
import torch.nn.functional as F
from torch import nn, Tensor

class SoftBCEWithLogitsLoss(nn.Module):

    __constants__ = [
        "weight",
        "pos_weight",
        "reduction",
        "ignore_index",
        "smooth_factor",
    ]

    def __init__(
        self,
        weight: Optional[torch.Tensor] = None,
        ignore_index: Optional[int] = -100,
        reduction: str = "mean",
        smooth_factor: Optional[float] = None,
        pos_weight: Optional[torch.Tensor] = None,
    ):
        """Drop-in replacement for torch.nn.BCEWithLogitsLoss with few additions: ignore_index and label_smoothing

        Args:
            ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient.
            smooth_factor: Factor to smooth target (e.g. if smooth_factor=0.1 then [1, 0, 1] -> [0.9, 0.1, 0.9])

        Shape
             - **y_pred** - torch.Tensor of shape NxCxHxW
             - **y_true** - torch.Tensor of shape NxHxW or Nx1xHxW
        """
        super().__init__()
        self.ignore_index = ignore_index
        self.reduction = reduction
        self.smooth_factor = smooth_factor
        self.register_buffer("weight", weight)
        self.register_buffer("pos_weight", pos_weight)

    def forward(self, y_pred: torch.Tensor, y_true: torch.Tensor) -> torch.Tensor:
        """
        Args:
            y_pred: torch.Tensor of shape (N, C, H, W)
            y_true: torch.Tensor of shape (N, H, W)  or (N, 1, H, W)

        Returns:
            loss: torch.Tensor
        """

        if self.smooth_factor is not None:
            soft_targets = (1 - y_true) * self.smooth_factor + y_true * (1 - self.smooth_factor)
        else:
            soft_targets = y_true

        loss = F.binary_cross_entropy_with_logits(
            y_pred,
            soft_targets,
            self.weight,
            pos_weight=self.pos_weight,
            reduction="none",
        )

        if self.ignore_index is not None:
            not_ignored_mask = y_true != self.ignore_index
            loss *= not_ignored_mask.type_as(loss)

        if self.reduction == "mean":
            loss = loss.mean()

        if self.reduction == "sum":
            loss = loss.sum()

        return loss

Guess you like

Origin blog.csdn.net/qq_43456016/article/details/130459645