1. Square loss function (Quadratic Loss / MSELoss) :
Pytorch
accomplish:
from torch.nn import MSELoss
loss = nn.MSELoss(reduction="none")
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
nn.MSELoss()
The parameters are as follows:
size_average(bool,optional):
Deprecated (see: :attr:reduction
). By default, the loss is averaged over each loss element in the batch. Note that for some losses there may be multiple elements per sample. If field:attr:size_average
is set to "False", the loss will instead be summed for each mini-batch. When :attr:reduce
is "False", this parameter is ignored. Defaults:True
reduce(bool,optional):
Deprecated (see: :attr:reduction
). By default, the loss is averaged or summed over each mini-batch of observations, depending on :attr:size_average
. When :attr:reduce
is "False", returns the loss per batch element, ignoring :attr:size_average
. Defaults:True
reduction(str,optional):
Specifies the reduction method to apply to the output: 'none'|'mean'|'sum'. "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed. Note: :attr:size_average
and :attr:reduce
are being deprecated, during which time specifying either of these parameters will override :attr:reduction
. Default value:'mean'
.
2. Cross-entropy loss function (cross-entropy loss)
- Binary classification cross entropy loss function
Pytorch
accomplish:
from torch.nn import BCEWithLogitsLoss
target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10
output = torch.full([10, 64], 1.5) # A prediction (logit)
pos_weight = torch.ones([64]) # All weights are equal to 1
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target) # -log(sigmoid(1.5))
nn.BCEWithLogitsLoss()
The parameters are as follows:
weight (Tensor, optional):
Manually rescaled weights for each class. If given, must beC
a tensor of size . Otherwise, all weights are considered to be 1.size_average (bool, optional):
Deprecated (see :attr:reduction
). By default the loss will be averaged over each loss element in the batch. Note that for some losses there may be multiple elements in each sample. If attribute :attr:size_average
is set to "False", then for each mini-batch the losses will be summed instead of averaged. This attribute is ignored when :attr:reduce
is "False". Default:None
ignore_index (int, optional): Specifies the target value to ignore and not contribute to the input gradient. When :attr:size_average
is onTrue
, the average loss will be applied to targets that are not ignored.reduce (bool, optional):
Deprecated (see :attr:reduction
). By default, the observations for each mini-batch will be averaged or summed, depending on :attr:size_average
. When :attr:reduce
is onFalse
, return the loss per batch element and ignore :attr:size_average
. Defaults:None
reduction (str, optional):
Specifies the reduction to apply to the output:'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: takes a weighted average of the outputs,'sum'
: the outputs will be summed. Note: :attr:size_average
and :attr:reduce
are being deprecated, meanwhile specifying either of these parameters will override :attr:reduction
. Default:'mean'
.pos_weight (Tensor, optional):
Used to specify the weight of positive samples. It must be a vector of length equal to the number of categories.
Note1 : BCEWithLogitsLoss
The BCELoss
difference is that the former calculates the target sigmoid()
to achieve the LogSumExp technique and achieve the advantage of numerical stability.
Note2 : pos_weight
The parameter is used to specify the weight of positive samples. Its function is to give higher weight to positive samples when dealing with class imbalance problems, so as to balance the number of categories. By default, all categories are considered equally important, i.e. each category has a weight of 1. If a certain category has fewer numbers than other categories, you can set pos_weight
a parameter to give this category a higher weight. pos_weight
is a vector whose length is the number of categories, and each element represents the weight of the corresponding category. When the weight of a category is set to a value greater than 1, the contribution of the category to the loss function will become more important.
- Multi-classification cross-entropy loss function:
Pytorch
accomplish:
from torch.nn import CrossEntropyLoss
loss = nn.CrossEntropyLoss()
# Example of target with class indices
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
nn.CrossEntropyLoss()
The parameters are as follows:
-
weight (Tensor, optional):
Manually rescaled weights for each class. If given, must be a tensorC
of . -
size_average (bool, optional):
Deprecated (see :attr:reduction
). By default, the losses are averaged over each batch. Note that for some losses there may be multiple elements per sample. If :attr:size_average
is set to "False", the loss will be accumulated for each mini-batch.reduce
Ignored when :attr: is "False". Defaults:True
-
ignore_index (int, optional):
Specifies the target value to ignore and not contribute to the input gradient. When :attr:size_average
is onTrue
, the loss will be averaged over non-ignored targets.ignore_index
Note that :attr: only applies if the target contains a category index . -
reduce (bool, optional):
Deprecated (see :attr:reduction
). By default, the losssize_average
is averaged or summed over each mini-batch of observations according to :attr: . When :attr:reduce
is "False", returns the loss per batch element, ignoring :attr:size_average
. Defaults:True
-
reduction (str, optional):
Specifies the reduction to apply to the output:'none'
|'mean'
|'sum'
.'none'
: do not apply reduction,'mean'
: take a weighted average of the outputs,'sum'
: sum the outputs. Note: :attr:size_average
and :attr:reduce
are being deprecated, and specifying either of these parameters will override :attr:reduction
. Defaults:'mean'
-
label_smoothing (float, optional):
The amount of smoothing used when calculating the loss, the value range is [0.0, 1.0], where 0.0 means no smoothing. The target becomes a mixture of the original labels and the uniform distribution, as described in Rethinking the Inception Architecture for Computer Vision . Default: :math:0.0
.
Note1 : The main function of Pytorch
the medium is the result of merging softmax-log-NLLLoss together.CrossEntropyLoss()
Softmax
The subsequent values are all between 0 and 1 , soln
the subsequent value range is from negative infinity to 0.- Then
Softmax
take the subsequent resultslog
, so that multiplication is changed to addition to reduce the amount of calculation, while ensuring the monotonicity of the function. - to
NLLLoss
calculate.
Note2 : label_smoothing
is a technique used to reduce overconfidence when computing cross-entropy loss. When training a classifier, labels are usually treated as one-hot vectors, i.e. only the correct class has a probability of 1, and the rest are 0. However, this approach can lead to a model that is very confident about the predicted classes, even if the predicted probability distribution is far from the actual distribution. To alleviate this problem, label smoothing techniques can be used. In label smoothing techniques, the probability of a correct label is reduced from 1 to 1-ε, while the probability of a wrong label is increased from 0 to ε/(C-1) , where C is the number of categories . This can reduce the impact of the correct label on the predicted distribution, make the model more cautious, and avoid overfitting to the noise on the training set. In
, you can use the parameter to specify the strength of label smoothing, ranging from 0 to 1 . By default, no label smoothing is done, ie .nn.CrossEntropyLoss
label_smoothing
label_smoothing=0
BCEWithLogitsLoss
It is not implemented in label_smoothing
, ignore_index
we can implement it ourselves, refer to pytorch-toolbelt :
from typing import Optional
import torch
import torch.nn.functional as F
from torch import nn, Tensor
class SoftBCEWithLogitsLoss(nn.Module):
__constants__ = [
"weight",
"pos_weight",
"reduction",
"ignore_index",
"smooth_factor",
]
def __init__(
self,
weight: Optional[torch.Tensor] = None,
ignore_index: Optional[int] = -100,
reduction: str = "mean",
smooth_factor: Optional[float] = None,
pos_weight: Optional[torch.Tensor] = None,
):
"""Drop-in replacement for torch.nn.BCEWithLogitsLoss with few additions: ignore_index and label_smoothing
Args:
ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient.
smooth_factor: Factor to smooth target (e.g. if smooth_factor=0.1 then [1, 0, 1] -> [0.9, 0.1, 0.9])
Shape
- **y_pred** - torch.Tensor of shape NxCxHxW
- **y_true** - torch.Tensor of shape NxHxW or Nx1xHxW
"""
super().__init__()
self.ignore_index = ignore_index
self.reduction = reduction
self.smooth_factor = smooth_factor
self.register_buffer("weight", weight)
self.register_buffer("pos_weight", pos_weight)
def forward(self, y_pred: torch.Tensor, y_true: torch.Tensor) -> torch.Tensor:
"""
Args:
y_pred: torch.Tensor of shape (N, C, H, W)
y_true: torch.Tensor of shape (N, H, W) or (N, 1, H, W)
Returns:
loss: torch.Tensor
"""
if self.smooth_factor is not None:
soft_targets = (1 - y_true) * self.smooth_factor + y_true * (1 - self.smooth_factor)
else:
soft_targets = y_true
loss = F.binary_cross_entropy_with_logits(
y_pred,
soft_targets,
self.weight,
pos_weight=self.pos_weight,
reduction="none",
)
if self.ignore_index is not None:
not_ignored_mask = y_true != self.ignore_index
loss *= not_ignored_mask.type_as(loss)
if self.reduction == "mean":
loss = loss.mean()
if self.reduction == "sum":
loss = loss.sum()
return loss