1. call
First, torch's cross-entropy loss function is called as follows:
torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
Generally it will be written as:
import torch.nn.functional as F
F.cross_entropy(input, target)
2. Parameter description
-
Input ( tensor ) – (N, C), where C = number of categories; or input size (N, C, H, W) in case of 2D loss , or in case of K≥1 in K-dimensional loss The input dimensions are (N, C, d1, d2, ..., dK) .
-
target ( tensor ) - (N), where each value is 0 ≤ target[i] ≤ C -1, or at K≥1 for a K-dimensional loss, the target tensor is of size ( N, d1, d2, ..., dK) .
-
weight ( Tensor , optional ) – Manually rescaled weights for each class. If given, must be a tensor of size C
-
size_average ( bool , optional ) – Deprecated. By default, the loss is the average of each loss element in the batch. Note that for some losses there are multiple elements per sample. If this field
size_average
is set toFalse
, the losses are summed for each mini-batch. Ignored when reduce isFalse
. default:True
-
ignore_index ( int , optional ) – Specifies a target value that is ignored and does not contribute to the input gradient. When
size_average
,True
the loss is averaged over non-ignored targets. Default: -100 -
reduce ( bool , optional ) – deprecated. By default, the loss is averaged or summed over each mini-batch of observations, depending
size_average
. Whenreduce
isFalse
, returns the loss per batch element and ignores itsize_average
. default:True
-
reduce ( string , optional ) – specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Note:size_average
andreduce
are being deprecated, meanwhile, specifying either of these two parameters will overridereduction
. Default:'mean'
3. Give an example
code:
import torch
import torch.nn.functional as F
input = torch.randn(3, 5, requires_grad=True)
target = torch.randint(5, (3,), dtype=torch.int64)
loss = F.cross_entropy(input, target)
loss.backward()
variable output:
input:
tensor([[-0.6314, 0.6876, 0.8655, -1.8212, 0.0963],
[-0.5437, 0.2778, -0.1662, -0.0784, -0.6565],
[-0.1164, 0.3882, 0.2487, -0.5318, 0.3943]], requires_grad=True)
target:
tensor([1, 0, 0])
loss:
tensor(1.6557, grad_fn=<NllLossBackward>)
4. Pay attention
The implementation of the torch.nn.functional.cross_entropy function in python is:
def cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean'):
if size_average is not None or reduce is not None:
reduction = _Reduction.legacy_get_string(size_average, reduce)
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Note 1: The input tensor does not need to go through softmax, and the tensor taken directly from the fn layer can be sent to the cross-entropy, because the input input has been softmaxed in the cross-entropy.
Note 2: There is no need to perform one_hot encoding on the label, because the nll_loss function has already implemented a similar one-hot process. The difference is that when class = [1, 2, 3], it should be processed from 0 to [0, 1, 2] .
Put the address of the official website here: torch.nn.functional — PyTorch master documentation https://pytorch.org/docs/1.2.0/nn.functional.html#torch.nn.functional.cross_entropy