Torch's cross entropy loss function (cross_entropy) calculation (including python code)

1. call

First, torch's cross-entropy loss function is called as follows:

torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

Generally it will be written as:

import torch.nn.functional as F
F.cross_entropy(input, target)

2. Parameter description

  • Input ( tensor ) – (N, C),  where C = number of categories; or input size (N, C, H, W)  in case of 2D loss , or in case of K≥1  in K-dimensional loss The input dimensions are  (N, C, d1, d2, ..., dK)  .

  • target ( tensor ) - (N), where each value is 0 target[i]​​​​​​​​​​ C  -1, or at K≥1  for a K-dimensional loss, the target tensor is of size ( N, d1, d2, ..., dK) .

  • weight  (  Tensor  ,  optional  ) – Manually rescaled weights for each class. If given, must be a tensor of size C

  • size_average  (  bool  ,  optional  ) – Deprecated. By default, the loss is the average of each loss element in the batch. Note that for some losses there are multiple elements per sample. If this field size_average is set to False, the losses are summed for each mini-batch. Ignored when reduce is False. default:True

  • ignore_index  (  int  ,  optional  ) – Specifies a target value that is ignored and does not contribute to the input gradient. When size_averageTruethe loss is averaged over non-ignored targets. Default: -100

  • reduce  (  bool  ,  optional  ) – deprecated. By default, the loss is averaged or summed over each mini-batch of observations, depending size_average. When reduceis False, returns the loss per batch element and ignores it size_average. default:True

  • reduce  (  string  optional  ) – specifies the reduction to apply to the output:  'none''mean''sum''none': no reduction will be applied,  'mean': the sum of the output will be divided by the number of elements in the output,  'sum': the output will be summed. Note: size_average and reduceare being deprecated, meanwhile, specifying either of these two parameters will override reduction. Default:'mean'

3. Give an example

code:

import torch
import torch.nn.functional as F
input = torch.randn(3, 5, requires_grad=True)
target = torch.randint(5, (3,), dtype=torch.int64)
loss = F.cross_entropy(input, target)
loss.backward()

variable output:


input:
tensor([[-0.6314,  0.6876,  0.8655, -1.8212,  0.0963],
        [-0.5437,  0.2778, -0.1662, -0.0784, -0.6565],
        [-0.1164,  0.3882,  0.2487, -0.5318,  0.3943]], requires_grad=True)
target:
tensor([1, 0, 0])
loss:
tensor(1.6557, grad_fn=<NllLossBackward>)

4. Pay attention

The implementation of the torch.nn.functional.cross_entropy function in python is:

def cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100,
                  reduce=None, reduction='mean'):
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

Note 1: The input tensor does not need to go through softmax, and the tensor taken directly from the fn layer can be sent to the cross-entropy, because the input input has been softmaxed in the cross-entropy.

Note 2: There is no need to perform one_hot encoding on the label, because the nll_loss function has already implemented a similar one-hot process. The difference is that when class = [1, 2, 3], it should be processed from 0 to [0, 1, 2] .

Put the address of the official website here: torch.nn.functional — PyTorch master documentation icon-default.png?t=LA92https://pytorch.org/docs/1.2.0/nn.functional.html#torch.nn.functional.cross_entropy

It's not easy to organize, welcome to one-click three links! ! !

Guess you like

Origin blog.csdn.net/qq_38308388/article/details/121640312