Manual in pytorch: torch.nn.KLDivLoss
kl_loss = nn.KLDivLoss(reduction="batchmean")
# input should be a distribution in the log space
input = F.log_softmax(torch.randn(3, 5, requires_grad=True))
# Sample a batch of distributions. Usually this would come from the dataset
target = F.softmax(torch.rand(3, 5))
output = kl_loss(input, target)
kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
log_target = F.log_softmax(torch.rand(3, 5))
output = kl_loss(input, log_target)
Manual in tensorflow: tf.keras.losses.KLDivergence
y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()
>>0.458
Let's look at another example:
Calculation under tensorflow:
the probability distribution of the two sequences is consistent, so it is 0
Calculation under pytorch:
It is obviously the same distribution, but pytorch does not output 0?
This is because according to the pytoch manual, it is necessary to manually log y_pred
if not log_target: # default
loss_pointwise = target * (target.log() - input)
else:
loss_pointwise = target.exp() * (target - input)
It can be seen that the calculation is correct.
Note: In actual use, it is often necessary to use softmax
, that is, the values in y_true add up to 1, such as 0.1+0.2+0.3+0.4=1
The following is an example of using pytorch:
kl_loss = nn.KLDivLoss(reduction="batchmean")
# input should be a distribution in the log space
input = F.log_softmax(torch.randn(3, 5, requires_grad=True)) #y_pred取log再softmax
# Sample a batch of distributions. Usually this would come from the dataset
target = F.softmax(torch.rand(3, 5)) #y_true做softmax
output = kl_loss(input, target)
kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
log_target = F.log_softmax(torch.rand(3, 5))
output = kl_loss(input, log_target)