Article directory
Video: 55. PyTorch's cross entropy, information entropy, binary cross entropy, negative log likelihood, KL divergence, cosine similarity principle and code explanation_bilibili_bilibili
Here is a more complete summary of the video content.
1.Corss Entropy Loss
PyTorch API:CrossEntropyLoss — PyTorch 1.13 documentation
1.1 Introduction
parameter:
- weight: Specify weight, (dim), optional parameter, you can specify a weight for each class. Usually when the number of samples of different categories in the training data is very different, weights can be used to balance it.
- ignore_index: Specify to ignore a real value, (int), that is, manually ignore a real value.
- reduction: Select from [none, mean, sum], string type.
none
It means not reducing the dimension and returning the same shape as the target;mean
it means averaging the loss of a batch;sum
it means summing the loss of a batch.
enter:
It can be seen that Target
there are two situations for . If Target
it is the index of the category , then Target
the shape of is Input
less channel dimension than the shape of; if Target
it is the probability value of the category , then Target
the shape of is Input
the same as the shape of. Please see the official website for details.
1.2 Code examples
import torch
import torch.nn as nn
import torch.nn.functional as F
# logits shape:[BS, NC]
batch_size = 2
num_class = 7
logits = torch.randn(batch_size, num_class) # input unnormalized score
target_indices = torch.randint(num_class, size=(batch_size,)) # delta目标分布,是整形的索引 shape:(2,)
target_logits = torch.randn(batch_size, num_class) # 非delta目标分布,概率分布 shape:[2,7]
ce_loss_fn = nn.CrossEntropyLoss() # 实例化
## method 1 for CE loss
ce_loss = ce_loss_fn(logits, target_indices)
print(f"cross entropy loss1: {
ce_loss}")
## method 2 for CE loss
ce_loss = ce_loss_fn(logits, torch.softmax(target_logits, dim=-1))
# 将target_logits 进行softmax在通道维上进行归一化成概率分布,和为1
print(f"cross entropy loss2: {
ce_loss}")
The output results are all scalars:
cross entropy loss1: 3.269336700439453
cross entropy loss2: 2.0783615112304688
2.Negative Log Likelihood loss (NLL loss)
PyTorch API:NLLLoss — PyTorch 1.13 documentation
2.1 Introduction
For Negative Log Likelihood loss (NLL loss), it is the log-probabilitiesinput
of each category , and can only be the category index . In fact, where cross entropy loss can be used, negative log-likelihood loss can be used, which will be further verified in the following code section.target
2.2 Code examples
nll_fn = nn.NLLLoss()
nll_loss = nll_fn(torch.log(torch.softmax(logits, dim=-1)), target_indices)
print(f"negative log-likelihood loss: {
nll_loss}")
Following the above initialization logits
, perform softmax on the channel dimension to obtain the normalized probability value, and then take the logarithm to obtain the log-probabilities of each class of batch_size samples . target_indices
is the category index.
The output result is a scalar:
negative log-likelihood loss: 3.269336700439453
If you continue to use the initialization value of the logits
sum of CELoss above target_indices
, you can see that the output results of cross entropy loss1 and negative log-likelihood loss are the same. Description cross entropy loss = softmax + log + nll loss cross \ entropy \ loss = softmax+log+nll\ losscross entropy loss=softmax+log+n ll l oss (This is a personal method of assisting memory, which may not be formal).
3.Kullback-Leibler divergence loss (KL loss)
PyTorch API:KLDivLoss — PyTorch 1.13 documentation
3.1 Introduction
P and Q are equivalent to two systems. The KL distance is defined as:
DKL ( P ∣ ∣ Q ) = ∑ x ∈ XP ( x ) log ( P ( x ) Q ( x ) ) D_{KL}(P|| Q)=\sum_{x\in \mathcal{X}}P(x)log \left ( \frac{P(x)}{Q(x)} \right )DKL(P∣∣Q)=x∈X∑P(x)log(Q(x)P(x))
Among them, P comes first, that is, taking P as the benchmark to measure the difference between Q and P. If the distributions of the two systems P and Q are the same, thenDKL ( P ∣ ∣ Q ) = 0 D_{KL}(P||Q)=0DKL(P∣∣Q)=0 . So KL divergence isnon-negative.
For the formula on the official website, P is equivalent to ytrue y_{true}ytrue, Q is equivalent to ypred y_{pred}ybefore _ _。
What needs to be noted here is that the shape of input
and should be the same . Moreover, in order to avoid some underflow problems, the expectation is passed in in log space, which can be log space or linear space.target
input
target
3.2 Code examples
kld_loss_fn = nn.KLDivLoss(reduction='mean')
kld_loss = kld_loss_fn(torch.log(torch.softmax(logits, dim=-1)), torch.softmax(target_logits, dim=-1))
print(f"Kullback-Leibler divergence loss : {
kld_loss}")
The output result is a scalar:
Kullback-Leibler divergence loss : 0.07705999910831451
4. Verify that DKL ( P ∣ ∣ Q ) = H ( p , q ) − H ( p ) D_{KL}(P||Q)=H(p,q)-H(p)DKL(P∣∣Q)=H(p,q)−H(p)
4.1 Introduction
First, let’s explain the correctness of this formula from a formula perspective:
This formula means p, qp, qKL divergence of p , q = p, q =p, q=Cross entropy of p and q − -− p p The information entropy of p . of whichppThe information entropy of p is the information entropy of the target distribution.
4.2 Code verification
# 4.验证 CE = IE + KLD 交叉熵=信息熵+KL散度
# 交叉熵
ce_loss_fn_smaple = nn.CrossEntropyLoss(reduction="none") # 能够输出每一个样本的loss,不进行平均
ce_loss_sample = ce_loss_fn_smaple(logits, torch.softmax(target_logits, dim=-1)) # (b,)
print(f"cross entropy loss sample: {
ce_loss_sample}")
# KL散度
kld_loss_fn_sample = nn.KLDivLoss(reduction="none")
kld_loss_sample = kld_loss_fn_sample(torch.log(torch.softmax(logits, dim=-1)), torch.softmax(target_logits, dim=-1)).sum(-1)
print(f"Kullback-Leibler divergence loss sample: {
kld_loss_sample}")
# 计算目标分布的信息熵
target_information_entropy = torch.distributions.Categorical(probs=torch.softmax(target_logits, dim=-1)).entropy()
print(f"information entropy sample: {
target_information_entropy}") # IE为常数,如果目标分布是delta分布,IE=0
# 验证 CE = IE + KLD 是否成立
print(torch.allclose(ce_loss_sample, kld_loss_sample + target_information_entropy)) # 比较两个浮点数
The output results are as follows:
cross entropy loss sample: tensor([1.7736, 2.5585])
Kullback-Leibler divergence loss sample: tensor([0.1108, 1.1138])
information entropy sample: tensor([1.6628, 1.4447])
True
As you can see, the final output result is True, indicating that the formula is established. It is worth mentioning that since the information in the data set target
is certain, its information entropy is also certain and is a constant. Therefore, the optimized cross-entropy loss and the KL divergence loss are equivalent , except that they differ in value by a constant.
5.Binary Cross Entropy loss (BCE loss)
PyTorch API:BCELoss — PyTorch 1.13 documentation
5.1 Introduction
What is introduced above is often used in multi-classification problems , and BCE loss binomial cross entropy is suitable for two-classification problems . The formula in the API is as follows:
ln = − wn [ yn ⋅ log ( xn ) + ( 1 − yn ) ⋅ log ( 1 − xn ) ] l_n = - w_n[y_n \cdot \textup{\textrm{log}}(x_n )+(1-y_n) \cdot \textup{\textrm{log}}(1-x_n)]ln=−wn[yn⋅log(xn)+(1−yn)⋅log(1−xn)]
requiresTarget
thatInput
the sum is the same dimension, andtarget
the value should be between 0-1.
5.2 Code examples
# 5.调用Binary Cross Entropy loss (BCE loss) 二分类的损失函数
bce_loss_fn = nn.BCELoss()
logits = torch.randn(batch_size)
prob_1 = torch.sigmoid(logits) # 使用sigmoid求出概率值
target = torch.randint(2, size=(batch_size,)) # 二分类,只有0和1
bce_loss = bce_loss_fn(prob_1, target.float())
print(f"binary Cross Entropy loss: {
bce_loss}")
# 1) BCEWithLogitsLoss = sigmoid + BCELoss ,传入的未归一化的logits
bce_logits_loss_fn = nn.BCEWithLogitsLoss()
bce_logits_loss = bce_logits_loss_fn(logits, target.float())
print(f"binary Cross Entropy with logits loss: {
bce_logits_loss}")
print(torch.allclose(bce_logits_loss, bce_loss)) # 比较两个浮点数
# 2) BCE Loss 是特殊的 NLL Loss
nll_fn = nn.NLLLoss()
prob_0 = 1 - prob_1.unsqueeze(-1) # 在通道维升维 [BS, 1]
prob = torch.cat([prob_0, prob_1.unsqueeze(-1)], dim=-1) # [BS, 2]
nll_loss_binary = nll_fn(torch.log(prob), target)
print(f"negative likelihood loss binary: {
nll_loss_binary}")
print(torch.allclose(bce_loss, nll_loss_binary)) # 比较两个浮点数
The output results are as follows:
binary Cross Entropy loss: 0.963399350643158
binary Cross Entropy with logits loss: 0.963399350643158
True
negative likelihood loss binary: 0.963399350643158
True
It can be seen from the output results: 1) BCEW ith L ogits L oss = sigmoid + BCEL oss BCEWithLogitsLoss = sigmoid + BCELossBCEWithLogitsLoss=sigmoid+BCE Loss ; 2) BCE Loss is a special NLL Loss. NLL Loss can handle multi-classification problems. When the target only has two indexes, 0 and 1, it can also handle two-classification situations.
6.Cosine Similarity Loss
PyTorch API:CosineEmbeddingLoss — PyTorch 1.13 documentation
6.1 Introduction
Cosine Embedding Loss is based on cosine similarity, thereby evaluating whether the two input quantities are similar or dissimilar. It is widely used in contrastive learning, self-supervised learning, learning the similarity of pictures or texts, search engine image search, etc.
It is required that the shape of input1
and input2
is the same, target
and the value of is 1 or -1.
6.2 Code examples
cosine_loss_fn = nn.CosineEmbeddingLoss()
v1 = torch.randn(batch_size, 512)
v2 = torch.randn(batch_size, 512)
target = torch.randint(2, size=(batch_size, )) * 2 - 1 # 只能是-1~1
cosine_loss = cosine_loss_fn(v1, v2, target)
print(f"cosine similarity loss: {
cosine_loss}")
The output result is a scalar:
cosine similarity loss: 0.07631295919418335