1. Description

In this article, all common loss functions used in deep learning are discussed and implemented in NumPy, PyTorch, and TensorFlow.

(2-5) see

6. Sparse Classification Cross Entropy Loss

Sparse categorical cross-entropy loss is similar to categorical cross-entropy loss, but is used when the true labels are provided as integers rather than one-hot encoding. It is often used as a loss function in multiclass classification problems.

The formula for sparse categorical cross-entropy loss is:

L = -1/N * sum(log(Y_hat_i))

where is the predicted probability of the true class label for each sample and is the number of samples.Y_hat_iiN

In other words, the formula computes the negative log of the predicted probability of the true class label for each sample, and then averages these values over all samples.

Unlike the categorical cross-entropy loss, which uses one-hot encoding for the true labels, the sparse categorical cross-entropy loss uses integer labels directly. The true label for each sample is represented as a single integer value between 0 and , where is the number of classes.iC-1C

6.1 Implementation in NumPy

import numpy as np

def sparse_categorical_crossentropy(y_true, y_pred):
    # convert true labels to one-hot encoding
    y_true_onehot = np.zeros_like(y_pred)
    y_true_onehot[np.arange(len(y_true)), y_true] = 1

    # calculate loss
    loss = -np.mean(np.sum(y_true_onehot * np.log(y_pred), axis=-1))

    return loss

In this implementation, is an array of integer labels and is an array of predicted probabilities for each sample. The function first converts the ground truth labels to a one-hot encoded format using NumPy's advanced indexing capabilities to create an array of shape where is the number of samples and the number of classes, with each row corresponding to the true label distribution for a single sample.y_truey_pred(N, C)NC

The function then calculates the loss using the formula described in the previous answer: . This is implemented using NumPy's broadcasting, which creates an array of shape where each element represents the product of the corresponding elements in the sum. The function is then used to sum the dimensions and to average the dimensions.-1/N * sum(log(Y_hat_i))y_true_onehot * np.log(y_pred)(N, C)y_true_onehotnp.log(y_pred)sumCmeanN

Here is an example of how to use this function:

# define true labels as integers and predicted probabilities as an array
y_true = np.array([1, 2, 0])
y_pred = np.array([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss)

This will output the value of the sparse categorical cross-entropy loss for the given input.

6.2 Implementation in TensorFlow

import tensorflow as tf

def sparse_categorical_crossentropy(y_true, y_pred):
    loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
    return loss

# define true labels as integers and predicted probabilities as a tensor
y_true = tf.constant([1, 2, 0])
y_pred = tf.constant([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.numpy())

In this implementation, is an array of integer labels and is an array of predicted probabilities for each sample. This function uses functions provided by TensorFlow to calculate the loss. Set this parameter to ensure probabilistic rather than logarithmic values are represented.y_truey_predtf.keras.losses.sparse_categorical_crossentropyfrom_logitsFalsey_pred

6.3 Implementation in PyTorch

import torch.nn.functional as F
import torch

def sparse_categorical_crossentropy(y_true, y_pred):
    loss = F.cross_entropy(y_pred, y_true)
    return loss

# define true labels as integers and predicted logits as a tensor
y_true = torch.tensor([1, 2, 0])
y_pred = torch.tensor([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.item())

In this implementation, is an array of integer labels and is an array of predicted logs for each sample. This function uses PyTorch's functions to calculate the loss. The tensor should have shape where is the number of samples and is the number of classes.y_truey_predF.cross_entropyy_pred(N, C)NC

7. Dice Loss

Dice loss, also known as the Sorenson-Dice coefficient or F1 score, is a loss function used in image segmentation tasks to measure the overlap between the predicted segmentation and the ground truth. Dice loss ranges from 0 to 1, where 0 means no overlap and 1 means full overlap.

Dice loss is defined as:

Dice Loss = 1 - (2 * intersection + smooth) / (sum of squares of prediction + sum of squares of ground truth + smooth)

where is the element-wise product of the prediction and the ground truth mask, is a smoothing constant (usually a small value such as 1e-5) to prevent division by zero, and the sum will cover all elements of the mask.intersectionsmooth

Dice loss can be implemented in various deep learning frameworks such as TensorFlow, PyTorch, and NumPy. The implementation involves computing intersections and sums of squares using the element-wise product-and-sum operations available in the framework.

7.1 Implementation in NumPy

import numpy as np

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = np.sum(y_true * y_pred, axis=(1,2,3))
    sum_of_squares_pred = np.sum(np.square(y_pred), axis=(1,2,3))
    sum_of_squares_true = np.sum(np.square(y_true), axis=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, are the ground truth and prediction masks, respectively. This parameter is used to prevent division by zero. The and functions are used to compute intersection and sum of squares, respectively. Finally, calculate the dice loss using the formula described in the previous answer.y_truey_predsmoothsumsquare

Note that this implementation assumes and are 4D arrays with dimensions. If your mask has a different shape, you may need to modify your implementation accordingly.y_truey_pred(batch_size, height, width, num_classes)

7.2 Implementation in TensorFlow

import tensorflow as tf

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = tf.reduce_sum(y_true * y_pred, axis=(1,2,3))
    sum_of_squares_pred = tf.reduce_sum(tf.square(y_pred), axis=(1,2,3))
    sum_of_squares_true = tf.reduce_sum(tf.square(y_true), axis=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, and are TensorFlow tensors representing the ground truth and predicted masks, respectively. This parameter is used to prevent division by zero. The and functions are used to compute intersection and sum of squares, respectively. Finally, calculate the dice loss using the formula described in the previous answer.y_truey_predsmoothreduce_sumsquare

Note that this implementation assumes and are 4D tensors with dimensions. If your mask has a different shape, you may need to modify your implementation accordingly.y_truey_pred(batch_size, height, width, num_classes)

7.3 Implementation in PyTorch

import torch

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = torch.sum(y_true * y_pred, dim=(1,2,3))
    sum_of_squares_pred = torch.sum(torch.square(y_pred), dim=(1,2,3))
    sum_of_squares_true = torch.sum(torch.square(y_true), dim=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, and are PyTorch tensors representing the ground truth and prediction masks, respectively. This parameter is used to prevent division by zero. The and functions are used to compute intersection and sum of squares, respectively. Finally, calculate the dice loss using the formula described in the previous answer.y_truey_predsmoothsumsquare

Eight, KL divergence loss

The KL (Kullback-Leibler) divergence loss is a measure of how different two probability distributions are from each other. In the context of machine learning, it is often used as a loss function to train a model that generates new samples from a given distribution.

The KL divergence between two probability distributions p and q is defined as:

KL（p||q） = sum（p（x） * log（p（x） / q（x）））

In the context of machine learning, p represents the true distribution and q represents the predicted distribution. The KL divergence loss measures how well the predicted distribution matches the true distribution.

The KL divergence loss can be used in various tasks such as image generation, text generation, and reinforcement learning. However, since it has a non-convex form, it can be difficult to optimize.

In practice, KL divergence loss is often combined with other loss functions such as cross-entropy loss. By adding the KL divergence loss to the cross-entropy loss, the model is encouraged to generate samples that not only match the target distribution, but also have a similar distribution to the training data.

8.1 Implementation in NumPy

import numpy as np

def kl_divergence_loss(p, q):
    return np.sum(p * np.log(p / q))

In this implementation, and are numpy arrays representing the true and predicted distributions, respectively. KL divergence loss is calculated using the above formula.pq

Note that this implementation assumes and has the same shape. If they have different shapes, the implementation may need to be modified accordingly.pq

8.2 Implementation in TensorFlow

tf.keras.losses.KLDivergence()is a built-in function in TensorFlow that computes the KL divergence loss between two probability distributions. It can be used as a loss function in various machine learning tasks such as image generation, text generation, and reinforcement learning.

Here is an example usage:tf.keras.losses.KLDivergence()

import tensorflow as tf

# define true distribution and predicted distribution
p = tf.constant([0.2, 0.3, 0.5])
q = tf.constant([0.4, 0.3, 0.3])

# compute KL divergence loss
kl_loss = tf.keras.losses.KLDivergence()(p, q)

print(kl_loss.numpy())

In this example, and are TensorFlow tensors representing the true and predicted distributions, respectively. This function is used to compute the KL divergence loss between and . The result is a scalar tensor representing the loss value.pqtf.keras.losses.KLDivergence()pq

Note that cases with different shapes are handled automatically by broadcasting to a common shape. Additionally, you can adjust the weight of the KL divergence loss relative to other losses in the model by setting a parameter to the function that controls how the losses are aggregated.tf.keras.losses.KLDivergence()pqreduction

8.3 Implementation in PyTorch

In PyTorch, the KL divergence loss can be computed using the module. Here is an example implementation:torch.nn.KLDivLoss

import torch

def kl_divergence_loss(p, q):
    criterion = torch.nn.KLDivLoss(reduction='batchmean')
    loss = criterion(torch.log(p), q)
    return lossIn this implementation, p and q are PyTorch tensors representing the true distribution and predicted distribution, respectively. The torch.nn.KLDivLoss module is used to compute the KL divergence loss between p and q. The reduction parameter is set to 'batchmean' to compute the mean loss over the batch.

Note that the sums should be probabilities summing to 1 along the last dimension. This function is used to get the logarithm of the logarithm before passing it to the module. This is because the module expects the input to be log probabilities.pqtorch.logptorch.nn.KLDivLoss

9. Mean Absolute Error (MAE) Loss / L1 Loss

L1 loss, also known as mean absolute error (MAE) loss, is a common loss function used in deep learning for regression tasks. It measures the absolute difference between the predicted and true value of the target variable.

The formula for L1 loss is:

L1 LOSS = 1/n * Σ|y_pred — y_true|

where n is the number of samples, y_pred is the predicted value, and y_true is the true value.

In simple terms, L1 loss is the average of the absolute difference between predicted and true values. It is less sensitive to outliers than the mean square error (MSE) loss, so it is a good choice for models that may be affected by outliers.

9.1 Implementation in Numpy

import numpy as np

def l1_loss(y_pred, y_true):
    loss = np.mean(np.abs(y_pred - y_true))
    return loss

The NumPy implementation of the L1 loss is very similar to the formula where you subtract the predicted value from the true value and take the absolute value. These absolute differences are then averaged across all samples to obtain the average L1 loss.

9.2 Implementation in TensorFlow

import tensorflow as tf

def l1_loss(y_pred, y_true):
    loss = tf.reduce_mean(tf.abs(y_pred - y_true))
    return loss

In TensorFlow, you can use this function to calculate the average of the absolute differences between predicted and true values across all samples.tf.reduce_mean()

9.3 Implementation in PyTorch

import torch

def l1_loss(y_pred, y_true):
    loss = torch.mean(torch.abs(y_pred - y_true))
    return loss

In PyTorch, you can use this function to calculate the mean of the absolute difference between predicted and true values across all samples.torch.mean()

10. Huber Huber loss

Huber loss is a loss function used in regression tasks that is less sensitive to outliers than mean squared error (MSE) loss. It is defined as a combination of MSE loss and mean absolute error (MAE) loss, where the loss function is MSE for small errors and MAE for larger errors. This makes Huber loss more robust to outliers than MSE loss.

The Huber loss function is defined as follows:

L(y_pred, y_true) = 1/n * sum(0.5 * (y_pred - y_true)^2)   if |y_pred - y_true| <= delta
                    1/n * sum(delta * |y_pred - y_true| - 0.5 * delta^2)   otherwise

where is the number of samples, is the predicted value, is the true value, and is a hyperparameter that determines the threshold for switching between MSE and MAE losses.ny_predy_truedelta

When, the loss function is MSE loss. At that time, the loss function is the MAE loss with a slope of .|y_pred - y_true| <= delta|y_pred - y_true| > deltadelta

In practice, it is usually set to a value that balances MSE and MAE losses, eg.delta1.0

10.1 Implementation in Numpy

import numpy as np

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = np.abs(error)
    quadratic = np.minimum(abs_error, delta)
    linear = (abs_error - quadratic)
    return np.mean(0.5 * quadratic ** 2 + delta * linear)

This function takes as input the predicted value, ground truth value and hyperparameters and returns the Huber loss.y_predy_truedelta

The function first calculates the absolute error between the predicted value and the true value, and then splits the error into two components according to the hyperparameters. The quadratic component is the MSE loss at time, and the linear component is the MAE loss at time. Finally, the function returns the average Huber loss over all samples.deltaabs_error <= deltaabs_error > delta

You can use this function in numpy-based regression tasks by calling it with the predicted and true values and the desired value.delta

10.2 Implementation in TensorFlow

import tensorflow as tf

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = tf.abs(error)
    quadratic = tf.minimum(abs_error, delta)
    linear = (abs_error - quadratic)
    return tf.reduce_mean(0.5 * quadratic ** 2 + delta * linear)

This function takes as input the predicted value, ground truth value and hyperparameters and returns the Huber loss.y_predy_truedelta

The function first uses this function to calculate the absolute error between the predicted value and the true value, and then uses the and operator to split the error into two components based on the hyperparameters. The quadratic component is the MSE loss at time, and the linear component is the MAE loss at time. Finally, the function returns the average Huber loss over all samples using this function.tf.absdeltatf.minimum-abs_error <= deltaabs_error > deltatf.reduce_mean

You can use this function in TensorFlow-based regression tasks by calling it with the predicted and true values and the desired value.delta

10.3 Implementation in PyTorch

import torch.nn.functional as F

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = torch.abs(error)
    quadratic = torch.min(abs_error, delta)
    linear = (abs_error - quadratic)
    return 0.5 * quadratic ** 2 + delta * linear

This function takes as input the predicted value, ground truth value and hyperparameters and returns the Huber loss.y_predy_truedelta

The function first uses this function to calculate the absolute error between the predicted value and the true value, and then uses the and operator to split the error into two components based on the hyperparameters. The quadratic component is the MSE loss at time, and the linear component is the MAE loss at time. Finally, the function returns the Huber loss using the formula.torch.absdeltatorch.min-abs_error <= deltaabs_error > delta0.5 * quadratic ** 2 + delta * linear

You can use this function in PyTorch based regression tasks by calling it with predicted and true values and desired values.delta

[All loss functions for deep learning] Implemented in NumPy, TensorFlow and PyTorch (2/2)

1. Description

(2-5) see

6. Sparse Classification Cross Entropy Loss

6.1 Implementation in NumPy

6.2 Implementation in TensorFlow

6.3 Implementation in PyTorch

7. Dice Loss

7.1 Implementation in NumPy

7.2 Implementation in TensorFlow

7.3 Implementation in PyTorch

Eight, KL divergence loss

8.1 Implementation in NumPy

8.2 Implementation in TensorFlow

8.3 Implementation in PyTorch

9. Mean Absolute Error (MAE) Loss / L1 Loss

9.1 Implementation in Numpy

9.2 Implementation in TensorFlow

9.3 Implementation in PyTorch

10. Huber Huber loss

10.1 Implementation in Numpy

10.2 Implementation in TensorFlow

10.3 Implementation in PyTorch

Guess you like