1. Description

In this article, all common loss functions used in deep learning are discussed and implemented in NumPy, PyTorch, and TensorFlow.

2. Summary

The cost function we are talking about in this article is listed as follows:

Mean squared error (MSE) loss
Binary cross entropy loss
Weighted binary cross-entropy loss
Categorical cross entropy loss
Sparse Categorical Cross Entropy Loss
dice loss
Kuala Lumpur Divergence Loss
Mean Absolute Error (MAE) / L1 Loss
huber loss

In the following, we will demonstrate the different implementation methods one by one.

3. Mean square error (MSE) loss

The mean squared error (MSE) loss is a commonly used loss function in regression problems where the goal is to predict a continuous variable. The loss is calculated as the average of the squared differences between the predicted and true values. The formula for MSE loss is:

MSE loss = (1/n) * sum((y_pred — y_true)²)

here:

n is the number of samples in the dataset
The predicted value of the target variable y_pred
y_true is the true value of the target variable

The MSE loss is sensitive to outliers and heavily penalizes large errors, which may not be desirable in some cases. In this case, other loss functions such as mean absolute error (MAE) or Huber loss can be used.

Implementation in NumPy

import numpy as np

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    n = len(y_true)
    mse_loss = np.sum((y_pred - y_true) ** 2) / n
    return mse_loss

In this implementation, and are NumPy arrays containing predicted and true values, respectively. The function first calculates the squared difference between and and then averages these values to obtain the MSE loss. This variable represents the number of samples in the dataset and is used for normalization loss.y_predy_truey_predy_truen

Implementation in TensorFlow

import tensorflow as tf

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = tf.keras.losses.MeanSquaredError()
    mse_loss = mse(y_true, y_pred)
    return mse_loss

In this implementation, and are TensorFlow Tensors containing predicted and true values, respectively. This function calculates the MSE loss between and . This variable contains the calculated loss.y_predy_truetf.keras.losses.MeanSquaredError()y_predy_truemse_loss

Implementation in PyTorch

import torch

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = torch.nn.MSELoss()
    mse_loss = mse(y_pred, y_true)
    return mse_loss

In this implementation, and are PyTorch tensors containing predicted and true values, respectively. This function calculates the MSE loss between and . This variable contains the calculated loss.y_predy_truetorch.nn.MSELoss()y_predy_truemse_loss

4. Binary cross entropy loss

Binary cross-entropy loss, also known as log loss, is a common loss function used in binary classification problems. It measures the difference between the predicted probability distribution and the actual binary label distribution.

The formula for binary cross-entropy loss is as follows:

L（y， ŷ） = -[y * log（ŷ） + （1 — y） * log（1 — ŷ）]

where y is the true binary label (0 or 1), ŷ is the predicted probability (ranging from 0 to 1), and log is the natural logarithm.

The first term of the equation computes the loss when the true label is 1, and the second term computes the loss when the true label is 0. The total loss is the sum of the two terms.

The loss is lower when the predicted probability is close to the true label, and higher when the predicted probability is far from the true label. This loss function is typically used in neural network models that use a sigmoid activation function in the output layer to predict binary labels.

4.1 Implementation in NumPy

In numpy, the binary cross-entropy loss can be implemented using the formula we described earlier. Here's an example of how to calculate it:

# define true labels and predicted probabilities
y_true = np.array([0, 1, 1, 0])
y_pred = np.array([0.1, 0.9, 0.8, 0.3])

# calculate the binary cross-entropy loss
loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean()

# print the loss
print(loss)

4.2 Implementation in TensorFlow

In TensorFlow, the binary crossentropy loss can be implemented using the tf.keras.loss.BinaryCrossentropy() function. Here's an example of how to use it:

import tensorflow as tf

# define true labels and predicted probabilities
y_true = tf.constant([0, 1, 1, 0])
y_pred = tf.constant([0.1, 0.9, 0.8, 0.3])

# define the loss function
bce_loss = tf.keras.losses.BinaryCrossentropy()

# calculate the loss
loss = bce_loss(y_true, y_pred)

# print the loss
print(loss)

4.3 Implementation in PyTorch

In PyTorch, the binary cross-entropy loss can be implemented using this function. Here's an example of how to use it:torch.nn.BCELoss()

import torch

# define true labels and predicted probabilities
y_true = torch.tensor([0, 1, 1, 0], dtype=torch.float32)
y_pred = torch.tensor([0.1, 0.9, 0.8, 0.3], dtype=torch.float32)

# define the loss function
bce_loss = torch.nn.BCELoss()

# calculate the loss
loss = bce_loss(y_pred, y_true)

# print the loss
print(loss)

4.4 Weighted binary cross-entropy loss

Weighted binary cross-entropy loss is a variant of binary cross-entropy loss that allows assigning different weights to positive entropy and negative examples. This is useful when dealing with unbalanced datasets, where one class is significantly underpowered compared to the other.

The formula for weighted binary cross-entropy loss is as follows:

L（y， ŷ） = -[w_pos * y * log（ŷ） + w_neg * （1 — y） * log（1 — ŷ）]

where y is the true binary label (0 or 1), ŷ is the predicted probability (ranging from 0 to 1), log is the natural logarithm, and w_pos and w_neg are positive and negative weights, respectively.

The first term of the equation computes the loss when the true label is 1, and the second term computes the loss when the true label is 0. The total loss is the sum of two terms, each weighted by a corresponding weight.

Positive and negative weights can be chosen based on the relative importance of each class. For example, if the positive class is more important, it can be assigned a higher weight. Likewise, if negative classes are more important, they can be assigned higher weights.

5. Classification cross entropy loss

Categorical cross-entropy loss is a common loss function used in multi-class classification problems. It measures the difference between the true label and the predicted probability for each class.

The formula for categorical cross-entropy loss is:

L = -1/N * sum(sum(Y * log(Y_hat)))

where is the true label matrix in one-hot encoded format, is the predicted probability matrix for each class, is the number of samples, and represents the natural logarithm.YY_hatNlog

In this formula, the shape is , where is the number of samples and is the number of classes. Each row represents the true label distribution for a single sample, with a value of 1 in a column corresponding to the true label and 0 for all other columns.Y(N, C)NCY

Similarly, has the shape of , where each row represents the predicted probability distribution for a single sample, with a probability value for each class.Y_hat(N, C)

The function is applied one by one to the predicted probability matrix. The function is used twice to sum the two dimensions of the matrix.logY_hatsumY

The resulting value represents the average cross-entropy loss over all samples in the dataset. The goal of training a neural network is to minimize this loss function.LN

The loss function penalizes the model more for making big mistakes in predicting classes with low probability. The goal is to minimize the loss function, which means making the predicted probabilities as close as possible to the true labels.

5.1 Implementation in NumPy

In numpy, the categorical cross-entropy loss can be implemented using the formula we described earlier. Here's an example of how to calculate it:

import numpy as np

# define true labels and predicted probabilities as NumPy arrays
y_true = np.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = np.array([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# calculate the loss
loss = -1/len(y_true) * np.sum(np.sum(y_true * np.log(y_pred)))

# print the loss
print(loss)In this example, y_true represents the true labels (in integer format), and y_pred represents the predicted probabilities for each class (in a 2D array). The eye() function is used to convert the true labels to one-hot encoding, which is required for the loss calculation. The categorical cross-entropy loss is calculated using the formula we provided earlier, and the mean() function is used to average the loss over the entire dataset. Finally, the calculated loss is printed to the console.

In this example, represents the true label in one-hot encoded format and represents the predicted probability for each class, both as NumPy arrays. Compute the loss using the above formula, then use the function to print to the console. Note that the function is used twice to sum the two dimensions of the matrix.y_truey_predprintnp.sumY

5.2 Implementation in TensorFlow

In TensorFlow, the categorical cross-entropy loss can be easily computed using this class. Here's an example of how to use it:tf.keras.losses.CategoricalCrossentropy

import tensorflow as tf

# define true labels and predicted probabilities as TensorFlow Tensors
y_true = tf.constant([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = tf.constant([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
cce_loss = tf.keras.losses.CategoricalCrossentropy()

# calculate the loss
loss = cce_loss(y_true, y_pred)

# print the loss
print(loss.numpy())

In this example, the true labels are represented in one-hot encoded format, and the predicted probabilities for each class are represented, both as TensorFlow tensors. This class is used to create an instance of the loss function and then calculate the loss by passing the true label and the predicted probability as parameters. Finally, use the method to print the calculated loss to the console.y_truey_predCategoricalCrossentropy.numpy()

Note that this class handles converting the ground truth labels to one-hot encoding internally, so there is no need to do this explicitly. If your ground truth labels are already in one-hot encoded format, you can pass them directly to the loss function without any problems.CategoricalCrossentropy

5.3 Implementation in PyTorch

In PyTorch, the categorical cross-entropy loss can be easily computed using this class. Here's an example of how to use it:torch.nn.CrossEntropyLoss

import torch

# define true labels and predicted logits as PyTorch Tensors
y_true = torch.LongTensor([1, 2, 0])
y_logits = torch.Tensor([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
ce_loss = torch.nn.CrossEntropyLoss()

# calculate the loss
loss = ce_loss(y_logits, y_true)

# print the loss
print(loss.item())

In this example, represents the true label in integer format and represents the log of predictions for each class, both as PyTorch tensors. This class is used to create an instance of the loss function and then calculate the loss by passing the predicted log and true label as parameters. Finally, use the method to print the calculated loss to the console.y_truey_logitsCrossEntropyLoss.item()

Note that this class combines the softmax activation function and the categorical cross-entropy loss into one operation, so you don't need to apply softmax separately. Also note that the real labels should be in integer format, not one-hot encoded.CrossEntropyLoss

[All loss functions for deep learning] Implemented in NumPy, TensorFlow and PyTorch (1/2)