[Organization and summary of loss function in deep learning]

introduce

Loss function is very important in machine learning or deep learning, suppose you are working on any problem and you have trained a machine learning model on a dataset and are ready to put it in front of your clients, but how can you be sure about this Which model will give the best results? Are there metrics or techniques that can help you quickly evaluate models on your dataset?

Yes, here the loss function plays an important role in machine learning or deep learning.

In this article, I organize and summarize different types of loss functions.

What is a loss function?

Wikipedia says that in mathematical optimization and decision theory, a loss or cost function (sometimes called an error function) is a function that maps an event or the value of one or more variables onto a real number that visually represents the event.
In simple terms, a loss function is a way of evaluating how well an algorithm models a dataset, and it is a mathematical function of the parameters of a machine learning algorithm.

Why are loss functions important?

Famed author Peter Druker said: You can't improve what you can't measure. That's why loss functions are used to evaluate how well your algorithm models your dataset. If the value of the loss function is low, then it is a good model, otherwise, we have to change the parameters of the model and minimize the loss.

Loss function and cost function

Most people confuse loss function and cost function, let us understand what is "loss function" and "cost function". Cost function and loss function are synonyms and used interchangeably, but they are different.

Loss function:

The loss function/error function is for a single training example/input.

Cost function:

The cost function is the average loss over the entire training dataset.

Loss Functions in Deep Learning

  1. Regression
    MSE (Mean Squared Error)
    MAE (Mean Absolute Error)
    Hubble Loss

  2. Categorical
    Binary Cross Entropy
    Categorical Cross Entropy

  3. Autoencoder
    KL Divergence

  4. GAN
    discriminator loss
    Minmax GAN loss

  5. Object detection
    Loss of focus

  6. word embedding
    triple loss

In this article, we will learn about regression loss and classification loss.

A. Regression Loss

1. Mean squared error/squared loss/L2 loss

Mean squared error (MSE) is the simplest and most common loss function, to calculate MSE you need to take the difference between the actual value and the model predicted value, square it, and average it over the whole dataset.

insert image description here

advantage

  1. Easy to explain.
  2. is always differentiated by square.
  3. There is only one local minimum.

shortcoming

  1. The error units in the grid. Because the units in the square are not properly understood.

  2. not robust to outliers

    Note: use a linear activation function in the regression of the last neuron.

2. Mean Absolute Error/L1 Loss

Mean Absolute Error (MAE) is also the simplest loss function, to calculate MAE you need to take the difference between the actual value and the model predicted value and then average it over the whole dataset.

insert image description here
advantage

  1. intuitive and simple
  2. Error Unit is the same as the output column.
  3. Robust to outliers

shortcoming

  1. Gradient descent cannot be used directly, subgradient calculations can be performed.
    Note: use a linear activation function in the regression of the last neuron.

3. Huber Loss

In statistics, Huber loss is a loss function for robust regression that is less sensitive to outliers in the data than squared error loss.

insert image description here
n - the number of data points.
y - the actual value of the data point, also known as the true value.
ŷ - the predicted value of the data point, which is returned by the model.
δ - Defines the point at which the Huber loss function transitions from quadratic to linear.

advantage

  1. Robust to outliers
  2. It is located between MAE and MSE.

shortcoming

  1. Its main disadvantage is the associated complexity, and in order to maximize the model accuracy, it is also necessary to optimize the hyperparameter δ, which increases the training requirements.

B. Classification loss

1. Binary cross entropy/log loss

It is used for binary classification problems like two classes, for example, whether a person is infected with the new coronavirus, or whether my article is popular or not.
Binary cross-entropy compares each predicted probability value, which may be 0 or 1, with the actual class output. It then calculates a score for the penalty probability based on the distance from the expected value, meaning how close or far it is to the actual value.
insert image description here
yi——actual value
yi^——predicted value of neural network

advantage

  1. The cost function is a differential function.

shortcoming

  1. multiple local minima
  2. not intuitive

Note: A sigmoid activation function is used in the classification of the last neuron.

2. Categorical cross entropy

Categorical cross-entropy is used for multiclass classification problems and also for softmax regression.

insert image description here

Note: use a softmax activation function in the last neuron for multiclass classification.

When to use categorical cross-entropy and sparse categorical cross-entropy?

If the target column is one-hot encoded as 0 0 1, 0 1 0, 1 0 0, etc., use categorical cross-entropy. If the target columns are numerically encoded as 1,2,3,4….n, etc., use sparse categorical cross-entropy.

Which is faster?

Sparse Categorical Cross Entropy is faster than Categorical Cross Entropy.

in conclusion

In this article, I mainly sort out and summarize different types of loss functions in deep learning for your reference and study. If there is anything wrong, please correct me, thank you!

Guess you like

Origin blog.csdn.net/vcsir/article/details/126093858