[Tensorflow2.0] loss function losses

In general, the objective function of supervised learning consists of a loss function and regularization terms. (Objective = Loss + Regularization)

For the keras model, the regularization terms in the objective function are generally specified in each layer. For example, using Dense's kernel_regularizer and bias_regularizer parameters to specify the weights. Use l1 or l2 regularization terms. In addition, you can also use kernel_constraint and bias_constraint to constrain the weights. Value range, this is also a means of regularization.

The loss function is specified when the model is compiled. For regression models, the commonly used loss function is the square loss function mean_squared_error.

For binary classification models, the binary cross-entropy loss function binary_crossentropy is usually used.

For the multi-classification model, if the label is encoded by the category serial number, the category cross-entropy loss function categorical_crossentropy is used. If the label is one-hot encoded, you need to use the sparse category cross-entropy loss function sparse_categorical_crossentropy.

If necessary, you can also customize the loss function. The custom loss function needs to receive two tensors y_true and y_pred as input parameters and output a scalar as the loss function value.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers,models,losses,regularizers,constraints

First, the loss function and regularization term

tf.keras.backend.clear_session()
 
model = models.Sequential()
model.add(layers.Dense(64, input_dim=64,
                kernel_regularizer=regularizers.l2(0.01), 
                activity_regularizer=regularizers.l1(0.01),
                kernel_constraint = constraints.MaxNorm(max_value=2, axis=0))) 
model.add(layers.Dense(10,
        kernel_regularizer=regularizers.l1_l2(0.01,0.01),activation = "sigmoid"))
model.compile(optimizer = "rmsprop",
        loss = "sparse_categorical_crossentropy",metrics = ["AUC"])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                4160      
_________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 4,810
Trainable params: 4,810
Non-trainable params: 0
_________________________________________________________________

Second, the built-in loss function

The built-in loss function generally has two forms: class implementation and function implementation.

For example: CategoricalCrossentropy and categorical_crossentropy are categorical cross-entropy loss functions, the former is the implementation form of the class, the latter is the implementation form of the function.

Some commonly used built-in loss functions are described below.

mean_squared_error (squared error loss, used for regression, abbreviated as mse, class implementation form is MeanSquaredError and MSE)
mean_absolute_error (absolute value error loss, used for regression, abbreviated as mae, class implementation form is MeanAbsoluteError and MAE)
mean_absolute_percentage_error (mean percentage error loss, used for regression, abbreviated as mape, class implementation form is MeanAbsolutePercentageError and MAPE)
Huber (Huber loss, only a class implementation form, used for regression, between mse and mae, is more robust to outliers, and has certain advantages over mse)
binary_crossentropy (binary cross entropy, used for binary classification, class implementation form is BinaryCrossentropy)
categorical_crossentropy (Categories cross entropy, used for multi-classification, requires label to be onehot encoding, class implementation form is CategoricalCrossentropy)
sparse_categorical_crossentropy (sparse category cross entropy, used for multi-classification, requires label to be a serial number encoding form, and class implementation form is SparseCategoricalCrossentropy)
hinge (hinge loss function, used for binary classification, the most famous application is as a loss function of support vector machine SVM, the class implementation form is Hinge)
kld (relative entropy loss, also called KL divergence, commonly used in the loss function of the maximum expected algorithm EM, an information measure of the difference between two probability distributions. The class implementation form is KLDivergence or KLD)
cosine_similarity (cosine similarity, can be used for multi-classification, the class implementation form is CosineSimilarity)

Three, custom loss function

The custom loss function receives two tensors y_true and y_pred as input parameters and outputs a scalar as the loss function value.

You can also subclass tf.keras.losses.Loss and rewrite the call method to implement the calculation logic of loss, so as to obtain the realization of the loss function class.

The following is a custom implementation demonstration of Focal Loss. Focal Loss is an improved form of loss function for binary_crossentropy.

In the case of category imbalance and the existence of difficult training samples, it can achieve better results than binary cross entropy.

For details, see "How to evaluate Kaiming's Focal Loss for Dense Object Detection?" 》

https://www.zhihu.com/question/63581984

def focal_loss(gamma=2., alpha=0.25):
 
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.sum(alpha * tf.pow(1. - pt_1, gamma) * tf.log(1e-07+pt_1)) \
           -tf.sum((1-alpha) * tf.pow( pt_0, gamma) * tf.log(1. - pt_0 + 1e-07))
        return loss
    return focal_loss_fixed
 
class FocalLoss(losses.Loss):
 
    def __init__(self,gamma=2.0,alpha=0.25):
        self.gamma = gamma
        self.alpha = alpha
 
    def call(self,y_true,y_pred):
 
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.sum(self.alpha * tf.pow(1. - pt_1, self.gamma) * tf.log(1e-07+pt_1)) \
           -tf.sum((1-self.alpha) * tf.pow( pt_0, self.gamma) * tf.log(1. - pt_0 + 1e-07))
        return loss

reference:

Open source e-book address: https://lyhue1991.github.io/eat_tensorflow2_in_30_days/

GitHub project address: https://github.com/lyhue1991/eat_tensorflow2_in_30_days