keras中的损失函数

只是为了记录下如何选择损失函数，把公式贴上来了，有些损失函数的公式没找到，以后找到了再贴上来。

损失函数是模型优化的目标，所以又叫目标函数、优化评分函数，在keras中，模型编译的参数loss指定了损失函数的类别，有两种指定方法：

model.compile(loss='mean_squared_error', optimizer='sgd')

或者

from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer='sgd')

你可以传递一个现有的损失函数名，或者一个TensorFlow/Theano符号函数。该符号函数为每个数据点返回一个标量，有以下两个参数:

y_true: 真实标签. TensorFlow/Theano张量
y_pred: 预测值. TensorFlow/Theano张量，其shape与y_true相同

实际的优化目标是所有数据点的输出数组的平均值。

mean_squared_error

mean_squared_error(y_true, y_pred)

源码：

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

说明：

MSE:

L = \frac{1}{n} \sum_{i = 1}^{n} (y_{p r e d}^{(i)} - y_{t r u e}^{(i)})^{2}

$L= \frac {1}{n} \sum^n_{i=1} (y_{pred}^{(i)} - y_{true}^{(i)})^2$

mean_absolute_error

mean_absolute_error(y_true, y_pred)

源码：

def mean_absolute_error(y_true, y_pred):
    return K.mean(K.abs(y_pred - y_true), axis=-1)

说明：

MAE：

L = \frac{1}{n} \sum_{i = 1}^{n} | (y_{p r e d}^{(i)} - y_{t r u e}^{(i)}) |

$L= \frac {1}{n} \sum^n_{i=1} |(y_{pred}^{(i)} - y_{true}^{(i)})|$

mean_absolute_percentage_error

mean_absolute_percentage_error(y_true, y_pred)

源码：

def mean_absolute_percentage_error(y_true, y_pred):
    diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true),
                                            K.epsilon(),
                                            None))
    return 100. * K.mean(diff, axis=-1)

说明：

MAPE：

L = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{p r e d}^{(i)} - y_{t r u e}^{(i)}}{y_{t r u e}^{(i)}} | \cdot 100

$L= \frac {1}{n} \sum^n_{i=1} |\frac {y_{pred}^{(i)} - y_{true}^{(i)}}{y_{true}^{(i)}}| \cdot 100$

mean_squared_logarithmic_error

mean_squared_logarithmic_error(y_true, y_pred)

源码：

def mean_squared_logarithmic_error(y_true, y_pred):
    first_log = K.log(K.clip(y_pred, K.epsilon(), None) + 1.)
    second_log = K.log(K.clip(y_true, K.epsilon(), None) + 1.)
    return K.mean(K.square(first_log - second_log), axis=-1)

说明：

MSLE：

L = \frac{1}{n} \sum_{i = 1}^{n} (l o g (y_{t r u e}^{(i)} + 1) - l o g (y_{p r e d}^{(i)} + 1))^{2}

$L= \frac {1}{n} \sum^n_{i=1} (log(y_{true}^{(i)} +1) - log( y_{pred}^{(i)}+1))^2$

squared_hinge

squared_hinge(y_true, y_pred)

源码：

def squared_hinge(y_true, y_pred):
    return K.mean(K.square(K.maximum(1. - y_true * y_pred, 0.)), axis=-1)

L = \frac{1}{n} \sum_{i = 1}^{n} (m a x (0, 1 - y_{p r e d}^{(i)} \cdot y_{t r u e}^{(i)}))^{2}

$L= \frac {1}{n} \sum^n_{i=1} (max(0,1-y_{pred}^{(i)} \cdot y_{true}^{(i)}))^2$

hinge

hinge(y_true, y_pred)

源码：

def hinge(y_true, y_pred):
    return K.mean(K.maximum(1. - y_true * y_pred, 0.), axis=-1)

说明：

L = \frac{1}{n} \sum_{i = 1}^{n} m a x (0, 1 - y_{p r e d}^{(i)} \cdot y_{t r u e}^{(i)})

$L= \frac {1}{n} \sum^n_{i=1} max(0,1-y_{pred}^{(i)} \cdot y_{true}^{(i)})$

categorical_hinge

categorical_hinge(y_true, y_pred)

源码：

def categorical_hinge(y_true, y_pred):
    pos = K.sum(y_true * y_pred, axis=-1)
    neg = K.max((1. - y_true) * y_pred, axis=-1)
    return K.maximum(0., neg - pos + 1.)

logcosh

logcosh(y_true, y_pred)

源码：

def logcosh(y_true, y_pred):
    """Logarithm of the hyperbolic cosine of the prediction error.
    `log(cosh(x))` is approximately equal to `(x ** 2) / 2` for small `x` and
    to `abs(x) - log(2)` for large `x`. This means that 'logcosh' works mostly
    like the mean squared error, but will not be so strongly affected by the
    occasional wildly incorrect prediction.
    # Arguments
        y_true: tensor of true targets.
        y_pred: tensor of predicted targets.
    # Returns
        Tensor with one scalar loss entry per sample.
    """
    def _logcosh(x):
        return x + K.softplus(-2. * x) - K.log(2.)
    return K.mean(_logcosh(y_pred - y_true), axis=-1)

categorical_crossentropy

categorical_crossentropy(y_true, y_pred)

源码：

def categorical_crossentropy(y_true, y_pred):
    return K.categorical_crossentropy(y_true, y_pred)

注意: 当使用categorical_crossentropy损失时，你的目标值应该是分类格式 (即，如果你有10个类，每个样本的目标值应该是一个10维的向量，这个向量除了表示类别的那个索引为1，其他均为0)。为了将整数目标值转换为分类目标值，你可以使用Keras实用函数to_categorical：

from keras.utils.np_utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)

sparse_categorical_crossentropy

sparse_categorical_crossentropy(y_true, y_pred)

源码：

def sparse_categorical_crossentropy(y_true, y_pred):
    return K.sparse_categorical_crossentropy(y_true, y_pred)

def sparse_categorical_crossentropy(target, output, from_logits=False):
    """Categorical crossentropy with integer targets.

    # Arguments
        target: An integer tensor.
        output: A tensor resulting from a softmax
            (unless `from_logits` is True, in which
            case `output` is expected to be the logits).
        from_logits: Boolean, whether `output` is the
            result of a softmax, or is a tensor of logits.

    # Returns
        Output tensor.
    """
    # Note: tf.nn.sparse_softmax_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output)

    output_shape = output.get_shape()
    targets = cast(flatten(target), 'int64')
    logits = tf.reshape(output, [-1, int(output_shape[-1])])
    res = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=targets,
        logits=logits)
    if len(output_shape) >= 3:
        # if our output includes timestep dimension
        # or spatial dimensions we need to reshape
        return tf.reshape(res, tf.shape(output)[:-1])
    else:
        return res

binary_crossentropy

binary_crossentropy(y_true, y_pred)

源码：

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

def binary_crossentropy(target, output, from_logits=False):
    """Binary crossentropy between an output tensor and a target tensor.

    # Arguments
        target: A tensor with the same shape as `output`.
        output: A tensor.
        from_logits: Whether `output` is expected to be a logits tensor.
            By default, we consider that `output`
            encodes a probability distribution.

    # Returns
        A tensor.
    """
    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

kullback_leibler_divergence

kullback_leibler_divergence(y_true, y_pred)

源码：

def kullback_leibler_divergence(y_true, y_pred):
    y_true = K.clip(y_true, K.epsilon(), 1)
    y_pred = K.clip(y_pred, K.epsilon(), 1)
    return K.sum(y_true * K.log(y_true / y_pred), axis=-1)

poisson

poisson(y_true, y_pred)

源码：

def poisson(y_true, y_pred):
    return K.mean(y_pred - y_true * K.log(y_pred + K.epsilon()), axis=-1)

说明：

L = \frac{1}{n} \sum_{i = 1}^{n} (y_{p r e d}^{(i)} - y_{t r u e}^{(i)} \cdot l o g (y_{p r e d}^{(i)}))

$L= \frac {1}{n} \sum^n_{i=1} (y_{pred}^{(i)} - y_{true}^{(i)}\cdot log(y_{pred}^{(i)}))$

cosine_proximity

cosine_proximity(y_true, y_pred)

源码：

def cosine_proximity(y_true, y_pred):
    y_true = K.l2_normalize(y_true, axis=-1)
    y_pred = K.l2_normalize(y_pred, axis=-1)
    return -K.sum(y_true * y_pred, axis=-1)

说明：

L = - \frac{\sum_{i = 1}^{n} y_{t r u e}^{(i)} \cdot y_{p r e d}^{(i)}}{\sqrt{\sum_{i = 1}^{n} (y_{t r u e}^{(i)})^{2}} \cdot \sqrt{\sum_{i = 1}^{n} (y_{p r e d}^{(i)})^{2}}}

$L= - \frac{ \sum^n_{i=1} y_{true}^{(i)} \cdot y_{pred}^{(i)}} {\sqrt{ \sum^n_{i=1} (y_{true}^{(i)})^2} \cdot \sqrt {\sum^n_{i=1} (y_{pred}^{(i)})^2}}$

简写

mse = MSE = mean_squared_error
mae = MAE = mean_absolute_error
mape = MAPE = mean_absolute_percentage_error
msle = MSLE = mean_squared_logarithmic_error
kld = KLD = kullback_leibler_divergence
cosine = cosine_proximity

参考

Keras中文文档

Loss Functions in Artificial Neural Networks