table of Contents

The objective function

The objective function, also known as loss of function, is the network performance function, one must also compile a model of two parameters. Due to the large loss of function types, the following official manual to keras network as an example.

In the official keras.io inside, there is the following information:

mean_squared_error或mse
mean_absolute_error或mae
mean_absolute_percentage_error或mape
mean_squared_logarithmic_error或msle
squared_hinge
hinge
binary_crossentropy (also known as against the number of losses, logloss)
categorical_crossentropy: long the class known as the logarithm of the loss, using the objective function noted, the label needs to be converted to the form (nb_samples, nb_classes)of the binary sequence
sparse_categorical_crossentrop: as above, but sparse acceptable label. Note that when using this function still requires the same dimensions and output values your tag, you may need to add a dimension of data on the label:np.expand_dims(y,-1)
kullback_leibler_divergence: the predicted values of the probability distribution of the true value of Q to P, the probability distribution information gain for measure of difference between the two distributions.
cosine_proximity: i.e., the opposite of the predicted value of the cosine distance average real tag

1.mean_squared_error

As the name suggests, is intended to mean square error, also known as standard deviation, abbreviated as MSE, it may reflect the degree of dispersion of a data set.

Standard error of each measurement is defined as the square root of the square error and the mean value, it is also known mean square error.

公式：
\[ M S E=\frac{1}{n} \sum_{i=1}^{n}\left(\tilde{Y}_{i}-Y_{i}\right)^{2} \]

Significance of the formula: can be understood as a function of n-dimensional space from a point to a straight line distance. (This is the understanding on the graph, the key to understanding how individuals)

2. mean_absolute_error

Translated as mean absolute error, acronym MAE.
The average absolute error is the mean absolute deviation of all the individual observations and the arithmetic mean.
Formula:
\ [\ mathrm MAE} = {\ n-FRAC {} {}. 1 \ sum_ {I}. 1 = n-^ {} \ left | -y_ F_ {I} {I} \ right | = \ {FRAC. 1} {n-} \ sum_ {I =. 1} ^ {n-} \ left | E_ {I} \ right | \]
( \ (F_i \) is the predicted value, \ (y_i \) is the actual value, the absolute error \ (\ left | E_ {I} \ right | = \ left | -y_ F_ {I} {I} \ right | \) )

3.mean_absolute_percentage_error

Translated as mean absolute percentage error, acronym MAPE.

Formula:
\ [\ mathrm {M} = \ n-FRAC {} {}. 1 \ sum_. 1} = {T} ^ {n-\ left | \ A_ FRAC {T} {} {} T {-F_ A_ {T} } \ right | \]
( \ (A_T \) represents the actual value, \ (f_t \) represents the predicted value)

4. mean_squared_logarithmic_error

Translated into logarithmic mean square error, abbreviated MSLE.

Formula : \
[\ varepsilon = \ FRAC {. 1} {n-} \ sum_ {I =. 1} ^ n-{\ left (\ log \ left (P_i +. 1 \ right) - \ log \ left (a_i +. 1 \ right ) \ right) ^ 2} \]
(n-value is the observed entire data set, \ (P_i \) is the predicted value, \ (a_i \) is the true value)

5.squared_hinge

Formula max (0,1-y_true * y_pred) ^ 2.mean (axis = -1), the mean square of the accumulated values is larger than the opposite results with 0 1 taken subtracting the prediction value and the actual value of the product.

6.hinge

Results of the formula is 0 max (0,1-y_true * y_pred) .mean (axis = -1), taking the predicted and actual values of the product of one minus the cumulative average ratio of relatively large value.

Hinge Loss is most commonly used to maximize the interval of SVM classification, the

Possible output of T = ± classifier scores. 1 and Y , the predicted value Y Hinge Loss is defined as follows:

L(y) = max(0,1-t*y)

y = w*x+b

Can be seen that when t and y when the same symbols (meaning y predicted correct classification)

|y|>=1

At this point the hinge loss

L (y) = 0

But if they are of opposite sign

L (y) will be in accordance with y increasing linear one-sided error. (Translated from wiki)

7.binary_crossentropy

That function, log loss, loss of the sigmoid function corresponding to the number of losses.

　　Formula: L (the Y, P (the Y | X-)) = -log P (the Y | X-)

　　This function is mainly used for maximum likelihood estimation, this will facilitate the calculation. Because the maximum likelihood estimate for the derivative will be very troublesome, and generally seek logarithmic derivative and then seek extreme points.

　　Usually the loss of the loss function and each of the data, just taking the logarithm of each of the losses can add up. The negative sign means that the maximum likelihood estimation corresponds to the minimum losses.

8.categorical_crossentropy

Multi logarithmic loss function classification, and classification softmax corresponding loss function, Li supra.

　　tip: This loss function and belong to a class of loss function of the difference between the number, sigmoid and softmax primarily, sigmoid for binary, softmax for multi-classification.

One explanation:
SoftMax formula:

logistic regression objective function is based on the maximum likelihood to do. That is assuming that x belongs to the class y, to predict the probability oy, we need to maximize oy.

Step 2 calculates softmax_loss comprising:

(1) softmax normalization calculated probability
\ [x_i-x_i = max (x_1, ..., x_n) \]
\ [P_i = \ E {FRAC x_i ^ {} {} \ sum_ J = {1} ^ { n} e ^ {x_j}} \]

reference:

[] Https://www.cnblogs.com/smuxiaolei/p/8662177.html
[official document] https://keras.io/zh/losses/

keras model.compile (loss = 'objective function', optimizer = 'adam', metrics = [ 'accuracy'])