[Comprehensive summary] Metrics evaluation function in model.compile method

A summary of metrics evaluation function in model.compile method

Problem introduction

  You will find that in the process of doing experiments, we often find that we need to write a parameter in the process of Model.compile, such as: metrics=['accuracy'], then generally there are few articles or code comments at this time The reason or significance of this parameter selection will be mentioned, especially the author was a novice in the early stage. At the beginning, I did several two-class neural network experiments. I even thought that as long as I kept writing metrics=['accuracy'] What error will be reported. But this kind of thinking is actually wrong and absurd. This parameter actually has different choices and functions when facing different data sets and problems.

Simply explain the merit function

  First of all, the meaning of this parameter is to define the evaluation function. What is the evaluation function? Students who have run the model must know that when you print the results, you will see the acc and loss of each epoch. The acc and val_acc are defined by The evaluation function is obtained. In short, the choice of the evaluation function directly determines the accuracy expression you get.

Types of evaluation functions (reasonable examples to assist understanding)

  Therefore, what kind of evaluation function you choose to use for different questions will determine your final training set score, which is still very important, and in the process of tuning, some friends do not like to watch loss, but like to see the validation set score val_acc Then you also need to understand the classification of the evaluation function.

  In fact, Keras defines 6 different accuracy for us. The most commonly used is the accuracy I just gave. Now let’s give a more precise example:

  The accuracy real label and model prediction are both scalar. If the real label sequence is [1, 1, 3, 0, 2, 5] and the prediction sequence is [1, 2 , 3, 1 , 2, 5], you can see When four hits, [accuracy] = 4/6 = 0.6667.

  Another common evaluation function is binary_accuracy, which is suitable for the processing of binary classification problems. First, the sample set should have a real label sequence, such as [0, 1, 1, 0], and the model predicts it as a probability sequence, such as [0.6, 0.7, 0.6, 0.9], but now we only have a probability sequence, how should we compare it with the label sequence? , The evaluation function has a threshold parameter, the default value of the parameter is 0.5. In the prediction sequence, the probability> threshold is set to 1, and the probability <= threshold is set to 0. So the model prediction will be converted from [0.3, 0.7, 0.6, 0.9] to [0, 1, 1, 1 ], and then the first accuracy calculation method (hit 3/4, so accuracy = 3/4 = 0.7500) .

  When faced with a multi-classification problem or a multi-label task, the evaluation function may usually use two functions categorical_accuracy and sparse_categorical_accuracy.

  The first is categorical_accuracy. First, the true value and the predicted value are both a one-hot vector. The strategy of this evaluation function is to compare whether the index value of the largest element in the two vectors is consistent. Note that only one value is compared here, that is, the largest one. Value index, which is suitable for multi-class single-label tasks, but not suitable for multi-label tasks. For example, it is like the true value is [0, 0, 3 , 1] and the predicted value is [0.2, 0.1, 0.9 , 0.5], the prediction is considered accurate.

  The true value in sparse_categorical_accuracy is already a subscript index, and the predicted value is still a vector. Compare whether the value in the prediction sequence corresponding to the subscript value of the true value is the largest in the entire sequence, if it is, it is deemed accurate. . For example, the true value is 2 (note that the subscript starts from 0), the prediction sequence is [0.1, 0.38, 0.79 , 0.5], and the evaluation result is deemed to be accurate.

  Two Evaluated function below to take good understanding of both the core idea is actually: Top k predicted sequence element index can contain the true value of the maximum sequence index scoring functions. This sentence is very abstract after hearing it, so it doesn't matter if we give examples slowly.

  First look at top_k_categorical_accuracy, which is equivalent to adding top_k to categorical_accuracy, which is no longer a single sequence subscript comparison. Categorical_accuracy requires that the predicted score of the sample on the true value category is the maximum value of the predicted scores on all categories before it is considered a predicted pair. The top_k_categorical_accuracy only requires sample predicted scores on the true value ranked in the top category of its predicted scores in all categories k name on the line .

  Give a detailed example: For example, there are 5 samples, and their true values ​​are [[0, 1, 0], [0, 0, 1], [0, 1, 0], [0, 1, 0], [ 1, 0, 0]], the prediction sequence is [[0.8, 0.2, 0.1], [0.3, 0.6, 0.1], [0.5, 0.4, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1 ]], according to the previous knowledge we can calculate its categorical_accuracy=40%, but if the evaluation function you choose is top_k_categorical_accuracy, the answer is completely different. It can be said that top_k has a more relaxed constraint, because it is equivalent to allowing the maximum value only You need to rank in the top k even if the prediction is correct. Of course, this also means that the accuracy of the prediction is closely related to the value of k. For example, in the example just now, if the value of k >= 3, the top_k_categorical_accuracy is 100%, because there are 3 categories in each vector, and the maximum value must be within the top 3, which means that it is 100% accurate. So it makes sense to set the value of k to be less than the vector element length 3. For example, if we set k = 2, then top_k_ accuracy = 75%. The specific calculation method is: 1) First, convert the true value sequence to a non-onehot form, ie [1, 2, 1, 1, 0] (that is, take the subscript index of the maximum value to form a new vector) 2) Calculate the prediction The label of the value of top_k, for example, when k=2, the sequence of predicted values ​​can be transformed into = [[0, 1], [0, 1], [0, 1], [0, 1], [0, 2] ] (That is, find the subscript Index corresponding to the first two larger values ​​in the prediction sequence) 3) Calculate the accuracy rate according to whether the true label of each sample is within the top_k of the predicted label. Take the above 5 samples as an example, 1 In [0, 1], 2 is not in [0, 1], 1 is in [0, 1], 1 is in [0, 1], 0 is in [0, 2], a total of 5 samples are predicted 4, so top_k_categorical_accuracy=80% when k=2. Note that the default value of k in Keras is 5 during use, so if you use the default value when calling, you need to ensure that the number of classifications>

  sparse_top_k_categorical_accuracy and top_k_categorical_accurac have the same idea, except that the true value of sparse_top_k is not in onehot form. (It will be helpful to understand the difference between the two scoring functions in the more classification just now)

  Suppose that given 4 samples, the true value sequence is [2, 1, 2, 2], and the prediction sequence is [[0.2, 0.5, 0.15], [0.5, 0.3, 0.1], [0.3, 0.7, 0.2], [0.9, 0.05, 0.4]]. The hit rate of sparse_top_k_categorical_accuracy should be calculated according to the following method: First, suppose that k=2 is selected, then the prediction sequence is first converted into [[0, 1], [0, 1], [0, 1], [0, 2]], We can check one by one by the method just now, 2 is not in [0, 1], 1 is in [0, 1], 2 is not in [0, 1], and 2 is in [0, 2]. So 4 samples hit 2 and the hit rate is 50%.

  In fact, you can find that when you set k to 1 in the two top_k methods, the top_k_categorical_accuracy and sparse_top_k_categorical_accuracy methods will degenerate into the first two categorical_accuracy and sparse_categorical_accuracy methods due to the small value of k.

Summary of usage scenarios

  According to the principles we introduced just now, you should have a good understanding of the 6 different evaluation functions. According to the characteristics of different evaluation functions, a summary is given here. Of course, it is worth mentioning that there are many uses of accuracy metric in keras. You can choose the appropriate accuracy metric according to your actual situation. What we are discussing here are just a few of the more common methods.

  1) If the true value label and the predicted value are both specific index values ​​(such as the true value sequence = [1, 1, 1], y_pred=[0, 1, 1]), the accuracy evaluation function can be used directly to meet the requirements. Part of the situation. (That is a very simple application scenario, there is a clear classification information label in the data set)

  2) If the true value label is a specific index value, and the predicted value is in the form of a vector, and the problem is a multi-class problem (such as the true value = [1, 1, 1], the predicted sequence = [[0.2, 0.3, 0.5], [0.45, 0.2, 0.35], [0, 0.24, 0.78]]), use the sparse_categorical_accuracy evaluation function to solve the problem.

  3) If the true value label is in one-hot form, and the predicted value is in vector form (eg true value = [[0, 1, 0], [0, 0, 1], [1, 0, 0]], prediction Value = [[0.52, 0.33, 0.15], [0.9, 0.1, 0], [0, 0.4, 0.6]]), use the categorical_accuracy evaluation function.

  At present, due to the limited experiments done, the understanding is only based on a relatively shallow level. The main purpose of this article is to help everyone understand the principles of each evaluation function. I hope it can help. Of course, the author is also learning from the shoulders of giants. To summarize the usage of this article, refer to the blogs of three big brothers:

  https://blog.csdn.net/qq_36588760/article/details/105689736

  https://blog.csdn.net/weixin_44866160/article/details/106437277

  https://blog.csdn.net/qq_20011607/article/details/89213908

       Learning is endless, pay tribute to the predecessors, I hope everyone will work hard together.

Guess you like

Origin blog.csdn.net/qq_39381654/article/details/108747701