Evaluation metrics for classification networks

I have been doing research on target detection before. There are two main tasks in target detection, one is classification regression and the other is position regression. The evaluation indicators used are: AP, mAP, Recall, Precision, F1 value , the first two are generally used the most, AP is the area surrounded by P and R. But in this article, mainly introduces the evaluation indicators in the classification network.

There are actually many evaluation indicators commonly used in classification networks, such as accuracy (Acc), error rate (ErroRate), precision (Precision), recall (Recall), F1, ROC, etc.

1. Accuracy (Acc)

When we train classification tasks, we often print acc after each training iteration. So how is acc calculated? Let’s take a look at the formula first:

Accuracy=\frac{TP+TN}{TP+TN+FP+FN}

The meaning of the variables in the formula:

TP: represents a positive sample that is classified correctly;

TN: represents a correctly classified negative sample;

FP: Indicates a positive sample that has been classified incorrectly (actually, this category is a negative sample but is recognized as a positive sample);

FN: Indicates a negative sample that has been classified incorrectly (actually, this category is a positive sample but is recognized as a negative sample); a>

To sum up: acc is the proportion of samples correctly identified by the classification network among all samples!

This evaluation index is often used by us, but this evaluation has certain limitations. For example, there is a large gap in the number of categories in the data set, resulting in imbalance< a i=2>, then acc is unable to objectively evaluate the performance of the algorithm.

Therefore, before we train the network, we should analyze the data set when we get it.

2. Error rate

Once you understand the accuracy acc, you can know the error rate. The error rate is easy to calculate:

ErrorRate=1-Acc

This evaluation index, like acc, is not objective enough in an unbalanced data set, and it is impossible to locate the specific cause of the error rate.

3. Precision and Recall

Precision P and recall rate R are often used as evaluation indicators in target detection. P and R have different goals, which will be further introduced next.

Precision is also calledprecision rate, which isspecific to the prediction result< a i=4>, represents the probability of the actual positive sample among all identified as positive samples a>.

For example, in the second classification, the network identifies 100 targets as the category "dog", but 80 of these 100 targets are actually real dogs (that is, positive samples), so the P at this time is 80%. The higher the P value, the higher the recognition accuracy. The formula is as follows:

Precision=\frac{TP}{TP+FP}

The recall rate is also calledrecall rate, which isbased on the original sample< a i=4>, represents the probability of being predicted as a positive sample among the actual positive samples (popular It means how many positive samples the network really found, so it is also called a full search). It is also hoped that the higher the value, the better. The formula is as follows:

Recall=\frac{TP}{TP+FN}

Ideally, we want both to be as high as possible, but in reality they are a pair of contradictions, one is high and the other is easily low.

4.F1 value

The F1 value is the weighted harmonic average of P and R. It is an indicator that can reflect the overall quality of P and R. The higher the F1 value, the better. The formula is as follows:

F1=\frac{2\times P\times R}{P+R}

5.ROC curve

Here comes the highlight, which is also the focus of this article. The ROC curve is also called the receiver operating curve. It is the curve that the classification ability of the classifier changes with the change of the classification threshold. The graph looks like this:

Let’s talk about the conclusion first:The steeper the curve and the closer it is to the upper left corner, the better the model is..

The reason will be explained next. The abscissa in the graph is called the False Positive Rate (FPR), which has a similar meaning to the previous FP. In allrealnegative samplesThe proportion of in , the calculation formula is as follows: that are identified as positive samples

FPR=\frac{FP}{FP+TN}

The ordinate is the true positive rate (TPR), which has a similar meaning to the previous TP, indicating that allrealpositive samplesIn is the recall rate. The formula is as follows: recognized as positive samples, the proportion of samples

TPR=\frac{TP}{TP+FN}

We hope that the lower the FPR, the better (lower false detections), and the higher the TPR, the better (high accuracy), so the steeper the curve, the closer to the upper left corner, the better, that is, the smaller the abscissa, the better the ordinate. The larger the value.

Then the point in the lower left corner of the ROC curve will be expressed as: all identified as negative samples

Then the point in the upper right corner of the ROC curve will be expressed as: all identified as positive samples

So it can be understood that FPR can judge the degree of false detection of the model, and TPR is the accuracy of the model against the positive sample< a i=3>Recall level.

And why is the ROC evaluation index more objective than the previous Acc? This is because even if there is an imbalance problem in the data set, ROC is not sensitive to this because only TPR and FPR are related.

As we mentioned before, the ROC curve will change with the change of the threshold. If our threshold is small, then the TPR will be high (hope to get more positive samples), then the FPR will also be high (false detections will also be (increases), the curve will shift to the upper right corner. The higher the threshold, the TPR will decrease and the FPR will decrease (hoping for low false detections), then the curve will shift to the lower left corner.

Summarized as:

The larger the threshold, the lower the TPR and FPR, and the curve shifts to the lower left corner;

The smaller the threshold, the TPR and FPR both increase, and the curve shifts to the upper right corner;

Therefore, we can get the recommended threshold based on ROC, or the optimal threshold for use in production tasks. The method is:

1. Find the point closest to the upper left corner of the ROC curve, and the threshold corresponding to this point is the optimal threshold;

2. According to the needs of specific business scenarios, weigh the classification importance of positive samples and negative samples to determine the optimal threshold to determine whether summoning is more important or reducing false detections is more important.


References:

[1] Performance metrics of classification algorithms

Guess you like

Origin blog.csdn.net/z240626191s/article/details/133777755