Detailed explanation of concepts related to machine vision performance indicators - confusion matrix, IoU, ROC curve, mAP, etc.

Table of contents

0. Preface

1. Image classification performance indicators

1.1 Confusion Matrix

1.2 Accuracy (Precision)

1.3 Recall rate (Recall)

1.4 F1 score

1.5 ROC curve (Receiver Operating Characteristic curve)

1.6 mAP(mean Average Precision)

2. Image segmentation performance indicators

2.1 Intersection over Union (IoU, Intersection over Union)

2.2 Precision&Recall&F1 score

2.3 Dice coefficient


0. Preface

In accordance with international practice, I would like to first declare: This article is only my own understanding of learning. Although I refer to the valuable insights of others, the content may contain inaccuracies. If you find errors in the article, I hope to criticize and correct them so that we can make progress together.

This article will systematically explain machine vision performance-related indicators through examples. I can roughly divide these indicators into two categories: image classification performance indicators and image segmentation performance indicators.

1. Image classification performance indicators

This type of indicator is used to evaluate the accuracy of the model in image classification: the proportion of objects in the image that can be correctly classified into the corresponding categories, the proportion of missed recognitions, the proportion of incorrect recognitions, etc.

1.1 Confusion Matrix

The confusion matrix is ​​the basis of this type of performance metric. The confusion matrix is ​​an evaluation method used in supervised learning to evaluate the prediction ability of a classification model on a test data set. A confusion matrix is ​​a two-dimensional matrix in which each row represents the actual label and each column represents the predicted label.

The four basic indicators of the confusion matrix are True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN), which respectively represent correctly classified The number of samples of positive examples, incorrectly classified positive examples, correctly classified negative examples, and incorrectly classified negative examples.

The following is an example to explain the confusion matrix: Suppose we have a deep learning model to identify whether there is Ultraman in the image. We have the following 9 test samples. After recognition by the deep learning model, the output is as follows:

It can be seen that the corresponding numbers of the above four indicators are:

  • TP (the model predicts Ultraman, and the real one also has Ultraman): 5
  • TN (the model predicts no Ultraman, and there is no Ultraman in reality): 1
  • FP (the model predicts Ultraman, but there is actually no Ultraman): 2
  • FN (model predicts no Ultraman, but actually has Ultraman): 1

The corresponding confusion matrix is:

Confusion matrix for judging Ultraman true category
yes no
Prediction category yes 5(TP) 2(FP)
no 1(FN) 1(TN)

Although the above example uses a two-class classification problem (whether there is a problem, whether there is a problem), the confusion matrix can also be extended to multi-classification problems, such as judging Ultraman Tiga, Ultraman Taro, and Seven in the image. Terman et al.

Confusion matrix for judging Ultraman true category
Tiga Taylor …… Severn
Prediction category Tiga
Taylor
……
Severn
1.2 Accuracy (Precision)

The mathematical definition of accuracy is:

Precision = \frac{TP}{TP+FP}

The accuracy rate describes: if the model output is "yes", what proportion is actually "yes", that is, whether the model prediction is accurate.

1.3 Recall rate (Recall)

The mathematical definition of recall is:

Recall = \frac{TP}{TP+FN}

The recall rate describes: if all the predictions are actually "yes", what proportion of the model can output "yes", that is, the model's predictions are incomplete.

1.4 F1 score

The mathematical definition of F1 value is:

F1 score = \frac{2\times Precision\times Recall }{Precision+Recall}

Substituting the above formulas of Precision and Recall can be simplified to:

F1 score = \frac{2TP}{2TP+FN+FP}

The F1 value is an evaluation metric that combines the precision and recall of the model. It is the harmonic mean of precision and recall and is therefore more comprehensive and accurate when evaluating binary classifiers. It has important reference significance in model selection, parameter tuning and result interpretation. At the same time, F1 score can also be used to compare the performance of different models or algorithms in order to select the optimal model or algorithm.

1.5 ROC curve (Receiver Operating Characteristic curve)

This indicator is a bit complicated. . .

First, the abscissa of the ROC curve is the false positive rate FPR (False Positive Rate), FPR=FP/(FP+TN). The ordinate is the true positive rate TPR (True Positive Rate, namely Recall), TPR= Recall= TP/(TP+FN).

Then back to Altman's example above: We need to know that the output of the deep learning network for classification problems is not "yes" or "no", but a confidence probability of 0~1.

If we set a threshold, for example, if the confidence probability of Ultraman in the model calculation output is above 0.6, we will consider the model judgment to be Ultraman. The above example should become like this:

Obviously, if we adjust this judgment threshold, the prediction result "yes" or "no" may change, then both FPR and TPR may change, so that there will be a new point coordinate (FPR, TPR).

If we trace all (FPR, TPR) in coordinates and connect them in order, we will get the ROC curve.

In particular, if we set the threshold to 0, that is, all results output by the model are "yes", then TN = FN = 0, (FPR, TPR) = (1, 1); if we set the threshold to 1, that is, all results output by the model are "none", which is TP=FP=0, (FPR, TPR) = (0, 0). In this way, we know that the ROC curve must be between the two points (0, 0) and (1, 1). For example, the following picture:

The slope and convexity of the ROC curve reflect the prediction performance of the classifier. The closer the ROC curve is to the upper left corner, the better the performance of the classifier. In addition, the area under the ROC curve AUC (Area Under the ROC Curve) is also a commonly used indicator. The larger the AUC value, the better the prediction performance of the classifier. An AUC value of 1 means that the prediction of the classifier is completely accurate.

1.6 mAP(mean Average Precision)

How to translate it into Chinese. . . Average average accuracy?

First we need to introduce AP. We create another curve according to the ROC production idea above: its abscissa is Recall and the ordinate is Precision. This time the threshold we adjusted is no longer the confidence probability, but IoU (or IoU itself can also be counted as a confidence probability, as will be introduced below).

By adjusting the IoU from 0 to 1, we obtain multiple coordinate points (Recall, Precision) and connect them in sequence to obtain the following Precision-Recall curve:

Integrating this curve is AP:

AP = \int_{0}^{1} p(r)dr

If there are multiple objects we want to identify (Ultraman Tiga, Ultraman Taro, Ultraman Seven, etc.), then we will have multiple APs, and the average of them is mAP.

The ROC curve is used to evaluate the performance of the binary classifier , and mAP (mean Average Precision) is an important indicator in the target detection task and is used to evaluate the accuracy of the model in detecting multiple categories of targets .

2. Image segmentation performance indicators

This type of indicator is used to evaluate the accuracy of image segmentation: it can accurately segment the target image and describe the difference between the predicted object position and the actual position.

Let’s also give an example from Ultraman:

The blue box A here is Ultraman's true location, which has been marked in advance. The red box B is the boundary divided by the model for Ultraman.

2.1 Intersection over Union (IoU, Intersection over Union)

IoU is the ratio of the intersection and union of the predicted area and the real area:

IoU = \frac{A\bigcap B}{A\bigcup B}

mIoU (Mean Intersection over Union) is the average of the IoU of all categories and is used to evaluate the performance of multi-class segmentation models.

2.2 Precision&Recall&F1 score

These three indicators have the same definition idea as the above classification problem, so together, their mathematical definition is:

Precision = \frac{A\bigcap B}{B}

Recall = \frac{A\bigcap B}{A}

F1 score = \frac{2\times Precision\times Recall }{Precision+Recall}

2.3 Dice coefficient

The Dice coefficient is the ratio of the intersection of the predicted area and the true area to the sum of the two:

Dice = \frac{2A\bigcap B}{A+B}

Guess you like

Origin blog.csdn.net/m0_49963403/article/details/132866665