[CV] From classification to regression: common algorithm evaluation indicators, such as ROC, MAP, etc.

classification problem

Accuracy

Accuracy is one of the simplest and most commonly used classification metrics. The accuracy rate refers to the proportion of the number of samples correctly predicted to the total number of samples. The higher the accuracy rate, the better the classification performance of the model.
A accuracy = TP + TNTP + FP + TN + FN Accuracy=\frac{TP+TN}{TP+FP+TN+FN}Accuracy=TP+FP+TN+FNTP+TN
Among them, TP TPTP represents the true example, that is, the number of samples that are actually positive and predicted to be positive;TN TNTN represents the true negative example, that is, the number of samples that are actually negative and predicted to be negative;FP FPFP represents false positives, that is, the number of samples that are actually negative but predicted to be positive;FN FNFN represents false negatives, that is, the number of samples that are actually positive but predicted to be negative.

Precision

The precision rate is used to measure the proportion of the samples that the model predicts are positive, that is, to measure the accuracy of the model. The higher the precision rate, the higher the proportion of the samples predicted by the model as the positive class are actually positive.
P precision = TPTP + FP Precision=\frac{TP}{TP+FP}Precision=TP+FPTP

Recall rate or true rate (Recall)

The recall rate is used to measure the proportion of all samples that are actually positive that are predicted to be positive by the model (effective alarm, sensitivity), that is, to measure the comprehensiveness of the model. The higher the recall rate, the better the model can find positive samples.
R ecall = TPTP + FN Recall=\frac{TP}{TP+FN}Recall=TP+FNTP

False Positive Rate (FPR)

Negative examples predicted as positive examples among all negative examples (false positive, false positive rate):
FPR = FPFP + TN FPR = \frac{FP}{FP + TN}FPR=FP+TNFP,
where FP represents false positives and TN represents true negatives.

Specificity

It refers to the proportion of all samples that are actually negative examples that are correctly predicted as negative examples. Specifically, specificity can be calculated with the following formula:
Specificity = TNTN + FP Specificity = \frac{TN}{TN + FP}Specificity=TN+FPTN

ROC curve

ROC curve is a graphical tool to measure the performance of classification models. The horizontal axis of the ROC curve is the false positive rate (FPR), which is 1-specificity, and the vertical axis is the true positive rate (TPR), whose shape can reflect the performance of the model. The area under the ROC curve (AUC) is also a commonly used classification indicator. The higher the AUC, the better the performance of the model. The value range of AUC is between 0.5 and 1, and the closer the AUC is to 1, the better the performance of the classifier.
insert image description here
The ROC curve is drawn by changing the threshold of the classification model (Threshold). The output result of the classification model is generally a continuous real value, and we can use this value as a threshold to divide the samples into positive examples and negative examples. When the threshold is positive infinity, all samples are judged as negative; when the threshold is negative infinity, all samples are judged as positive; when the threshold is 0, all samples are judged as positive. By changing the threshold, we can get different FPR and TPR to draw the ROC curve.

PR curve (precision-recall curve)

For each class, the predicted boxes are ordered from highest to lowest confidence.

Define a series of thresholds [ t 1 , t 2 , . . . , tn ] [t_1, t_2, ..., t_n][t1,t2,...,tn] , wheret 1 t_1t1is the confidence of the prediction box with the lowest confidence, tn t_ntnis the confidence of the predicted box with the highest confidence.

For each threshold ti t_iti, put confidence higher than ti t_itiThe prediction box of is regarded as a positive sample, lower than ti t_itiThe predicted box is regarded as a negative sample. According to the IoU value between the predicted frame and the real frame, a set of matching positive samples and corresponding IoU values ​​are obtained.

Calculate the precision and recall rate under this threshold, where the precision is equal to the confidence of the matched predicted box, and the recall rate is the number of positive samples divided by the total number of real boxes.

Use the precision and recall under all thresholds to draw the precision-recall curve, that is, the precision is the horizontal axis, the recall is the vertical axis, and the coordinates of each point are ( P i , R i ) ( P_i, R_i)(Pi,Ri) , of whichP i P_iPiThe recall rate is greater than or equal to ti t_itiAccuracy when R i R_iRiThe recall rate is greater than or equal to ti t_itiThe recall rate when .
insert image description here

AP value

AP is a measure of the area under the precision-recall curve, which represents the average performance of the model at all recall rates. When calculating AP, we need to integrate the area under the curve according to the shape of the precision-recall curve. The value range of AP is between 0 and 1, and the closer to 1, the better the performance of the model.

MAP indicator

Let's take the target detection task as an example. Suppose our model wants to detect objects in a picture, which contains three kinds of fruits: apples, oranges, and bananas. After we used the model to detect this picture, we got the result.

First, we sort all the detection results from high to low confidence, and get the following results:
Category Confidence Bounding Box
Apple 0.8 (10, 10, 100, 100)
Orange 0.7 (20, 20, 120, 120)
Apple 0.6 (50, 50, 150, 150)
Banana 0.5 (30, 30, 130, 130)
Orange 0.4 (60, 60, 160, 160)
Then, we follow the steps above to calculate the AP value for each category. Taking the apple category as an example, we can get the following precision-recall table:
confidence threshold precision-recall rate
0.4 0.5 1.0
0.6 0.67 0.67
0.8 1.0 0.67
According to this table, we can draw the precision-recall curve of the apple category and calculate its AP value. In the same way, we can calculate the AP values ​​for the orange and banana categories. Finally, the final mAP value can be obtained by weighting the AP values ​​of all categories.
When drawing the AP curve, we need to draw the precision-recall curves of all categories in the same graph for comparison. At the same time, we can also draw AP curves under different IoU thresholds to evaluate the performance of the model under different IoU thresholds.

Computations in the COCO dataset

In the COCO dataset, the Map metric is widely used in the evaluation of object detection and segmentation tasks. The COCO data set is a large data set containing more than 330,000 labeled boxes in 80 different categories, so the evaluation of target detection and segmentation tasks on this data set is very representative.
The calculation method of the Map index in the COCO dataset is slightly different from the general Map index calculation method. The specific formula is as follows:
AP = 1 ∣ R ∣ ∑ i = 1 ∣ R ∣ P i Δ ri AP = \frac{1}{|R|} \sum_{i=1}^{|R|} P_i \Delta r_iAP=R1i=1RPiΔri
M a p = 1 N ∑ i = 1 N A P i Map = \frac{1}{N} \sum_{i=1}^{N} AP_i Map=N1i=1NAPi
Among them, RRR is the set of all real annotation boxes,NNN is the number of all test pictures. AP i AP_iAPiIndicates the iiThe average precision of i test pictures is obtained by calculating the average of the average precision of all object categories in the picture.Map MapMap indicator . _ P i P_iPiIndicates the iiThe accuracy of all object categories in i test pictures,Δ ri \Delta r_iΔriIndicates the iiThe difference in recall for each class among the i test images.

For each test picture, it is necessary to calculate the average precision (AP) of each object category in the picture, and then use the average of the average precision of all object categories as the average precision of the picture. Finally, the average accuracy of all test pictures is averaged, which is the Map indicator of the COCO dataset.
When calculating the average precision for each object category, the difference in recall needs to be considered, that is, objects of different sizes contribute differently to the recall. Therefore, the Map indicator in the COCO dataset will assign different weights to objects of different sizes to more accurately evaluate the model's ability to detect and segment objects of different sizes.
In practical applications, you can use some open source computing tools, such as COCO API and mAP calculator, to easily calculate the Map indicators in the COCO dataset.

F1 value (F1-Score)

The F1 value is the harmonic mean of precision and recall, which is used to measure the overall performance of the model. The higher the F1 value, the better the overall performance of the model.
F 1 = 2 P resolution R ecall P resolution + R ecall F1=2\frac{PrecisionRecall}{Precision+Recall}Q1 _=2Precision+RecallPrecisionRecall

The harmonic mean (Harmonic Mean) is a kind of mean in statistics, and its calculation method is the reciprocal of the mean of the reciprocals of n numbers. Specifically, n positive numbers x 1 , x 2 , . . . , xn x_1, x_2, ..., x_nx1,x2,...,xnThe harmonic mean of can be expressed as:
H = n 1 x 1 + 1 x 2 + . . . + 1 xn = n ∑ i = 1 n 1 xi H = \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n}} = \frac{n}{\sum_{i=1} ^{n} \frac{1}{x_i}}H=x11+x21+...+xn1n=i=1nxi1n
Unlike common averages such as the arithmetic mean, geometric mean, and median, the harmonic mean pays more attention to small values, because the reciprocal of any number less than 1 is greater than 1, so when there are individual numbers in a set of data that are small, the harmonic mean will be affected by these decimals and become smaller. Therefore, the harmonic mean is often used in some occasions that need to consider extreme values, such as calculation speed, average rate, etc.

When to use PR curve and when to use ROC curve

PR curve and ROC curve are commonly used classification model performance evaluation indicators, but they are applicable to different scenarios.
The PR curve is mainly used when the proportion of positive and negative samples is unbalanced, for example, when the proportion of positive samples is very low. The horizontal axis of the PR curve is the recall rate, and the vertical axis is the precision rate, which reflects the classification performance of the model under different recall rates. The advantage of the PR curve is that it can better reflect the classification ability of positive samples, especially when the proportion of positive samples is low, it can reflect the performance difference of the model.

The ROC curve is suitable for situations where the proportion of positive and negative samples is relatively balanced. The horizontal axis of the ROC curve is the false positive rate, and the vertical axis is the true positive rate, which reflects the classification performance of the model under different thresholds. In the ROC curve, the point (0,0) means that all samples are predicted as negative samples, and the point (1,1) means that all samples are predicted as positive samples. The closer the curve is to the upper left corner, the better the performance of the model.

In general, when the ratio of positive and negative samples is unbalanced, we can use the PR curve for model performance evaluation; when the ratio of positive and negative samples is relatively balanced, we can use the ROC curve for model performance evaluation.

confusion matrix

The confusion matrix is ​​a matrix that visualizes the classification results, and it can be used to calculate indicators such as accuracy, precision, and recall. The rows of the confusion matrix represent the actual categories, the columns represent the predicted categories, and each element in the matrix represents the number of intersections of the actual and predicted categories.
insert image description here

regression problem

Mean squared error (MSE)
Mean squared error is one of the most commonly used evaluation indicators in regression problems, and it is used to measure the difference between the predicted value of the model and the true value. The smaller the mean square error, the better the predictive performance of the model.
MSE = 1 n ∑ i = 1 n ( yi − yi ^ ) 2 MSE=\frac{1}{n}\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2MSE=n1i=1n(yiyi^)2
where,nnn represents the number of samples,yi y_iyiIndicates the iiThe true value of i samples,yi ^ \hat{y_i}yi^Indicates the predicted value of the model.

Root mean square error (RMSE)
Root mean square error is the square root of the mean square error, which is also used to measure the difference between the model's predicted value and the true value. The smaller the root mean square error, the better the predictive performance of the model.
RMSE = 1 n ∑ i = 1 n ( yi − yi ^ ) 2 RMSE=\sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2}RMSE=n1i=1n(yiyi^)2

Mean Absolute Error (MAE)
Mean Absolute Error is another commonly used evaluation metric in regression problems, and it is also used to measure the difference between the model's predicted value and the true value. The smaller the mean absolute error, the better the predictive performance of the model.
MAE = 1 n ∑ i = 1 n ∣ yi − yi ^ ∣ MAE=\frac{1}{n}\sum\limits_{i=1}^{n}\left|y_i-\hat{y_i}\right|MAE=n1i=1nyiyi^

Guess you like

Origin blog.csdn.net/hh1357102/article/details/131376169