Performance evaluation of target detection and classification

  Network model for deep learning, and I hope its speed, small memory, high precision. It is necessary to evaluate the performance of quantitative indicators, the indicators are used: mAP (mean mean accuracy, precision index), the FPS (or number of pictures processed per second each image processing time required, at the same speed indicator hardware conditions), the size of the model parameters (memory size indicators).

1.mAP (mean Avearage Precision)

    mAP AP refers to an average value of each category, and refers to the area AP (precision and Recall curve) curves PR, and therefore have to first understand precision (accuracy rate) and Recall (recall), and associated Accuracy (Accuracy ), F-measurement (F value), ROC curves like.

  recall and precision are commonly used evaluation binary classification problem, it is common to focus on the positive class category, other classes are negative category, the results of classifiers are four cases on the test data:

 

 

 

   Recall and Precision Calculation example :

    Suppose we train on a model data set identification cat, comprising a test set of 100 samples, of which the cat 60, dog for further 40. Test results are displayed as a cat, a total of 52 pictures, which really is a total of 50 cats, that is, there are 10 cats has not been detected model, and there are two to false detection in the detection result. Because the cat more lovely, we are more concerned Inspection of the cat, so here the cat considered positive categories: so TP = 50, TN = 38, FN = 10, FP = 2, P = 50/52, R = 50/60, acc = (50 + 38) / (50 + 38 + 10 + 2)

  Why the introduction of Precision and Recall:

    recall and precision is a measure of the performance of the model in two different dimensions: the image classification task, although many cases are investigated accuracy, such as the evaluation criteria of ImageNet. But specific to a single category, if the recall is relatively high, but less precision, such as most of the vehicles have been identified, but put a lot of trucks misidentified for cars, this time corresponds to a reason. If the recall is low, precision is high, such as aircraft detected very accurate results, but there are a lot of aircraft has not been identified, there is a reason this time.

    recall measure is "recall" , is not all positive samples were detected. For example, in tumor prediction scenario, the model requires higher Recall, each tumor can not let go.

    precision measurement is "precision" is not actually all positive samples in all positive samples detected in. For example, in the spam judgment scene, requiring higher precision, to ensure that the recycle bin are spam.

  F-score/F-measurement:

    The above analysis found that the precision and recall rate reflects two aspects classifier performance, rely on a single index can not comprehensively evaluate the performance of a classifier. In general, the higher the accuracy rate, the lower the recall rate; conversely, the higher the recall rate, the lower accuracy rate. In order to balance the effects of precision and recall rates, a more comprehensive evaluation of a classifier, the introduction of the F-score this comprehensive index.

    F-score is the harmonic mean of precision and recall rates , calculated as follows:

    Wherein [official]( ) [official]values reflect the relative importance of precision and recall the specific performance evaluation, usually, a value of 1. Described as follows:

      (1) When [official]the time is commonly used [official]value indicating precision and recall as important, is calculated as follows:

      (2) When [official]the time, [official]indicates that the recall weight ratio weight ratio higher precision;

      (3) When [official]time, [official]it indicates the precise weight of heavier than the recall rate.

  Accuracy : is accurate overall assessment of the model predictions, usually used to accurately rate is calculated as follows:

  AP / PR curve:

    I.e. recall as the abscissa, ordinate Precision is plotted precison recall different values, a curve can be obtained and recall of Precisoin, the AP is the area under the curve PR, is defined:             [official]

    For example child is better understood,

    Classification:

        Suppose there are 100 pictures, to be divided into cats, dogs, chickens three, 100 pictures corresponding to 100 true value, the model classification we will get 100 corresponding to the predicted value. Here we can take only the top 10 predicted value out of values ​​calculated 10 cat several predicted, prediction of a few, so that it can calculate and cats precison Recall; Then we can take the top 20 predicted value can be calculated similarly a group of cats and precison Recall; 100 this has been increased to get a prediction value can be obtained cat 10 group (recall, precision) values ​​plotted. Here we must note that: With the increase of selecting the predicted value, recall must be increased or unchanged (more predictive value selected, the more cats predicted that the recall must be increased or unchanged), if after selecting the predicted value increases, constant recall, a recall precison will correspond two values, to select the larger of the general precision value. If we add a predictive value of a time, will give approximately 100 pairs (recall, precisoin) value, then the curve can be drawn cat PR, calculate the area below it, it is a cat values ​​corresponding to the AP (Average Precision). If we then dogs and chickens in the same manner curve drawn PR can be obtained cats, dogs, chickens three AP value, averaged to obtain the final mAP entire model (mean Average Precsion). In the following figure A, B, C PR three curves:

    Target Detection:

      There is a target detection IOU (and cross-over, Intersection over Union, IoU), is determined by comparing the detected bbox IOU and the real bbox whether TP (True Positive

), Such as setting a threshold value of 0.7 IoU, the IoU is determined to be greater than 0.7 TP, otherwise FP. So when we set different thresholds IoU, mAP will get different values, then these values ​​are averaged mAP get mmAP, generally do not do otherwise stated mAP on mmAP refers to the usual sense.

      Thus target detection mAP calculated as follows: Given a set of IOU thresholds, each IOU below the threshold, seeking AP all categories, and on average, as the detection performance at the IOU threshold, called mAP (such mAP @ represents 0.5 to 0.5 IOU threshold when mAP); Finally, in all mAP IOU average threshold value, the final performance is obtained evaluation: mmAP.

 

   ROC curve with AUC:

    In addition to drawing PR curve, the AP, sometimes draw the ROC curve, AUC. ( Refer to article )

    ROC (receiveroperating characteristic): receiver operating characteristic, refers to the relationship between the TPR and the FPR, TPR as ordinate, the abscissa is the FPR, calculated as follows:

    AUC (area under curve): represents the area under the ROC curve.

 

 

 Reference: https: //zhuanlan.zhihu.com/p/43068926

    https://zhuanlan.zhihu.com/p/55575423

    https://zhuanlan.zhihu.com/p/70306015

    https://zhuanlan.zhihu.com/p/30953081

 

 2. FLOPs (floating point operand)

     FLOPS : (Floating Point the Operations) S lowercase, refers to floating-point operands, it is understood the amount of calculation. It can be used to measure the complexity of the algorithm / model. ( Models ) used in the paper GFLOPs (1 GFLOPs = 10 ^ 9 FLOPs)

     FLOPS : (Floating Point Operations per SECOND), capital S, refers to the number of floating-point operations per second, can be understood as the operation speed, it is a measure of hardware, a performance indicator.

    FLOPS general calculation to measure the complexity of the model, FLOPS is smaller, the amount of computation required to represent smaller, faster when running the model. For the convolution operation and fully connected, the formula is as follows:

    Further, the MAC (Memory Access cost, the cost of memory access ) is also used to measure the running speed of the model, the general MAC = 2 * FLOPs (one addition and one multiplication algorithm):

    There is a pytorch based torchstat packet can be calculated FLOPs model number, size parameters and other indicators, the following sample code:

from torchstat import stat
import torchvision.models as models

model = model.alexnet()
stat(model, (3, 224, 224))

 

3. The size of the model parameters

 

   Model parameters commonly used to measure the size of the share memory size required for the model, and can be divided into small big Vgg, GoogleNet, Resnet and other parameters of the model, and squeezeNet, mobilerNet, shuffleNet parameters such as the amount of the lightweight model, often some models FLOPs amounts and parameters are as follows:

 

 

 

 

 

 

Reference: https: //www.zhihu.com/question/65305385

     https://zhuanlan.zhihu.com/p/67009992

Guess you like

Origin www.cnblogs.com/silence-cho/p/11619546.html