Machine Learning Evaluation Metrics - f1, precision, recall, acc, MCC

1 Introduction to TP, TF, FP, FN

        TP, TF, FP, FN are the values ​​obtained for the prediction results of the binary classification task, and these four values ​​constitute the confusion matrix;

        The confusion matrix as shown below:

        The left side represents the real label, the human mark is 0; the fake mark is 1;

        The predicted class on the right side represents the predicted label;

        Therefore: TN means (True -- the prediction is correct, Negitive, the prediction is 0) the prediction label is 0 (human), and the prediction is correct;

                     FN means (False -- prediction error, Negitive, prediction is 0) the prediction label is 0 (human), and the prediction is wrong;

                     FP means (False -- prediction error, Positive, prediction is 1) the prediction label is 1 (fake), and the prediction is wrong;

                     TP means (True -- the prediction is correct, Positive, the prediction is 1) the prediction label is 1 (fake), and the prediction is correct;

2 Introduction to f1, precision, recall, acc, MCC

        f1, precision, recall, acc, MCC are calculated from the four values ​​of the above confusion matrix;

        Calculation formula:

        acc = \frac{TP + TN}{TP+TN+FP+FN} 

                The real result of acc prediction, how much data is predicted correctly in the overall data;

        recall = \frac{TP}{TP+FN}

                recall The ratio of the number of predicted bots and the number of correct predictions to the total number of predicted bots;

        Precision = \frac{TP}{TP + FP}

                Precision The ratio of the number of predicted bots and the number of correct predictions to the actual number of bots;

        f1 = \frac{2*Precision*recall}{Precision+recall}

        MCC =

        f1 and Mcc are comprehensive evaluation indicators;

Analysis of the advantages and disadvantages of the above five indicators:

        Accuracy (acc) measures how many samples are correctly identified in two classes, but it does not indicate whether one class can be better identified by another class;

        High precision (Precision) indicates that many samples identified as 1(bot) are correctly identified, but it does not provide any information about 1(bot) samples that have not yet been identified;

        This information is provided by the recall metric (recall), which indicates how many samples were correctly identified in the entire set of 1(bot) samples: low recall means that many 1(bot) samples were not identified;

        F1 and MCC attempt to convey the quality of the forecast in a single value, combined with other metrics.

        MCC is considered an unbiased version of F1 because it uses all four elements of the confusion matrix. An MCC value close to 1 indicates that the prediction is very accurate; a value close to 0 means that the prediction is no better than random guessing, and a value close to -1 means that the prediction is strongly inconsistent with the true class.

        

Guess you like

Origin blog.csdn.net/qq_40671063/article/details/126954237