Performance evaluation of algorithms in machine learning

Evaluation indicators in machine learning and recommendation systems—(1) Accuracy; (2) Error rate (3) Precision and Precision; (4) Recall; (5) ) Sensitivity/sensitivity (sensitive); (6) specificity/specificity; (7) introduction to F-Measure/comprehensive evaluation index (F-Measure); (8) calculation speed; (9) ) Robustness; (10) Scalability; (11) Interpretability; (12) ROC curve (Receiver Operating Characteristic); (13) PR curve (Precision-Recall curve); (14) AUC (Area UnderCharacteristic) .

When introducing the above indicators, first introduce a few concepts: (must understand the "confusion matrix") (True and false are True and False indicating whether the prediction result is correct , and T indicates whether the prediction is correct in the result of P or N. Situation) (Positive and negative are positive and Negative indicate the result of prediction , that is, whether the sample is predicted to be P or N)

True Positive (TP: true positive ): it is determined that the sample positive (positive), the number of correct judgment of (the real situation as a positive example, predicted the situation as a positive example) --- coming true class predicted positive class number. (That is, in the prediction result sample that judges the test sample as Positive, how many True is the number of correct predictions) (Count the number of correct predictions in the Positive prediction result)

False Positive (FP: false positive ): In the sample judged as positive (positive), the number of judgment errors (the real situation is a negative example, and the prediction situation is a positive example)---that is, the false class is predicted as the number of positive classes (false positive) ). (That is, in the result sample that the test sample is judged to be Positive, how many False is the number of prediction errors)) (Count the number of False predictions in the Positive prediction result)

True Negative (TN: true negative): among the samples judged to be negative, the number of correct judgments ((the real situation is a negative example, the predicted situation is a negative example)) ---that is, the true class is predicted as a negative class number. (That is, in the result sample that the test sample is judged to be Negative, how many True is the number of correct predictions) ( Count the number of correct True predictions in the Negative prediction result)

False Negative (FN: false negative): In the sample judged as Negative, the number of judgment errors (the real situation is a positive example, the predicted situation is a negative example) ---that is, the false class is predicted as a negative class number. (That is, in the result sample that the test sample is judged as Negative, how many False is the number of prediction errors, that is, it is true that it is also predicted as Negative, that is, the prediction is wrong, and it is necessary to count the number of real predictions as Negative) ( Count the number of false predictions in the Negative prediction results)

  Positive Negative
True TP TN
False FP FN

Actual category and forecast category:

                 Forecast category

 

Yes/P/positive example No/N/Negative example total

real

Occasion

class

do not

Yes/T/prediction is correct TP FN P (It was actually Yes)
No/F/prediction error FP TN N (the original is actually No)
total P^{'}(Divided into Yes) (All actual Y/N is predicted to be the number of P) N^{'}(All the numbers that are actually Y/N are predicted to be N) P+N

(1) Accuracy calculation formula A: (divide the paired samples by the number of all samples)

                                                               ACC=A=\frac{TP+TN}{TP+TN+FP+FN}

(Accuracy is the correct prediction (TP, TN) divided by all results) (T means that the prediction is correct in the prediction results of P and N)

The higher the correct rate, the better the classifier.

(2) Error rate calculation formula: (The error rate is the opposite of the accuracy rate, which describes the proportion of misclassification by the classifier)

                                                               erroe \ rate=\frac{FP+FN}{TP+TN+FP+FN}

Right and wrong are mutually exclusive events, so accuracy=1-error rate

(3) Calculation formula of precision rate and precision (Precision):

                                                              P=\frac{TP}{TP+FP}

Indicates the proportion of examples that are classified as positive.

(Accuracy rate and recall rate influence each other. Ideally, both are high, but under normal circumstances, the accuracy rate is high, the recall rate is low; the recall rate is high, the accuracy rate is low; if both are low, where should it be calculated? Problematic)

(4) Recall calculation formula R:

                                                                reacll=R=\frac{TP}{TP+FN}=\frac{TP}{P}=sensitive

Recall rate is a measure of coverage, which measures how many positive cases are classified as positive cases. (The number of positive examples in the number of positive examples is the number of positive examples that are correctly divided into positive examples, and the positive examples that are divided into positive examples contain correct points and wrong points (wrong points are positive examples are divided into N Middle is the wrong part of N is FN))

It can be seen that the recall rate and sensitivity are the same.

(5) Sensitivity (sensitive) calculation formula: (represents the proportion of TP in which P is paired in all positive examples, and measures the classifier’s ability to recognize negative examples)

                                                               sensitive=\frac{TP}{P}

(6) The specificity calculation formula: (represents the proportion of TN in which N is matched in all negative cases, and measures the classifier’s ability to recognize negative cases)

                                                              specificity=\frac{TN}{N}

(7) Introduction to F-Measure/F-Measure (also known as F-Fcore):

When P and R have time to conflict, when both P precision and R recall rate are required to be high, the F value is used to comprehensively measure P and R:

F-Measure is the weighted harmonic average of Precision and Recall:

                                                             F=\frac{\left ( a^{2}+1 \right )P*R}{a^{2}(P+R)}

When the parameter a=1is the most common F1 (F1 score), that is

                                                                F1=\frac{2*P*R}{P+R}

It can be seen that F1 combines the results of P and R. When F1 is higher, the experimental method is more effective. Generally, when multiple model hypotheses are compared, the higher the F1 score, the better it is.

(8) Calculation speed: the time required for classifier training and prediction.

(9) Robustness: the ability to deal with missing values ​​and outliers.

(10) Scalability: the ability to handle large data sets.

(11) Interpretability: The comprehensibility of the prediction criteria of the classifier, like the rules generated by the decision tree, is easy to understand, while a bunch of parameters of the neural network are not easy to understand, so we have to treat it as a black box.

(12) ROC curve (Receiver Operating Characteristic):

The ROC curve is a curve with the axis of false positive rate (FP_rate (the proportion of the real value is negative and the prediction is positive) and the true rate (TP_rate (the proportion of the real value is positive and the prediction is positive)) The area under the ROC curve is called AUC. Generally, the larger the AUC value, the better the model. As shown in the figure: (Generally, the horizontal axis of the ROC curve is FP_rate and the vertical axis is TP_rate)

 

(13) PR curve (Precision-Recall curve)

Suppose N_c>>P_c (that is, the number of Negatives is much greater than the number of Positives), if the FP is large, that is, many Write picture description heresamples of N are predicted to be P, because , therefore, the value of FP_rate is still small (if ROC curve is used, it will It is judged that its performance is very good, but in fact its performance is not good), but if you use PR, because Precision considers the values ​​of TP and FP comprehensively, under extremely unbalanced data (the Positive sample is less), the PR curve May be more practical than ROC curve.

(14)AUC(Area UnderCharacteristic)

The area under the ROC curve is a measure of the ability of the classifier to classify, and is used as a summary of the ROC curve. The higher the AUC, the better the model's performance in distinguishing between positive and negative classes.

1) When AUC=1, the classifier can correctly distinguish all positive and negative points. However, if AUC is 0, then the classifier will predict all negatives as positives and all positives as negatives.

2) When 0.5<AUC<1, the classifier is likely to distinguish between positive and negative values. This is because the classifier can detect more true and true negative examples than false negatives and false positives.

3) When AUC=0.5, the classifier cannot distinguish between positive and negative points. This means that the classifier either predicts a random class of all data points or a constant class.

Therefore, the higher the AUC value of the classifier, the better its ability to distinguish between positive and negative classes.

Reference link: http://www.jisuapp.cn/news/8752.html

 

Guess you like

Origin blog.csdn.net/m0_37957160/article/details/108375698