Classification indicators: accuracy rate, precision rate, recall rate, F1 score and ROC, AUC, macro average, weighted average

This article will introduce:

  • Confusion Matrix
  • Accuracy
  • Recall rate (recall)
  • Precision
  • F1score
  • ROC and AUC
  • Macro averaging (macro avg)
  • micro average (micro avg)
  • Weighted average (weighted avg)

1. Confusion Matrix

In the n-category model, the accuracy is expressed in the form of a matrix with n rows and n columns. The columns represent n categories, and the n data in each row represent the number of predictions in each category. The perfect prediction should be one column ordinal = one slash for the index number where there is data in the row.
insert image description here

  • TP : True Positive: Make a Positive judgment, and the judgment is correct
  • FP : False Positive: Make a Positive judgment, and the judgment is wrong
  • TN : True Negative: Correct Negative determination
  • FN : False Negative: Wrong Negative judgment

Second, the accuracy rate (accuracy)

Calculation formula:accuracy = (TP + TN) / (TP + FP + TN + FN)

In the unbalanced classification problem (disease; terrorist) with fewer positive examples and more negative examples, there is a problem that if all the data are predicted as negative examples, the accuracy rate will still be high, so the recall rate is introduced here.

Three, the recall rate (recall)

Recall can be understood as the ability of a model to find all data points of interest in a dataset.

Calculation formula: recall = TP / (TP + FN)How many of all correct entries were retrieved

If we predict all individuals as positive samples, then the recall rate of the model is 1.0, but this is obviously wrong, so the precision rate is introduced here.

Fourth, precision

Precision can be understood as the accuracy of all positive examples found by the model.

Calculation formula: precision= TP / (TP + FP)How many of all predicted correct items are actually correct

As precision increases, recall decreases and vice versa.

Five, F1score

F1 score is the harmonic mean of precision and recall :
F1 Score = 2*P*R/(P+R), where P and R are precision and recall respectively

The reason we use the harmonic mean instead of a simple arithmetic mean is that the harmonic mean can penalize extreme cases. A classifier with a precision of 1.0 and a recall of 0 would have an arithmetic mean of 0.5, but an F1 score of 0. F1 score gives equal weight to precision and recall, it is a special case of the general Fβ metric, if we want to create a model with the best precision-recall balance, then try to maximize the F1 score.

Six, ROC and AUC

The idea is fairly simple: a ROC curve shows how the relationship between recall and precision changes when changing the threshold at which examples are identified as positive in the model.

The calculation formula is as follows:
insert image description here
TPR = TP / (TP + FN)TPR is the recall rate
FPR = FP / (FP + TN)FPR is the probability that a negative example is reported as a positive example

The figure below is a typical ROC curve:
insert image description here

  • The black diagonal line represents a random classifier, and the red and blue curves represent two different classification models. For a given model, there can only be one curve. But we can move along the curve by adjusting the threshold at which positive examples are classified. Generally, as you lower the threshold, you move right and up the curve.

  • With a threshold of 1.0, we would be at the bottom left of the graph, since no data points were identified as positives, resulting in no true positives and no false positives (TPR = FPR = 0). When lowering the threshold, we identify more data points as positives, resulting in more true positives, but also more false positives (TPR and FPR increase). Finally, at a threshold of 0.0, we identify all data points as positive and found to be in the upper right corner of the ROC curve ( TPR = FPR = 1.0 ).

  • Finally, we can quantify the model's ROC curve by calculating the area under the curve (AUC), a measure between 0 and 1, with higher numbers indicating better classification performance. In the graph above, the AUC of the blue curve will be greater than the AUC of the red curve, which means that the blue model is better at achieving the trade-off between precision and recall. The random classifier (black line) achieves an AUC of 0.5.

Seven, macro average (macro avg), micro average (micro avg), weighted average (weighted avg)

When we use the sklearn.metric.classification_report tool to evaluate the test results of the model, the following results will be output:
insert image description here

1. Macro average macro avg:

The precision, recall, and F1 sums are averaged for each class.

Accurate macro avg=(P_no+P_yes)/2=(0.24+0.73)/2 = 0.48

2. Micro average micro avg:

Calculate the overall precision, recall and F1 without distinguishing the sample category

精准 macro avg=(P_nosupport_no+P_yessupport_yes)/(support_no+support_yes)=(0.247535+0.73)/(7535+22462)=0.45

3. Weighted average weighted avg:

It is an improvement on the macro average, considering the proportion of the number of samples in each category in the total samples

精准 weighted avg =P_no*(support_no/support_all)+ P_yes*(support_yes/support_all =0.24*(7525/29997)+0.73*(22462/29997)=0.61

Guess you like

Origin blog.csdn.net/TFATS/article/details/118334138