Detailed classification indicators Precision, Recall, F1-Score

In the process of using machine learning algorithms, we need to evaluate the established model to identify the quality of the model. The following mainly introduces several common evaluation indicators. The following indicators are all evaluation indicators for classification problems.

In the binary classification task, it is assumed that there are only two categories: positive (1) and negative (0). True (1) and False (0) respectively indicate whether the prediction result is correct or wrong; Positive (1) and Negative (0) indicate Predict as positive or negative class.

After feeding the data set marked with positive and negative examples to the model, the following four situations can generally be obtained, which can be represented by a confusion matrix:

  • True Positive (TP) : The model determines positive instances as positive classes. (The prediction is correct, the predicted category is positive)
  • True Negative (TN) : The model determines negative instances as negative classes. (The prediction is correct, the predicted category is negative)
  • False Negative (FN) : The model determines positive instances as negative classes. (prediction error, predicted class is negative class)
  • False Positive (FP) : The model determines the negative instance as a positive class. (prediction error, predicted category is positive)

Here: True/False represents whether the judgment result is correct, Positive/Negative represents the predicted category result.

1. Precision

Precision: For the judgment result, the proportion of samples predicted to be positive (TP+FP) that are actually positive instances (TP).

Precision = Positive samples that were correctly predicted/total number of samples that were predicted as Positive

Precision (accuracy) is also called precision. The higher the Precision, the better. The higher the Precision, the more reliable the model’s judgment of “positive prediction” is.

2. Recall (recall rate)

Recall: For a sample, the ratio of correctly judged positive instances (TP) to the total positive instances (TP+FN).

Recall = correctly predicted Positive samples/total number of actual Positive samples

Recall is also called recall rate. The higher the Recall, the better. The higher it means that the model will misjudge "actually positive" samples less and the probability of missed judgments will be lower.

Note: Although precision and recall are not necessarily related, in large-scale data collections, these two indicators are mutually restrictive. Generally speaking, when the recall rate is high, the precision is low; when the precision is high, the recall rate is low.

3. F1-Score

F1-Score: It is the weighted harmonic average of precision (Precision) and recall (Recall)

The closer the F1-Score value is to 1, the better the overall performance of the model in terms of recall rate and precision rate. In Precision or Recall, once an item is very stretched (close to 0), the F value will be very low.

4. Accuracy

Accuracy: the proportion of the number of samples correctly classified by the model (positive instances are judged as positive examples, negative instances are judged as negative examples) in the total samples.

Accuracy = number of correctly predicted samples/total number of samples

Accuracy vs. Precision

Model A assumes that the existing model A predicts the "incidence rate of malignant tumors" among Chinese people, and the accuracy of the prediction is 99.7%. How does this model work? it's usable or not?
Answer: It's hard to say. Because only by accuracy, we don’t know how many samples there are of false positives (FP) and false negatives (FN), and what their proportion is. In fact, in 2017, the national incidence rate of malignant tumors was 0.3%. We only need to guess that all Chinese people will not get sick, and we can achieve an accuracy of 99.7%. But this prediction does not bring any incremental information to us.

  • Accuracy calculates the proportion of correctly classified samples from the perspective of all data, and is an evaluation of the overall accuracy of the classifier. When there are problems such as category imbalance in the data, using accuracy cannot produce informative judgment results.

  • Precision corresponds to a certain category in classification (the numerator is the number of correct predictions for this category, and the denominator is the number of all data predicted for this category). Precision is the evaluation of the accuracy of the classifier's prediction of a certain category.

5. PR Curve

PR Curve (full accuracy curve): It is a curve composed of P value and R value. By drawing the corresponding parameters of the model (R value, P value) into a curve, a fully accurate curve is obtained.

Insert image description here

6. TPR、FPR

With so many evaluation indicators mentioned above, we found a problem: the current evaluation system does not use all available information. Among the above indicators, the P value and R value are considered, but the impact of TN (True Negative) samples is not considered.

So, is there any metric that can take into account the information of the entire chaos matrix?

Here, two indicators, True Positive Rate and False Positive Rate, can be introduced to solve the above problem of missing information.

6.1 TPR (True Rate)

TPR (True Positive Rate): counts how many predictions are correct among the "actually positive samples".

"True rate" is "precision rate", but there are two different names for the same thing. The higher the true rate, the better, and the higher it means the model will misjudge "positive samples" less.

6.2 FPR (false positive rate)

FPR (False Positive Rate): counts how many predictions among "actually negative samples" are wrong.

The lower the false positive rate, the better, and the lower it means the model will misjudge "negative samples" less.

TPR and FPR have an advantage: they are not affected by the balance of the sample.
The conditional probabilities of TPR and FPR are both based on real samples, and TPR is only based on positive samples, while FPR is only based on negative samples. This makes TPR and FPR not affected by the degree of sample balance.
The ROC curve and AUC area are concepts derived from TPR and FPR.

7. ROC curve

The ROC curve (Receiver Operating Characteristic Curve) is a coordinate chart with the false positive rate (FPR) as the horizontal axis and the true positive rate (TPR) as the vertical axis. It depends on the subject's use of different judgment criteria under specific stimulation conditions. The curves drawn for the different results obtained. The ROC curve can remain unchanged when the distribution of positive and negative samples in the test set changes.

On the entire ROC curve, the point close to the upper left corner (0, 1) has better corresponding model parameters.

Each point pair on the ROC curve is obtained under a certain threshold (FPR, TPR). Set a threshold. Instances larger than this threshold are classified as positive instances, and instances smaller than this value are classified as negative instances. Run the model to get the results, calculate the FPR and TPR values, replace the thresholds, and loop operations to get different results. The (FPR, TPR) pair under the threshold can be drawn into a ROC curve.

8. AUC curve

The AUC curve (Area Under Curve) is the area value under the ROC curve, ranging from 0.5 to 1.0. The reason why the AUC value is used as the evaluation criterion is that in many cases the quality of the model cannot be judged from the ROC curve, and the AUC value can quantify the performance effect of the model. The closer the AUC value is to 1, the better the model performance and the higher the accuracy of model prediction. If the performance of multiple models is compared, the model with a larger AUC value generally performs better than the model with a smaller AUC value.

When AUC is equal to 0.5, the entire model is equivalent to a random classifier. The larger the area of ​​AUC, the better the overall performance of the model.

References

Guess you like

Origin blog.csdn.net/u012856866/article/details/131824430