Evaluation method of classification model

(1) Confusion matrix

In the classification task, there are four different combinations between the predicted result (Predicted Condition) and the correct label (True Condition) to form a confusion matrix, which is suitable for both binary classification tasks and multi-classification tasks.

(2) Precision and Recall

Accuracy = (TP+TN)/ (TP+FP+FN+TN)

Accuracy: The proportion of the predicted result that is the true positive example in the sample of positive examples (understand) = TP / (TP + FP)

Recall rate: the proportion of samples whose true positives are predicted to be positive (completely searched, the ability to distinguish between positive samples) = TP / (TP + FN)

(3)F1-score

F1-score reflects the robustness of the model. The value is divided into 0-1. The closer the value is to 1, the better. It is suitable for multiple classifications!

(4) Classification assessment report api

 

  • sklearn.metrics.classification_report(y_true, y_pred, labels=[], target_names=None )
    • y_true: true target value
    • y_pred: The estimator predicts the target value
    • labels: the number corresponding to the specified category
    • target_names: target category names
    • return: precision rate and recall rate for each category

(5) ROC curve and AUC index

TPR and FPR

  • TPR = TP / (TP + FN)
    • Proportion of all samples whose true category is 1 and predicted category is 1
  • FPR = FP / (FP + TN)
    • Proportion of all samples with a true category of 0 and a predicted category of 1

ROC(receiver operating characteristic curve)曲线

  • The horizontal axis of the ROC curve is FPRate, and the vertical axis is TPRate. When the two are equal, the meaning is: Regardless of whether the true category is 1 or 0, the probability that the classifier predicts 1 is equal, at this time AUC 0.5

 AUC indicator

  • The probability meaning of AUC is to randomly take a pair of positive and negative samples, and the probability that the score of the positive sample is greater than the score of the negative sample
  • The range of AUC is between [0, 1], and the closer to 1, the better, the closer to 0.5, it is a random guess
  • AUC=1, perfect classifier. When using this prediction model, a perfect prediction can be obtained no matter what threshold is set. In most prediction situations, there is no perfect classifier.
  • 0.5<AUC<1, better than random guessing. This classifier (model) can have predictive value if the threshold is properly set.
  • AUC can only be used to evaluate two categories
  • AUC is very suitable for evaluating classifier performance in sample imbalance

AUC calculation API

from sklearn.metrics import roc_auc_score

  • sklearn.metrics.roc_auc_score(y_true, y_score)
    • Calculate the area of ​​the ROC curve, that is, the AUC value
    • y_true: the true category of each sample, must be 0 (counter example), 1 (positive example) mark
    • y_score: prediction score, which can be the estimated probability of the positive class, the confidence value or the return value of the classifier method

 

 

Guess you like

Origin blog.csdn.net/qq_39197555/article/details/115288029