(1) Confusion matrix
In the classification task, there are four different combinations between the predicted result (Predicted Condition) and the correct label (True Condition) to form a confusion matrix, which is suitable for both binary classification tasks and multi-classification tasks.
(2) Precision and Recall
Accuracy = (TP+TN)/ (TP+FP+FN+TN)
Accuracy: The proportion of the predicted result that is the true positive example in the sample of positive examples (understand) = TP / (TP + FP)
Recall rate: the proportion of samples whose true positives are predicted to be positive (completely searched, the ability to distinguish between positive samples) = TP / (TP + FN)
(3)F1-score
F1-score reflects the robustness of the model. The value is divided into 0-1. The closer the value is to 1, the better. It is suitable for multiple classifications!
(4) Classification assessment report api
- sklearn.metrics.classification_report(y_true, y_pred, labels=[], target_names=None )
-
- y_true: true target value
- y_pred: The estimator predicts the target value
- labels: the number corresponding to the specified category
- target_names: target category names
- return: precision rate and recall rate for each category
(5) ROC curve and AUC index
TPR and FPR
- TPR = TP / (TP + FN)
- Proportion of all samples whose true category is 1 and predicted category is 1
- FPR = FP / (FP + TN)
- Proportion of all samples with a true category of 0 and a predicted category of 1
ROC(receiver operating characteristic curve)曲线
- The horizontal axis of the ROC curve is FPRate, and the vertical axis is TPRate. When the two are equal, the meaning is: Regardless of whether the true category is 1 or 0, the probability that the classifier predicts 1 is equal, at this time AUC 0.5
AUC indicator
- The probability meaning of AUC is to randomly take a pair of positive and negative samples, and the probability that the score of the positive sample is greater than the score of the negative sample
- The range of AUC is between [0, 1], and the closer to 1, the better, the closer to 0.5, it is a random guess
- AUC=1, perfect classifier. When using this prediction model, a perfect prediction can be obtained no matter what threshold is set. In most prediction situations, there is no perfect classifier.
- 0.5<AUC<1, better than random guessing. This classifier (model) can have predictive value if the threshold is properly set.
- AUC can only be used to evaluate two categories
- AUC is very suitable for evaluating classifier performance in sample imbalance
AUC calculation API
from sklearn.metrics import roc_auc_score
- sklearn.metrics.roc_auc_score(y_true, y_score)
- Calculate the area of the ROC curve, that is, the AUC value
- y_true: the true category of each sample, must be 0 (counter example), 1 (positive example) mark
- y_score: prediction score, which can be the estimated probability of the positive class, the confidence value or the return value of the classifier method