Evaluation indicators for machine learning classification problems

Model evaluation index for classification problems

In the regression problem, we may use the mean square error to measure the quality of the model. But in the classification problem, we need to judge whether the model is correctly classified, so there are the following evaluation criteria:

image-20210223211413968

True means that the prediction is correct, False means that the prediction is wrong . The wrong prediction of a negative case is called a type 1 error, and the prediction of a positive case error as a negative case is called a type 2 error.

  • Accuracy = number of samples that are correct / total number of samples

  • Precision = TP/Number of samples predicted to be positive

  • Recall = TP/Number of samples whose true value is positive

  • F value = Harmonic average of Precision and Recall

We can understand it as:

Accuracy is the proportion of correct predictions in all samples.

Precision starts from a certain prediction result, which predicts the correct proportion.

The recall rate (Recall) starts from a real sample of a certain category, among which the proportion of correct predictions.

The F value is the harmonic average of the precision rate and the recall rate.

to sum up:

image-20210223214058861

Starting from the two types of real values, four types of indicators, TPR, FNR, FPR, and TNR, are derived according to whether the prediction is correct, which are the true rate, the false negative rate, the false positive rate, and the true negative rate. At the same time, it can be seen that the true rate TPR is equivalent to the recall rate Recall.

Confusion matrix

image-20210223234115252

Reference: https://en.wikipedia.org/wiki/Precision_and_recall

Accuracy

 Accuracy  = T P + T N T P + F P + F N + T N \text { Accuracy }=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}+\mathrm{TN}}  Accuracy =TP+FP+FN+TNTP+TN

If 80 out of 100 data are correctly classified, then the accuracy rate is 0.8. Use the test data to calculate this value. The higher the precision, the better the model.

Precision rate and recall rate

Assuming that the dots in the figure are Positive data and the crosses are Negative data:

image-20210223220709716

Suppose there are 100 data, 95 of which are negative. Then, even if there is an extreme case where the model classifies all the data as negative, the Accuracy value is 0.95, which means that the accuracy of the model is 95%. But no matter how high the accuracy is, a model that classifies all data as negative cannot be said to be a good model.

Suppose the classification results of our model are as follows:

image-20210223221300108

精确率:
 Precision  = T P T P + F P \text { Precision }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}  Precision =TP+FPTP
This indicator only focuses on the proportion of data classified as Positive, which is actually the proportion of Positive data:

image-20210223221724497

As can be seen from the figure, the accuracy rate = 1/3

Recall rate:
 Recall = TPTP + FN \text {Recall }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} Recall =TP+FNTP
This indicator only focuses on the proportion of data that are actually positive that are correctly classified:

image-20210223222448805

As you can see from the figure, the recall rate = 1/5

F value

The above example shows that a model with a high accuracy rate can have very low accuracy and recall rates.
 Fmeasure = 2 1 Precision + 1 Recall \Huge \text {Fmeasure }=\frac{2}{\frac{1}{\text {Precision }}+\frac{1}{\text {Recall }}} Fmeasure = Precision 1+ Recall 12
This indicator takes into account the balance between precision and recall. As long as the precision and recall are low, the F value will be lowered.

The modified formula is:
 Freasure = 2 ⋅ Precision ⋅ Recall Precision + Recall \text {Freasure }=\frac{2 \cdot \text {Precision} \cdot \text {Recall }}{\text {Precision }+\text {Recall }} Freasure = Precision + Recall 2 Precision  Recall 
It is more accurate to say that the F value is the F1 value, because in addition to the F1 value, there are also weighted F value indicators:
 WeightedFmeasure = (1 + β 2) ⋅ Precision ⋅ Recall β 2 ⋅ Precision + Recall \text {WeightedFmeasure }= \frac{\left(1+\beta^{2}\right) \cdot \text {Precision} \cdot \text {Recall }}{\beta^{2} \cdot \text {Precision }+\text { Recall }} WeightedFmeasure =b2 Precision + Recall (1+b2) Precision  Recall 

ROC and AUC

ROC (Receiver Operating Characteristic) describes the relative change between the two quantities of FPR-TPR. The vertical axis is "True Positive Rate" (TPR), and the horizontal axis is "False Positive Rate" (False Positive Rate). FPR). ROC reflects the question of who grows faster and how much faster TPR increases with FPR. The faster the TPR increases, the more the curve goes upward, the larger the AUC, which reflects the better the classification performance of the model.

AUC (Area Under Curve) is defined as the area under the ROC curve. Obviously, the value of this area will not be greater than 1. And because the ROC curve is generally above the line y=x, the value range of AUC is between 0.5 and 1. As the specific value of the ROC curve, AUC can intuitively evaluate the quality of the classifier, the larger the value, the better.

  • AUC = 1, is a perfect classifier. When this prediction model is used, a perfect prediction can be obtained no matter what threshold is set. In most prediction situations, there is no perfect classifier.
  • 0.5 <AUC <1, which is better than random guessing. This classifier (model) can have predictive value if the threshold is properly set.
  • AUC = 0.5, follow the machine to guess the same (for example: lost copper plate), the model has no predictive value.
  • AUC <0.5 is worse than random guessing; but as long as it is always anti-predictable, it is better than random guessing.

image-20210223230538504

When the data sample is unbalanced, why is it better to use ROC as an evaluation indicator?

We know that TPR is only related to positive samples, and FPR is only related to negative samples. Assuming that 10% of the total samples are positive samples and 90% are negative samples, only the prediction results of 10% positive samples need to be considered when calculating TPR, and only 90% of the prediction results of negative samples are needed when calculating FPR. In this way, no matter what the sample distribution is , it will not affect the calculation of TPR and FPR, and it will not affect the trend of the ROC curve.

The sklearn function for each evaluation index

index description scikit-learn function
Precision Accuracy from sklearn.metrics import precision_score
Recall Recall rate from sklearn.metrics import recall_score
F1 F1 indicators from sklearn.metrics import f1_score
Confusion Matrix Confusion matrix from sklearn.metrics import confusion_matrix
ROC ROC curve from sklearn.metrics import roc
AUC Area under the ROC curve from sklearn.metrics import auc

Guess you like

Origin blog.csdn.net/as604049322/article/details/114005634