Calculate classification indicators such as precision and recall based on sklearn

In the previous article, we have introduced the definition and calculation formula of the classification indicators Precision, Recall, and F1-Score: Detailed explanation of the classification indicators Precision, Recall, and F1-Score

We can know that the calculation of precision, recall, and F1 are defined for the two-classifier. Their calculations are only related to y_true and y_pred, and require that y_true and y_pred only contain two categories: 0 and 1.

When we actually calculate the evaluation indicators for the above two-classification task, we can directly call the function library in sklearn to implement it.

1. Classification indicator function

1.1 precision_score function

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)        

1.2 recall_score function

sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, 
average='binary', sample_weight=None)     

1.3 accuracy_score function

The accuracy_score function is used to calculate the accuracy of the classification results.

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

1.4 f1_score function

The f1_score function is used to calculate the value of the classification result.

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, 
average='binary', sample_weight=None)

1.5 precision_recall_curve function

The precision_recall_curve function is used to calculate the PR curve of the classification results.

sklearn.metrics.precision_recall_curve(y_true, probas_pred, pos_label=None,
sample_weight=None)

1.6 roc_curve function

The roc_curve function is used to calculate the ROC curve of the classification results. Its prototype is:

sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None,
drop_intermediate=True)

1.7 roc_auc_score function

The oc_auc_score function is used to calculate the area AUC of the ROC curve of the classification result.

sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None)

1.8 classification_report function

The classification_report function in sklearn is used to display a text report of the main classification indicators. The precision, recall, F1 value and other information of each class are displayed in the report.

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2)

2. Two classification tasks

For the binary classification model, you can directly call sklearn.metricsin precision_score, recall_score 和 f1_scorefor calculation, and set the average parameter in the function to binary ( average='binary').

Application examples:

from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

y_true = [1, 1, 1, 1, 1, 0, 0, 0, 0, 1]
y_pred = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]

precision = precision_score(y_true, y_pred, average='binary')
print('precision:', precision)

recall = recall_score(y_true, y_pred, average='binary')
print('recall:', recall)

f1_score = f1_score(y_true, y_pred, average='binary')
print('f1_score:', f1_score)

The output is as follows:

precision: 0.714285714286
recall: 0.833333333333
f1_score: 0.769230769231

We can also use classification_reportthe function to view the classification of each category:

target_names = ['class 0', 'class 1']
cla_report = classification_report(y_true, y_pred, target_names=target_names)
print('cla_report:', cla_report)

The output is as follows:

             precision    recall  f1-score   support

    class 0       0.67      0.50      0.57         4
    class 1       0.71      0.83      0.77         6

avg / total       0.70      0.70      0.69        10

3. Multi-classification tasks

As mentioned earlier, the traditional calculation formulas of precision, recall, and F1 are only applicable to two-class models.

For multi-classification models, the Macro Average or Micro Average rule should be used to calculate F1 (or P, R).

3.1 Macro Average

Macro-averaging is to first count the indicator values ​​​​for each class, and then calculate the arithmetic mean value for all classes.

Macro Average will first calculate evaluation indicators for each class, such as precision, recall, and F1 score. Then average them to get Macro Precesion, Macro Recall, Macro F1. The specific calculation method is as follows:

Insert image description here

For example, consider a three-category classification model:

y_true=[1,2,3]
y_pred=[1,1,3]

The calculation process of Macro Average F1 is as follows:

(1) Set the first category to True (1), and set the non-first category to False (0), and calculate P1 and R1.

y_true=[1,0,0]
y_pred=[1,1,0]
P1 = (预测为1且正确预测的样本数)/(所有预测为1的样本数) = TP/(TP+FP) = 1/(1+1)=0.5
R1 = (预测为1且正确预测的样本数)/(所有真实情况为1的样本数) = TP/(TP+FN)= 1/1 = 1.0
F1_1 = 2*(PrecisionRecall)/(Precision+Recall)=20.5*1.0/(0.5+1.0)=0.6666667

(2) Set the second category to True (1), and set the non-second category to False (0), and calculate P2 and R2.

y_true=[0,1,0]
y_pred=[0,0,0]
P2 = (预测为1且正确预测的样本数)/(所有预测为1的样本数) = TP/(TP+FP) =0.0
R2 = (预测为1且正确预测的样本数)/(所有真实情况为1的样本数) = TP/(TP+FN)= 0.0
F1_2 = 2*(Precision*Recall)/(Precision+Recall)=0

(3) Set the third category to True (1), and set the non-third category to False (0), and calculate its P3 and R3.

y_true=[0,0,1]
y_pred=[0,0,1]
P3 = (预测为1且正确预测的样本数)/(所有预测为1的样本数) = TP/(TP+FP) = 1/1=1.0
R3 = (预测为1且正确预测的样本数)/(所有真实情况为1的样本数) = TP/(TP+FN)= 1/1 = 1.0
F1_3 = 2*(PrecisionRecall)/(Precision+Recall)=21.0*1.0/(1.0+1.0)=1.0

(4) The average of P1, P2, and P3 is P, the average of R1, R2, and R3 is R, and the average of F1_1, F1_2, and F1_3 is F1.

P=(P1+P2+P3)/3=(0.5+0.0+1.0)/3=0.5
R=(R1+R2+R3)/3=(1.0+0.0+1.0)/3=0.6666666
F1 = (0.6666667+0.0+1.0)/3=0.5556

The final P value/R value obtained after averaging is the P value/R value under the Macro rule. For this 3-category model, its F1 is 0.5556.

[Implementing Macro Average based on sklearn]:

The following is an example of calculating Macro Average based on sklearn. Just set the average parameter in the function to macro ( average='macro').

from sklearn.metrics import precision_score, recall_score, f1_score

y_true = [1, 2, 3]
y_pred = [1, 1, 3]

precision = precision_score(y_true, y_pred, average='macro')
print('precision:', precision)

recall = recall_score(y_true, y_pred, average='macro')
print('recall:', recall)

f1_score = f1_score(y_true, y_pred, average='macro')
print('f1_score:', f1_score)

The output is as follows:

precision: 0.5
recall: 0.666666666667
f1_score: 0.555555555556

3.2 Micro Average

Micro Average takes into account the contribution of all categories, combines the predictions from all categories, and then calculates an overall performance metric.

Micro-average = (TP + FP) / (TP + TN + FP + FN)

The denominator is the number of predicted samples input to the classifier, and the numerator is the number of correctly predicted samples (regardless of category).

For Micro F1, Micro F1 = Micro Recall = Micro Precesion = Accuracy.

[Implementing Micro Average based on sklearn]:

The following is an example of calculating Micro Average based on sklearn. average='micro'Just set the average parameter in the function to micro ().

from sklearn.metrics import precision_score, recall_score, f1_score

y_true = [1, 2, 3]
y_pred = [1, 1, 3]

precision = precision_score(y_true, y_pred, average='micro')
print('precision:', precision)

recall = recall_score(y_true, y_pred, average='micro')
print('recall:', recall)

f1_score = f1_score(y_true, y_pred, average='micro')
print('f1_score:', f1_score)

The output is as follows:

precision: 0.666666666667
recall: 0.666666666667
f1_score: 0.666666666667

3.3 Macro average vs. micro average

  • Macro average: Calculate the average of the performance indicators of each category separately, and then average these averages.
    – In Macro averaging, for each category, the precision rate, recall rate and F1 score are calculated separately, and these indicators are simply averaged.
    – Macro gives each category the same weight on average, regardless of differences in sample size, so it treats each category equally.
    – Macro averaging is more suitable when the performance of each category is equally important to the overall performance.

  • Micro average: combines the prediction results of all categories and then calculates the overall performance indicator.
    – In Micro averaging, the sum of the number of true examples, false positive examples, and false negative examples of all categories is used to calculate precision, recall, and F1 scores.
    – Micro averagely gives each sample the same weight, no matter which category it belongs to. Therefore, for the problem of unbalanced sample size, Micro average will be biased towards the category with a large number of samples.
    – Micro averaging is more suitable for situations where there are significantly unbalanced sample distributions across different classes, and is more concerned with the overall performance rather than the individual performance of each class.

To summarize: Macro averaging calculates the results of each category independently and averages the results of each category. It is suitable for situations where the performance of each category is equally important to the overall performance. The Micro average combines the results of all categories into an overall calculation, which is suitable for situations where the number of samples is unbalanced or where overall performance is concerned. The choice of which averaging method to use depends on the specific problem and needs.

Guess you like

Origin blog.csdn.net/u012856866/article/details/131829645