Recently, when using sklearn for classification, the evaluation function in metrics is used. One of the very important evaluation functions is the F1 value, (the principle of this value is google or Baidu)
The function for calculating F1 in sklearn is f1_score, which has a parameter average to control the calculation method of F1. Today we will talk about the difference between micro and macro parameters.
1. F1 formula description:
F1-score: 2*(P*R)/(P+R)
2. The usage description of the parameter average in f1_score:
'micro'
:Calculate metrics globally by counting the total true positives, false negatives and false positives.
'micro': by first calculating the total number of TP, FN and FP, and then calculating F1
'macro'
:Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'macro': The distribution calculates the F1 of each category, and then averages (the weight of each category F1 is the same)
3. Preliminary understanding
Through the description of parameter usage, I think everyone can understand what he means from the literal level. Micro is to first calculate the number of all TP, FN and FP, and then use the formula mentioned above to calculate F1
In fact, macro is to first calculate the F1 value of each category, and then average it. For example, the following multi-classification problem has a total of 4 categories of 1, 2, 3, and 4. We can first calculate the F1 of 1, the F1 of 2, and the 3 F1 of F1, F1 of 4, and then average (F1+F2+F3+4)/4
y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4]
y_pred = [1, 1, 1, 0, 0, 2, 2, 3, 3, 3, 4, 3, 4, 3]
4. Further understanding
Let's take the above example as an example to illustrate how micro and macro are calculated in sklearn:
micro computing principle
First calculate the total TP value, this is fine, just count how many categories are correctly classified. TP=3+2+2+1=8
Secondly, the total FP value is calculated. Simply put, the number of elements that do not belong to a certain category is divided into this category. For example, there is 1 element that does not belong to 4 categories above and is divided into 4.
If it is still confusing, we can keep 4 when calculating, and change all others to 0, so that we can see the number of FPs under the 4 categories more clearly. In fact, this principle is One-vs-All (OvA), put 4 It is regarded as a positive class, and the others are regarded as a negative class
Similarly, we can calculate the number of FN again
Category 1 | Category 2 | Category 3 | Category 4 | total | |
TP | 3 | 2 | 2 | 1 | 8 |
FP | 0 | 0 | 3 | 1 | 4 |
FN | 2 | 2 | 1 | 1 | 6 |
So the precision P of micro is TP/(TP+FP)=8/(8+4)=0.666 Recall R TP/(TP+FN)=8/(8+6)=0.571 So the value of F1-micro as: 0.6153
You can use sklearn to check, set the average to micro
y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4] y_pred = [1, 1, 1, 0, 0, 2, 2, 3, 3, 3, 4, 3, 4, 3] print(f1_score(y_true,y_pred,labels=[1,2,3,4],average='micro'))
#>>> 0.615384615385
Computing macros
The macro first calculates the F1 of each class. With the above table, it is easy to calculate the F1 of each class. For example, class 1, its precision rate P=3/(3+0)=1 recall rate R=3 /(3+2)=0.6 F1=2*(1*0.5)/1.5=0.75
You can use sklearn to calculate the check and set the average to macro
#average=None, take out the P, R, F1 values of each class
p_class, r_class, f_class, support_micro=precision_recall_fscore_support(y_true=y_true, y_pred=y_pred, labels=[1, 2, 3, 4], average= None) print ( ' All kinds of individual F1: ' ,f_class) print ( ' All kinds of F1 average: ' ,f_class.mean()) print (f1_score(y_true,y_pred,labels=[1,2,3,4],average = ' macro ' )) # >>> All kinds of individual F1: [ 0.75 0.66666667 0.5 0.5 ] # >>> All kinds of F1 average: 0.604166666667 # >>>0.604166666667
If there is any loading, please indicate the source, thank you