Model evaluation index - F1 value

Recently, I have been participating in the Digital China Innovation Contest in my free time. The rules of the competition are to rank the contestants according to the F1 value of the model. In order to understand this indicator more deeply, I recently sorted it out and now share it with more people in need. Recently, I also found a problem during the competition, that is, the algorithm fully fits on the training set (KS=1), but it decays more on the test set, that is, the phenomenon of overfitting appears. If you have more experience in overfitting parameter adjustment, welcome to communicate in the official account. There are friends who are participating in the competition, and those who want to discuss the competition can also contact me.
  
For the classification model, after the model is established, we want to evaluate the model. Common indicators include confusion matrix, F1 value, KS curve, ROC curve, AUC area, etc. You can also define your own function, divide the model result into n (100) parts, and calculate the accuracy and coverage of top1. The confusion matrix and KS curve were explained before. This article explains the principle of F1 value and the implementation example of Python. Other indicators will be explained in detail in subsequent articles. Please look forward to the pictures.

  

1. Introduce the F1 value in detail

  

1 What is the F1 value

  
The F1 value is also called the F1 score (F1-Score): it is a measure of the classification problem, which is the harmonic mean of the precision rate P (Precision) and the recall rate R (Recall).

F1 value=2*P*R/(P+R)

  
The value range of F1 value (0~1), the closer to 1, the better the prediction effect of the model, as for the reasons, see later.

  
  

2 A small example to understand the F1 value

  
Assume that 1 represents an account involved in gambling and fraud, and 0 represents a low-risk account that is not involved in gambling or fraud.

picture

T: correct prediction, F: wrong prediction, P: 1, N: 0.
  

  1. TP (True Positive): The number that the model correctly predicts is 1, that is, the true value is 1, and the model predicts the number of 1.
  2. FN (False Negative): The number of model errors predicted as 0, that is, the true value is 1, and the model predicts the number of 0.
  3. FP (False Positive): The number of model errors predicted as 1, that is, the true value is 0, and the model predicts the number of 1.
  4. TN (True Negative): The number that the model correctly predicts is 0, that is, the true value is 0, and the model predicts the number of 0.

Precision P (Precision): Among the samples predicted as 1 by the model, the proportion of samples that are actually 1. Calculated as follows:
  

P(Precision)=TP/(TP+FP)

  
Recall rate R (Recall): Among the samples that are actually 1, the proportion of samples that are predicted to be 1 by the model (how many 1 are retrieved by the model). Calculated as follows:
  

R(Recall)=TP/(TP+FN)

  
but

F1值=2 P R/(P+R)

  
Considering an extreme situation, the probability of all gambling-related and fraudulent accounts is higher than that of normal accounts, which means that I can find a cut point where both P and R are equal to 1, that is, the F1 value is 1. At this time, it shows that the model can completely distinguish between gambling-related fraud accounts and non-gambling-related fraud accounts. That is, the closer the F1 value is to 1, the better the model effect is. In order to see the trend relationship among P, R and F1 values ​​more clearly, we change the F1 value into a lower form and get the following results:
  

F1 value=2/(1/P+1/R)

  
You can try to restore the points. From the above formula, it can be found that when R is constant, the larger the P, the smaller the denominator, the larger the value of F1, and the R can be obtained in the same way. Explain that P, R and F1 are directly proportional.

  
  

2. How to calculate the F1 value with Python

  
There are many codes for calculating the F1 value in Python, and this article provides two. One is to write function calculations, and the other is to call sklearn calculations.
  
  

1 Write function to calculate F1 value

  
First look at the way to write the function:

#Recall = TP/(TP + FN)
#Precision = TP/(TP + FP)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold, cross_val_score
from sklearn.metrics import confusion_matrix, recall_score, classification_report

#绘制混淆矩阵
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    
def plot_confu_matrix_cal_F1(thresholds, date):
    '''
    thresholds:切割predict的阈值,比如:[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
    data:数据集,其中包含y标签列,predict预测列(可以是概率值,也可以是标签)
    '''
    import itertools
    thresholds =  thresholds     
    plt.figure(figsize = (10, 10))
    j = 1
    for i in thresholds:
        y_test_predictions_high_recall = date['predict'] > i

        plt.subplot(3, 3, j)
        j += 1

        #Compute confusion mat
        cnf_matrix = confusion_matrix(date.y, y_test_predictions_high_recall)
        np.set_printoptions(precision = 2)

        recall_score_1 = cnf_matrix[1, 1] / (cnf_matrix[1, 1] + cnf_matrix[1, 0])
        accary_score_1 = cnf_matrix[1, 1] / (cnf_matrix[1, 1] + cnf_matrix[0, 1])
        F1_score = 2*recall_score_1*accary_score_1/(recall_score_1+accary_score_1)
        print("thresholds in the testing dataset:", i)
        print("Recall metric in the testing dataset:", cnf_matrix[1, 1] / (cnf_matrix[1, 1] + cnf_matrix[1, 0]))
        print("accary metric in the testing dataset:", cnf_matrix[1, 1] / (cnf_matrix[1, 1] + cnf_matrix[0, 1]))
        print("F1 score:", F1_score)
        # Plot non-normalized confusion matrix
        class_names = [0, 1]
        plot_confusion_matrix(cnf_matrix, classes = class_names, title = 'Threshold >= %s' %i)

The F1 value is related to the confusion matrix, and the code supports displaying both the confusion matrix and the F1 value.

  
  

2 Write a function to calculate the specific example of F1 value

  
For ease of understanding, here is a specific example (entry data):

plot_confu_matrix_cal_F1(list(np.arange(0.4, 0.7, 0.05)), train_date)

thresholds: The threshold for cutting the predict. In this example, list(np.arange(0.4, 0.7, 0.05)) is taken.
  
train_date: Data set, which contains y label column, predict prediction column (can be a probability value or a label).
  
The result is as follows:

thresholds in the testing dataset: 0.4
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]
thresholds in the testing dataset: 0.45
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]
thresholds in the testing dataset: 0.5
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]
thresholds in the testing dataset: 0.55
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]
thresholds in the testing dataset: 0.6
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]
thresholds in the testing dataset: 0.6499999999999999
Recall metric in the testing dataset: 1.0
accary metric in the testing dataset: 1.0
F1 score: 1.0
Confusion matrix, without normalization
[[900   0]
 [  0 300]]

picture
  
Since the predict in train_date is a label that has been divided into 0 and 1, when I set the 0.4 to 0.65 cut, the default is 0 is a group, and 1 is a group. From the above results, the precision rate P (Precision), the recall rate R (Recall), and the F1 value of the model are all 1, that is, the model is fully fitted on the training set and can completely distinguish gambling-related fraud accounts from non-related accounts. Gambling on fraudulent accounts.

  
  

3 Call sklearn to calculate F1 value

  
This article only focuses on the calculation of the F1 value of the two-category problem. The parameters of the f1_score function in sklearn are as follows:

from sklearn.metrics import f1_score
f1_score(y_true, y_pred, *, labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn')

Description of common parameters:
  
y_true: true label column.
  
y_pred: Predicted label column.
  
labels: optional parameter, the default is None, which is a one-dimensional array, and this parameter is not required for binary classification.
  
Apply the same data, call the f1_score function in sklearn, the statement is as follows:

f1_score(train_date.y, train_date.predict)

got the answer:

1

It can be found that the calculation result is consistent with the result of the write function calculation, both of which are 1. So far, the principle of F1 value and Python implementation examples have been explained, and interested students can try to implement it by themselves.
  
You may be interested in:
Draw Pikachu with Python Draw a
word cloud map
with Python Draw 520 eternal heartbeats with
Python With sound and text) Use the py2neo library in Python to operate neo4j and build a relationship map Python romantic confession source code collection (love, rose, photo wall, confession under the stars)



Guess you like

Origin blog.csdn.net/qq_32532663/article/details/130030545