pytorch advanced learning (7): drawing and code of confusion matrix, recall rate, precision rate, ROC curve and other indicators in the process of neural network model verification

reference notes

[Machine Learning] Understand how to evaluate a binary classification model in five minutes! Confusion matrix, recall rate, precision rate, and precision rate are super simple explanations, a must-see for beginners! _哔哩哔哩_bilibili

The concept of confusion matrix_GIS_JH's Blog-CSDN Blog

Confusion matrix in machine learning, accuracy rate, precision rate, recall rate, F1, ROC/AUC, AP/MAP_confusion matrix accuracy rate prediction precision recall rate_Yinfeng's blog-CSDN blog course teaching resources  :

8. Source code sharing Confusion matrix, recall rate, precision rate, ROC curve and other indicators can be exported with one click [Pytorch for elementary school students]_哔哩哔哩_bilibili

Notes from the previous section: pytorch advanced learning (6): how to optimize and verify the trained model and visualize the accuracy and loss of the training process, novice-friendly super detailed records -CSDN blog 


Table of contents

One and two classification model evaluation indicators (theoretical introduction)

1. Confusion Matrix

1.1 Introduction

1.2 TP、FP、FN、TN

2. Secondary indicators

2.1 Accuracy 

2.2 Accuracy

2.3 Recall rate 

 3. Level 3 indicator F1

2. Visualization of indicators such as confusion matrix, recall rate, precision rate, ROC curve, etc.

1. Dataset generation and model training

2. Model Validation

2.1 Specific steps

2.2 Explanation about the eval function

2.3 Code

2.4 Running results

3. Image drawing of indicators such as confusion matrix and ROC curve

3.1 Code

 3.2 Output result


One and two classification model evaluation indicators (theoretical introduction)

1. Confusion Matrix

1.1 Introduction

In the field of machine learning, confusion matrix (Confusion  Matrix ), also known as possibility matrix or error matrix. Confusion matrix is ​​a visualization tool, especially for supervised learning, and it is generally called matching matrix in unsupervised learning. In the image accuracy evaluation, it is mainly used to compare the classification results and the actual measured values, and the accuracy of the classification results can be displayed in a confusion matrix.

  • In the prediction of whether it is a cat in the figure below, the upper left of the table (actual positive and predicted positive) and the lower right of the table (actual negative and predicted negative) are the predicted correct values.

  •  According to the data in the table on the left, the numbers in the table on the right can be obtained, that is, the confusion matrix is ​​obtained

 

 

1.2 TP、FP、FN、TN

  • True Positive (TP): true class. The true class of the sample is the positive class, and the result of the model recognition is also the positive class.
  • False Negative (FN): False negative class. The true class of the sample is the positive class, but the model recognizes it as the negative class.
  • False Positive (FP): False positive class. The true class of the sample is the negative class, but the model recognizes it as the positive class.
  • True Negative (TN): True negative class. The true class of the sample is the negative class, and the model recognizes it as the negative class.
     

In the image below:

  • True is a cat, predicted to be a cat (positive): TP
  • Real is not a cat, prediction is not a cat (negative): TN
  • The truth is not a cat, the prediction is a cat (positive): FP
  • The truth is a cat, the prediction is not a cat (positive): PN

2. Secondary indicators

2.1 Accuracy 

The proportion of the number of true prediction pairs in all samples to all samples.

2.2 Accuracy

The number of samples that were predicted to be positive (predicted to be cats: 4) that were actually positive (actually cats: 3).

2.3 Recall rate 

How many of the actual positive samples (actual cat: 5) were predicted positive (predicted cat: 3).

 3. Level 3 indicator F1

It is necessary to comprehensively consider the scores of both the precision rate and the recall rate, so the F1 value is introduced, which is the harmonic mean of the precision rate and the recall rate.

2. Visualization of indicators such as confusion matrix, recall rate, precision rate, ROC curve, etc.

1. Dataset generation and model training

Here, the code used to generate the dataset and train the model is the same as the previous section, you can see the specific code above.

Pytorch advanced learning (6): How to optimize and verify the trained model and visualize the accuracy and loss of the training process, novice-friendly super detailed records

  1.  CreateDataset.py generates dataset train.txt and test.txt files
  2. PreTrainedModel.py pre-trains the model. Here we use resnet34 as the basic network structure, download the pre-training weight file to adjust the parameters, adjust the fully connected layer of the neural network, and go through a series of transfer learning steps such as parameter loading, freezing, and training. After that, set epoch=50 to complete the training of the network, and save the parameter weight trained by the epoch with the highest correct rate, which is BEST_resnet_epoch_50_acc_87.1.pth in my case, for subsequent use;

 

2. Model Validation

2.1 Specific steps

  1. Load the pre-trained model: the network framework of resnet34, modify the output of the fc layer to 5
  2. Load model parameters: Use the "./BEST_resnet_epoch_50_acc_87.1.pth" file trained in the previous step to load it into the model for use
  3. Load the pictures that need to be predicted: We use the test.txt file for testing, which contains a total of 866 pictures.
  4. Get the result: the eval function calculates the predicted label, probability and

 

2.2 Explanation about the eval function

        This code defines a function called eval that takes two parameters: dataloader and model. Inside the function, three empty lists label_list, likelihood_list and pred_list are first defined.

  • Use model.eval() to set the model into evaluation mode for use at inference time.
  • Use the torch.no_grad() context manager to disable gradient computation so that gradients are not computed at inference time, improving inference speed.
  • In the for loop, use the dataloader to load the data to get the image data X and the real label y. Then transfer the data to the GPU, and pass the picture into the model to get the predicted value pred.
  • pred_softmax is a numpy array that represents the model's predicted probability distribution for each class. The i-th element of pred_softmax represents the probability that the model predicts that the input belongs to the i-th category. In the function, pred_softmax is obtained by passing the pred tensor to the torch.softmax function and converting the result to a numpy array using the cpu().numpy() method.
  • Convert the pred to a probability distribution using the torch.softmax function and convert it to a numpy array using the numpy function. Then, use the numpy.argmax function to get the label with the greatest probability and add it to label_list. Use the numpy.max function to get the value with the greatest probability and add it to the likelihood_list.
  • Add pred_softmax to pred_list. Finally, the function returns three lists label_list, likelihood_list and pred_list. 

 

2.3 Code

The code uses a progress bar when the result is displayed, and you need to pip tqdm

pip install tqdm

'''
    1.单幅图片验证
    2.多幅图片验证
'''
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.models import resnet34
from utils import LoadData, write_result
import pandas as pd
from tqdm import tqdm

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'


def eval(dataloader, model):
    label_list = []
    likelihood_list = []
    pred_list = []
    model.eval()
    with torch.no_grad():
        # 加载数据加载器,得到里面的X(图片数据)和y(真实标签)
        for X, y in tqdm(dataloader, desc="Model is predicting, please wait"):
            # 将数据转到GPU
            X = X.cuda()
            # 将图片传入到模型当中就,得到预测的值pred
            pred = model(X)

            pred_softmax = torch.softmax(pred,1).cpu().numpy()
            # 获取可能性最大的标签
            label = torch.softmax(pred,1).cpu().numpy().argmax()
            label_list.append(label)
            # 获取可能性最大的值(即概率)
            likelihood = torch.softmax(pred,1).cpu().numpy().max()
            likelihood_list.append(likelihood)
            pred_list.append(pred_softmax.tolist()[0])

        return label_list,likelihood_list, pred_list


if __name__ == "__main__":

    '''
        加载预训练模型
    '''
    # 1. 导入模型结构
    model = resnet34(pretrained=False)
    num_ftrs = model.fc.in_features    # 获取全连接层的输入
    model.fc = nn.Linear(num_ftrs, 5)  # 全连接层改为不同的输出
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # 2. 加载模型参数

    model_loc = "./BEST_resnet_epoch_50_acc_87.1.pth"

    model_dict = torch.load(model_loc)
    model.load_state_dict(model_dict)
    model = model.to(device)

    '''
       加载需要预测的图片
    '''
    valid_data = LoadData("test.txt", train_flag=False)
    test_dataloader = DataLoader(dataset=valid_data, num_workers=4, pin_memory=True, batch_size=1)


    '''
      获取结果
    '''
    # 获取模型输出
    label_list, likelihood_list, pred =  eval(test_dataloader, model)

    # 将输出保存到exel中,方便后续分析
    label_names = ["daisy", "dandelion","rose","sunflower","tulip"]     # 可以把标签写在这里
    df_pred = pd.DataFrame(data=pred, columns=label_names)

    df_pred.to_csv('pred_result.csv', encoding='gbk', index=False)
    print("Done!")

2.4 Running results

The model predicts the 866 pictures in the test set, and generates the pred_result.csv file, which stores the predicted probability of each picture for each category. 

 

 pred_result.csv display:

 

3. Image drawing of indicators such as confusion matrix and ROC curve

3.1 Code

Need to use sklearn package, pip installation 

pip install scikit-learn

'''
    模型性能度量
'''
from sklearn.metrics import *  # pip install scikit-learn
import matplotlib.pyplot as plt # pip install matplotlib
import numpy as np  # pip install numpy
from numpy import interp
from sklearn.preprocessing import label_binarize
import pandas as pd # pip install pandas

'''
读取数据

需要读取模型输出的标签(predict_label)以及原本的标签(true_label)

'''
target_loc = "test.txt"     # 真实标签所在的文件
target_data = pd.read_csv(target_loc, sep="\t", names=["loc","type"])
true_label = [i for i in target_data["type"]]

# print(true_label)


predict_loc = "pred_result.csv"     # 3.ModelEvaluate.py生成的文件

predict_data = pd.read_csv(predict_loc)#,index_col=0)

predict_label = predict_data.to_numpy().argmax(axis=1)

predict_score = predict_data.to_numpy().max(axis=1)

'''
    常用指标:精度,查准率,召回率,F1-Score
'''
# 精度,准确率, 预测正确的占所有样本种的比例
accuracy = accuracy_score(true_label, predict_label)
print("精度: ",accuracy)

# 查准率P(准确率),precision(查准率)=TP/(TP+FP)

precision = precision_score(true_label, predict_label, labels=None, pos_label=1, average='macro') # 'micro', 'macro', 'weighted'
print("查准率P: ",precision)

# 查全率R(召回率),原本为对的,预测正确的比例;recall(查全率)=TP/(TP+FN)
recall = recall_score(true_label, predict_label, average='macro') # 'micro', 'macro', 'weighted'
print("召回率: ",recall)

# F1-Score
f1 = f1_score(true_label, predict_label, average='macro')     # 'micro', 'macro', 'weighted'
print("F1 Score: ",f1)


'''
混淆矩阵
'''
label_names = ["daisy", "dandelion","rose","sunflower","tulip"]
confusion = confusion_matrix(true_label, predict_label, labels=[i for i in range(len(label_names))])


plt.matshow(confusion, cmap=plt.cm.Oranges)   # Greens, Blues, Oranges, Reds
plt.colorbar()
for i in range(len(confusion)):
    for j in range(len(confusion)):
        plt.annotate(confusion[j,i], xy=(i, j), horizontalalignment='center', verticalalignment='center')
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.xticks(range(len(label_names)), label_names)
plt.yticks(range(len(label_names)), label_names)
plt.title("Confusion Matrix")
plt.show()


'''
ROC曲线(多分类)
在多分类的ROC曲线中,会把目标类别看作是正例,而非目标类别的其他所有类别看作是负例,从而造成负例数量过多,
虽然模型准确率低,但由于在ROC曲线中拥有过多的TN,因此AUC比想象中要大
'''
n_classes = len(label_names)
# binarize_predict = label_binarize(predict_label, classes=[i for i in range(n_classes)])
binarize_predict = label_binarize(true_label, classes=[i for i in range(n_classes)])

# 读取预测结果

predict_score = predict_data.to_numpy()



# 计算每一类的ROC
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(binarize_predict[:,i], [socre_i[i] for socre_i in predict_score])
    roc_auc[i] = auc(fpr[i], tpr[i])

# print("roc_auc = ",roc_auc)

all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])
# Plot all ROC curves
lw = 2
plt.figure()
plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)


for i in range(n_classes):
    plt.plot(fpr[i], tpr[i], lw=lw, label='ROC curve of {0} (area = {1:0.2f})'.format(label_names[i], roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Multi-class receiver operating characteristic ')
plt.legend(loc="lower right")
plt.show()

 

 3.2 Output result

confusion matrix

ROC curve 

 

 

Guess you like

Origin blog.csdn.net/weixin_45662399/article/details/130131106