Article directory

foreword
1. Model Evaluation Summary
2. Evaluation method
3. Examples
Summarize

foreword

1. Model Evaluation Summary

After the model training is complete, it is necessary to use the model to predict new data and evaluate the performance of the model. In this case, model evaluation is needed to check the performance of the model.

Model evaluation involves using the model to make predictions on new data and checking the model's performance using the same metrics as the training process. For example, if accuracy was used as a metric during training, it can also be used when evaluating the model to check how well the model predicts.

2. Evaluation method

In PyTorch, there are many built-in indicators that can be used to evaluate model performance, and these indicators can help us understand the performance of the model.

`1.准确率（Accuracy）`

Accuracy is an indicator for evaluating model performance, which indicates how well the model's predicted results match the real results. In general, the higher the accuracy, the better the performance of the model.

Use torch.nn.functional.accuracy() the function to calculate the accuracy of the model.

# 使用模型对数据进行预测
outputs = model(inputs)

# 计算准确率
accuracy = torch.nn.functional.accuracy(outputs, labels)

#打印准确率,准确率的值可以通过调用 accuracy.item() 来获取。
print(accuracy.item())

`2.ROC（Receiver Operating Characteristic）`

The ROC (Receiver Operating Characteristic) curve is a curve used to measure the performance of a binary classifier. The ROC curve plots the true positive rate and false positive rate of the classifier. The true rate is the probability that the classifier correctly classifies a positive sample, and the false positive rate is the probability that a negative sample is misclassified as a positive sample.

You can use torch.nn.functional.roc_auc_scorefunctions to 计算ROC曲线下的面积（AUC）. This function takes two parameters:

y_true：一个包含真实标签的Tensor。标签取值可以是0或1。
y_score：一个包含分类器预测得分的Tensor。这个得分可以是分类器对样本的预测概率，也可以是分类器对样本的预测类别。

If you want to plot ROC curve, you can use scikit-learn中的roc_curvefunction. It needs to receive three parameters:

y_true：一个包含真实标签的数组。标签取值可以是0或1。
y_score：一个包含分类器预测得分的数组。这个得分可以是分类器对样本的预测概率，也可以是分类器对样本的预测类别。
pos_label：正样本的标签值。

roc_curveThe function returns three values:

fpr：一个数组，包含每个ROC曲线绘制的真正率（true positive rate）和假正率（false positive rate）。绘制ROC曲线时，我们需要将真正率作为横坐标，假正率作为纵坐标，并将它们作为一个散点图绘制出来。

tpr：一个数组，包含真正率的值。
thresholds：一个数组，包含每个阈值对应的真正率和假正率。

After drawing the ROC curve, we can also evaluate the performance of the classifier by calculating the area under the curve (AUC). The larger the AUC, the better the performance of the classifier. Usually, the value range of AUC is 0~1. When AUC=1, it means that the performance of the classifier is optimal; when AUC=0.5, it means that the performance of the classifier is similar to random guessing.

# 定义真实标签
y_true = torch.Tensor([0, 0, 1, 1])

# 定义预测得分
y_score = torch.Tensor([0.1, 0.4, 0.35, 0.8])

# 计算AUC值
auc = torch.nn.functional.roc_auc_score(y_true, y_score)

# 绘制ROC曲线
fpr, tpr, thresholds = sklearn.metrics.roc_curve(y_true, y_score, pos_label=1)
plt.plot(fpr, tpr)
plt.show()

`3.混淆矩阵（confusion_matrix）`

A confusion matrix is a matrix used to evaluate the performance of a classifier. It counts the true rate and false positive rate of the classifier and uses them as four values of the matrix: true positive, true negative, false positive and false negative ( false negative).
In pytorch, you can use the torch.nn.functional.confusion_matrix function to calculate the confusion matrix. This function takes two parameters:

y_true：一个包含真实标签的Tensor。标签取值可以是0或1。
y_pred：一个包含预测标签的Tensor。标签取值可以是0或1。

The confusion_matrix function returns a two-dimensional Tensor containing 4 values.

# 定义真实标签
y_true = torch.Tensor([0, 0, 1, 1])

# 定义预测标签
y_pred = torch.Tensor([0, 1, 0, 1])

#计算混淆矩阵
confusion_matrix = torch.nn.functional.confusion_matrix(y_true, y_pred)

#打印结果
print(confusion_matrix)

The output is:

#这个矩阵的值依次是：真正类（1）、假负类（1）、假正类（1）和真负类（1）。
tensor([[1, 1],
        [1, 1]])

4. Precision

Precision (Precision) is an indicator for evaluating model performance, which indicates the proportion of samples that are actually positive among the samples predicted by the model. In general, the higher the accuracy, the better the performance of the model.

A function can be used sklearn.metrics.precision_score() to calculate the accuracy of the model.

5. Recall rate (Recall)

The recall rate (Recall) is an indicator to evaluate the performance of the model, which indicates the proportion of the samples that are predicted to be positive by the model among the samples that are actually positive. In general, the higher the recall, the better the performance of the model.

The recall of the model can be calculated using sklearn.metrics.recall_score() the function .

6. F1 value (F1 Score)

F1 value (F1 Score) is an indicator to evaluate the performance of the model, which represents the harmonic mean of the precision and recall of the model. In general, the higher the F1 value, the better the performance of the model.

A function can be used sklearn.metrics.f1_score()to calculate the accuracy of the model.

3. Examples

Use the following code to evaluate the PyTorch model:

# 禁用自动求导
with torch.no_grad():
    # 将模型设置为评估模式
    model.eval()

    # 使用模型对数据进行预测
    outputs = model(inputs)

    # 计算损失
    loss = criterion(outputs, labels)

    # 计算准确率
    accuracy = torch.nn.functional.accuracy(outputs, labels)

    # 计算精度、召回率和 F1 值
    precision = sklearn.metrics.precision_score(labels, outputs)
    recall = sklearn.metrics.recall_score(labels, outputs)
	f1 = sklearn.metrics.f1_score(labels, outputs)
    # 输出指标值
    print("Loss:", loss.item())
    print("Accuracy:", accuracy.item())
    print("Precision:", precision)
    print("Recall:", recall)
   	print("F1:", f1)

We first disabled automatic differentiation and then put the model in evaluation mode. We then use the model to make predictions on the data and compute the loss using the torch.nn.CrossEntropyLoss class. Next, we calculated the accuracy, precision, and recall of the model and output the values of these metrics.

Summarize

PyTorch provides a series of functions for evaluating model performance. These functions can help us understand the performance of the model on the training and test data, so as to determine whether the model needs further improvement. Commonly used evaluation indicators include accuracy rate, confusion matrix and ROC curve. In PyTorch, functions such as accuracy_score, confusion_matrix, and roc_auc_score can be used to calculate these metrics. In addition, PyTorch also provides some other evaluation functions, such as F1-score, precision and recall, etc., which can be selected according to actual needs.

Pytorch Tutorial Introduction Series 11----Model Evaluation