"In-depth understanding of machine learning performance evaluation indicators: TP, TN, FP, FN, precision, recall, accuracy, F1-score and mAP"

Table of contents

introduction

Classification criteria

Example: Cancer Detection

1. Precision

2. Recall rate (Recall)

3. Accuracy

4. F1-score

5. mAP (mean average precision)

Summary and popular explanation


introduction

One of the core goals of machine learning is to build models with good performance. To evaluate the performance of the model, we rely on a series of important metrics. In this article, we will delve into these metrics, includingTrue Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN), Precision, Recall , accuracy, F1-score and mean average precision (mAP) commonly used in the field of target detection.

Classification criteria

Example: Cancer Detection

Suppose we are developing a cancer detection model and we have 12 patient samples where 4 are positive cases with cancer, 8 are healthy negative cases.

# 模型预测结果
predicted = [0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0]
# 实际标签
actual =    [0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0]
predicted value=1 predicted value=0

true value=1

3(TP)

1(FP)

true value=0 1(FN) 7(TN)
  • TP = 3 (the number of actual illnesses that were correctly predicted)
  • FP = 1 (number of people who are actually healthy but incorrectly predicted to be sick)
  • FN = 1 (the number of people who are actually sick but incorrectly predicted to be healthy)
  • TN = 7 (actual health and correctly predicted number)

With these basic concepts, we can calculate other important performance metrics.

1. Precision

The accuracy rate refers to the proportion of samples that are correctly predicted as positive examples by the model and are actually positive examples. The calculation formula is:

\text{Precision} = \frac{TP}{TP + FP}=\frac{3}{4}

  • Definition: Precision rate refers to the proportion of samples that are actually positive among all samples predicted by the model to be positive.
  • Site view
    • Fraud detection: Used to determine the proportion of fraudulent transactions that the model correctly identifies to reduce false positives.
    • Medical diagnosis: used to evaluate the model's ability to correctly diagnose diseases and reduce misdiagnosis rates.

2. Recall rate (Recall)

The recall rate refers to the proportion of samples that are actually positive examples that the model correctly predicts as positive examples. The calculation formula is:

\text{Recall} = \frac{TP}{TP + FN} =\frac{3}{4}

  • Definition: Recall rate refers to the proportion of samples that are actually positive examples that the model correctly predicts as positive examples.
  • Site view
    • Malignant tumor screening: Used to ensure that the model does not miss cases with cancer and reduce false negatives.
    • Security checkpoints: Used to ensure that the model does not miss potential threats and maintain public safety.

3. Accuracy

Accuracy rate refers to the proportion of samples correctly predicted by the model to the total number of samples. The calculation formula is:

Accuracy=\frac{TP+TN}{TP+FP+TN+FN}=\frac{10}{12}

  • Definition: Accuracy refers to the proportion of all correctly classified samples to the total number of samples.
  • Site view
    • Binary classification problem: used to measure the overall performance of the model, especially when the number of positive and negative samples differs greatly.
    • Text Classification: Used to measure the model's ability to correctly classify text.

4. F1-score

F1-score takes precision and recall into consideration and is an indicator that balances the two. The calculation formula is:

\text{F1 Score} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

  • Definition: The F1 score is the harmonic average of precision and recall, which is used to comprehensively consider the trade-off between precision and recall.
  • Site view
    • Search engine result ranking: used to measure the quality of search results, balancing relevance and diversity.
    • Information Retrieval: Used to evaluate the performance of retrieval systems to ensure that retrieved documents are both relevant and comprehensive.

5. mAP (mean average precision)

mAP is widely used in the field of object detection and is an average measure of a model's performance on multiple categories. We extend other metrics to multi-category problems. The calculation formula is:

mAP=Precision*ReCall

As shown in the figure below,MAP value is the rectangular area surrounded by Precision and ReCall

 

  • Definition: mAP is a metric used to measure the performance of an object detection model. It calculates the average precision for each category and takes their average as the final score.
  • Site view
    • Object Detection: Used to evaluate the performance of object detection models, especially when there are multiple object categories.
    • Visual Search: Used to evaluate the performance of image retrieval systems to ensure that retrieved images contain relevant objects.

Summary and popular explanation

In layman’s terms,accuracy answers the question “How accurate is the overall prediction of the model?”;Recall rate answers the question "How strong is the model's ability to identify positive examples?"; mAP then It is a more comprehensive evaluation index for complex tasks; while F1-score attempts to combine precision and recall, giving us A more comprehensive assessment results. These indicators can help us understand the performance of the model more comprehensively, so as to select or optimize the model.

  • TP: The model says "this person is sick", but in fact this person is really sick.
  • TN: The model says "this person is not sick", and in fact this person is not sick.
  • FP: The model misdiagnosed a healthy person as a patient.
  • FN: The model missed real patients.
  • Precision rate: The proportion of people who are said to be "sick" by the model who are actually sick. Reflects the accuracy of model predictions.
  • Accuracy: Accuracy is a basic indicator for evaluating the prediction ability of a model. It reflects how many of all predictions of the model are correct. The higher the accuracy, the stronger the overall prediction ability of the model.
  • Recall: Recall focuses on the model’s ability to identify positive examples. It measures how many true positive examples the model correctly identified. A high recall rate means that the model has a strong ability to identify positive examples and will not miss too many real positive examples.
  • mAP: mAP is a more complex evaluation index, mainly used in tasks such as image classification and target detection. It can measure the performance of the model on multiple categories and gives a more comprehensive evaluation. In practical applications, if our task is to have the model recognize multiple objects in the image, then mAP is a very suitable evaluation metric.
  • F1-score: This is an evaluation metric that combines precision and recall. It tries to find a balance point so that both precision and recall reach a relatively high level. A high F1-score means that the model performs well in both precision and recall.

Guess you like

Origin blog.csdn.net/m0_74053536/article/details/134235759