Machine Learning Notes - IOU, Precision, Recall, F-score

1. What is cross-comparison?

1. Overview of Intersection Over Union (IOU)

        Intersection over Union is an evaluation metric used to measure the accuracy of an object detector on a specific dataset. Any algorithm that provides predicted bounding boxes as output can be evaluated using IoU.

        As long as there are hand-labeled bounding boxes on the test set and bounding boxes predicted by our model. The intersection ratio can be calculated.

  • R1: The extent of the true bounding box rectangle;
  • R2: The range of the predicted rectangle;
  • Rol: The range where R1 and R2 overlap.

        As shown below

        The IOU value reflects the accuracy of a single object prediction. In general, if the IOU score is > 0.5, it is considered a good annotation. However, this varies from project to project.

2. Calculate the intersection ratio (IOU)

        Reference Code

import numpy as np

# get IoU overlap ratio
def iou(a, b):
	# get area of a
    area_a = (a[2] - a[0]) * (a[3] - a[1])
	# get area of b
    area_b = (b[2] - b[0]) * (b[3] - b[1])

	# get left top x of IoU
    iou_x1 = np.maximum(a[0], b[0])
	# get left top y of IoU
    iou_y1 = np.maximum(a[1], b[1])
	# get right bottom of IoU
    iou_x2 = np.minimum(a[2], b[2])
	# get right bottom of IoU
    iou_y2 = np.minimum(a[3], b[3])

	# get width of IoU
    iou_w = iou_x2 - iou_x1
	# get height of IoU
    iou_h = iou_y2 - iou_y1

	# get area of IoU
    area_iou = iou_w * iou_h
	# get overlap ratio between IoU and all area
    iou = area_iou / (area_a + area_b - area_iou)

    return iou

# [x1, y1, x2, y2]
a = np.array((50, 50, 150, 150), dtype=np.float32)

b = np.array((60, 60, 170, 160), dtype=np.float32)

print(iou(a, b))

        Calculate the intersection ratio of the following two rectangles

a = np.array((50, 50, 150, 150), dtype=np.float32)
b = np.array((60, 60, 170, 160), dtype=np.float32)

        output

0.627907

        visualization 

 2. Assess the overall accuracy of the model

        It is said that IOU is the accuracy evaluation of a single object prediction, but for the overall model or task, how to evaluate the quality of its recognition? This leads to the following F1 Score/F2 Score/F3 Score etc.

1. Basic Concepts

(1)TP/TN/FN/FP

        Before evaluating a model's score on a specified dataset, let's understand some terminology:

        True Positive (TP): Annotations with a correctly drawn IOU score > 0.5.

        True Negative (TN):  Do not draw annotations when they are not needed.

        False Negative (FN):  When an annotation is required, no annotation is drawn.

        False Positive (FP):  These are incorrectly drawn annotations with an IOU score < 0.5.

        It can be compared and understood from the perspective of classification:

  • TP: A good melon predicted by the model as a good melon (a real good melon, and also predicted as a good melon by the model)
  • TN: A bad melon predicted by the model as a bad melon (is a real bad melon, and also predicted as a bad melon by the model)
  • FN: A good melon predicted as a bad melon by the model (the melon is a real good melon, but predicted to be a bad melon by the model)
  • FP: A bad melon predicted by the model to be a good melon (the melon is a real bad melon, but predicted to be a good melon by the model)

(2) Accuracy ( Accuary )

        Use the accuracy method to measure the performance of our model. Accuracy is the most intuitive measure of task performance, as it is simply the ratio of correctly drawn annotations to the total expected annotations.

        While accuracy is intuitive and simple, it is also the least insightful. In most real-life situations, there is a severe class imbalance, and FN and FP are not considered, which can lead to bias or wrong conclusions. 

(3) Precision ( Precision )

        Accuracy is the ratio of correctly drawn annotations to the total number of annotations drawn .

(4) Recall rate ( Recall )

        Recall is the ratio of correctly drawn annotations to the total number of annotations.

 2. What is F-score?

(1) F-score overview

        F-score (also called F-measure ) is commonly used in the field of information retrieval to measure search, document classification, and query classification performance. F-score is also used for machine learning. F-score has been widely used in the natural language processing literature, such as the evaluation of named entity recognition and word segmentation.

        The F-score uses precision and recall to measure the accuracy of a test.

        The general formula for the F-score is as follows:

        For F1 score, β=1; for F2 score, β=2; and so on.

        sklearn.metrics provides functions to measure fbeta_score scores.

(2) F1 score

        The F1 score is the harmonic mean of Precision and Recall, which is a better measure of incorrect annotations.

# calculate the f1-measure
from sklearn.metrics import fbeta_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
# perfect precision, 50% recall
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
p = precision_score(y_true, y_pred)
r = recall_score(y_true, y_pred)
f = fbeta_score(y_true, y_pred, beta=1.0)
print('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))

(3) F2 score

        The intuition behind the F2 score is that it weights recall more than precision.

# calculate the f2-measure
from sklearn.metrics import fbeta_score
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
# perfect precision, 50% recall
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
p = precision_score(y_true, y_pred)
r = recall_score(y_true, y_pred)
f = fbeta_score(y_true, y_pred, beta=2.0)
print('Result: p=%.3f, r=%.3f, f=%.3f' % (p, r, f))

Guess you like

Origin blog.csdn.net/bashendixie5/article/details/123738525